One model to learn them all

(blog.acolyer.org)

289 points | by mpweiher 2289 days ago

14 comments

  • maxpupmax 2288 days ago
    Wow. Can someone pull a Hacker News and explain to me why I'm allowed to be super pessimistic about this result? I want to believe.
    • nicolashahn 2288 days ago
      To me it looks like they just took a bunch of specialized NN classifiers and glued them together. Not to belittle their work, this is still impressive and an important step towards generalized machine intelligence, but we're still a very long way off.

      The next level above this would be to give it some input and without telling the classifier what to do with it, it decides which task it's supposed to do on its own, and then executes.

      • sdenton4 2288 days ago
        That actually seems like a rather higher bar than /we/ have to deal with. Image, auditory, touch and taste data come in on distinct signal paths before going to higher-level feature processing.
        • larister 2288 days ago
          True in normal conditions, but our neocortex is highly adaptable. I can’t remember the studies, but there have been cases where a grid of actuators was taped to the body of a blind man and hooked up to a camera mounted on his head. In time, he was able to “see” edges based on the signals coming from his skin.
          • mod 2288 days ago
            That's amazing.

            I guess probably the orientation didn't even matter. So likely, you could seamlessly wear something like that on a patch of skin that stayed out of your way (on your back, perhaps).

      • ratsimihah 2288 days ago
        Wouldn't you "just" need to train a classifier on top to select the best fitting model? Or use ensemble learning? I dunno. My point is that it wouldn't be much more generalized even without that input.
        • nicolashahn 2288 days ago
          Yeah, that would just be one step out of the however dozens/hundreds/thousands more to go before AGI.
    • andreyk 2288 days ago
      No need to be super pessimistic, but there is reason to temper your excitement - this is quite preliminary research (little results to indicate much benefit to this - "But the results show that even on the ImageNet task, the presence of such blocks does not detract from performance, and may even slightly improve it."), and is still entirely supervised. Something like this able to learn in a semi-supervised fashion from images and text could really seem revolutionary.
      • sdenton4 2288 days ago
        There's a pretty strong literature around retraining just the top couple layers of a deep net to Target a slightly different objective.

        The interesting possiblity opened here is training new feature processing frontends to work with an established 'conceptual' backend.

      • tripzilch 2287 days ago
        To be fair, their ImageNet results were 86% versus 95% in state of the art. In ML, those last few percentages, the closer you get to 100% become exceedingly harder to beat.
      • innagadadavida 2288 days ago
        If AI really really, worked, Google would’ve called the team: “Google Brainless”.

        Hang on tight we are getting there.

    • jeyoor 2288 days ago
      At first glance, these results appear fairly impressive as far as deep learning transfer studies go.

      However, standard caveats about the limits of those approaches should still apply: e.g. https://spectrum.ieee.org/cars-that-think/transportation/sen...

      In other words, I don't think MultiModel will be immune to fairly straightforward adversarial image attacks (although you might need a different adversarial network to generate them).

      Furthermore, the problems being addressed by MultiModel (image recognition, natural language processing, machine translation) are problems that already have fairly robust deep learning results.

      I'd be more interested if MultiModel showed significantly better results on problem areas that are currently difficult for standard deep learning approaches.

    • ACow_Adonis 2288 days ago
      Well, first laying out the caveat that there are indeed quite useful and interesting results that could come from this approach, including discovering unexpected connections/underlying patterns between things humanly/socially viewed as "separate-domains", and that new ways of mixing/ensemble techniques are not particularly new, but can lead to real advances in both performance and insights, i'm guessing what you mean by "super pessimistic" is being asked to "throw water on the notion that there is literally one model to learn them all".

      At a high-level I will try to do so with three concepts and an analogy.

      First concept: opportunity cost Second concept: objective definition Third concept: qualitative similarity/monism

      The analogy i use is physical fitness. If someone tried to tell you there was "one fitness regime to rule them all", you would, hopefully, step back and say something like:

      "Well hang on a sec...what even is fitness?...and even once we MIGHT agree that there is some "thing" we're both calling fitness, what if the "thing" we agree on is fundamentally composed of qualitatively different components or atomistic concepts...and if that is the case, is it really conceivable that there is no opportunity cost between maximizing all qualitatively different concepts? How would we even agree on how to compare them?"

      Put into english, i think most relatively advanced thinkers understand this about fitness. There is no "one greatest athlete" and there is no "one ultimate training regime". There is no objective way to rank or compare a tennis player to a linebacker to a golf player to a sprinter to a marathon runner. Additionally, the regimes and body types that make one good at some fundamentally make you worse at others. It might even be worse than that... There might not even be a way to compare or rank athletes transitively WITHIN domains?! Aye carumba!

      To bring it back to data science, what we're being asked to believe in things like "general AI" or "one model to rule them all" is that the problem domain has these kinds of properties:

      1. Composed of things which do not have a fundamental opportunity cost between them. If they do, you cannot have one model to rule them all, you must choose trade-offs. 2. Can be "objectively" agreed upon in some way: ok, you've come to an agreement on what your trade-offs will be, and you will maximise that instead, but was there anything objective in that decision? In the example included, he uses the example of training on the concept of "banana", but maybe there really is no universal concept of "banana" because it is subjectively experienced by every conscious being. Is it right to link the concept of banana to yellow, sweet, sour, disgusting, desirable? which really just leads into... 3. That the domain "REALLY IS" composed of a singular "same type of underlying thing". If the underlying thing in our domain is fundamentally composed of qualitatively different things, conglomerating and comparing them can only be achieved by subjectivity and subjective agreement. There is literally no objective answer to be found. You might find practical similarities or averages or something like that, but there is literally no fundamental common ground that will make everything happy and work out.

      Now, to be sure, in a practical sense you can usually limit your domain, and limit your problems, and your limit your social circle sufficiently enough to get close to one "optimal model" that you all agree on in one limited context but usually this is a cause of extreme finiteness of problem scope and extreme finiteness of social and subjective context.

      Once we expand to anything even remotely close to "all models" or even "most things humans care vaguely about", the whole thing breaks down.

      Personally, not only do i think all these things are not composed of non-opportunity cost incurring, objectively defined, qualitatively similar domains, i think all evidence points explicitly to the opposite.

      This of course does not mean that generalising models are not valuable or practically interesting, but you no more have to worry about general AI or one model to rule them all any more than you have to worry about one fitness regime that will make you the best at sport.

      Of course...you MIGHT have to worry about the social context, that is to say, one idea of sport becoming so pre-eminant socially that it is what everyone thinks of when fitness and sport are mentioned. If you're not into tiddly-winks when it takes over the world, you might be in for a world of social pain if you're not involved too...

  • reilly3000 2288 days ago
    I think this nips at the core. The reality we live in is an unlimited stream of inputs. Their notion of attention is important. Modeling weight of inputs seems to be an interactive componation of bidding the value of past and future across infinite dimensions. By encoding them as a single input stream they are nudging at what a brain does well when is works right: contextulization.
  • dchichkov 2288 days ago
    I see a comparison with the sate-of-the-art on single problems (and it is far from state of the art). I don't see a comparison on the proposed multimodal dataset with a baseline, for example T2T or ByteNet?. Do I just have to believe that this is a good model?

    Aside from that, nice to see that someone actually makes the effort and deals with the logistics of working with multimodal data. Usually researchers stop at maximum three datasets and call it multimodal, it is great to see eight!

  • tw1010 2288 days ago
    One model to overfit it all
  • YeGoblynQueenne 2288 days ago
    For me, as a non-practitioner (I'm happy with my GOFAI, thank you), the big problem with neural nets is that there are too many architectures, each tailored to a specific problem. On the one hand it's great that there's a broad toolset, on the other hand there are so many competing claims about best-of-class performance that it's hard to know what is even the state of the art. There is too much noise, you know?

    So it'd be nice to see a result that reduced the noise a bit. I'm afraid this one doesn't fit the bill. It's not so much reducing architectural options, as piling even more architecture on top of the already sprawling mass of architecture. It's got architecture hanging from its architecture!

    I mean, come on- one component is a gated-mixture-of-experts, which is to say, a collective of feed-forward nets. There are just so many layers upon layers of choices to make for each type of network to use for each component of the entire model. How do you make these choices?

    Or, think about how long this specific architecture is going to give state-of-the art results (as claimed). It uses normalised ReLu convolution blocks- the height of fashion, at the moment, but what happens in four years from now, when nobody uses that anymore, because it's so 2010's?

    ANN research frustrates me like this. It's describing an art form, but I'm not sure that's really the most useful thing to do.

    • npatrick04 2288 days ago
      I get your frustration with architecture of architecture. However this kind of AOA has been going on for a long time in a similar domain of control theory.

      Model Predictive Control is a method of applying a model of a process to estimate it's state. Per Wikipedia, it's been used since the 80s. When you apply it to tracking something you aren't in control of, say an enemy plane, you need to use multiple models, and then pick the most probable solution.

      It's not elegant, but it's not actually a bad way to get good results.

    • michaelmior 2288 days ago
      I understand the frustration. I'm just getting into ANNs myself and the options are overwhelming. Trying to keep up with the state of the art is definitely a challenge. However, if you have a model that gives you the accuracy you need for whatever problem you're trying to solve, why not just stick with it? One of the nice things about using Neural nets these days is there's a lot of great software and infrastructure available for training and serving models.
    • ghthor 2288 days ago
      You should look at the models being produced by Jeff Hawkins and numenta. Their neuron is based on the neuron of our neocortex which are homogeneous. Theyve had great success using their ANN across different domains with very little to no differences in settings. The magic really is in the encoding of the information stream into the Sparse Ditributed Representation that best represents the features you're looking for.
  • aportnoy 2288 days ago
    One model to rule them all, one model to find them, One model to bring them all, and in the darkness bind them.
  • erikpukinskis 2288 days ago
    Legit question: if human intelligence is fundamentally limited by scale, why aren’t there some humans running around with heads twice as big and twice the neurons? If it’s such an advantage why hasn’t nature selected for it in at least one place on Earth?

    (My answer: above the scale of the human brain there’s diminishing returns on more neurons. Two individuals with their own volition are smarter than one individual with double the brain size. We’ll do a lot of research in AI to find that there aren’t any machine learning problems that can’t be run on a computer about the size and speed of a brain, or distributed among some number of independently operating brain sized computers, and that the problem never was scale but just where to focus)

    • joshmarlow 2288 days ago
      Here's my understanding/interpretation - if I get any details wrong, I hope someone will correct me!

      Brains are very expensive and, for our body size, we have very large brains; evolution had to build a brain that could operate within the calorie constraints imposed by our ecological niche - presumably a hunter-gatherer niche.

      If you graph adult body volume relative to gestation time across the placental mammals, there's a good correlation, but humans are an anomaly; for our body volume we should take more than 9 months. So we are kind of born prematurely - the reason? So our heads can fit through the birth-canal. Significantly bigger brains would have likely entailed altering our walking gate, thus potentially sacrificing our ecological niche (and guaranteed source of those much needed calories!). Most of this was taken from Pinker ([0]).

      Bostrom ([1]) brings up another interesting point - anything smarter than us would have probably taken longer to evolve than us. So it might not be inaccurate to think of ourselves as the dumbest species that could build an interplanetary species.

      My suspicion is that things much smarter than us are very possible; my reasoning is, it just seems so strange to me that the peak of feasible intelligence just so happens to correspond with what can be fed by a hunter-gatherer diet and fit through a birth-canal which fits into the constraints of an upright primate.

      It seems more likely to me that we're a local optimum - we're probably close to the optimal design for something as smart as us, in our ecological niche.

      [0] - https://en.wikipedia.org/wiki/How_the_Mind_Works [1] - https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dang...

      • tormeh 2288 days ago
        Very interesting! I happen to know of some other problems: Heat and cross-brain communication time.

        Bird brains are much smarter than we are per weight and volume. Not a little bit, but a lot. I think part of the reason they can do that is that they have much better cooling because their heads are small. Having a bigger brain than we do means that it would need to have a lower activity level because of cooling constraints.

        There's also diminishing returns from size because the interconnects from one point in the brain to another have longer latency. One response to this is to fold the brain, as we do. Regardless, at some point adding more volume stops making sense.

        • joshmarlow 2286 days ago
          I read once that bird brains (or some part of them) are more modular and consist of densely connected components - any idea/references about that?

          Also, this blog-post (and associated links) may interest you - http://www.rifters.com/crawl/?p=6116

      • erikpukinskis 2287 days ago
        The question isn't whether something could exist that's smarter than every human

        The question is whether something could exist that's smarter than every group of humans is

        Humans are social, so the social group, and in fact groups of social groups are the highest form of intelligence on the planet. That's the bar to beat, not an individual Homo Sapiens in a observation room.

      • YeGoblynQueenne 2288 days ago
        >> My suspicion is that things much smarter than us are very possible; my reasoning is, it just seems so strange to me that the peak of feasible intelligence just so happens to correspond with what can be fed by a hunter-gatherer diet and fit through a birth-canal which fits into the constraints of an upright primate.

        I think you're understimating the intelligence required to survive as a hunter-gatherer, individual or species, in a world that keeps wanting to kill you, where the number of things that want to make it easy for you to eat them is exactly 0 (not counting fruit, which is made to be eaten by birds, not humans) and the numer of things that want to eat you is ...very big.

        If you think about it, the amount of brainpower needed to survive as a primitive human in the savannah is exactly as much as required to develop modern technology. We got were we are today because our ancestors were smart enough to not be eaten by lions etc.

        Of course I'm not saying that smarter things than us aren't possible. There's an element of luck involved in evolution. Maybe we 're not the most powerful intelligence that could have evolved on Earth; but we're certainly the one that did.

    • omalleyt 2288 days ago
      I think there's a simple explanation for this.

      Increased head size == greater risk of death in childbirth

      Female hips have already evolved to be as large as possible while still being load-bearing.

      It would be interesting to see what the wide-spread use of Caesarean sections would do to the human brain, if we weren't all going to be transhuman within a few generations anyway

      As far as whether there are diminishing returns to increased intelligence within one individual, I'd say try taking two people with half of Einstein's IQ and asking them to explain General Relativity

      • erikpukinskis 2287 days ago
        I'm sure there have been individuals throughout time who had normal heads at birth, but whose heads grew more in childhood.
    • stewbrew 2288 days ago
      > why aren’t there some humans running around with heads twice as big and twice the neurons?

      Because humans don't live in an artificial bubble like ML models do but they always are already entangled in a network (aka society) of similar beings. The ultimate actor isn't the single individual but the group/aggregate/swarm/society. Given the current societal demands, the current brain size is sufficiently large. A single nerd with a double-sized brain would most likely not be able to reproduce in human societies as of today.

      We probably shouldn't overrate a single individual's capabilities.

    • YeGoblynQueenne 2288 days ago
      >> Legit question: if human intelligence is fundamentally limited by scale, why aren’t there some humans running around with heads twice as big and twice the neurons? If it’s such an advantage why hasn’t nature selected for it in at least one place on Earth?

      There's nothing saying there won't be such humans, one day. Maybe we'll eventually evolve to Homo Sapiens Megacephalus, with brains twice the size than now. Evolution is an ongoing thing, right?

      Btw, I don't think it's ever safe to compare machine learning with human intelligence. So far, it's clear that human learning is completely different to machine learning and human learning is only part of human cognition. It's not easy to use observations on the one to draw conclusions about the other.

    • Veedrac 2288 days ago
      History seems to suggest precisely the opposite hypothesis; namely that the larger your brain is the more benefit there is from further growth, at least with this line of brain architecture.

      http://aquatic-human-ancestor.org/anatomy/images/brain-size....

    • jfoutz 2288 days ago
      Does smarter imply more children survive to reproduce?
      • jiggunjer 2288 days ago
        I think it's more a plus for survival. But society has so much technology and excess resources that this pressure isn't affecting much.
    • EamonnMR 2288 days ago
      Giant heads have other disadvantages, having to be born being one major one.
    • marcosdumay 2288 days ago
      With all the problems that big heads carry, human heads have done nothing but grow since our ancestors started walking on two feet.

      I guess the only answer available for you is: evolution does not work that fast.

  • Animats 2288 days ago
    That's encouraging in two directions. Not only are they encoding different subject matter areas in the same net, they're using the same net for different subject matter areas. Standardized net designs may work for a variety of tasks.

    Progress marches on.

  • zbyte64 2288 days ago
    Reminds me allot of DRAGNN: https://arxiv.org/pdf/1703.04474.pdf

    Both are multi-task endeavors with novel encoding techniques.

  • brabel 2288 days ago
    One model, such a great idea! We could introduce a revolutionary concept for that... like, an Object! Which can be converted to lots of different "formats"! Like an image, or text (e.g. JSON and XML, really a revolution)!!! Why no one has thought about that?! /s
  • d--b 2288 days ago
    Seriously 'one model to learn them all'? Isn't that a tiny bit overreaching?

    The results are not very surprising after the Google translate post about multi language translation.

  • oneman 2288 days ago
    Superintelligence is hyperintergration of the metasystem.
  • dakomind 2288 days ago
    We already have modern math, ZF or ZFC set theory and model theory.

    What advantage does this offer?

    What disadvantage of ZF or ZFC does it address?

  • yahoojpree4 2288 days ago