AI and Compute


347 points | by gdb 306 days ago


  • dmreedy 306 days ago

    I'm going to draw up some charts about hull displacement on ships from the dawn of time up until about 1950. Then we can have a really informed conversation about the naval power of countries through the ages. I think we need to be ready for the implications of there one day being a battleship the size of the Pacific that will allow its owner to rule the world.

    Forgive the sarcasm, but I'm really put off by the aggressive weak-to-strong generalizations that are going on here. I'm also very excited about AI, but I don't understand how lines like,

    >> But at least within many current domains, more compute seems to lead predictably to better performance, and is often complementary to algorithmic advances.

    can be extrapolated to anything more than a fun conversation to have over drinks, or the plot for a bad sci-fi movie about AI (which, to be fair, are also quite prevalent in the current zeitgeist). We're definitely at a new tier of "kinds of problems computers can solve", but surely experience and history in this space should tell us that we need to expect massive, seemingly insurmountable plateaus before we see the next tier of growth, and that that next tier will be much more a matter of paradigm shift than of growth on a line.

    The systems on this graph all do different things in different ways. It's one thing to abstract over compute power via something like Moore's Law, or societal complexity via the Kardashev scale. But I think we need a much more nuanced set of metrics to provide any kind of insight in to the various AI techniques. Or an entirely different way of looking at 'intelligence'

    • blixt 306 days ago

      I'm not sure I'm reading the same thing you're reading. Yes, OpenAI has a mission to work on ways to make AI safer (that is, prevent AI from completing goals in unexpected, "hacky" ways) as it becomes more capable. But I didn't find much of that in this article.

      What I read from this article is that we're currently in a trend where the more compute you throw at machine learning, the better solutions you end up with. Nothing about general AI, it's just that you can train deeper, more complex neural networks that can handle broader and more complex versions of the problems they're designed to handle.

      This can have implications down the line which still have nothing to do about AI becoming smarter (in the general intelligence kind of way) than it is today. If all input/output problems out there have very deep neural networks behind them, and those neural networks are constantly training simultaneously, and that has a positive economic output, we'll see tremendous amounts of FLOPS that pales cryptocurrency miners by comparison.

      Just as an example, depending on how cloud solutions keep up, maybe startups on the cloud won't be so competitive anymore. It's an interesting trend to point out and keep track of. And yes, OpenAI's involvement will probably be to learn about how this can lead to unsafe use of AI, but again that's not what I saw the topic of this article being.

      • stochastic_monk 306 days ago

        I think the best analogy is that ultrascale deep learning (if it’s not a term yet, it soon will be) is that of a particle accelerator: you throw as much firepower as possible at a task and get continued benefit.

        • e_modad 306 days ago

          Hyperscale is a term already right? I don’t know a lot about HPC, but I think it’s already part of the nomenclature.

      • red75prime 305 days ago

        > next tier will be much more a matter of paradigm shift than of growth on a line.

        There are a limited number of paradigm shifts before AIs get "general" in their name. So "it already happened before" argument has a natural end. What makes you think it is still applicable?

        Some organizations have access to a computational power reaching into the realm of a computational power of a human brain. We see commercial applications of systems which aren't programmed by hand like expert systems and which don't fail miserably like speech recognition engines of yore. How often do you hear about the curse of dimensionality today?

        Some things have changed from then to now, the question is are they changed enough.

        • erikpukinskis 305 days ago

          > Some organizations have access to a computational power reaching into the realm of a computational power of a human brain

          From a physics standpoint it’s not equivalent power actually, it’s equivalent energy.

          Power is work per unit time. Work is energy expended which causes displacement.

          These “brain-equivalent” computers can’t do nearly the work of a human brain in the amount of time. They use about the same amount of raw computational energy, but they don’t make nearly the same amount of structures waves in the information spaces around them. Their output can only even be seen in the absolute quietest of environments. They often run for long periods of time with no obviously informative output.

          Human minds tuned for it are essentially incapable of not producing a constant stream of novel and disruptive insights (work) which is how you get a large computational power given a roughly constant computational energy.

        • dbelchamber 306 days ago

          I completely agree. Current AI is excellent (or at least super-human) at learning to do anything where the mechanics of the situation are clear and where the measurement of success is well defined. Beyond that, I'm not sure we've made any convincing strides towards anything truly general.

          • wyattpeak 306 days ago

            While I sort of intuitively agree with you, I think there's a certain amount of circularity in the argument - any problem will seem to have well defined metrics once you've built and studied a machine which can solve it.

            In the sixties, chess was thought to be a problem that required intelligence since it couldn't be brute-forced. We now have machines which can play it well, without brute-forcing, and yet it's seen as entirely procedural.

            • goatlover 306 days ago

              AI people thought chess was a problem that required intelligence. Critics back in the 60s such as Dreyfus probably didn't view chess as the hallmark of intelligence.

              • erikpukinskis 305 days ago

                AGI is the hypothesis that someday the number viable human cons with well-defined metrics will dwarf the ones without.

              • Veedrac 306 days ago

                The mechanics of speech synthesis are clear? The measure of success of style transfer is well defined?

                • dbelchamber 306 days ago

                  I would say speech synthesis is governed by very clear mechanics.

                  As for style transfer, that is a very specific skill of making the patterns of one style map to the patterns of another. I am not particularly well versed with art, but that process seems well defined to me.

                  Perhaps your issue is with my more generalized definition of "clear" and "well-defined". I meant to use these terms to distinguish between autonomous driving and being a successful human. I really don't think there is anywhere close to a consensus on the latter. To the extent that there is, then yes, AI should be able to do it.

                  • Veedrac 306 days ago

                    The IPA is nowhere close to sufficient for realistic speech synthesis, and style transfer is not just copy and paste. By the same token writing poetry is just "putting words into grammatical constructions that have certain patterns" or mathematics research is just "a form of proof search".

                    Of course we don't have human-level AI right now, but if that's the only thing you're claiming it's pretty vacuous.

                    • dnautics 306 days ago

                      I would say speech synthesis is governed by clear mechanics - and it's not the IPA, it's that the output comes out as a waveform, which has a structure that informs the algorithm.

                      Note that we have great raster-based deep visual effects, but vector is... not there yet (not saying it won't be) - vector is less structured than raster, so the choice of algorithm is less obvious.

                      As for well-defined criteria, I don't think that's really quite the right standard, I think the correct standard is that there is a way of metrizing success on a well-ordered set (like the [0,1] interval), even if it's noisy.

                  • rimliu 306 days ago
                    • 306 days ago
                      • ddtaylor 306 days ago

                        I fiddled with an idea where I wrote unit tests and used them for a scoring function to train a model. Writing the number of tests and encoding the logic for a simple Linked List took orders of magnitude more code than coding the list itself.

                    • nunya213 306 days ago

                      But you can easily calculate why its impossible to build a battleship beyond a certain size and understand why that trend is silly. I challenge you to do something similar with compute and AI. Not that I necessarily disagree with your thesis of better computer != better AI.

                      • daveguy 305 days ago

                        I'd like to see those easy calculations of why it's impossible to build a battleship beyond a certain size. I mean, if you put guns instead of planes on a modern aircraft carrier you'd have a battleship that's a hell of a lot bigger than any battleships in WWII.

                        The hull size argument is apt. I think it's a lot more obvious that throwing more compute power at a neural networks isn't going to eventually make a general intelligence.

                        At the very least there will need to be significant breakthroughs in architecture design that will be at least as paradigm shifty as deep learning.

                        • entropie 305 days ago

                          > I mean, if you put guns instead of planes on a modern aircraft carrier you'd have a battleship that's a hell of a lot bigger than any battleships in WWII.

                          This is not true. The biggest battleship bought was the yamato

                          > Length: > 256 m (839 ft 11 in) (waterline) > 263 m (862 ft 10 in) (overall)

                          The biggest CV (aircraft carrier) right now should be the Nimitz class with

                          >Length: > 317m

                          which seems to emphasize your statement, but, battleships are not versatile and battleship development pretty much stopped after WWII. The germans had propsals of the H-class battleship, with a total length of the H-44 of 345m.

                          • daveguy 305 days ago

                            I wasn't saying that battleships are a good strategic choice to develop, just that you can't "easily calculate why its impossible to build a battleship beyond a certain size".

                            It is possible, it's just not strategically advantageous.

                            Just like throwing more compute at a neutral network isn't going to make general AI. The diminishing returns comes with how brittle their learning and representations are.

                            • nunya213 305 days ago

                              I'm not a MechE but I don't see why it would be hard to show that past a certain size steel can no longer support the structure of a ship.

                          • roenxi 305 days ago

                            We can estimate the computational power of a human brain. If we throw an order of magnitude more compute than that at a neural network and train it for 20 years, why wouldn't that give us a general intelligence?

                            The issue with physical structures is that eventually the mass and stresses in the macro structure overcome the strength in the micro-structure. That is why nature stops at, eg, elephants.

                            It isn't obvious that intelligence suffers from such limits, as the only time the limit of intelligence has been tested (in terms of evolution) was when humans tried it. There is no evidence humans are pushing the limits of what intelligence can achieve. Quite the reverse, honestly, when you look at the performance of computers so far.

                            • daveguy 305 days ago

                              >If we throw an order of magnitude more compute than that at a neural network and train it for 20 years, why wouldn't that give us a general intelligence?

                              It isn't just the computer power that allows us to be intelligent. It's also the bandwidth (which is always an order of magnitude or more behind the computer power) and the algorithm (which we don't have a clue how to create).

                              State of the art visual processing that gets so touted in the press is brittle -- it has to see very similar examples or it will fail. Neural networks don't transfer to new problem domains well at all.

                              Neural Networks have no sense of self or agency and they never will. There are key parts missing (like the ability to experiment with the environment). I'm not saying we will never have general intelligence, just that it's quite a ways away and the algorithms will be significantly different to the neural networks we use today. That said there will probably be many recognizeable components, like backpropagation, recurrent nodes, bayesian estimations, etc.

                              • crististm 305 days ago

                                Because you are missing the underlying structure that a human brain has a priory! You throw 10x more power to a problem hoping that empty matrices will magically converge to a brain or something better.

                          • CoolGuySteve 306 days ago

                            But isn't that kind of what happened with the Nimitz class carrier and the range of it's aircraft? One country doninating all other militaries via their navy?

                            • dantheman0207 306 days ago

                              It is what happened, but it took a paradigm shift (aircraft carriers) not an increase in computational (fire-)power.

                              The Japanese actually invested heavily in the biggest battleship in the world. It was soundly defeated at sea by an inferior force with an aircraft carrier at Leyte Gulf during World War II.

                            • sgt101 305 days ago

                              I thought that what happened was that the USA was the only major industrialised power to emerge from WW2 with a functional government and economy, and consequently enjoyed 20+ years of unprecedented economic dominance. Subsequently it built whatever military assets it damn well pleased.

                              I would argue that if you want to identify the mechanism of military dominance that the US has used to assert itself in the world you should look to Trident and the Ohio class.

                            • efangs 306 days ago

                              This plot essentially ignores all other computational science. People in HPC have been operating at these scales for awhile, and yet don't make claims about their field taking over the world.

                              • 306 days ago
                              • zach 306 days ago

                                Looking at the trend here, you can see why many business forecasters and economists have predicted that advances in artificial intelligence will create huge new returns to capital. That future is worth reflecting on because it suggests a fundamental change of labor-capital dymamics.

                                Take startups. Right now, many startups can compete on the same basis to hire talent as huge companies. But if companies with huge capital reserves can put their cash directly to work to train AI models, startups will be hard-pressed to compete with "smarter" products. Specialization will not even be much help.

                                Looking at Beating the Averages (, PG enthused that, since established companies are so behind the curve on software development technology, there is always a chance for higher-productivity techniques like more productive languages to give smaller teams a real chance at a huge market. Of course, that this was in the era when Google was not creating new programming languages and there were no Facebook to widely deploy OCaml and Haskell. And now, AI looks to make the averages even harder to beat.

                                Even today, if you round up the smartest members of a CS grad class, it is going to be quite difficult to directly compete with a machine learning model with access to huge amounts of data and computing resources. Looking further forwards, if machine learning is able to provide "good enough" alternatives to most human-created software, the software startup narrative — that a few talented and determined people can beat billions in resources — may not even be so relevant anymore.

                                • ddtaylor 306 days ago

                                  It's worth noting that some prominent figures in AI/ML are saying we are due for another "AI winter" since it's being oversold again. I don't know if I agree with that, since we are seeing some interesting things, but technically Google is kind of saying they can tentatively pass the Turing Test with phones and meanwhile even a car decked out with extra sensors and 360 LIDAR cannot detect a simple stop sign with mud on it.

                                  • aglionby 306 days ago

                                    > Google is kind of saying they can tentatively pass the Turing Test with phones

                                    This is quite a bold claim, and one I'm not sure they're making. Their promo material suggests that it's limited to quite well-defined domains where conversations aren't really that open-ended, and we haven't seen how it'll perform in the real world.

                                    Relatedly, I don't think headlines like "Google Duplex beat the Turing test: Are we doomed?" [0] are helpful at all. It's disappointingly low-effort clickbait where instead there's plenty of interesting discussion to be had (should machines have to identify themselves as such? What about their use of pauses and fillers?).


                                    • computerex 306 days ago

                                      Right. I personally think the coolest thing about duplex is the end-to-end synthesis of natural speech. The actual call isn't as impressive to me because that's just handed coded stuff. IBM Watson has already had success in this regard.

                                      • 306 days ago
                                      • ddtaylor 306 days ago

                                        They aren't explicitly making the claim, but it seems the premise of their demo was "hey look humans think it's another humans which is somewhat like the Turing Test.

                                      • acdha 306 days ago

                                        > Google is kind of saying they can tentatively pass the Turing Test with phones

                                        Is Google really saying that or just the more breathless commenters? I thought they were pretty good at making it clear that Duplex took a lot of work to do well in very constrained conversational situations.

                                        • ddtaylor 306 days ago

                                          Well, during the original AI Winter many were open and honest about the capabilities of early ML and it's limits, but what caused the winter itself was it's perception by a large audience as a magic bullet and their disappointment when it didn't work.

                                        • nl 306 days ago

                                          Sorry if this sounds harsh, but this is a bad comment.

                                          some prominent figures in AI/ML are saying we are due for another "AI winter" since it's being oversold again.

                                          "Some say...". Name one.

                                          We may have a Gartner style "trough of disillusionment", but a 1990's style AI Winter is unlikely. It works too well in too many valuable areas for the money to go away.

                                          technically Google is kind of saying they can tentatively pass the Turing Test with phones

                                          Could you show us where they claim that? That goes well beyond any statement I've heard Google make, and into the kinds of breathless claims click-bait blogs have tried to make.

                                          car decked out with extra sensors and 360 LIDAR cannot detect a simple stop sign with mud on it

                                          Do you have a specific example of that? I did Google, and I couldn't fine anything.

                                          Most examples I've seen handle occulted road signs pretty well. There are of course adversarial examples which are an interesting case, but mud causing a failure like this is surprising to me.

                                          • 306 days ago
                                            • romaniv 305 days ago

                                              >I don't know if I agree with that, since we are seeing some interesting things

                                              There were plenty of interesting results in AI research before the last two AI winters.

                                            • goatlover 306 days ago

                                              > if machine learning is able to provide "good enough" alternatives to most human-created software, the software startup narrative — that a few talented and determined people can beat billions in resources — may not even be so relevant anymore.

                                              Are there any examples where current ML has replaced human-created software, the demand for startups or software engineers?

                                              Seems to me that ML so far has expanded our toolbox of what can be done with software, not replaced programmers, designers, engineers, or really much of anybody yet. All this worry about future automation is imagining that things are going to be different this time, because of recent success with ML in limited domains.

                                              • zach 305 days ago

                                                Seems trivial, but I have met someone whose ardent hobby was programming go-playing programs. With the creation of AlphaGo Zero, one could say all the clever code ever written by humans for the purpose of playing go is obsolete.

                                                More relevantly, I would be surprised if the shift to AI techniques in fraud detection at places like PayPal is not already having an impact on the career paths of the engineers that were tasked with maintaining and tuning their pre-ML fraud system. At one point the top engineers of the original heuristic system could have been considered their most valuable non-management employees at the company. I'm sure they're not out on the streets or anything, but I also assume the next person to take their job will not be nearly as valued.

                                                Also, ML will impact programmer demand in subtle ways. A lot of programming is refactoring, and there is reason to believe we can refactor code, especially in certain languages, automatically to make it more aesthetic. Realistically, that seems likely to decrease demand for programmer hours. Or an ML system that can run over someone's GitHub account or repo may be the new resume screen, and if one scores badly on it that may limit the demand for them personally.

                                                Finally, I have to think that the overall march of software towards more complex integrated systems is already a major cause of the dearth of entry-level programming positions, and ML will accelerate that trend.

                                              • allenleein 306 days ago

                                                I believe we still have the chance. The opportunity exists in the business world because when companies become successful is when collective leadership becomes most focused on maintaining success.

                                                Doing well reduces the incentive to explore other ideas, especially when those ideas conflict with your proven business model.

                                              • ClassAndBurn 306 days ago

                                                That is a staggering rate of increase. I can see a future where this is less centralized; learning could happen in "phases" where a local device improves its model given local data and reports back something centrally that can be combined and used to train a shared model.

                                                This requires hardware to be miniaturized as non-ML compute has been and when that does happen we'll have the learnings from the current edge computing push. In the mean time I've excited to see what developments are made on both the hardware and software side.

                                              • westoncb 306 days ago

                                                Would someone explain the purpose/origin of using 'compute' as a noun like this instead of a verb?

                                                • dahart 306 days ago

                                                  I don't know the origin specifically, but it's been happening for some time (~decades) in GPU & graphics circles.

                                                  We've had 'compute shaders' e.g.,

                                                  The purpose from this perspective has been to differentiate general purpose computation on GPUs from fixed-function pipelines and/or graphics-specific functionality. The history of using GPUs for general purpose computation involved a lot of hacking to abuse hardware designed for rasterization to do other kinds of calculations.

                                                  One keyword / search term you can use is "GPGPU" (which stands for general purpose GPU). Here's another article which might shed more light on the history:

                                                  * Also found this possibly relevant note: "When it was first introduced by Nvidia, the name CUDA was an acronym for Compute Unified Device Architecture" (

                                                  • westoncb 306 days ago

                                                    That's an interesting example of usage. I was actually familiar with compute shaders but hadn't connected it with the sort of usage we see in the headline.

                                                    So it seems like a big part of how it's being used is to refer to a generalized computation service—some 'function' you're given access to which takes arbitrary programs as a parameter.

                                                    Seems like there's often the implication that how the computation is performed is abstracted over and that more or fewer resources could be applied to it—though that's not necessarily there (absent in the case of compute shaders for instance).

                                                  • tshadley 306 days ago

                                                    Archived 2012 discussion invokes Oxford English Dictionary to trace the original use back several 100 years.

                                                    • mannykannot 306 days ago

                                                      In those examples, however, the meaning is, in current usage, 'calculation' or 'computation', not as a measure of computational work.

                                                      • tshadley 306 days ago

                                                        > In those examples, however, the meaning is, in current usage, 'calculation' or 'computation', not as a measure of computational work.

                                                        So is the OP:

                                                        "We’re releasing an analysis showing that since 2012, the amount of compute [amount of calculation, amount of computation] used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time "

                                                        Without loss of meaning, title could be AI and Calculation, or AI and Computation

                                                        • mannykannot 306 days ago

                                                          To be clearer, I should have written 'a calculation' and 'a computation', as in the examples of the older usage, 'compute' is a singular noun, referring to a specific calculation, and the acceptability of substituting, under current usage, 'compute' for 'computation' here (where the latter is shorthand for computational work or effort, rather than referring to a specific calculation), has no bearing on the usage five centuries ago - and nor does the usage five centuries ago have any particular relevance for today, given how much computation has changed since then.

                                                          • igravious 305 days ago

                                                            I think compute stands in for computing power so `amount of compute' to me means `amount of computing power'. If I were to use the terms calculation or computation I'd pluralise them so you'd get `amount of calculations' and `amount of computations' from `amount of compute'.

                                                      • GuiA 306 days ago

                                                        I think a lot of people in the industry got that word in their vocabulary from its usage in “Amazon EC2” (Elastic Cloud Compute). It’s certainly been used before, but that was one of the first times I remember hearing it in that context.

                                                        • brootstrap 306 days ago

                                                          I dont know the origin but some of my bosses and managers like to use compute to sound cool and fancy... 'We need to know the execution time and compute cost for this job'. Good luck tracking the 'compute' of our 'job' that uses like 10 AWS tools. ec2, rds, cloudwatch, s3.

                                                          • 306 days ago
                                                            • 306 days ago
                                                              • mr_toad 306 days ago

                                                                ‘Elastic compute’ (i.e. EC2) goes back to 2006.

                                                              • mooneater 306 days ago

                                                                This implies more centralization, as those with cheap access to vast compute gain a bigger relative edge.

                                                                • sgillen 306 days ago

                                                                  Yes, unfortunately both data and compute will probably become more and more centralized. At least the algorithmic components have a chance to becone available to everyone.

                                                                  • samstave 306 days ago

                                                                    Here is an off the cuff thought: What if there was (or maybe there already is?) a system which is distributed such as was SETI back in the day and its a massively distributed general AI that can be used - and people on a mass scale allow for slices of their compute to be part of the system?

                                                                    • PeterisP 306 days ago

                                                                      The current machine learning paradigms are hard to distribute efficiently - they can be parallelized but require significant ongoing communication between nodes, you can't really split a problem into separate subtasks and merge them only in the end, you need to transfer all the calculated values after, say, each iteration.

                                                                      I'd guess that having ten extra machines in the same rack would be more valuable than a thousand remote machines with limited network bandwidth.

                                                                      • TeMPOraL 306 days ago

                                                                        > its a massively distributed general AI

                                                                        You've just described a centralized system.

                                                                        Centralization can happen at different layers - not all technical. The ultimate centralization is ownership, as defined legally.

                                                                        • hodgesrm 305 days ago

                                                                          One problem with the SETI model is that a lot of the increase in available compute capacity is due to specialized GPU or TPU processors, which aren't widely available outside of purpose-built data centers. Trying to offload ML workloads to general purpose CPUs would likely be quite wasteful in terms of power consumption unless you can somehow get access to graphics processors.

                                                                          • tlrobinson 306 days ago

                                                                            So Skynet, but it lets you rent parts of itself? I'm sure someone is writing an ICO whitepaper for this right now, if they haven't already...

                                                                            • nostrademons 306 days ago


                                                                              Raised $2.7B in its ICO, currently trading at a market cap of $10B.


                                                                              Raised $257M in its ICO.


                                                                              Raised $232M in its ICO.

                                                                              Those are the 3 largest ICOs of all time, so yes, there is definitely a market for renting part of Skynet.

                                                                              The actual technology may or may not be vaporware or a scam. IMHO the way you build a decentralized P2P system is to give a single really smart programmer enough to live on for a couple years and see what he comes up with, not throw a billion dollars at a Cayman Islands corporation that may or may not use it for anything productive. Sorta like what Ethereum did.

                                                                              • tlrobinson 306 days ago

                                                                                Are any of those platforms suitable for running deep learning algorithms?

                                                                                I think Golem is closer And some others:

                                                                                But I'm skeptical of distributed computing blockchains. I think a) it's unlikely a distributed compute network can compete with highly optimized datacenters running TPUs or whatever, b) people are unlikely to trust distributed compute networks with their proprietary data (maybe acceptable for CGI rendering and some other specific use-cases)

                                                                        • vokep 306 days ago

                                                                          This is why I am working on raspberry pi based neural net things

                                                                          We have learned a lot using big computing which can still inform better efficiency of AI on smaller computing units. Raspi is pretty good because it is quite limiting, but also quite capable.

                                                                          • tlrobinson 306 days ago

                                                                            Is it really worth training on a Raspberry Pi? It seems vastly underpowered compared to even modest desktop hardware.

                                                                            • mc808 306 days ago

                                                                              It's worth having a self-contained option available. For something like a drone, you don't need all that much computation for on-the-fly PID tuning in response to changing weather or different piloting styles, etc.

                                                                              • chillydawg 306 days ago

                                                                                You've hooked a NN up to a PID? How's that going? It's hard enough tuning those things by hand using squishy human brain networks.

                                                                                • mc808 306 days ago

                                                                                  I haven't done it myself yet, it's on the to-do list. There is a lot of academic material on PID autotuning, not always with neural networks but that seems the most straightforward way. A Raspberry Pi is probably overkill for the job, actually.

                                                                                • tlrobinson 306 days ago

                                                                                  Inference on embedded hardware makes sense, but training no so much.

                                                                          • tehsauce 306 days ago

                                                                            I think it's important to notice that if we're using the metric of "300,000x" increase in computing power applied to ML models, the giant increase has mostly been due to parallel computing playing catchup on decades of moores law all at once. It will hit a wall and die with moores law fairly soon. Physics requires it.

                                                                            • spunker540 306 days ago

                                                                              How is parallelism limited by physics?

                                                                              I thought the point of parallelism is you can throw more chips at a problem and see improved performance. Single chips are limited by physics, but true parallelism scales linearly ad infinitum.

                                                                              Can anyone with more knowledge than me speak to known limits of parallelism? I’d guess it’s not truly infinitely scalable.

                                                                              • ychen306 306 days ago

                                                                                You can't scale linearly ad infinitum because eventually the communication (i.e. memory) cost gets too high.

                                                                                This reminds me of a thought experiment I heard from -- if memory serves -- Scott Aaronson. The gist is that the fastest super-computer will be on the edge of a black hole. If you run any faster, there will be too much energy concentrated on a given area, thus creating a black hole. Similarly, when you run so many parallel devices (on GPU, CPU, etc) together, you will want to put the devices as close to each other as possible (speed of light limits the rate of communication). You then pump too much heat into a small area, and getting so much heat out is, among other things, a physics problem.

                                                                                • red75prime 306 days ago

                                                                                  That's a very far limit, though. It will not have practical consequences for a long time.

                                                                                  Also, if you don't squeeze as much as you can into a small space, you can scale sublinearly ad infinitum (in practical terms, which don't include heat death of the universe).

                                                                                • sullyj3 306 days ago

                                                                                  If you built a computer with a squillion chips that was a light-year long, it would take a year at minimum to get a message from one side of the computer to the other. The same issue applies on a smaller scale for smaller computers

                                                                                  • sheeshkebab 306 days ago

                                                                                    Parent is probably referring to amdahls law - which limits speedup in parallel computing systems

                                                                                    • chas 306 days ago

                                                                                      That doesn’t really apply in this case though because the major thing people are using the increase in parallelism for is running larger computations or more parallel computations of the same size, rather than trying to run the same computation in less time.

                                                                                • nutanc 306 days ago

                                                                                  Though this talks about current trends, I would place my bets on a more radical future where the current algorithms for AI are overhauled and we get much better and faster algorithms which can even work on generic CPUs.

                                                                                  • cmarschner 305 days ago

                                                                                    Cherry-picking a few papers doesn‘t tell anything. If at all it shows what people have achieved who pushed the envelope to the extreme, mostly at Google where people can afford to not care about cost. 99.9% of the work is done using small numbers of GPUs, and that hasn‘t changed much in recent years, except for the improvements in GPU architectures. Draw this graph and you get a very different story.

                                                                                    • forapurpose 306 days ago

                                                                                      > Three factors drive the advance of AI: algorithmic innovation, data (which can be either supervised data or interactive environments), and the amount of compute available for training. Algorithmic innovation and data are difficult to track ...

                                                                                      Are algorithmic innovations and improvements in data so difficult to track? Could they be measured by the cost of certain outputs? Or is it that the information about algorithms and data is not easily accessible?

                                                                                      • jfaucett 306 days ago

                                                                                        > On the other hand, cost will eventually limit the parallelism side of the trend and physics will limit the chip efficiency side.

                                                                                        Anyone working on chip architecture care to give their opinion on the next 10-20 years in chip design? It would really interest me to know if chip designers think Moore's law will continue, since that is probably going to be a big factor in the timeline for AGI.

                                                                                        • deepnotderp 306 days ago

                                                                                          Not gonna predict the future 1-2 decades out, since that's a fool's errand, but here's a grab bag of relevant points:

                                                                                          1. Moore's Law is undoubtedly slowing, but in the foreseeable future, it will likely continue. On the other hand, Dennnard Scaling which is already basically dead, will be the crunch you will likely feel more. Exponential transistors aren't too useful if they still consume so much power. To mitigate leakage we moved to FinFETs... Which actually made dynamic power worse.

                                                                                          2. You might be interested to know that data movement (predominantly memory access) costs orders of magnitude more than computation, especially relevant to AI compute which requires large amounts of access. These global wires already suck and don't seem to be getting any better in the foreseeable future.

                                                                                          3. Foundries have already been using (and thus expending) "scaling boosters" to reach their density goals. Most of these are one-time use effects that won't provide significant continuous scaling capability.

                                                                                          • p1esk 306 days ago

                                                                                            Analog computing has a lot of yet unrealized potential for machine learning algorithms.

                                                                                            However, currently it does not make sense to build a specialized analog chip to run specific type of ML algorithms, because algorithms are still being actively developed. I don't see GPUs being replaced by ASICs any time soon. And before you point to something like Google's TPU, the line between such ASICs and latest GPUs such as V100 is blurred.

                                                                                            • sanxiyn 306 days ago

                                                                                              I define GPU as something that can efficiently implement DirectX. Hence TPU is not GPU. And I predict ML algorithms will run on non-GPU, soon-ish.

                                                                                              • deepnotderp 306 days ago

                                                                                                Please explain where analog computation has a benefit over digital that outweighs its numerous disadvantages.

                                                                                                • p1esk 306 days ago

                                                                                                  Wait, aren’t you working on analog chips?

                                                                                                  • deepnotderp 306 days ago


                                                                                                    You may have confused me with the Isocline/Mythic guys or a red herring comment. Our approach to deep learning chips is very public and amongst the craziest...A̶n̶d̶ ̶e̶v̶e̶n̶ ̶I̶ ̶w̶o̶u̶l̶d̶n̶'̶t̶ ̶t̶o̶u̶c̶h̶ ̶a̶n̶a̶l̶o̶g̶ ̶c̶o̶m̶p̶u̶t̶a̶t̶i̶o̶n̶

                                                                                                    To clarify: I'm always open to opposing evidence, but based on the data at the moment, I believe that analog computing buys you very little.

                                                                                                    • p1esk 305 days ago

                                                                                                      I'm sure you know both cons and pros of analog computing. As long as you can significantly improve digital tech every year, keep doing that. But as soon as that stops, or becomes too expensive, analog is the way forward.

                                                                                                      • deepnotderp 305 days ago

                                                                                                        Again, what advantage does analog have?

                                                                                                        People seem to assume that analog intrinsically consumes less power, which due to bias and leakage currents isn't true in the general case.

                                                                                            • itchyjunk 306 days ago

                                                                                              So for research, would using some standard petaflops/s-days when presenting results be useful? Like model x might be 1% more accurate then model Y but for same baseline petaflops/s-day, how does x and y perform? I'm guessing it might not make sense for all types of research though.

                                                                                              • svantana 306 days ago

                                                                                                Dawnbench [1] is such an effort (you will need to work out the petaflops yourself from time x performance, but it lists cloud computing cost which probably is more relevant), and MLPerf is an upcoming one [2].

                                                                                                [1] [2]

                                                                                                • alfalfasprout 306 days ago

                                                                                                  OpenAI and the other research labs (FAIR, Google Brain, MS Research) are heavily focused on image and speech models, but the reality is the vast majority of models deployed in industry don't need DL and benefit more from intelligent feature engineering and simpler models with good hyperparameter tuning. It's definitely the exception that more compute automatically yields more performance.

                                                                                              • petters 306 days ago

                                                                                                It's not wrong, but the unit "petaflop/s-day" made me smile.

                                                                                                • kbob 306 days ago

                                                                                                  1 petaFlO/sec × day = 86400 petaFlO = 8.64e19 FlO.

                                                                                                • 306 days ago
                                                                                                  • forcer 306 days ago

                                                                                                    I don't get it. how does OpenAI knows how many resouces are thrown at AI calculations worldwide?

                                                                                                    • visarga 306 days ago

                                                                                                      They are reporting only on a few well known papers. They don't know what people are doing in secret.

                                                                                                    • tzahola 306 days ago

                                                                                                      For some reason the word “compute” in this context causes me to throw up in my mouth.

                                                                                                      It used to be that only “coding” could elicit this reaction - nevertheless I’m quite fascinated by this new development.

                                                                                                      • calibas 306 days ago

                                                                                                        I support harsh penalties on anyone who tries to noun a verb.

                                                                                                        • dahart 306 days ago

                                                                                                          Verbing nouns and nouning verbs is probably as old as verbs and nouns.

                                                                                                          These words are all nouned verbs:

                                                                                                          Chair, cup, divorce, drink, dress, fool, host, intern, lure, mail, medal, merge, model, mutter, pepper, salt, ship, sleep, strike, style, train, voice.

                                                                                                          (according to this, anyway:

                                                                                                          Shakespeare verbed nouns.

                                                                                                          "Compute" as a noun is at least 20 years old, according to my memory, and there are several high profile products named this way that are more than 10 years old.

                                                                                                          • bstamour 306 days ago

                                                                                                            Verbing weirds language. Respect your parts of speech!

                                                                                                            • jamesblonde 306 days ago

                                                                                                              No noun is too proper to verbify :)

                                                                                                          • lainga 306 days ago

                                                                                                            Really? The article says "total compute" in the first paragraph and "total computation" in the second. Noun your verbs or don't noun them at all.

                                                                                                            • gdb 306 days ago

                                                                                                              Updated the post :).

                                                                                                              • 306 days ago
                                                                                                                • make3 306 days ago

                                                                                                                  that you care so much about things so unimportant makes me think you should probably not be on HN, but taking care of whatever is making you so cranky

                                                                                                                • sandover 306 days ago

                                                                                                                  It's machine learning. It's not AI. Please, all, let's try hard to use words that mean what they mean.

                                                                                                                  • blixt 306 days ago

                                                                                                                    I think that ship has sailed. The term "AI" for any behavior by a machine that changes based on input has been in use for at least over 60 years now. Whether it's the ghosts in Pacman, or a disembodied voice that tells you the weather and plays music when you ask it to.

                                                                                                                    • MarkMMullin 306 days ago

                                                                                                                      We should do our best to get it back into port - part of the whole mess is that the name AI implies things about ML systems that simply aren't true - as a side note, we should also probably start using the word tensor more accurately, we've now enraged enough physics and math folks :-)

                                                                                                                      • blixt 306 days ago

                                                                                                                        The entire English dictionary has evolved into its current state, and there's several words that used to have the opposite meaning just from stubborn ironic use by the masses. As much as I like to be correct about my use of words, I think AI has established itself as a term that will stick around for now.

                                                                                                                        Besides, I really don't think all the stigma comes from the term "artificial intelligence". You don't have to ever mention the term to a child interacting with Alexa, they will nevertheless greatly overestimate "her" ability. I think because of the anthropomorphic nature of their interactions, and the black box implementation that prevents you from knowing the boundaries of what is possible.

                                                                                                                        This something that video game characters have played on since their conception, to make humans imagine much more complex intents and thoughts behind their "stupid" hard coded behaviors. I'm okay with calling it AI even if it's not even close to on par with human intelligence. :)

                                                                                                                        • 306 days ago
                                                                                                                        • sullyj3 306 days ago

                                                                                                                          How is the word tensor misused? I thought it was just an n-dimensional array of numbers?

                                                                                                                          • chas 306 days ago

                                                                                                                            Similar to how linear transforms can be represented as 2-dimensional arrays of numbers (that is to say matrices)[0], tensors are a higher dimensional analogue with a rich theory in their own right and a representation as higher-dimensional arrays of numbers. Similarly, if you look at a tensor solely as an n-dimensional array of numbers, it ignores important differences in the mathematical behavior of objects with the same representation. To give an example: Different parts of a tensor can behave differently under change of basis. [1]



                                                                                                                    • MikkoFinell 306 days ago

                                                                                                                      Is AI defined as "mysterious future thinking computer"? Anything we figure out how to do seems to suddenly fall outside of the definition.

                                                                                                                      • visarga 306 days ago

                                                                                                                        It's the magically shrinking "AI of the gaps". "AI" covers only things we can't yet do in ML.


                                                                                                                        • goatlover 306 days ago

                                                                                                                          Or things involving general intelligence, like Data on Star Trek, which is what people tend to think of when the term AI is used.

                                                                                                                          • sanxiyn 306 days ago

                                                                                                                            We can use the term AGI for things involving general intelligence.

                                                                                                                      • dahart 306 days ago

                                                                                                                        Some of the fun of language is the volume of usage of a name or phrase causes it to become correct.

                                                                                                                      • tw1010 306 days ago

                                                                                                                        To me this just smells like there's some hidden force – not necessarily nefarious but definitely with the power to incentivize an exaggerated lens – pushing OpenAI to make these claims. Maybe it's the desire to keep AI in the limelight as the buzz is fading slightly. Maybe it is SV echo chamber effects, or investors, or a strategy to build hype in order to attract talent to the company. But to me, on a gut level, it doesn't feel completely ethically pure.

                                                                                                                        • damodei 306 days ago

                                                                                                                          I'm the lead author, and I can only speak for myself, but what drove me to spend a lot of time on this post is a sense of caution. I think AI is likely to have amazing positive implications for society, but it also has negative implications, and if it advances faster than expected, we're going to have to be very alert to properly deal with those negative implications.

                                                                                                                          The facts about hardware are hard numbers and difficult to argue with, at least in order-of-magnitude. I agree the implications for AI progress are very open to interpretation (and we acknowledge this in the post), but caution means we should think carefully about the case where the implications are big.

                                                                                                                        • gwern 306 days ago

                                                                                                                          Huh? What are you talking about? The escalating compute involved in DL is obvious to anyone reading the papers; OA is just doing the work of putting numbers on the trend.

                                                                                                                          • PeterisP 306 days ago

                                                                                                                            There's lots of research on doing the same learning with much less resources (e.g. recent paper , or the example visible in this very article of AlphaZero having much, much less compute than AlphaGoZero and doing better anyways), and even without that simple hardware progress means that random gaming GPUs can handle datasets that were inconvenient a few years ago.

                                                                                                                            I'd say it all depends on the size of datasets - some domains (e.g. unlabeled image data) have "effectively infinite" datasets where the amount of data you can use is limited only by your computing power, but in many other use cases all the data you'll ever get can be processed by a single beefy workstation.

                                                                                                                            More available compute means that we tackle more difficult problems. However, for any single given task it's often not the case that the amount of compute grows. If anything, the graph is not showing the compute required for DL, but the compute available for DL - it gets used simply because it's there.

                                                                                                                            • gwern 306 days ago

                                                                                                                              > There's lots of research on doing the same learning with much less resources (e.g. recent paper , or the example visible in this very article of AlphaZero having much, much less compute than AlphaGoZero and doing better anyways), and even without that

                                                                                                                              AlphaZero could not have been created without going through many many iterations of AlphaGo, each one of which cost several GPU-years, and calling AlphaZero cheap is serious moving of goalposts as it required thousands of TPUs for days, and Facebook's recent replication for chess also used thousands of GPUs for 3 weeks. (Zero is cheap only in comparison to the previous AlphaGos using weeks or months of hundreds/thousands of GPUs/TPUs.) Note that that is a log graph; flip over the linear scale and you get an idea of how extraordinarily expensive Zero is compared to everything not named 'AlphaGo'.

                                                                                                                              If anything, this observation implies that AI risk is more dangerous than thought because it implies a 'hardware overhang': it will take vast computational resources to create the first slow inefficient AI but it will then rapidly be optimized (either by itself or human researchers) and able to run far faster/more copies/on more devices/for less money, experiencing a sudden burst in capabilities. Like model compression/distillation where you can take the slow big model you normally train and then turn it into something which is 10x faster or 100x smaller or just plain performs better (see 'born again networks' or ensembling).

                                                                                                                              > simple hardware progress means that random gaming GPUs can handle datasets that were inconvenient a few years ago.

                                                                                                                              ...which means using a lot more compute, yes.

                                                                                                                              • geoffreyirving 306 days ago

                                                                                                                                And it's worth emphasizing that we don't need to have particularly high confidence that this trend continues for it to motivate working on AI safety. All we need is the lack of high confidence that it will stop.

                                                                                                                              • damodei 306 days ago

                                                                                                                                It's actually incorrect that AZ got better results that AGZ with less compute. The graph shows the large AGZ, which somewhat exceeded the rating of both the small AGZ and AZ. AZ did slightly outperform the small AGZ, but did so while using a similar amount of compute.

                                                                                                                                On the broader point though, I agree with this. We say that compute and algorithms are complementary in the post. Much of the time, when you come up with an algorithm that allows you to do something that used to cost X compute in 0.2X compute instead, you can use the new algorithm to do something significantly more impressive with the full X compute.

                                                                                                                                • adamnemecek 306 days ago

                                                                                                                                  How can I contact you? I have some questions about adversariality in AI.

                                                                                                                                • geoffreyirving 306 days ago

                                                                                                                                  Yep, we're not claiming this is the compute required for DL, and for specific tasks we expect compute required to fall over time. But better algorithms actually mean compute is more important, not less, and would likely make the growth in available compute more important.

                                                                                                                                  For example, if a task is parameterized (by size or difficulty, say), then a better algorithm might change the asymptotic complexity from O(n^3) to O(n^2). A 2x compute increase for the old algorithm would take us from n -> 1.25n, but the new algorithm would go from n -> 1.41n.

                                                                                                                              • bluetwo 306 days ago

                                                                                                                                To me it just seems like a stab at trying to make the intangible, tangible.

                                                                                                                                To be taken with a grain of salt.

                                                                                                                                Innovations in algorithms will give us better prediction with less compute power.