Cerebras Systems unveils a 1.2T transistor chip for AI

(venturebeat.com)

227 points | by modeless 1682 days ago

30 comments

  • modeless 1682 days ago
    There are far more transistors in this chip than neurons in the human brain. In 100 of these chips, there are more transistors than there are synapses in the human brain.

    I don't mean to suggest that transistors are equivalent to neurons or synapses; clearly they are very different things (though it is not clear that neurons are much more computationally powerful than transistors, despite assertions from many people that this is the case). But I think it is still useful to compare the complexity of structure of chips vs. the human brain. We are finally approaching the same order of magnitude of complexity in structure.

    Also note that this is not manufactured on TSMC's highest density process. Assuming TSMC's 3nm process development is successful, that will probably be 6x denser than the 16nm process used here.

    • jacquesm 1682 days ago
      A few buckets of sand also contain more elements than there are synapses in the human brain. They clearly lack any structure, and so are computationally stuck at a big fat '0' no matter what you do with the sand (unless you want to turn it into a giant abacus or turn it into integrated circuits).

      Where on the scale between a few buckets of sand and an actual working human brain this chip is is not so much a function of it's structure but of what it does, and the brain's finer structures are so complex that even modelling a few neurons is going to crush that complexity downwards by so many orders of magnitude that we simply hope that we are not accidentally throwing out the useful bits in the simulation.

      That's a roundabout way of saying that I think that:

      > We are finally approaching the same order of magnitude of complexity in structure.

      Is not necessarily true in a way that is relevant. The parts count or basic interconnection may have nothing to do with how the brain is connected internally, nor with how it functions at the lowest levels.

      • Veedrac 1682 days ago
        > A few buckets of sand also contain more elements than there are synapses in the human brain.

        Actually you'd need about a thousand buckets.

        https://www.quora.com/How-many-grains-of-sand-could-fit-in-a...

        ---

        Ultimately I think these numbers do matter. As chips approach the size of human brains, only 1,000,000,000x faster, you remove the question of engineering and scale, and are left with just research.

        • azinman2 1682 days ago
          We don’t know what numbers are the numbers that we need to care about. How much do chemical gradients matter? Inhibitors? Hormones? What other factors are at play?

          I read so many people that have both simplified views of DNNs as well as real neurons that they get overly excited about the possibility of somehow getting to AGI by just having enough transistors. There was similar thinking in the 1950s when proto-AIs could play chess.

          Cognition is extraordinary. The theory of mind encompasses many important features that we have no use how to approach.

          To say all that’s left is research doesn’t really say much at all. It’s like saying for teleportation all that’s left is research as we have lots of bandwidth, or that for month-long battery life all that’s left is research as we have the phones ready to take them.

          The gap between our knowledge Of the brain and what’s being applied to machine learning is a big leap.

          • Veedrac 1682 days ago
            Brains run on computation. Computation is fungible. The highest bandwidth signal in the brain is through neuron spiking, and there isn't much space for other mechanisms to hide. Take a simple 10 virtual neurons:1 synapse ratio as a rough upper bound of how much computation the brain can be doing, and by necessity the rest is architecture.

            This is not at all like teleportation. It is absurd to claim numbers much larger than my own. 100:1? Where would the computation be? It is completely unnecessary to claim that the brain must have vast amounts of hidden capacity to do what it does, rather than the secret being in the ‘software’, and it goes against most of what we know about the brain, its ancestry, and computation.

            This is not like teleportation. The brain isn't magic. I don't have to claim to know how the brain works to point this out.

            • ypcx 1682 days ago
              For the in-brain computation, this may be true. However there may be non-local quantum effects emergent from the functioning of this massively parallel biological neural network apparatus, and information could be received and transmitted outside of the five senses. Whether this information or functionality is necessary for basic functioning of the brain, we don't know yet. I'm guessing it is, but I'm also guessing that we can supplant it by little more computation.
              • AstralStorm 1681 days ago
                And that butterfly just caused a tornado that killed the original poster.

                Quantum effects, while nonlocal, are small magnitude, not even microtubules are sensitive enough for this at the temperatures brain runs at.

                It is nigh certain that mind does not depend on this magnitude of effects. There's way too much chaos inside even a single neuron, much less a whole network.

            • colordrops 1682 days ago
              Is anything magic? It doesn't really matter when a thing is so inscrutable that it is indistinguishable from magic, especially if you believe that your personal subjective experience is 100% the result of brain function.
              • Veedrac 1682 days ago
                Maybe it's best to wait until after we've hammered away at the problem on warehouses of human-scale hardware for 50 years before calling it inscrutable? The paper that put neural networks on GPUs is ten years old. I get that the problem looks hard, but so do lots of things.
            • ohum 1682 days ago
              Please don’t take this as a personal attack. Your comment and certainty sound like religious fanaticism. The belief that consciousness is merely computation is unproven, and a matter of faith. It always seems a little short sighted when people claim to know something about consciousness and the mind based on a belief that the brain amounts to a biological computer. As an example of an alternative possibility to brain as a biological computer: in a holographic universe the brain itself is an emergent system. Or, Buddhists say that the material world as we perceive it is interdependent with mind, which is more fundamental than form. Some interesting phenomena that present problems with mind as programming running on a biological computer are the placebo/*cebo effects and Wim Hof :). Mind over matter, man.
              • The_rationalist 1682 days ago
                qualias (consciousness, feelings, senses) might be impossible to create through transistors. But it does not matter, we can build something functionally equivalent in a computable way. This is called: a philosophical zombie https://en.m.wikipedia.org/wiki/Philosophical_zombie
                • ohum 1682 days ago
                  A computing machine (not taking environmental factors like cosmic rays into account (though, why not?)) is deterministic, unless it interfaces with a truly random source of entropy. My physicist friend has said, there may be no randomness, only complexity.

                  Is all deterministic or do conscious entities have free will, or...?

                  The decisions made by your philosophical zombie will either be predetermined by the programming, or seeded by randomness. The decisions made by such a machine cannot be proved to be functionally equivalent to the decisions made by a conscious entity, and belief in such is a matter of faith.

                  Belief in randomness itself is a matter of faith. It’s impossible to prove that any event or measurement is truly random, and yes, there may be hidden variables- there’s no way to test.

                  There is no rational reason to believe that conscious behavior, consciousness, feelings, and all the activity of the mind, are functionally equivalent to a program and/or randomness.

                  Sounds like you are believer in the religion of materialism.

                  • lostmsu 1682 days ago
                    > There is no rational reason to believe that conscious behavior, consciousness, feelings, and all the activity of the mind, are functionally equivalent to a program and/or randomness.

                    On the contrary, the number of tasks, that a human can do, but a machine can't is shrinking. That recently started including art.

                  • The_rationalist 1682 days ago
                    Is all deterministic or do conscious entities have free will? Free will vs determinism is a fun debate for beginners in philosophy. But it is not a debate per se, the answer is obvious once asked seriously in a well defined manner. So firstly you will agree with all scientists on earth that the world, that matter is causal and totally predictable on each of it's properties though calculus. https://en.m.wikipedia.org/wiki/Causality_(physics) Something contra-causal has never been observed in all humanity history. The conception that a human (made of matter after all) is totally deterministic is consistent and explain human behavior. It is epistemologically weak to ask an alternative (free will) that needs another premise. Let's define free will : "Some conceive free will to be the capacity to make choices in which the outcome has not been determined by past events." This means: the free choice come not from the past, it comes from nowhere. To make such a thing possible a brain would need to create primary cause (sorry the English Wikipedia page does not yet exists https://fr.m.wikipedia.org/wiki/Cause_premi%C3%A8re). A prymary cause is a cause that come from nowhere and create consequences and causes that are determined by the primary cause. Only the primary cause has "free will power". So are there primary cause in the universe? Happily not, otherwise science would be broken and the world chaotic, unpredictable. There is one true candidate for primary cause: the big bang. To think that a brain is able to break the law of physics to create a primary cause is maybe not impossible but highly unlikely, and totally unscientific, mostly religious Many things can be said this premise: When does it start? Does a bacteria has this power? An ant? A fish? A cat? Or only humans? Because language allow free will? Or because language allow Suffisant complexity to hidden the deep causes between or choices and create an illusion of choice? So free will has no explanatory power, determinism is the most observed thing empirically, only humans have it? (because it is trivial to model deterministically mammals) which add even more ad-hoc-ness. Another question is: what is the frequency of free will on a human ? Are 80% of your thoughts free? Or PI%? Another ad-hoc Ness but the more frequency of free will you believe the more epistemologically weak your belief is.

                    But even after all this, let's say that free will exist, what does it change? Can it allow things that determinism cannot? Firstly free will is NOT free.. Imagine you ask me to solve a problem. Let's say that knowledge about A is necessary to find a solution. If I don't know A and is undeducible from my prior knowledge then I can't solve the problem, free will or not. My past is unsufficient, and a primary cause cannot give me knowledge (sadly ;)) Humans choices and thoughts are bounded by knowledge. They are also bounded by fluid intelligence. Humans have two single goals in their lifes: Maximize their happiness and maximize others Hapinness. It is the only meaningful thing in life (qualias). So it is important to understand that for each situation in life there is an optimal choice and maybe competing <= choices. The goal of a human is to find the most optimal choice, it necesssiate a lot of knowledge and a lot of rationality training e.g learn cognitive biases, logical fallacies, skepticism, the scientific method, learn more words, etc. The more you gain erudition and rationality the less choices appear as interesting as what you know is the most optimal you can find. The ideal omniscient person has no choice to make, she knows what choice is optimal. A choice is a choice when you don't know enough the consequences and so you make a bet (which e.g can maximize risk/return or minimize it) Once you know the optimal choice what does free will means? It means the freedom to make an irrational, worse choice than the optimal one. Such a useless concept (when you know the optimal one)

                    Free will would only apply on a list of choices when you cannot rank any of those choices, you don't know if any choice from the list is better than any other. In such a (rare) situation, free will would allow true randomness of choice. What's the point? And more than that, it's refuted. Humanity has built pseudo random number generators that are by far good enough and if you wanted from such a list of choice make a random choice, you better use a PRG! Because if we empirically measured your believed random choices, they would not at all be random. A common proof of that is the game paper scissor rock where people have big difficulties to not repeat patterns.

                    So I have shown that 1) free will is one of the weakest possible belief on earth, epistemologically. And 2) that it has no explanatory power (does not explain even one thing that determinism cannot) And 3) that it necessarily reduce to true randomness and that it would useless (cf "what's the point") and that it is empirically shown to be wrong. Sadly, when you have since a long time a strong belief, with emotional affect, reading a sound refutation does not allow the reader to change it's belief. You have probably not the cognitive freedom to now thinks "free will doesn't exist and if it would it would be useless" (see the irony about freedom?)[1] because human brains are buggy. Firstly when not trained we have an inability to see (on others and on ourself) logical fallacies. https://en.m.wikipedia.org/wiki/List_of_fallacies

                    And we cannot see our cognitive biases https://en.m.wikipedia.org/wiki/List_of_cognitive_biases which is in itself a bias called the blind spot bias.

                    [1]This does not apply or less, if you are trained to rationality. Lesswrong.com Rationalwiki.org And Wikipedia are good places

                    "Sounds like you are believer in the religion of materialism." You got me! Well to be precise, materialism is now called physicalism and it is synonymous to be a believer of the scientific method. I'll take that as a compliment. But it is anthitetic to" religion", Popper would have a heart attack reading your sentence ^^ You know what, you are a physicalist too even if you ignore it. All human progress is driven by science, it would be time to recognize that.

        • jacquesm 1682 days ago
          Using fine sand I make that out to be about 6 buckets, give or take, besides, this isn't about whether it is 6, 10 or even 100 (or 1000) buckets, it is about how the number itself does not make a huge difference. Before if you wanted 1.2T transistors you needed 50 or so regular large dies, and presumably there have already been dies with more than the 21 billion that I'm referring to here (the NVidia 5K cores chip).

          If that was all it took then we'd have had 'human brains' in a box long ago, after all, 50 GPUs is something that isn't all that rare in plenty of institutions.

          • Veedrac 1682 days ago
            There are 100 trillion synapses in the brain.

            I'm not claiming Cerebras is human scale; I think it's a factor of 100-1000 off personally. I'm just saying I don't think the comparison is meaningless.

            • jacquesm 1682 days ago
              Time will tell. I personally put the time between now and AGI well in the decades, possibly more than a few centuries. This is not a mere matter of engineering, it is one of fundamental understanding of which we likely have achieved only very little.

              Imagine seeing the wiring diagram of an alien computer with parts on the order of 6 orders of magnitude smaller than the ones that you are familiar with without knowing the contents of its memory and then you are expected to make a working copy. Good luck.

    • neural_thing 1682 days ago
      I have written a book about the true computational power of pyramidal neurons. Spoiler: they are A LOT more powerful than transistors. The book is free @ http://www.corticalcircuitry.com/
      • thelittleone 1682 days ago
        I'm a few pages in and with found it fascinating, humorous, compelling and understandable even with zero background in the field. Thanks for sharing.
      • christoph 1682 days ago
        I made the mistake of starting to read this in bed last night and got totally engrossed. Looking forward to finishing it off over the next few days. Many thanks for writing this and sharing it.
      • plainOldText 1682 days ago
        I have started reading the first two chapters of your book and I really like it. It's well written and easy to understand. If it continues to be this wonderful for the remaining chapters I will buy it as well. Thanks for writing it.
    • jdsully 1682 days ago
      I haven’t seen a perceptron circuit done in less than 4 transistors plus a few diodes. However these chips are almost certainly digital and will use a lot more to perform the floating point math.

      Single transistors can’t even compute our simplified model of a neuron let alone the complexities of the real thing.

      • mlyle 1682 days ago
        If we're going to talk about net computational power per unit (neuron/transistor), we also need to note that single transistors do stuff a whole lot faster than biological neurons, too. If we had neurons that were 1,000x faster, could we use fewer of them for the same task?
        • jacquesm 1682 days ago
          Well, we can turn that argument around: if an artificial brain could be faster than a real one, does that mean that right now we have slow artificial brains that are just as good as the real ones? The answer being an obvious 'no' may help answer your question.
          • mlyle 1682 days ago
            Just because we don't know a program that creates a decent artificial brain doesn't mean that such a program doesn't exist for the hardware we have.

            We're almost certainly going to be able to make speed/size trade-offs in AGI, because we're used to making them in all other computing.

            Even biological systems make these types of trade-offs. We're beings optimized for the environment we evolved in, not optimized for "smarts". There's many "choices" in energy balance, reaction time, etc, that have been made.

            Transistors are really powerful and fast computing devices; really our use of them in digital logic in some ways greatly under-utilizes them for ease in design and predictability. Saying that it takes 4 transistors to make a really simple neuron analog in a circuit undersells that those 4 transistors can operate many, many orders of magnitude faster.

            • jcims 1682 days ago
              Faster than what? There are a hundred trillion atoms in a neuron operating in a coordinated fashion to create its global behavior in the context of the brain. How many interactions between those components occur in a billionth of a second? Many.
          • tlb 1682 days ago
            It would be very hard to tell if an AGI running 1000x slower than a brain were intelligent. Humans babies don’t talk for about a year, but nobody has tried firing up a putative slow AI and waiting for 1000 years.
            • jacquesm 1682 days ago
              Then let's run 1000 of them in parallel, right?

              You're on to the exact point I'm trying to make. The whole idea that 'just' complexity or 'just' speed is enough to make something intelligent that is not yet intelligent is off-base to a degree that resorting to metaphors to try to explain why also breaks down.

              I liken intelligence to life: once it exists it is obvious and self-perpetrating but before it exists it is non-obvious and no amount of fantasizing about what it would take to make it will get you incrementally closer until one day you've got it. But that day will start out just like the thousands of days before it. It's a quantum leap, a difference in quality, not an incremental one. Yes, enough order of magnitude change can make a quantitative change into a qualitative one. But so far the proof that this holds for intelligence is eluding us. Maybe we're just not intelligent enough, in the same way that you can't 100% describe a system from within the system.

              • tlb 1682 days ago
                But it doesn't parallelize. Watching 1000 babies for their first 0.365 days isn't going to reveal their potential.

                More practically, deep learning experiments were tried in the 80s, but the results weren't encouraging. If they'd left the experiments running for a sufficiently long time, they might have gotten great results. But they gave it a few days, saw the loss rate plateau and hit ^C, and that was that.

                Most likely, the early deep learning experiments had some parameters wrong. Wisdom about choosing parameters only came once people could run multiple experiments in a week. So it's likely the same with some hypothetical AGI. Early experiments will have some parameters wrong, and it'll either go into seizures or catatonia. We won't be able to get the parameters right until we can run experiments faster than human brains. Say, simulating 5 years of life in a week, or 250x real time.

                • jacobush 1682 days ago
                  Wow, that is a thought I didn't have before. Thanks!

                  Still, we wouldn't even know what real time is, or what the "x" is, rather, until after we have achieved AGI. Also a mind bender.

                  • tlb 1682 days ago
                    Right, we have no idea how much computation it takes. Discoveries usually start with a less efficient algorithm, and then the performance gradually improves once we know what it needs to do. We might need 1000x the compute power to discover something as it'll eventually need once we deploy it. So I'm a big fan of ridiculously fast hardware like this article describes.
                • jacquesm 1682 days ago
                  Yes, we are indeed in violent agreement.
              • mlyle 1682 days ago
                If it's possible to implement AGI on a computer, a 4004 with enough RAM attached can do it... very slowly. So the speed/size trade-off is implicit unless we take a pseudo-spiritualist view that computing machines just can't be intelligent.

                There's two big options out there: either it requires much, much more computation than we can readily employ even to do it relatively "slowly", or we don't know how.

                The former has a little bearing on the balance between a transistor's computational capabilities and a neuron's, but even so is largely orthogonal.

                As to parallelism, it doesn't work that way. If you get nine women pregnant, you don't get a baby per month. If we've made a working but far-too-slow-for-us-to-realize-it AGI, making more of them doesn't help us understand the problem.

                • jacquesm 1682 days ago
                  > we don't know how

                  Exactly.

              • crististm 1682 days ago
                I think that part of the idea of adding more speed and data to solve the AI problem started with Norvig's presentations on how more data resolved the search problem better than using more clever algorithms.

                People are using the 'more data' approach because it makes a (minor) dent at the problem. It's the only tool they have right now.

                It is my opinion that it is not enough to make us smarter in understanding the quantum leap necessary.

        • trott 1682 days ago
          > If we had neurons that were 1,000x faster, could we use fewer of them for the same task?

          Absolutely. In fact, we do: in batched processing (pervasive in deep learning today), CNNs and other weight sharing schemes (e.g. transformers) the same "neurons" get re-used many times.

          • mkl 1682 days ago
            Those aren't neurons, though. Biology is different, synaptic connections have to physically fit, etc. Individual neurons serve multiple purposes, but I don't think we know if their speed is a limiting factor in that.
    • berdon 1682 days ago
      You're mixing terminology.

      A neuron is not a synapse.

      - There are ~100 billion neurons in the human brain

      - There are 100-1000 trillion synapses in the human brain

      There's strong evidence to suggest that each synapse is more akin to a digital perceptron than a neuron is. Synaptic cleft distance, transmitter types, dendritic structure, reuptake, etc are all factors that can allow for some level of long-term storage and mediation of subsequent neural activation.

      [Edit] Maybe not "mixing" but seemingly comparing apples and oranges.

    • Simon321 1682 days ago
      It takes around 15-20 transistors to make a simple neuron simulation:

      https://www.quora.com/How-many-transistors-can-be-used-to-re...

      But: "Synapses are usually separately modeled in transistors (they are not part of the neuron circuits described above) and dramatically add to the transistor count."

      "[A neuron] has on average 7000 synaptic connections to other neurons."

      So we're not there yet, but it's definitely an improvement!

  • ChuckMcM 1682 days ago
    Wafer scale integration, the stupid idea that just won't die :-)

    Okay. I'm not quite that cynical but I was avidly following Trillogy Systems (Gene Amdahl started it to make super computers using a single wafer). Conceptually awesome, in practice not so much.

    The thing that broke down in the '80s was that different parts of the system evolve at different rates. As a result your wafer computer was always going to be sub-optimal at something, whether it was memory access or an I/O channel standard that was new, changing that part meant all new wafers and fab companies new that every time you change the masks, you have to re-qualify everything. Very time consuming.

    I thought the AMD "chiplet" solution to making processors that could evolve outside the interconnect was a good engineering solution to this problem.

    Dave Ditzel, of Transmeta fame, was pushing at one point a 'stackable' chip. Sort of 'chip on chip' like some cell phone SoCs have for memory, but generalized to allow more stacking than just 2 chips. The problem becomes getting the heat out of such a system as only the bottom chip is in contact with the substrate and the top chip with the case. Conceptually though, another angle where you could replace parts of the design without new masks for the other parts.

    I really liked the SeaMicro clusters (tried to buy one at the previous company but it was too far off axis to pass the toy vs tool test). Perhaps they will solve the problems of WSI and turn it into something amazing.

    • Smerity 1682 days ago
      Given these chips are purely meant for machine learning the issue that "your wafer computer was always going to be sub-optimal at something" is less of an issue now than in traditional scientific programming setups, especially of the Trillogy / Transmeta days.

      You have silicon in place to deal with physical defects at the hardware level.

      You have backprop / machine learning in place to deal with physical deficiencies at the software level.

      The programmer operates mostly at the objective / machine learning model level and can tweak the task setup as needed softly influenced by both the hardware architecture (explicit) and potential deficiencies (hopefully implicit).

      The most extreme examples I've seen in papers or my own research: parts of a model left accidentally uninitialized (i.e. the weights were random) not impacting performance enough to be seen as an obvious bug (oops), tuning the size and specifics of a machine learning model to optimize hardware performance (i.e. the use of the adaptive softmax[1] and other "trade X for Y for better hardware efficiency and worry about task performance separately"), and even device placement optimization that outperforms human guided approaches[2].

      Whilst the last was proven across equipment at a Google datacenter with various intermixed devices (CPUs, GPUs, devices separated by more or less latency, models requiring more or less data transferred across various bandwidth channels, ...) it's immediately obvious how this could extend to optimizing the performance of a given model on a sub-optimal wafer (or large variety of sub-optimal wafers) without human input.

      Whilst reinforcement learning traditionally requires many samples that's perfectly fine when you're running it in an environment with a likely known distribution and can perform many millions of experiments per second testing various approaches.

      For me working around wafer level hardware deficiencies with machine learning makes as much or more sense as MapReduce did for working around COTS hardware deficiences. Stop worrying about absolute quality, which is nearly unattainable anyway, and worry about the likely environment and task you're throwing against it.

      [1]: https://arxiv.org/abs/1609.04309

      [2]: https://arxiv.org/abs/1706.04972

    • samstave 1682 days ago
      I was being facetious:

      I was working right next to andy grove and other legendary CPU eng's at intels bldg SC12....

      I was young and making comments abt "why cant we just do this, or that"

      ---

      it was my golden era.

      Running the Developer Relations Group (DRG) game labe with my best friend Morgan.

      We bought some of the first 42" plasma displays ever made.... and played quake tournaments on them...

      We had a T3 directly to out lab.

      We had the first ever AGP slots, the first test of the unreal engine...

      We had SIX fucking UO accounts logged in side-by-side and ran an EMPIRE in UO.

      We would stay/play/work until 4am. It was fantastic.

      ---

      We got the UO admins ghosting us wonersing how we were so good at the game (recall, everyone else had 56K modems at best... we were on a T3 at fucking intel...

      We used to get yelled at for playing the music too loud....

      ---

      Our job was to determine if the Celeron was going to be a viable project via playing games and figuring out if the SIMD instructions were viable... To ensure theree wasa . capability to make a sub $1,000 PC. Lots of ppl at intel thought it was impossible...

      GAMES MADE THAT HAPPEN.

      Intel would then pay a gaming/other company a million dollars to say "our game/shit runs best on Intel Celeron Processors" etc... pushing SIMD (hence why they were afraid of Transmeta, and AMD -- since AMD already had won a lawsuit that required Intel to give AMD CPU designs from past....

      This was when they were claiming that 14nm was going to be impossible...

      What are they at now?

    • p1necone 1682 days ago
      > Dave Ditzel, of Transmeta fame, was pushing at one point a 'stackable' chip. Sort of 'chip on chip' like some cell phone SoCs have for memory, but generalized to allow more stacking than just 2 chips. The problem becomes getting the heat out of such a system as only the bottom chip is in contact with the substrate and the top chip with the case. Conceptually though, another angle where you could replace parts of the design without new masks for the other parts.

      I'm imagining alternating between cpus and slabs of metal with heatpipes or some sort of monstrous liquid cooling loop running through them.

      • ChuckMcM 1682 days ago
        Actually diamond is a really good heat conductor. Dave's big idea was that the actual "thickness" of the chip that was needed to implement the transistors was actually quite thin (think nanometers thin) and that if you put one of these on top of another one, you could use effects like electron tunneling to create links between the two, and even bisected transistors (where the gate was on one of the two and the channel was on the other).

        So here is an article from 4 years ago on diamond as a substrate: https://www.electronicdesign.com/power/applications-abound-s... which talks about how great it is for spreading heat out. And one thought was that you would make a sandwich of interleaved chip slices and diamond slices, still thin enough to enable communication between the two semiconductor layers without having to solder balls between them.

        In that scenarios the package would have an outer ring that would clamp contact the diamond sheets to pull heat out of everything to the case.

        Of course the mass market adopted making chips really thin so that you can make a sleek phone or laptop. Not really conducive to stacks in the package. Perhaps that was the thing that killed it off.

    • samstave 1682 days ago
      Havent hear Tranmeta in a LONG time.

      I recall when I was at intel in 1996 and I used to work a few feet from Andy Grove... and I would ask naive questions like

      "how come we cant stack mutiple CPUs on top of eachother"

      and make naive statements like:

      "When google figures out how to sell their services to customers (GCP) we are fucked" (this was made in the 2000's when I was on a hike with Intels then head of tech marketing, not 1996) ((During that hike he was telling me abt a secret project where they were able to make a proc that 48 cores)) (((I didnt believe it and I was like "what the fuck are they going to do with it)))?? -- welp this is how the future happens. and here we are.

      and ask financially stupid questions like:

      "what are you working on?" response "trying to figure out how to make our ERP financial system handel numbers in the billions of dollars"

      I made a bunch of other stupid comments... like "apple is going to start using Intel procs" and was yelled at by my apple fan boi " THATS NEVER GOING TO FUCKING HAPPEN"

      But im just a moron.

      ---

      But transmeta... there was a palpable fear of them at intel at that time....

      • foobiekr 1682 days ago
        A lot of the transmeta crowd, at least on the hardware side, went on to work at a very wide variety of companies. Transmeta didn't work, and probably was never going to work, as a company and a product, but it made a hell of a good place to mature certain hardware and software engineers, like a VC-funded postdoc program. I worked with a number of them at different companies.
        • PopeDotNinja 1682 days ago
          I was at Transmeta. It was good for my career!
          • samstave 1682 days ago
            So what specifically are you doing now??
            • PopeDotNinja 1682 days ago
              I was a QA Engineer at Transmeta, got an MBA, work in tech recruiting for several years, and now I'm a software engineer.
            • skavi 1682 days ago
              click on their profile
    • petra 1682 days ago
      New improvements(zeno semi) do talk about 1T sram, and dram chiplet solution will have the same limits in memory size as wafer-scale, So maybe the sram vs dram gap will be close enough?

      And as for IO, maybe it's possible(thermally) to assemble this on an IO interposer ?

    • nickpsecurity 1682 days ago
      When I speculated on brain = partly general-purpose analog, this wafer-scale project was one of the ones I found digging for research like that:

      http://web1.kip.uni-heidelberg.de/Veroeffentlichungen/downlo...

      Pretty neat. I don't know how practical.

    • solotronics 1682 days ago
      For cooling maybe a phase change liquid would work such as Novec, then you would just need each chip to be exposed to the Novec and your packaging could be much smaller without heatsinks.
  • program_whiz 1682 days ago
    An amazing stride in computing power. I'm not convinced that the issue is really about more hardware. While more hardware will definitely be useful once we understand AI, we still don't have a fundamental understanding of how AGI works. There seem to be more open questions than solved ones. I think what we have now is best described as:

    "Given a well stated problem with a known set of hypotheses, a metric that indicates progress towards the correct answer, and enough data to statistically model this hypothesis space, we can efficiently search for the local optimum hypothesis."

    I'm not sure doing that faster is going to really "create" AGI out of thin air, but it may be necessary once we understand how it can be done (it may be an exponential time algo that requires massive hardware to execute).

    • narrator 1682 days ago
      Exponential time algos aren't really practical on any hardware except quantum computers, and only for algorithms that benefit from quantum speedup such as the quantum fourier transform[1]. The quantum fourier transform goes from O(n2^n) in a classical computer to O(n^2) in a quantum computer.

      [1] https://en.wikipedia.org/wiki/Quantum_Fourier_transform

    • yters 1682 days ago
      If AGI is a halting oracle, then hardware performance is irrelevant.
    • gridlockd 1682 days ago
      What's there to understand? We know how intelligence "happened". We just need to build a few million human-sized brains, attach the universe, and simulate a few billion years of evolution. Whatever comes out of that could just be called "AGI" by definition.
      • sanxiyn 1682 days ago
        I mean, say, photosynthesis also happened by evolution. There is still a lot to understand about photosynthesis, and we do understand now, although it took decades of research.
        • gridlockd 1681 days ago
          Sure, but we understand enough so that we can recreate photosynthesis in a lab. There are many drugs which do work, but we don't know exactly how they work.

          The point is, it's likely not a necessary precondition to understand AGI in order to create it.

      • yters 1682 days ago
        We don't know that's how intelligence happened. That's just speculation based on materialist assumptions. Intelligence is most likely immaterial, i.e. abstract concepts, free will, mathematics, consciousness, etc. In which case, it's beyond anything in this physical universe.
        • visarga 1682 days ago
          > it's beyond anything in this physical universe

          Your way of thinking leads to dualism, which has been proven to be a bad approach to the problem of consciousness. Dualism doesn't explain anything, just moves the problem outside of the 'material' realm into fantasy lala-land.

          • yters 1681 days ago
            Hey, if materialism cannot explain reality, then why stick with a bad hypothesis? Sticking your fingers in your ears and calling any alternative 'lala-land' sounds pretty anti-intellectual.
            • visarga 1680 days ago
              I am not a materialist, I am a physicalist. And la-la land is just a fit metaphor for thinking that conscious experience is explained by a theory that can't be proven or disproven. If you think your consciousness is received by the brain-antenna from the macrocosmic sphere of absolute consciousness (or God, or whatever you call it) then remember about your mother and father, and the world that supported your and fed you experiences. They are responsible for your consciousness. Your experience comes from your parents and the world, and you are missing the obvious while embracing a nice fantasy. And if you don't make babies you won't extend your consciousness past your death. Your parents did, and here you are, all conscious and spiritual.
              • yters 1679 days ago
                How do you prove/disprove physicalism?
        • gridlockd 1681 days ago
          > Intelligence is most likely immaterial, i.e. abstract concepts, free will, mathematics, consciousness, etc. In which case, it's beyond anything in this physical universe.

          For the purposes of my argument, intelligence is a set of behaviors that would be described as "intelligent" by our contemporaries. I have no use for an inscrutable philosophical definition of intelligence, consciousness, free will, or any of that stuff.

          I'm convinced that these behaviors arose as a result of natural physical processes and that if we were to reproduce them, we would "most likely" receive similar results. We can already observe this process by simulating simpler life forms[1].

          Of course that's speculation, but it has better foundations than "it's beyond anything in this physical universe". That's the scientific equivalent of "giving up".

          [1] http://openworm.org/

          • yters 1681 days ago
            Why is that the same as 'giving up'? I call it a better hypothesis. It's like saying there is a halting problem and there are problems that are fundamentally unsolvable by computers. If true, it's pointless trying to create an algorithm to solve the halting problem. Instead, perhaps the mind is a halting oracle, and we can progress even further in the sciences by that realization.
            • gridlockd 1681 days ago
              > Why is that the same as 'giving up'? I call it a better hypothesis.

              It's not a hypothesis, because it doesn't explain anything. It's also not falsifiable, so it is not useful for the purposes of science.

              > It's like saying there is a halting problem and there are problems that are fundamentally unsolvable by computers

              The halting problem is a rigorously defined mathematical problem which has a mathematical proof that demonstrates that it is indeed unsolvable. That's entirely different from saying that "there might be such a thing as a halting problem and it's probably unsolvable".

              Furthermore, the halting problem is abstract from physical reality. It is true regardless of anything observed in physical reality. In physical reality, there is no such thing as a program that will not terminate, because there is no such thing as a computer that runs forever. The "physical halting problem" is solvable. The answer is always: The program will halt at some point because of thermodynamics.

              Similarly, it is my position that the physical phenomenon of intelligence can be entirely explained by natural processes involving no metaphysics whatsoever.

              > Instead, perhaps the mind is a halting oracle, and we can progress even further in the sciences by that realization.

              Perhaps it is, but there is no such realization. You haven't even defined what you mean by intelligence and why that would even require a non-materialist explanation.

              • yters 1680 days ago
                Well then, how is materialism falsifiable? If it isn't, how is it a scientific hypothesis?
                • gridlockd 1680 days ago
                  > Well then, how is materialism falsifiable? If it isn't, how is it a scientific hypothesis?

                  It isn't a scientific hypothesis.

                  You use the word "materialism" to segregate yourself philosophically, but I haven't made a philosophical argument.

                  Let's suppose your "hypothesis" is true, but given that it is beyond physical observation, we can never hope to know that it is true or even know the probability of it being true. Then what would be the benefit in assuming that it is true?

                  We could conceivably save some time on trying to figure out AGI and instead do something "more productive". However, that's true of almost any endeavor. Anything you do in life has a chance of failure and an opportunity cost. Just imagine what could have become of us, had we not bothered to write HN comments!

                  • yters 1679 days ago
                    Ok, then if materialism isn't a scientific hypothesis, why is it scientific?
                    • gridlockd 1677 days ago
                      I don't know what "materialism is scientific" would mean exactly. I never made that claim, so don't ask me to defend it.

                      Remember, you are the one who wants to fight this "materialism vs. dualism" battle, not me. I'm not saying your position is wrong. I'm saying your position seems inconsequential at best, but harmful at worst.

                      To illustrate this, let's take a hypothetical "dualist" explanation for disease:

                      "Disease is caused by spirits that attach to our bodies, but we cannot observe these spirits by any physical means and we can never interact with them in any distinguishable way."

                      In other words, we cannot perform experiments, we cannot learn anything, so these spirits might as well not exist at all. There is no practical difference.

                      Now that's all fine and well until you realize that disease is actually caused by physical processes than you can understand and intervene with. Had you been convinced that it was spirits all along, you wouldn't have bothered trying to figure out the physical processes.

                      Today this example may sound ridiculous, but for most of history similar convictions held people back from turning quackery into real medicine.

                      • yters 1677 days ago
                        And on the materialist side we have similar quackery with phrenology, bogus medicines, Darwinian evolution, and the like. There is quackery everywhere. The fact that one can concoct a false explanation within either materialism or dualism means such an example doesn't help determine which paradigm is better.

                        The important question is whether one paradigm allows us to explain reality better than the other, and dualism does this very well. There is no materialistic explanation for consciousness, free will, abstract thought or mathematics. Instead, those who follow materialism strictly end up denying such things, and we end up with incoherent and inaccurate theories about the world.

                        • gridlockd 1677 days ago
                          > And on the materialist side we have similar quackery with phrenology, bogus medicines...

                          Yes, but the key distinction here is that phrenology or bogus medicine are testable. If you claim phrenology predicts something but then statistics show that this isn't the case, phrenology is proven wrong. It's more difficult with medicine, but it's at least possible.

                          Furthermore, we know that many of our "materialist" theories in physics are wrong. We know it because they disagree with experiment. However, they still have enough predictive power to be very useful.

                          > ...Darwinian evolution, and the like.

                          I'm not sure what you mean here. The principles behind Darwinian evolution are easily reproduced. Re-creating billions of years of evolution in a lab would be difficult, of course.

                          > The important question is whether one paradigm allows us to explain reality better than the other, and dualism does this very well.

                          I don't think it does it well at all. An explanation that you can not test isn't useful. In fact, a wrong explanation that you can test is more useful.

                          There's an infinite number of "dualist" explanations, all of which we cannot test. So which one to choose? Why stop at dualism, why not make it trialism? Why not infinitism?

                          Why not just say that there's an infinite number of intangible interactions, that every mind in the universe is connected with every other mind through a mesh spanning an infinite-dimensional hyperplane? You can't prove that this isn't the case, why not believe that one instead?

                          > There is no materialistic explanation for consciousness, free will, abstract thought or mathematics.

                          Well, so what? All of the "dualist" explanations are equally useless, so I might as well do without any explanation whatsoever.

                          > Instead, those who follow materialism strictly end up denying such things, and we end up with incoherent and inaccurate theories about the world.

                          Inaccurate and incoherent theories that are testable can nevertheless be very useful. Coherent theories that are untestable are useless, unless maybe you can turn them into a religious cult.

                          • yters 1677 days ago
                            Why do you assume dualism is not testable?

                            Also, Darwinian evolution is well known to be false. Just read any modern bioinformatics book. There are a whole host of other mechanisms besides Malthusian pressure, random mutation and natural selection that are used to explain evolution nowadays. As Eugene Koonin says, the modern synthesis is really a sort of postmodern evolution, where there is no ultimate explanation for how it works. Koonin even promotes a form of neo-Lamarkianism. Darwin's unique contributions to the theory of evolution have been experimentally discredited. You can easily disprove Darwin yourself with all the genomic data that is online these days.

                            • gridlockd 1677 days ago
                              > Why do you assume dualism is not testable?

                              I assume that your conception of dualism is not testable because, in your own words, it is "beyond anything in this physical universe".

                              If it's beyond the physical universe, it cannot be observed and therefore it can not be tested. Otherwise, it would be part of physics and the physical universe just like gravity, if only hitherto unknown.

                              > Also, Darwinian evolution is well known to be false.

                              So is the general theory of relativity. Yet, it's "right enough" to allow us to make sufficiently accurate predictions about the physical world.

                              Furthermore, one theory being wrong doesn't make other theories "more right".

                              > There are a whole host of other mechanisms besides Malthusian pressure, random mutation and natural selection that are used to explain evolution nowadays.

                              As you say yourself, that's besides random mutation and natural selection, not instead. It would be a miracle if somehow Darwin could have gotten all of the details right with the tools available to him.

                              Also, in science everything is an approximation to some degree, there are always factors you disregard so that you can actually perform a prediction in a finite amount of time. There's variance and uncertainty in every measurement.

                              > As Eugene Koonin says, the modern synthesis is really a sort of postmodern evolution, where there is no ultimate explanation for how it works.

                              There's no "ultimate explanation" for anything. It's turtles all the way down.

                              • yters 1677 days ago
                                It doesn't follow that for something to be observed it must be part of physics. If the physical universe is a medium, like our computers, they can transmit information from other entities without the entities themselves being embedded in the computers. It sounds like your argument begs the question by first assuming everything that interacts with us must be physical.

                                Darwin's mechanisms do not explain anything in evolution. All of his mechanisms select against increased complexity and diversity, so are useless to explain the origin of species as he originally claimed. Darwinian evolution is dead and died a long time ago. Modern evolution is very much non-Darwinian.

                                • gridlockd 1677 days ago
                                  > It doesn't follow that for something to be observed it must be part of physics.

                                  It's the other way around. If it can be observed, it interacts with matter. If it interacts with matter, it is within the domain of physics, by definition. I don't understand why you have a problem with this.

                                  Physicists are well aware that we do not understand all the interactions and we do keep discovering more and more interesting phenomena, such as quantum entanglement.

                                  > All of his mechanisms select against increased complexity and diversity, so are useless to explain the origin of species as he originally claimed.

                                  I don't know where this criticism comes from, but it sounds like a straw man. Natural selection may select against "complexity and diversity", but random mutation puts "complexity and diversity" back in the game.

                                  • yters 1677 days ago
                                    I think you are equivocating between the laws that govern material interaction and things that interact with matter. They are not the same thing. The laws of physics are developed by breaking matter down to its most uniform and granular components, and then characterizing their interaction. An immaterial soul would not be captured by such an analysis, but its interaction with matter could still be empirically identified.

                                    As for evolution, I see where you are coming from. Randomness, like flipping a coin, is always complex and different. However, if there are only a few specific, long coin sequence you are trying to construct, then flipping a coin does not get you there. There are just too many possibilities within a couple hundred coin flips to check within the lifespan of the universe. And, if you have one of these sequences, then randomly flipping some of the coins will destroy the sequence. Just think about what happens if you flip random bits in a computer program. For the most part, unless you get really, really lucky, it will destroy the computer program. So, while a few random mutations may be very lucky and flip the right nucleotides to create new functionality, the vast majority of mutations are destructive, and will kill off the species before there is the chance to evolve new functionality.

                                    • gridlockd 1676 days ago
                                      > I think you are equivocating between the laws that govern material interaction and things that interact with matter. They are not the same thing. The laws of physics are developed by breaking matter down to its most uniform and granular components, and then characterizing their interaction. An immaterial soul would not be captured by such an analysis, but its interaction with matter could still be empirically identified.

                                      If an "immaterial soul" interacted with matter in an empirically identifiable way, it is part of physics. The forces that interact with matter are themselves not made out of matter. To be precise, matter is only that which has a mass, but there are also particles that don't have mass. Their interactions are nevertheless part of physics and we can measure them at least indirectly. They're not abstract constructs, they're very much part of the physical universe.

                                      In that sense, perhaps the term "materialism" is misleading, because ultimately physics is about forces, not matter. Remember, I'm not using that term for myself.

                                      > However, if there are only a few specific, long coin sequence you are trying to construct, then flipping a coin does not get you there. There are just too many possibilities within a couple hundred coin flips to check within the lifespan of the universe.

                                      There isn't just a single coin though. There's about 10^46 molecules in the ocean[1]. That's a lot of interactions over the span of billions of years.

                                      > And, if you have one of these sequences, then randomly flipping some of the coins will destroy the sequence. Just think about what happens if you flip random bits in a computer program. For the most part, unless you get really, really lucky, it will destroy the computer program.

                                      It depends on the bits flipped. Bits flip all the time[2] and people rarely notice, because they're not necessarily important bits. It also depends on whether the program will abort upon error detection. Without memory protection by the operating system, most programs wouldn't terminate, they'd keep trucking along, perhaps producing some garbage here and there. In fact, the difficult part about memory corruption bugs is that the program often won't terminate until well after the corruption has taken place.

                                      Also, computer programs aren't like organisms exposed to nature. In nature, it's possible that a mutation kills you, but it's far more likely that some physical process or another organism kills you.

                                      > So, while a few random mutations may be very lucky and flip the right nucleotides to create new functionality, the vast majority of mutations are destructive, and will kill off the species before there is the chance to evolve new functionality.

                                      The vast majority of mutations are relatively inconsequential, at least for short-term survival. Our DNA mutates all the time, but also we have evolved error correction, which you probably also will not accept as evolving "by random mutation and selection".

                                      [1]https://www.quora.com/How-many-water-molecules-are-in-the-Ea...

                                      [2]https://en.wikipedia.org/wiki/Cosmic_ray#Effect_on_electroni...

                                      • yters 1676 days ago
                                        So, it doesn't really seem we are disagreeing, it's just a matter of terminology. You seem to want to call every interacting thing 'physics', which you are free to do, but then I'm not sure what the value of the term is. And, you agree we already empirically measure many immaterial things. So you seem to agree that in theory an immaterial soul is an empirically testable hypothesis, in which case I'm not sure what your objection is.

                                        As for the number of interactions, we have trouble reasoning about large numbers. Billions of years and molecules sound like unimaginably large numbers and of similar magnitude to trillions or decillion, even though the latter are many orders of magnitude greater. DNA sequences can be hundreds of billions of base pairs long. So, if we could only depend on random mutation and natural selection, we'd need 4^10^11 attempts to hit a particular sequence, which is more trials than even a multiverse of universes can offer.

                                        • gridlockd 1676 days ago
                                          > So, it doesn't really seem we are disagreeing, it's just a matter of terminology. You seem to want to call every interacting thing 'physics', which you are free to do, but then I'm not sure what the value of the term is.

                                          Everything that interacts with matter is in the domain of physics. So if your concept of a "soul" (or whatever makes you a "dualist") can at least in principle be observed, we can count it in. It's then not "beyond this universe".

                                          It would also raise a lot of questions: Are souls "individuals"? If so, does every organism afford a soul? If so, where do new souls come from when the amount of organisms increases? Where do they go if they decrease? Is there really only one soul spanning all organisms? How can we test for any of these things? What are the consequences?

                                          It's perfectly fine to explore such questions, but I am personally not convinced that something like a soul - within or outside the realms of physics - is at all necessary to explain life or intelligence as the phenomena we can already observe. That's what we disagree on.

                                          > DNA sequences can be hundreds of billions of base pairs long.

                                          Perhaps, but the simplest lifeforms alive today have on the order of hundreds of thousands of base pairs.

                                          > So, if we could only depend on random mutation and natural selection, we'd need 4^10^11 attempts to hit a particular sequence, which is more trials than even a multiverse of universes can offer.

                                          Yes, but DNA didn't just form spontaneously, fully assembled. It formed from simpler precursors, which formed from simpler precursors still, all the way down to proteins which formed from the simplest of molecules. Those precursors don't survive because they can not compete with their successors. They could be forming in the oceans right now, but they won't progress because they'll just get eaten.

                                          There have been experiments done reproducing it up to the "protein formation" step. Anything more complex than that is likely going to take too much time to result in a new lifeform - especially one that could survive ex-vitro.

                                          • yters 1675 days ago
                                            Even staying within materialism, it isn't clear that 'everything is physics'. For example, our computers operate according to the laws of physics, but the physical laws tell us nothing about how they operate. To understand how computers operate, we need to know a lot of extra information besides the physical laws. In other words, physical laws tell us nothing about physical conditions. This is why there are many other scientific disciplines besides physics. So, since this notion of 'everything is physics' doesn't even work within materialism, it is hard to see why it would exclude immaterial entities.

                                            And as you point out, the notion that Darwinian mechanisms can account for evolution of complexity and diversity is pure speculation. Which is why modern evolution theory does not use Darwin's theories. It uses mechanisms that we can see operating in the lab and in bioinformatics, such as bacterial horizontal gene transfer and empirically calculated substitution matrices. And the further that bioinformatic algorithms diverge from Darwin's ideas, the better they perform.

                                            • gridlockd 1675 days ago
                                              There's a difference between the physical laws (that we know) and the "domain of physics". There's a great deal we don't yet know about physics that's nevertheless within its domain.

                                              Not everything is in the domain of physics. "What is the meaning of life, the universe, and everything?" is not in the domain of physics. However, physics (or its subsets chemistry and biology) can explain how life may have emerged.

                                              Physics can't rule out that a god snapped his godfingers 5000 years ago and made it all appear the way that it does now. It can't rule out an immortal soul. It can't rule out that we're in a simulation, or that unobservable fairies are actually behind all the forces in the universe. For those cases, I like to apply Occam's Razor - not because it's true, but because it's practical.

                                              > And as you point out, the notion that Darwinian mechanisms can account for evolution of complexity and diversity is pure speculation.

                                              Random mutation and natural selection do result in complexity and diversity. This is a fact, not speculation. You can try it at home, on your computer. We do have genetic algorithms working on those exact principles. They're generally not preferable to more informed algorithms, because they're slow, but they do work.

                                              > It uses mechanisms that we can see operating in the lab and in bioinformatics, such as bacterial horizontal gene transfer and empirically calculated substitution matrices. And the further that bioinformatic algorithms diverge from Darwin's ideas, the better they perform.

                                              As we can see from computer simulation, pure selection and mutation is inefficient. If nature can find a shortcut mechanism that "Darwinian evolution" itself does not account for, it's not surprising that it would become a dominating factor. However, what makes you believe that such a mechanism itself could not possibly arise from natural selection and random mutation? Is it the improbability? It's a big universe, you know. We're not even talking about the probability of this happening on all planets, but on any planet in the universe.

                                              • yters 1675 days ago
                                                I think you are missing my point. Physical phenomena are not reducible to physical laws, and cannot be explained by the laws. You cannot derive a binary adder from the laws of electricity, and that is why computer engineering is not a field of physics, but is its own domain with its own set of laws.

                                                As for Darwin's mechanisms, I'm not speaking from personal incredulity, but from what I see reading bioinformatics textbooks and what leading biologists write, not what we are taught in high school. Darwin's mechanisms are only given lip service, and when the rubber meets the road in practice they are completely ignored.

                                                • gridlockd 1675 days ago
                                                  > I think you are missing my point. Physical phenomena are not reducible to physical laws, and cannot be explained by the laws. You cannot derive a binary adder from the laws of electricity, and that is why computer engineering is not a field of physics, but is its own domain with its own set of laws.

                                                  Fair enough, let's just leave it at that.

                                                  > As for Darwin's mechanisms, I'm not speaking from personal incredulity, but from what I see reading bioinformatics textbooks and what leading biologists write, not what we are taught in high school. Darwin's mechanisms are only given lip service, and when the rubber meets the road in practice they are completely ignored.

                                                  Okay, I've done some googling. Now let me take a shot at reconstructing what happened here. You read that one paper[1] that quite boldly claims that the CRISPR mechanism proves that Lamarckism is real. You stumbled upon it because it appeared to confirm something that you already wanted to believe: "Darwinism is wrong". You otherwise have no background in bioinformatics.

                                                  The thing is, that paper doesn't prove Darwin wrong at all - he never ruled out other effects besides random mutation and selection. Darwin just wasn't very fond of Lamarck's work, which wasn't very scientific even by the standards of its time.

                                                  The effects of random mutation and selection have been readily reproduced countless times. Lamarckian effects on the other hand have not been observed until epigenetics (a bit of a stretch), HGT (rather uncommon in eukaryotes like us two) or this here CRISPR (more of a bacteria thing). That's despite lots of experiments that have been performed to prove Lamarckism right. If anything, nature has a lot of obvious Darwinism and maybe some not-so-obvious Lamarckism here and there. To quote the paper:

                                                  "Neither Lamarck nor Darwin were aware of the mechanisms of emergence and fixation of heritable variation. Therefore, it was relatively easy for Lamarck to entertain the idea that phenotypic variation directly translates into heritable (what we now consider genetic or genomic) changes. We now realize that the strict Lamarckian scenario is extremely demanding in that a molecular mechanism must exist for the effect of a phenotypic change to be channeled into the corresponding modification of the genome (mutation). There seems to be no general mechanisms for such reverse genome engineering and it is not unreasonable to surmise that genomes are actually protected from this type of mutation."

                                                  Personally, I actually find the prospect of Lamarckian effects in the evolution of "higher" species exciting and plausible. It's not exactly a proven thing that this is common in nature, as you seem to believe.

                                                  • yters 1675 days ago
                                                    Yes, full disclosure, I have become skeptical of Darwin's mechanism once I realize the improbability of it all. I used to be fine with the idea of evolution I was taught in school, but looking into the basic combinatorics makes it all seem highly implausible. Then, I thought if Darwin was correct, his work should at least show up in the mathematically rigorous areas of biology that rely on evolution and need to get results. Not the speculative portions of biology. Bioinformatics, with its reliance on homology, seems to fit the bill.

                                                    So, I'm not saying that Lamarkianism, or any of these other mechanisms are common. I am saying that Darwin's mechanism do not seem to be useful for getting actual bioinformatics work done. For example, look at BLAST's substitution matrix. They started with PAM that is explicitly based on Darwinian assumptions, and it worked badly. The more they made Darwinism less explicit and relied more on the actual data to compute the matrices, the better BLAST performed. And, reading through a bioinformatics book it is full of observations that contradict what one would expect if Darwin were correct. So, I have been unable to find a significant impact of Darwin's work in an area where results matter.

                                                    Let me know if you are aware of any area of biology that is mathematically rigorous and relies explicitly on Darwin's mechanisms of random mutation and natural selection to make discoveries and get results.

                                                    • gridlockd 1674 days ago
                                                      Of course using natural selection and random mutation is a pretty dumb method when you can artificially select and selectively mutate instead.

                                                      It's why we don't use genetic algorithms when we have a better method. It's really expensive to simulate a "natural environment" and run it for hundreds, thousands, or even millions of generations.

                                                      Like I said, if nature found a shortcut - and it has, in the case of HGT or CRISPR - it will use that shortcut. It's just more efficient. The same goes for sexual selection, which Darwin didn't pay special attention to, but which is clearly a very important part of "natural" selection.

                                                      In my view, those shortcuts could have developed through Darwinian processes alone, in simpler lifeforms. If you can't conceive of that being possible, that's fine. We can't exactly prove it right through a simulation that is accurate to life on earth.

                                                      • yters 1674 days ago
                                                        It's not a disagreement over whether Darwinian processes maybe happened somewhere at some time. The question is what is the primary process driving evolution, and what do we see in the lab. Darwinism isn't there, as far as I can tell. So, why some irrelevant processes are taught as the big guiding mechanisms of evolution is a mystery to me. Darwinism, insofar as it's a major part of evolution, is clearly dead. And with certainty it is dead insofar as it proposed to be #the# fundamental explanation of origin of species, as Darwin originally proposed.
                                                        • gridlockd 1674 days ago
                                                          > The question is what is the primary process driving evolution, and what do we see in the lab.

                                                          As far as the "state of the art" is concerned, it's still primarily random mutation and natural selection. I highly doubt that Koonin would disagree on this, by the way.

                                                          > Darwinism isn't there, as far as I can tell.

                                                          What do you mean by "we don't see it in the lab"? Of course you don't see natural selection in the lab, that's oxymoronic.

                                                          We can see randomness in the offspring of, say, lab rats. We can selectively breed them to e.g. make them more susceptible to certain diseases. I'd rather call this Mendelian than Darwinian, but the difference is only in how the selection happens.

                                                          What happens in nature over billions of years obviously can't happen in a lab. It's obviously not sensible to use natural selection as a tool when you can just artificially select or even directly manipulate the genome.

                                                          > So, why some irrelevant processes are taught as the big guiding mechanisms of evolution is a mystery to me. Darwinism, insofar as it's a major part of evolution, is clearly dead. And with certainty it is dead insofar as it proposed to be #the# fundamental explanation of origin of species, as Darwin originally proposed.

                                                          It's not irrelevant at all. Again, the fact that you don't observe the effects of natural selection in lab, or that it isn't useful for bioinformatics doesn't make Darwinism irrelevant as an explanation for the origin of species.

                                                          Even if we accept that known processes like HGT or CRISPR aren't Darwinian and that it may have accelerated some evolutionary processes in some of our ancestors, they couldn't be the primary driver of evolution because so far we haven't observed it in any "higher" species beyond bacteria.

                                                          If you want to speculate that there are more such processes hitherto unknown, that's fine. That doesn't mean a eulogy for Darwinism is warranted at this time.

                                                          • yters 1674 days ago
                                                            I'm not sure how to line up what you are saying above, and my observation that bioinformatics algorithms work better the further they are from the RM+NS assumption, plus all the other discrepancies I see between theory and practice in bioinformatics.

                                                            Sure, Koonin and others can say RM+NS is primary, but I don't see this in the actual work people do that is quantitative and rigorous vs speculative stories.

                                                            Also, I'm not talking about genetic engineering, although the fact we can do genetic engineering vs just randomly jamming nucleotides together in the hopes something happens is also fairly surprising if it was all just RM+NS.

        • mlyle 1682 days ago
          Whacking your head with a hammer seems to take chunks of it away, though, which is a bit of an argument against it being immaterial. ;)
          • yters 1682 days ago
            Eating a hamburger helps me grow, but that doesn't mean I am a hamburger.

            Or, another analogy, if you smash my computer, I won't be able to type this response to you, but that doesn't mean I am my computer.

            If the brain is an antenna for intelligence, then damaging the antenna will damage the signal, but doesn't damage the source.

            • mlyle 1682 days ago
              No, but you're made up of the pieces that come from the hamburger.

              I get that it may be comforting to think that you are, fundamentally, some incorporeal, magical entity both special and immune to the slings and arrows of outrageous fortune... but there's no evidence for this.

              • yters 1682 days ago
                It seems like all the evidence is for this, and on the other hand there is no evidence for the materiality of the mind. The only reason people believe in mind == brain is due to their materialistic fundamentalism, just like the creationists force their theories onto science.
            • visarga 1682 days ago
              > if the brain is an antenna for intelligence

              And what causes the intelligence that is simply being channeled by such a brain? Is it turtles all the way down?

              Let me counter your position with another: the brain is that which protects the body and the genes. Essentially it means to find food, safety and make babies. But in order to do that we have evolved society, language, culture and technology. It's still simply a fight for life against entropy. The brain learns what actions lead to best rewards and what actions and situations are dangerous. And if it doesn't, then death acts like a filter and improves the next generation of intelligent agents to make them more attuned to survival. And it all stems from self replication in an environment shared with many other agents and limited resources.

              You see, my short explanation covers the origin, purpose and evolution of intelligence. The 'brain is an antenna' would work only if you consider the constraints of the environment as the source and the brain as the 'receiver' of signals.

              • yters 1682 days ago
                Your short explanation is full of unfounded speculation, whereas my even shorter explanation relies entirely on direct evidence everyone has access to. The only reason you feel your explanation gets a pass is because you speculate in the name of materialism, which in itself is an incoherent philosophy.
                • mkl 1682 days ago
                  I'm pretty sure I've never even heard of such evidence, let alone seen any. Philosophy is irrelevant here; what do you propose does the thinking if not the brain? Where is the direct evidence you claim?
                  • yters 1682 days ago
                    Consciousness, free will, abstract thought, mathematics. All things that cannot be reduced to matter. All our scientific theories are filtered through all or most of the above, so the above list is much more directly evidenced than anything the sciences say.
                    • mkl 1682 days ago
                      What do you propose does the thinking if not the brain?

                      The only one of those things with a concrete definition is mathematics, which we can do with computers in practice or in principle (I am a mathematician). Computers are made of matter.

                      • yters 1682 days ago
                        Computers do not do mathematics. Rather, they compute a set of rules we give them, which may or may not be consistent. It is up to the programmer to give them rules that are consistent and correspond to some abstract mathematical concept. However, the computational rules themselves are not mathematics.

                        I am not sure what does the thinking, but whatever it is, it cannot be the brain if thinking consists of uncomputable and non physical faculties.

                        • mkl 1682 days ago
                          That's a rather strange and nebulous "definition" of mathematics, and quite contrary to how it actually works; the rules define the abstract concepts that build mathematics, and the rules are followed by people too whenever they do maths, because they are the maths. For example, "numbers" are an abstract mathematical concept, totally unphysical, defined purely by rules, and used by people and computers. Actual mathematics can certainly be done with computers, and is, every day, by mathematicians (among others).

                          There's nothing demonstrably uncomputable or unphysical about thinking. On the contrary, the brain is an enormously complex network of neurons and synapses, clearly intricate enough to physically perform all known mind functions, and all in principle simulatable on a powerful enough computer system. Your magical antenna idea has no basis in reality, it is massively outweighed by real-world evidence, and you have failed to present any actual evidence to support it.

                          • yters 1681 days ago
                            Well, I guess that's that :)

                            I cited a number of first hand pieces of evidence that are more directly evident to everyone than the speculation you provide, and you accuse me of not providing any evidence. I guess there is nothing further to be said.

                    • visarga 1680 days ago
                      > Consciousness, free will, abstract thought, mathematics

                      - I defined consciousness in a concrete way. it's adaptation to environment based on reinforcement learning and designed by evolution, for survival

                      - free will - it's just as real as 1000 angels dancing on a pinhead. Nothing is beyond physical. If it has an effect in this world, it's physical. If it doesn't, then it's just fantasy. Philosophers have tried to settle the 'mind body problem' for hundreds of years and finally conceded that dualism is a misguided path. What you consider free will is just randomness (stochastic neural activity) filtered through experience.

                      - abstract thought - a form of data compression, useful for survival. We use abstractions in order to compress experience in a way that can be applied to novel situations. It would be too difficult to learn the best action for each situation, especially that many situations are novel. So we model the world, compute future outcomes before acting, then act. If it were not so we would never get to learn to drive a car because it would take too many crashes to learn driving the hard way. But we learn to drive without dying 1000 times, and we do many things with few mistakes because we can model in an abstract way the consequences of our current situation and actions.

                      - mathematics - a useful model we rely on, but it's not absolute. It could be formulated in different ways, the current formulation is not the only possible one, nor is irreducible to matter. It all started when people had more sheep to count than fingers on their hands. The rest is a gradual buildup of model creation.

                      You are attached to a kind of transcendental thinking which is just too burdensome on Occam's razor. You presuppose much more than necessary. Human experience can be explained by the continuous loop of perception, judgement and action, followed by effects and learning from the outcomes.

                      Perception is a form of representation of sensorial information in an efficient and useful way. Judgement is the evaluation of the current situation and possible actions (based on instinct and past experience). Acting is just learned reflex controlled by judgement. They are all actually implemented in neural networks. They process information in a loop with the environment.

                      You don't need any transcendental presupposition to understand consciousness, free will, abstract thought and mathematics. Sorry to be so blunt, but we're evolving past medieval thinking into the bright future of AI, and many things that seemed magical and transcendental have been proven to be just learning (error minimisation).

                      I have been like you once, a zealot of spiritual thinking. After many decades and life experiences now I have a much better way to grasp the situation. I don't rely on magic or divinity or anything that surpassed the physical in the way I see the world. You probably will come around at some point and realise how little explanative power you had in the old theories, and that the new way of thinking is actually just as poetic as the old one. Nothing was lost, you don't need to defend the old ways. If you'd decide to learn more about AI, RL and game theory you will be able to philosophically appreciate the wonders of life even more than now. Thousands of years of spiritual tradition stand in contradiction to billions of years of evidence from evolution and the amazing progress of the last decades in understanding the way things work.

                      • yters 1679 days ago
                        How can you falsify your physicalist explanations?
            • sanxiyn 1682 days ago
              Brain functions under electromagnetic shielding, so are you proposing some new physical force?
              • yters 1682 days ago
                I'm proposing the mind is non physical and interacts with the physical brain, analogous to a signal and antenna.
        • gbrown 1682 days ago
          [Citation Needed]
          • dnadler 1682 days ago
            That's precisely the point, right?
            • gbrown 1681 days ago
              The above poster was making magical assertions as if they were well established without presenting evidence.
              • yters 1679 days ago
                The evidence is literally the very things by which we perceive all other scientific evidence: consciousness, abstract thought, mathematics. All three are clearly non physical. People have to make up elaborate, incoherent explanations to try and explain how they are physical. The fact it is so hard, even impossible, to do shows the items are not physical.
                • gbrown 1675 days ago
                  Consciousness is clearly nonphysical? No offense, but that's simply nonsense. Why is consciousness modified by drugs? Why can consciousness be damaged by physical trauma? When you can interact with something in the physical world, that is a pretty strong indication that you're dealing with a physical phenomenon

                  In fact, it's not even clear what a "non-physical" phenomenon is if you drill deep enough. The existance of abstractions doesn't change granular reality. For example, a neural network implemented on a computer is highly abstracted, but still implemented on physical silicon.

                  I suggest you look up the philisophical history of dualism - you're a bit behind.

                  • yters 1675 days ago
                    If I turn off your computer you cannot respond to me. Does that mean you are your computer?
                    • gbrown 1675 days ago
                      If I smash my computer to atoms, I've destroyed it entirely - not severed a mysterious connection to its incorporeal self.

                      Your assertion is essentially: "I am conscious, therefore consciousness is non-physical". The conclusion does not follow from the observation.

                      • yters 1675 days ago
                        Returning to your counter argument, do you see the connection with my computer analogy?
                        • gbrown 1675 days ago
                          Yes - your computer analogy conflated me (a human person, existing in the physical world) with a nonphysical intelligence, of which we have no evidence and which isn't even characterized in a meaningful way - it's just a placeholder for "mystical stuff I feel but don't understand".

                          If you want to say you believe what you do on faith, or that it's your own spiritual belief and therefore none of my business, I won't begrudge you that and I won't bother you about it. You seem to be asserting, however, that there exists empirical evidence of non-corporeal souls (though you didn't use this word). I disagree on that point, vigorously.

                          • yters 1674 days ago
                            Not quite. Your counter argument is "doing things to the brain affects our mind, which shows our mind is physical." The analogy I offer shows that the fact doing something to X affects Y does not mean that X is Y.

                            So, the evidence you offer that the mind is physical does not actually show the mind is physical.

                            • gbrown 1674 days ago
                              You've either failed to understand, or you're being intentionally obtuse.

                              > Not quite

                              Yes quite.

                              > The analogy I offer shows

                              It does not, for reasons I explained.

                              You're observing physical phenomena, and concluding there must be magic hiding behind them.

                              • yters 1674 days ago
                                Physical and nonphysical are beside the point. The point is X influences Y does not mean Y is X.

                                My claim is that Y is not X.

                                You argue that X influences Y, therefore Y is X.

                                I provide a counter example that shows you cannot infer Y is X just because X influences Y. You need another premise to demonstrate from your example that Y is X.

                                • gbrown 1674 days ago
                                  I see, you want to (poorly and incompletely) reduce my points to a syllogism, and point out that I'm not formally proving the physical nature of the mind (which is more appropriately addressed by the last few hundred years of science than a toy logic problem).

                                  Meanwhile, you have not presented any evidence or argument (or even actual definition) of your magical antenna hypothesis.

                                  "You can't falsify my unfalsifyable hypothesis, therefore it must be true" is not reason, especially when your hypothesis is in no way needed to explain the oberved phenomena (and is, in fact, inconsistent with all empirical observation).

                                  You don't get to play stupid word games and declare that therefore magic is real.

                                  Or rather, you do, but rational people will feel free to ignore you, as I am about to do.

                                  • yters 1661 days ago
                                    I offered you a list of evidence a couple times.
  • Traster 1682 days ago
    I'd love to see some metrics on whether this idea has any merit. Because obviously any problem that can be done on a massive wafer can be done on multiple smaller wafers. The question is, are the compromises to get something working on a massive wafer killing performance to the point where just splitting the problem up efficiently would have been better. It's also important to think: Stuff isn't just going to happen to be the size to fit on this wafer, it's either huge in scale, or not. If it's not, you don't need this wafer, if it is, you probably need to partition your problem onto multiple copies of this wafer anyway. Take their comparison to a Nvidia GPU. It might be 57 times bigger, but I can rent 1000 Nvidia GPUs from AWS at the drop of a hat (more or less).

    So yes, maybe they've done some interesting stuff, but they need some decent benchmarks to show off before we can really distinguish between whether this is the Boeing 747 Dreamliner or a Spruce Goose.

    • dgacmu 1682 days ago
      You get much higher bandwidth (at lower power cost) between chiplets than if you used a multi-chip design or even an interposer.

      The drawback is that you suffer a really interesting thermal problem and have to figure out what to do with the wafer space that doesn't fit in your square -- probably creating lower-scale designs that you sell.

      The second drawback is that you can't really match your available computation to memory in the same way you can with a more conventional GPU. So you have to be able to split your model across chips and train model-parallel. The advantage is that model-parallel lets you throw a heck of a lot more computation at the problem and can help you scale better than using only data parallelism.

      Model-parallel training is typically harder than data-parallel because you need high bandwidth between the computation units. But that's exactly what Cerebras's design is intended to provide.

      You also have a yield management issue, where you have to build in the capability to route around dead chips, but that's not too nasty a technical detail. But if your "chip"-level yield (note that their chip is still a replicated array of subunits) is too low, it kills your overall yield. So they're going to be conservative with their manufacturing to keep yield high.

      It's not obviously broken, but it's certainly true we need benchmarks -- but not just benchmarks, time for people to come up with models that are optimized for training/inference on the cerebras platform, which will take even longer.

      • nrp 1682 days ago
        Why even make the final giant chip rectangular? I get that the exposure reticles are rectangular, but since this is tiling a bunch of connected chiplets, why not use the full area of the wafer?
        • petra 1682 days ago
          You need to make sure you're io lines are imprinted on the edge of the wafer.

          It's much easier to do by making the io lines at the ends of each "chip" , vs a circle that cuts in the middle of the "chip".

    • streetcat1 1682 days ago
      The problem is not if it bigger than nvidia (which is kind of a strange metric). It is when Amdhal law kicks in.

      https://en.wikipedia.org/wiki/Amdahl%27s_law

    • pirocks 1682 days ago
      Sorry for being pedantic , but there is no Boeing 747 Dreamliner, perhaps you meant 787?
  • voldacar 1682 days ago
    This is very neat, but I really wish the press would say "neural net" instead of "AI". "AI" just means a computer program that has some ability to reason about data similarly to a human, neural nets are a subset of that

    I guess "AI" gets you the clicks though

    • dkersten 1682 days ago
      > "AI" just means a computer program that has some ability to reason about data similarly to a human

      Not even similarly to a human, especially if you talk to the average business person. Anything software does that is in some way seen as "intelligent" is seen as AI by a layperson. I recently encountered a situation where someone I worked with was demoing automated triggering of actions based on simple conditions (eg metric A > metric B) and the business person being demoed to said something to the effect of "oh so its AI!".

      Anything that is artificially intelligent in some form is seen as AI to someone to the point where the term is quite meaningless.

      Machine Learning is a better term because its somewhat more specific and we're not even close to AGI yet, so if it were up to me, I'd just retire the term AI altogether as not being useful.

      • derefr 1682 days ago
        > Anything software does that is in some way seen as "intelligent" is seen as AI by a layperson.

        I mean... isn't that the technically-correct definition? It's artificial. It's intelligent. So it's artificial intelligence.

        Academia can define "AI" as a term however it likes, but in practice people are always just going to interpret it as the brute juxtaposition of the two adjectives that compose it.

        • dkersten 1682 days ago
          Sure, I'm not arguing against it, just pointing out what we think as AI isn't what a non-tech person sees as AI.

          > in practice people are always just going to interpret it as the brute juxtaposition of the two adjectives that compose it.

          Indeed.

    • paulkrush 1682 days ago
      Agreed! Something like this could be really useful in accelerating today's networks, and the AI word just gets in the way.
    • curiousgal 1682 days ago
      > neural net

      Which is basically matrix multiplication.

      • make3 1682 days ago
        Call it "just function approximation" if you want, but calling it just matrix multiplication removes the only important part, that the matrix's value are found through tiny adjustments made to generalize the empirical distribution of a dataset, and end up approximating it's function
      • ericd 1682 days ago
        Well, plus nonlinearities and backprop/gradient descent, to turn those matrix multiplications into a universal function approximator.
      • voldacar 1682 days ago
        Everything is matrix multiplication at some level!
        • ant6n 1682 days ago
          What about matrix addition?
        • guenthert 1682 days ago
          If you have a hammer ...
          • ivalm 1682 days ago
            In a sense of Hamiltonians as linear operators on quantum states (such as state of the universe)....
  • CaliforniaKarl 1682 days ago
    This is one of the first of what will probably be a number of announcements out of the Hot Chips conference, happening now through Tuesday on the Stanford campus.

    https://www.hotchips.org

  • groundlogic 1682 days ago
    > The 46,225 square millimeters of silicon in the Cerebras WSE house 400,000 AI-optimized, no-cache, no-overhead, compute cores

    > But Cerebras has designed its chip to be redundant, so one impurity won’t disable the whole chip.

    This sounds kinda clever to a semi-layperson. Has this been attempted before? Edit: at this scale. Not single-digit cores CPUs being binned, but 100k+ CPU core chips with some kind of automated core-deactivation on failure.

    • baybal2 1682 days ago
      > This sounds kinda clever to a semi-layperson. Has this been attempted before?

      Yes, by Trilogy Systems, and it went bust spectacularly. It raised +200m plus of capital (back in seventies!) and turned out to be the biggest financial failure in Silicon Valley's history.

      https://en.m.wikipedia.org/wiki/Trilogy_Systems

      • groundlogic 1682 days ago
        First: thanks, this was exactly the kind of historic knowledge I hoped would show up in this thread!

        Gene Amdahl ran this; geeze, no surprise they got funded.

        Do you happen to know how many "compute units" this chip was designed to handle?

        https://en.m.wikipedia.org/wiki/Trilogy_Systems

        > These techniques included wafer scale integration (WSI), with the goal of producing a computer chip that was 2.5 inch on one side. At the time, computer chips of only 0.25 inch on a side could be reliably manufactured. This giant chip was to be connected to the rest of the system using a package with 1200 pins, an enormous number at the time. Previously, mainframe computers were built from hundreds of computer chips due to the size of standard computer chips. These computer systems were hampered through chip-to-chip communication which both slowed down performance as well consumed much power.

        > As with other WSI projects, Trilogy's chip design relied on redundancy, that is replication of functional units, to overcome the manufacturing defects that precluded such large chips. If one functional unit was not fabricated properly, it would be switched out through on-chip wiring and another correctly functioning copy would be used. By keeping most communication on-chip, the dual benefits of higher performance and lower power consumption were supposed to be achieved. Lower power consumption meant less expensive cooling systems, which would aid in lower system costs.

        Edit: '"Triple Modular Redundancy" was employed systematically. Every logic gate and every flip-flop were triplicated with binary two-out-of-three voting at each flip-flop.' This seems like it should it should complicate things quite a bit more dramatically.. they were doing redundancy at a gate-level, rather than a a CPU-core level.

    • MisterTea 1682 days ago
      Remember three core AMD chips? They were quad core chips where one core was non functional or didn't meet spec. They simply disable the bad core and sell it as a three core. Though it didn't do well as who wants to buy a "broken" quad core?
      • opencl 1682 days ago
        This is how most large-ish chips are sold. AMD no longer sells 3 core CPUs but they sell 6 and 4 core models which are partially disabled 8 core dies. The 6 core even seems to be their best selling model because it has a very good price:performance ratio. The Radeon 5700 is a cut down 5700XT, RTX 2060 is a cut down 2070, etc.
    • vardump 1682 days ago
      >> But Cerebras has designed its chip to be redundant, so one impurity won’t disable the whole chip.

      > This sounds kinda clever to a semi-layperson. Has this been attempted before?

      I think every large chip is like that nowadays. It's just a matter of degree.

    • rwmj 1682 days ago
      Every time you see a CPU with 8 cores and one next to it with 6 cores at a lower price, they are generally the same chip but the 6 core one had 2 faulty cores which were disabled at the fab. The reason for the price difference is that chips with all good cores happen infrequently.

      Designing wafer-level integration is not new either. It was in fact very fashionable in the 80s (although not very successful): https://en.wikipedia.org/wiki/Wafer-scale_integration

    • robin_reala 1682 days ago
      Yep, it’s standard practice. The old AMD tri-core chips for example were all physically four cores with a (usually) broken one disabled. Modern AMD chips use multiple chiplets on a single board to extend yields.
      • happycube 1682 days ago
        ... and some of the new Epyc chips use only two cores per die out of eight. Customers still come out ahead since those chips have far more cache than a chip with 2 fully enabled 8 core dies.
    • rrss 1682 days ago
      Yes, every chip of moderate size does this. Selling floorswept chips as lower performance SKUs is very common.
      • groundlogic 1682 days ago
        Sure, that practice is pretty wellknown.

        But at this scale?

    • ryacko 1682 days ago
      It just has to detect damage and route around it. It is just an orchestration scheme in miniature.

      Relevant XKCD: https://xkcd.com/1737/

  • wolf550e 1682 days ago
    whitepaper: https://www.cerebras.net/wp-content/uploads/2019/08/Cerebras...

    I don't expect good yields from a chip that takes up the whole wafer. They must disable cores and pieces of SRAM that are damaged. How is this programmed?

    • keveman 1682 days ago
      > How is this programmed?

      Full disclosure: I am a Cerebras employee.

      There is extensive support for TensorFlow. A wide range of models expressed in the TensorFlow will be accelerated transparently.

      • paulsutter 1682 days ago
        He was asking about the implications for yields. Do you route around bad dies/cores, and what are the implication for programming and performance?

        For everyone else: normally a wafer is divded into dies, each of which (loosely) are a chip. Yield is a percentage of good parts and it's very unlikely that an entire wafer is good. Gene Amdahl estimated that 99.99% yield is needed for successful wafer scale integration:

        https://en.wikipedia.org/wiki/Wafer-scale_integration

      • gwern 1682 days ago
        Looking at the whitepaper, I'm a little surprised how little RAM there is for such an enormous chip. Is the overall paradigm here that you still have relatively small minibatches during training, but each minibatch is now vastly faster?
        • ivalm 1682 days ago
          IIRC they use batch size = 1 and each core only know about one layer. Which is to say this thing has to be trained very differently from normal SGD (but requires very little memory). There is also the issue that they rely on sparseness, which you get with relu activations, but if, for example, language models move to gelu activations they will be somewhat screwed.
        • IshKebab 1682 days ago
          It's because it's SRAM, not DRAM. Think how much L3 cache your processor has. A few MB probably. That's what this chip's memory is equivalent to.
        • morphle 1682 days ago
          We have up to 160 GB SRAM on our WSI. The rest of the transistors can be a few million cores or reconfigurable Morphle Logic (an open hardware kind of FPGA)

          Our startup has been working on a full Wafer Scale Integration since 2008. We are searching for cofounders. Merik at metamorphresearch dot org

        • Veedrac 1682 days ago
          “full utilization at any batch size, including batch size 1”

          https://www.cerebras.net/

          • gwern 1682 days ago
            That doesn't really mean anything. It (and any other chip) had better be able to run at least batch size 1, and lots of people claim to have great utilization... It doesn't tell me if the limited memory is part of a deliberate tradeoff akin to a throughput/latency tradeoff, or some intrinsic problem with the speedups coming from other design decisions like the sparsity multipliers, or what.
            • Veedrac 1682 days ago
              Most of the chip is already SRAM, I'm not really sure what else you would expect?

              18 GiB × 6 transistors/bit ≈ .93 trillion transistors

              • gwern 1682 days ago
                Well, it could be... not SRAM? It's not the only kind of RAM, and the choice to use SRAM is certainly not an obvious one. It could make sense as part of a specific paradigm, but that is not explained, and hence why I am asking. It may be perfectly obvious to you, but it's not to me.
                • Veedrac 1682 days ago
                  You basically have the option between SRAM, HBM (DRAM), and something new. You can imagine the risks with using new memory tech on a chip like this.

                  The issue with HBM is that it's much slower, much more power hungry (per access, not per byte), and not local (so there are routing problems). You can't scale that to this much compute.

                  • gwern 1682 days ago
                    But HBAM and other RAMs are, presumably, vastly cheaper otherwise. (You can keep explaining that, but unless you work for Cerebras and haven't thought to mention that, talking about how SRAM is faster is not actually an answer to my question about what paradigm is intended by Cerebras.)
                    • Veedrac 1681 days ago
                      They say they support efficient execution of smaller batches. They cover this somewhat in their HotChips talk, eg. “One instance of NN, don't have to increase batch size to get cluster scale perf” from the AnandTech coverage.

                      If this doesn't answer your question, I'm stuck as to what you're asking about. They use SRAM because it's the only tried and true option that works. Lots of SRAM means efficient execution of small batch sizes. If your problem fits, good, this chip works for you, and probably easily outperforms a cluster of 50 GPUs. If your problem doesn't, presumably you should just use something else.

      • zackmorris 1682 days ago
        Do you support MATLAB or GNU Octave? I'm looking for the level of abstraction below TensorFlow because I find pure matrix math to be more approachable. Admittedly, I'm not super experienced with TF, so maybe it can encapsulate them.

        Also, do you have a runtime to run the chip as a single 400,000 core CPU with some kind of memory mapped I/O so that a single 32 or 64 bit address space writes through to the RAM router through virtual memory? I'm hoping to build a powerful Erlang/Elixer or Go machine so I can experiment with other learning algorithms in realtime, outside the constraints of SIMD-optimized approaches like neural nets. Another option would be 400,000 virtual machines in a cluster, each running a lightweight unix/linux (maybe Debian or something like that). Here is some background on what I'm hoping for:

        https://news.ycombinator.com/item?id=20601699

        See my other comments for more. I've been looking for a parallel machine like this since I learned about FGPAs in the late 90s, but so far have not had much success finding any.

      • streetcat1 1682 days ago
        So why are you not publishing benchmarks against nvidia?
        • sanxiyn 1682 days ago
          Cerebras is an MLPerf member, so they will publish MLPerf numbers some day and then we will talk.
          • streetcat1 1682 days ago
            They probably run the benchmark (I guess many times, and not only against nvidia). But yet it is not in the white paper.

            I was an SE at an hardware company and it is the first thing that you do as a product manager.

      • The_rationalist 1682 days ago
        How do you achieve this? Tensorflow does not support openCL.
        • rrss 1682 days ago
          I'm sure they wrote a new backend for tensorflow that targets their API. Since the hardware is only for ML, it wouldn't make sense for them to bother trying to implement OpenCL.
    • groundlogic 1682 days ago
      If you divide that giant piece of silicon into 400k processors, and then only use the ones that actually work...

      I wonder if they figure that out every time the CPU boots, or at the factory. At this scale, maybe it makes sense to do it all in parallel at boot. Or, even dynamically during runtime.

      There may be edge case cores that sort of work, and then won't work at different temps, or after aging?

      • kingosticks 1682 days ago
        They'll aim to catch the logic failures and memory failures during wafer test at the factory. This testing is done at room temperature and at hot. There are margins built in to allow for ageing. If they want to ship a decent product they'll also need to repeat the memory testing every boot, and ideally during runtime but maybe the latter isn't a big deal for something like this.

        EDIT: to add a bit more and possibly address the original question (which I think keveman may have misunderstood), there will usually be some hardware dedicated to controlling the chip's redundancy. Part of that is often a OTP fuse-type thing that can be programmed during wafer test to indicate parts of the chip that don't work. Something (software or hardware) will read that during boot and not use those parts of the chip.

        • groundlogic 1682 days ago
          Sure, that makes sense.

          With this many cores it seems like the probability that a core dies during a multi-hour job (or in case it's used for inference, during a very long-lived realtime job) is pretty high, so the software in all layers would need to handle this kind of exception. They probably don't, today, since we haven't seen a 400k core chip before.

  • navaati 1682 days ago
    Wow ! 56 (!) times more core than the biggest Nvidia Volta core, a single 46 square millimeters chip (no chiplets like recent AMD chips), and an incredible wooping 18 GB of SRAM (that’s like 18GB of CPU cache basically) !

    I don’t know if you guys are used to that scale but I find it monstruous !

    • Tuna-Fish 1682 days ago
      It's wafer scale integration. All the major AI learning chips are probably heading this way.

      That is, CPUs and GPUs are made on silicon wafers, typically 30cm in diameter, with many chips built on a single wafer and then cut from the wafer and packaged into products. The idea of wafer-scale integration is that instead of cutting the wafer, you just build all the communication between the computation elements in the wafer into the wafer and therefore get a "network on wafer", a single very massive "chip".

      The reason to do this is that the lowest-energy way to communicate data is on-chip, and in major AI learning setups, by far the most of the power is spent on data movement. By making the largest possible chip, you minimize the data movement power requirements and spare more power for the computation.

      • sanxiyn 1682 days ago
        Wafer scale integration does seem to be a logical conclusion of integrated circuit. When Jack Kilby invented IC it indeed was a stroke of genius.
        • IshKebab 1682 days ago
          Not necessarily. The problem is power density. Imagine jamming 56 CPUs that close to each other. Hard to fit the fans in!

          Apparently this is going to be water cooled.

          • rbanffy 1682 days ago
            It's 1.5KW of thermal output, about 5 times the biggest Xeon.
    • fartismartass 1682 days ago
      It is monstrous. But the way I read it, it is actually similar to chiplets. Only chiplets are cut from the wafer, individually tested and then combined for the final chip. Here all parts stay on the wafer and faulty ones are routed around.
      • rbanffy 1682 days ago
        Since they are wiring the chips on the wafer and routing around defects, I imagine a chiplet design on a wafer-sized interposer would help them deal with yield issues. I wonder what their competition will do. It's certainly possible for Nvidia or AMD to bundle GPU dies on top of large interposers and TSMC has already shown large ones, though nothing on this scale.
    • 14 1682 days ago
      I was a little confused when I read your comment and 46 square millimeters. I am guessing you meant 46k square millimeters as the article stated 46,225 square millimeters which yes you are right that is monstrous. Very cool! As a care giver I often discuss with my clients the cool things that have come into existence in their life time and wonder when I get old what things will my children reflect back on and say grandpa you were alive when “x” was invented how neat. Personally I am hoping it is fusion power.
      • duderific 1682 days ago
        The young people I work with are always shocked when I tell them there was no internet (or at least, nothing terribly useful at the consumer level) when I went to college.
      • navaati 1682 days ago
        Ah, stupid me :). I had no idea of the scale of things and in my native locale the comma is the decimal point...
      • thfuran 1682 days ago
        I'll put my vote in for commodity room-temperature superconductors.
  • lnsru 1682 days ago
    I am just wondering, why there is no similar product as FPGA array. As far as I know, it’s the cheapest way to see if there is product/market fit for a semiconductor product. High speed transceiver as well as memory controllers are included in FPGA. This single wafer approach looks very interesting to me. I was intern at Infineon some time ago and was working on device distribution characterization across 200 mm wafer. The chips in the middle were 2-3 better performing than these in the border. So how Cerberas’s chip manages this issue? Middle parts are throttled or low performing areas near wafer’s boarder are disabled? How much does it cost?.. I can imagine it being shipped on thermal pad with liquid nitrogen cooling bellow. There must be some wires bonded for interface to the host. Very interesting technical project. I am very curious what are the clients for such huge specialized chip.
    • morphle 1682 days ago
      An FPGA would make it much more generic a product so you can sell to more markets. But you would loose a factor 200 (a factor 1000 with traditional FPGA design) to make transistors reconfigurable.

      If you leave the wafer intact, you get 70,000 mm2. Cerebras cut off almost half of the wafer.

      At 7nm you would get 2.7 trillon transistors with 300mm wafers, more with 450 mm wafers. You disable those reticle-sized areas with impurities or damage at runtime.

      You can cool it with immersive liquid cooling. Instead of wire bonding you can stack chips on top or use free space optics [1].

      [1] https://www.youtube.com/watch?v=7hWWyuesmhs

  • sp332 1682 days ago
    There's a photo on the homepage: https://www.cerebras.net/ It's just about 8.5" across. You could just barely print a 1:1 photo of it on a letter-sized sheet of paper, with less than a half-milimeter margin on each side.
    • ohazi 1682 days ago
      I wonder if they bond it to a copper slab. At that scale, the tiniest amount of PCB flex would probably shatter the die/wafer...
      • oITAZt 1682 days ago
        They do not. They have "developed [a] custom connector to connect wafer to PCB": https://imgur.com/a/sXxGbiD
        • ohazi 1682 days ago
          I think "cold plate" is the slab.
          • oITAZt 1682 days ago
            Yes, I it is probably a copper slab (they never mentioned), but there is no electrical connection, as the rest of the slides make clear: https://imgur.com/a/Rbd7e4D

            It just provides a thermal connection for water cooling. The electrical connection is made through the PCB (and probably through a thick copper plate on the opposite side of the PCB)

  • goldemerald 1682 days ago
    Is there any comparison of training speed of a neural net for one of these chips and a typical one? I'd be interested to see how long it takes to train an imagenet classifier on one of these compared to other hardware.
  • samcheng 1682 days ago
    I remember watching the (excellent!) Tesla Autonomy Day presentation, and an analyst was asking about 'chiplets' but was met with a bit of dismissal from the Tesla team.

    Maybe THIS is what the analyst had in mind! Pretty cool stuff, although I question how interconnect / inter-processing-unit communication would work.

    Notably, no benchmarks in the press release...

  • rightbyte 1682 days ago
    "The Cerebras software stack is designed to meet users where they are, integrating with open source ML frameworks like TensorFlow and PyTorch"

    What's the instruction-set? They don't say.

    I assume you need to program in some DSL VERILOG-ish macro-assembler for that monster contraption. Python is probably not what works well ...

    • sanxiyn 1682 days ago
      Your CoreML model can run on Apple Neural Engine, but Apple doesn't expose that hardware's instruction set. This probably works similarly.
  • rbanffy 1682 days ago
    • bhassel 1682 days ago
      > In another break with industry practice, the chip won’t be sold on its own, but will be packaged into a computer “appliance” that Cerebras has designed. One reason is the need for a complex system of water-cooling, a kind of irrigation network to counteract the extreme heat generated by a chip running at 15 kilowatts of power.

      15 kW, yikes.

      • dmitrygr 1682 days ago
        So, Azul Systems 2.0? clever(ish) hardware with good(ish) results, for too much $$$ for anyone to actually buy?
        • rbanffy 1682 days ago
          The main difference is that what Azul built had more limits - once you run your workload well enough, there is little incentive to have more compute power.

          When it comes to ML, the more compute power you throw at it, the better.

  • natpalmer1776 1682 days ago
    I know very little about this sub-field, but from a layman's perspective, and given competition between Intel and NVidia in the field of deep learning, I would not be surprised if Intel tried to acquire or takeover this company for their single-chip designs.
  • ineedasername 1682 days ago
    Would this really be more efficient in term of cost/performance? It seems the specialized nature of the chip push the price high enough that you could build equivalent systems with traditional hardware for the same or less, and it would all be known-quantities rather that working with something brand new and not as well understood.
    • mv4 1682 days ago
      Specialized AI chips don't seem like a very good business idea to me.

      The way we do things (in AI) today - we may be doing things completely different tomorrow. It's not like there's a standard everyone has agreed on.

      There is a very real risk these specialized, expensive devices will go the way of the Bitcoin ASIC miner (which saturated secondary markets at a fraction of its original cost).

      Source: I do ML consulting and build AI hardware.

      • sanxiyn 1682 days ago
        Isn't BLAS a standard everyone has agreed on?

        Making a matrix multiplication accelerator seems a pretty safe bet to me. I am less sure about sparsity optimization, but I guess it still works for dense matrixes even in the worst case.

      • p1esk 1682 days ago
        The way we do things in AI today is multiplication of two large matrices. Just like we did it 30 years ago: http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
        • ivalm 1682 days ago
          Sure, but Cerebras isn't just multiplying two large matrices, they are multiplying two large very sparse matrices, relying on ReLU activation to maintain sparcity in all of the layers. We already have BERT/XLNet/other transformer models move away from ReLU to GELU which do not result in sparse matrices. "Traditional" activations (tanh, sigmoid, softmax) are not sparse either.
          • p1esk 1682 days ago
            Good point. I think it's a safe bet to focus on dense dot product in hardware for the foreseeable future. However, to their defense:

            1. It's not clear that supporting sparse operands in hw would result in significant overhead.

            2. DL models are still pretty sparse (I bet even those models with GELU still have lots of very small values that could be safely rounded to zero).

            3. Sparsity might have some benefits (e.g. https://arxiv.org/abs/1903.11257).

        • mv4 1682 days ago
          While the theoretical innovations have mostly been incremental, there has been a lot of progress in the development of "light" deep learning frameworks - so the tasks that previously required massive GPUs can now run on your phone. And this trend will continue.
          • p1esk 1682 days ago
            Last I checked all those light frameworks still have to do good old matrix multiplications. What's changed?
            • mv4 1682 days ago
              I see your point. Fundamentally, the same multiplications.

              However, if we look at TF Lite, for example - its internal operators were tuned for mobile devices, its new model file format is much more compact, and does not need to be parsed before usage. My point is - the hardware requirements aren't growing; instead, the frameworks are getting optimized to use less power.

              • p1esk 1682 days ago
                I wish this was the case. 5 years ago I could train the most advanced, largest DL models in reasonable time (few weeks) on my 4 GPU workstation. Today something like GPT-2 would probably take years to train on 4 GPUs, despite the fact that GPUs I have now are 10 times faster than GPUs I had 5 years ago.
              • sanxiyn 1682 days ago
                This seems targeted for training, not inference. It definitely seems to me compute need is growing for training. (Is TF Lite even relevant at all for training?)
    • morphle 1682 days ago
      The energy cost of the interconnects between many chips is much higher than between the equivalent circuits on a wafer scale integration. The performance of the interconnect between such circuits is much higher. The cost of packaging is 1/78th with wafer scale.
  • kingosticks 1682 days ago
    How do they go and package something this big? Is this supposed to be used or is it just a headline?
  • temac 1682 days ago
    Impressive but is this a good idea?

    Either you need a crazy thermal solution, or it must be way less thermally dense than smaller chips can be. And is it really that much of an advantage to stay on chip compared to going through a PCB, if distances are crazy?

    • deepnotderp 1682 days ago
      They are using vertical water pipes for cooling to solve the thermal density problem. At 15 kw they're doing OK there.
    • gnode 1682 days ago
      I'm surprised this hasn't been done before; a monolithic wafer can achieve denser interconnects, and is arguably simpler than a multi-chip module.

      On the other hand, multi-chip modules can combine the working chips in low-yield wafers, whereas a monolithic wafer would likely contain many failed blocks, uselessly taking up space / distancing the working chips.

      Cooling isn't really a problem, as the TDP scales with area as in other chips. Water cooling or heat pipes can transport the heat away to a large heat sink. 3D / die-stacked chips have a harder cooling problem, potentially requiring something like an intra-chip circulatory system.

    • ivalm 1682 days ago
      It's 15kW, you need a custom liquid cooling solution that they package the chip with.
    • sanxiyn 1682 days ago
      It does seem to enable interconnect with impressive latency and bandwidth. It probably is worth it considering both Google and Nvidia systems are interconnect-limited.
  • bmh 1682 days ago
    In their whitepaper they claim "with all model parameters in on-chip memory, all of the time", yet that entire 15 kW monster has only 18 GB of memory.

    Given the memory vs compute numbers that you see in Nvidia cards, this seems strangely low.

    • baybal2 1682 days ago
      Check the C-suite track record... a pattern of making quick selling companies on a "wow effect" which then quickly turn defunct and valueless after sale. There were big red flags about Cerebras claims for a quite some time. Some say it is the Graphcore on steroids.

      Not so much about the tech side, stuff like that been tried before (without good results: the more is your reticle fill, the poorer is the exposure,) but the business wise side of their claims don't make sense.

      First, and the biggest one being the economics. It is completely impossible that a run as small as 100, or even 1000 wafers be more economical than that of mass market product, even if you deduct packaging costs.

      On top of that, just any process modification or "tweak" for a low volume run will destroy just any economy of scale. And as I understood, they pretty much brag about doing so.

      Lastly, some tech notes. Maybe they got the issue solved, maybe not: the bigger the chip, the more memory starved it is for a simple reason of geometry.

      With the "chip" the size of a wafer, it is gonna be extremely memory starved unless it has more IO than computing devices.

      Then, the thermal ceiling for CMOS is around 100W per cm², and it is a very hard limit. I see no point why they brag about beating it when they truly didn't: 20 watt per square cm² is quite low for HPC.

      I suspect they are indeed quite limited by thermals, if they had to backpedal on their original claims.

      • simg 1679 days ago
        what's wrong with Graphcore?
    • blihp 1682 days ago
      That's 18GB of static RAM accessible in one clock cycle... the memory on a GPU isn't in the same class of fast. Given the bandwidth and latency of this thing, you'd likely have to use a cluster of machines doing all sorts of pre-, post- and I/O processing just to keep this thing busy.
    • Veedrac 1682 days ago
      18GB is huge! An NVIDIA V100 has 6MB of L2 memory. HBM is off-chip, and vastly (~100x) slower.
      • bmh 1682 days ago
        That's true, but it doesn't match their claim of keeping all of the model on the chip.

        An 18 GB cache is huge for sure, but that's not what they claim.

      • baybal2 1682 days ago
        18GB of very fast memory will still be just as hard to keep fed with data as that 6MB cache
        • Veedrac 1682 days ago
          The idea is that the whole model resides in the fast memory, so you don't need to ‘keep it fed’.
          • dmitrygr 1682 days ago
            44K/core is very little memory
            • Veedrac 1682 days ago
              Indeed, but cores are only responsible for small fragments of the network, so don't need huge amounts of memory.
              • dmitrygr 1682 days ago
                Unless you need to multiply large matrices, where you need access to very large rows and columns...like in...ML applications
                • Veedrac 1682 days ago
                  That's what the absurdly fast interconnect is for. You send the data to where the weights are.
                  • dmitrygr 1682 days ago
                    Absurdly fast != Single cycle

                    It will be physically impossible to access that much memory single cycle at anything approaching reasonable speeds. I suppose you could do it at 5Hz :)

                    • Veedrac 1682 days ago
                      A core receives data over the interconnect. It uses its fast memory and local compute to do its part of the matrix multiplication. It streams the results back out when it's done. The interconnect doesn't give you single-cycle access to the whole memory pool, but it doesn't need to.
                      • dmitrygr 1681 days ago
                        I think it is telling that in one sentence there is a claim that it is faster than nvidia, and in another, a claim it does tensor flow. I do not think this architecture could do both of these at once. It could not do tensor flow fast enough (not enough local fast mem) to compete even with a moderate array of GPUs
        • deepnotderp 1682 days ago
          Hey! You seem knowledgeable, mind emailing me at tapabrata_ghosh [at] vathys (dot) ai ?
  • mschuster91 1682 days ago
    18 GByte of memory with 1 clock cycle latency? That's impressive.
    • bryanlarsen 1682 days ago
      I assume that each byte of that memory has a 1 clock cycle latency to only one of the 400,000 cores on the wafer. That's about 45KByte of memory per core; 1 clock cycle latency to a block that small is quite reasonable.
    • hwillis 1682 days ago
      I didn't see any indication, but I'd give ~0% chance that the full address space can be accessed in one clock.
    • dkersten 1682 days ago
      Depends on the how fast a clock cycle is, I suppose. Its easy to make something one clock cycle if the clock ticks verrrry slowly ;)
  • The_rationalist 1682 days ago
    ASICS do not support CUDA. There is a forked tensorflow with opencl support from AMD but I doubt people will use it for this ASIC. So how can tensorflow/cntk/pytorch use such hardware?
    • sanxiyn 1682 days ago
      The exact same way TensorFlow supports TPU: by writing another backend. TPU doesn't support CUDA either.
      • The_rationalist 1682 days ago
        So instead of having common optimized kernels for AMD, Intel, all ARM firms, all FPGAs and all ASICS. Each member is reinventing the wheel in it's own backend? Not so surprising ^^
  • fnord77 1682 days ago
    since the article didn't post a picture of the bare chip:

    https://i.imgur.com/cMo4w0C.jpg

  • p1esk 1682 days ago
    But... can I play Crysis - oops sorry - can I train ResNet on this thing?
  • riku_iki 1682 days ago
    Interesting what will be the price and power consumption. Does it need specialized server with huge power supply module?
    • morphle 1682 days ago
      A 300mm wafer at 16nm node would cost more than $6500 a piece. The power consumption would be more than 20kW if all transistors are in use simultaneously. We am designing a special reconfigurable AC-DC and DC-DC power router (inverter) to supply this huge amount of power. Our WSI design will be liquid immersively cooled.
  • jaboutboul 1682 days ago
    There goes Intel. Behind the curve again...
  • panabee 1682 days ago
    can anyone comment on the differences between cerebras and other chip startups trying to rethink the semiconductor architecture for AI? what are the main technical forks?
  • mv4 1682 days ago
    While this is certainly a very impressive achievement, I am personally interested in small and light AI.

    World's tiniest AI chip? That would get me excited!

    • p1esk 1682 days ago
      What's "AI chip"? You can build a full adder, and call it "an AI chip".
      • mv4 1682 days ago
        by that I mean any chip, from a generic (GPU) to something specialized (vision, NLP, etc). Any chip that makes training/or running TF/Caffe models faster.
        • p1esk 1682 days ago
          Faster than what? A tiny chip will be slower than a large chip.
          • mv4 1682 days ago
            faster while having the same form factor, energy usage, cost.
            • p1esk 1682 days ago
              Even under those constraints you can build something that's either fast or general. Pick one.
          • hwillis 1682 days ago
            uuuuuuugh