ARM1 Gate-level Simulation


74 points | by walterbell 134 days ago


  • dasmoth 134 days ago

    The actual simulation is at

  • krylon 134 days ago

    Within my lifetime, we went from CPUs that were simple enough we can now simulate them at the logic-gate level in ____ing Javascript to CPUs complex and powerful enough to emulate yesterday's mainstream CPUs at the ____ing logic-gate level in ____ing Javascript. I would not even be that surprised if the result outperformed the the original hardware.

    • Aardwolf 134 days ago

      Nice! One thing though:

      For, would be much nicer that dragging would pan rather than "3D rotate" the view. The panning with wasd is too slow and not compatible with some keyboard layouts.

      And of course zooming around mouse cursor rather than around center of screen would also help to zoom towards the part you want.

      The 3D rotation is gimmicky but not actually useful to see the gates, and the current UI just doesn't let me zoom to gates I want without spending too much effort fighting the slow panning and the zooming target.


      • bogomipz 134 days ago

        I had a question, the article states the following:

        >"One very nice thing about the 32-bit instruction set is its pervasive conditional execution, which helps one avoid branching over code. For example, this sequence of instructions resets the register r0 to 0 if its value is equal to or less than zero, or forces its value to 1 if its value is greater than zero:

        CMP r0, #0 ; if (r0 <= 0) MOVLE r0, #0 ; r0 = 0; MOVGT r0, #1 ; else r0 = 1

        Without the conditional moves (MOVLE and MOVGT) after the compare (CMP), you'd have to branch after the compare, which is wasteful."

        How are those those two conditional moves after the CMP operation more efficient than branching? Aren't they kind of branches themselves? What would the alternative "branching" sequence look like then?

        • monocasa 134 days ago

          It'd look something like

              cmp  r0,#0
              bgt  .1f
              mov  r0,#0
              b    .2f
              mov  r0,#1
          The big deal is the conditional branch (the bgt). If the processor gets it wrong it's a pipeline flush. And best case you still have extra instructions for the branches. The conditional mov example is a fixed cost of a single "wasted" cycle, which matches the best case of the branching example (branch correctly predicted to mov r0,#1 and fall through). The worst case for the branching version is probably somewhere ~15 cycles depending on the uArch, but is still 1 cycle for the conditional move.

          All of that being said, the branching version tends to be nicer for OoO cores since there aren't data dependencies on the flag registers any more, hence why you see RISC ISAs designed for OoO cores removing conditional execution for most instructions (AArch64 and RISC-V standout here).

          • fanf2 134 days ago

            In the ARM2 era (probably the same for ARM1?) a basic ALU instruction such as MOV took 1 cycle, and a branch took 4 (if taken) or 1 (if not). (There were extra DRAM page cycles every 4 words too)

            So for a simple if/else, it was usually both less code and faster to use a straight line of conditional instructions. In more complicated cases, if the programmer was feeling clever, it was possible to update the status flags to get three-way (or more!) conditionals in straight-line branchless code. Fun!

          • TomVDB 134 days ago

            The conditional moves convert into a NOP when the condition is false.

            The idea here is that a branch results in a pipeline flush which takes a couple of cycles to refill.

            In practice, most CPUs have very good branch predictors these days and conditional moves aren’t all that useful anymore.

            That’s probably the reason why they don’t exist for later ISAs such as RISC-V.

            • snarfy 134 days ago

              A real branch would involve a jump instruction somewhere. With a branch, different code executes depending on the condition. With the code above, you get different data depending on the condition.

              • bogomipz 134 days ago

                Thanks for all the great explanations and insights. I really appreciate it. Cheers.

            • bogomipz 134 days ago

              The article states:

              >"The ARM2 had pretty much the same instruction set as the ARM1, although featured new multiplication and (later) atomic swap instructions."

              Does this mean that the ARM1 didn't support any atomic operations or were they using something else besides "compare and swap"?

              • jecel 134 days ago

                The ARM1 did not have any atomic operation. You only need those if you have more than one processor. It also lacked the multiply and multiply-accumulate instructions, as stated above. These took multiple cycles, which is not very RISC-like. That is also true of the load multiple and store multiple instructions of the ARM2 (I don't remember if the ARM1 had them). The ARM2 also added the coprocessor interface.

                • jecel 134 days ago

                  Oops - in the analysis of the PLA2 in the ARM1 there are both the load/store multiple instructions and the coprocessor stuff. In fact, together they take up about half of the logic. So I was remembering it wrong, then.

                  • bogomipz 134 days ago

                    Ah of course there were no mutli-cores back then. That makes total sense. Thanks.

                • all2 134 days ago

                  Does anyone else find it slightly entertaining that this is an article from a news outlet titled "The Register"?

                  • krylon 134 days ago

                    I never thought about it, but now that you mention it, it is a great name for an IT news site. ;-)