Ask HN: Which books/resources to understand modern Assembler?

I’d like to learn more about Assembler in order to be able to work with LLVM and JIT as well as to write high performance low-level code. I’m familiar with the basics of x86 but I haven’t touched Assembler in a while, so I’m wondering which resources and in particular books you’d recommend?

269 points | by throwaway63467 12 days ago

42 comments

jstrieb 11 days ago
Not specific to LLVM or JIT, but if you want a visceral intuition for the basics of ARM assembly, I made a free, online game at work (for mobile and desktop) that may help you:
https://ofrak.com/tetris/
I didn't do much ARM before working on the game, but since playing a lot, I'm very quick at reading disassembly, even for instructions not present in the game. It might help you to do the same – the timed game aspect forces you to learn to read the instructions quickly.
The game is like Tetris, but the blocks are ARM assembly instructions. As instructions fall, you can change the operand registers. Locking instructions into the .text section executes them in a CPU emulator running client-side in the browser, so you can immediately see the effects of every action. Your score is stored in memory at the address pointed to by one of the registers, so even though you earn points for each instruction executed without segfaulting, the true goal is to execute instructions that directly change the memory containing the score value.
When I released it a bit less than a year ago, I posted it to Hacker News as a Show HN:
https://news.ycombinator.com/item?id=37083309
[-]
- dabber 11 days ago
  This is awesome! Thanks for sharing it again, I guess I missed the original Share HN.
- resonious 11 days ago
  > at work
  What kind of job?
  [-]
  - mtmail 11 days ago
    "Copyright © 2023 Red Balloon Security" is my guess https://redballoonsecurity.com/
WalterBright 11 days ago
All you really need is an instruction set reference, such as: https://www.felixcloutier.com/x86/index.html and have a compiler that supports an inline assembler, like the D compiler, with Intel assembler syntax and use it like:
```
    private uint asmBitswap32(uint x) @trusted pure
    {
        asm pure nothrow @nogc { naked; }

        version (D_InlineAsm_X86_64)
        {
            version (Win64)
                asm pure nothrow @nogc { mov EAX, ECX; }
            else
                asm pure nothrow @nogc { mov EAX, EDI; }
        }

        asm pure nothrow @nogc
        {
            // Author: Tiago Gasiba.
            mov EDX, EAX;
            shr EAX, 1;
            and EDX, 0x5555_5555;
            and EAX, 0x5555_5555;
            shl EDX, 1;
            or  EAX, EDX;
            mov EDX, EAX;
            shr EAX, 2;
            and EDX, 0x3333_3333;
            and EAX, 0x3333_3333;
            shl EDX, 2;
            or  EAX, EDX;
            mov EDX, EAX;
            shr EAX, 4;
            and EDX, 0x0f0f_0f0f;
            and EAX, 0x0f0f_0f0f;
            shl EDX, 4;
            or  EAX, EDX;
            bswap EAX;
            ret;
        }
    }
```
The compiler will handle all the program setup and teardown, and you can just concentrate on the assembler part. You can also compile programs with the -vasm switch and the compiler will emit the asm corresponding to the code:
```
    int square(int x) { return x * x; }
```
compiling:
```
    dmd -c test.d -vasm
```
prints:
```
    _D4test6squareFiZi:
    0000:   0F AF C0                 imul      EAX,EAX
    0003:   C3                       ret
```
By trying simple expressions like `x * x` and looking at what the compiler generates, and looking at the instructions in the referenced link, you'll get the hang of it pretty quick.
[-]
- P_I_Staker 10 days ago
  Inline assembly arguably not needed, and of debatable merit, vs. writing assembly code.
  > an instruction set reference
  I think the programming manual just as important, especially if you're working with mixed asm and c code.
  One way to think about it: the C programming language has a runtime environment that is everyone's responsibility to maintain. A bug-free compilation should ensure this is protected.
  Without the all seeing machine help, it's all on you not to interfere with this environment by following the specified calling conventions.
  Also, don't try to get too tricky. I saw a really bad bug where someone thought it wise to use a macro, to place a (C language) add operation directly before a function, then it calls it, and first instruction checks the overflow flag. The thing is C has no idea what you're doing, and doesn't guarantee that it will happen like that. In fact the assembly language couldn't even do certain register swaps for function calls without clearing the flag.
  So, basically it "seemed to work" for years. A new compiler caused it to fail testing. Yes, this was a terrible idea and you should no better, but it's important to understand this challenging to predict and "testing resistant" risk. Extreme misuse can help illustrate the point.
  P.S. I actually have come to favor asm code for asm, for the writing of asm code that is not trivial. It's mostly a preference and a style thing. Developers are more likely to think about the fact that the code has to play nice with C, instead of assuming some help. You should know better, but it can feel like everything is playing nice.
- JonChesterfield 11 days ago
  Reading that makes the GCC inline asm format look really ugly. Some envy. Too used to inline asm looking like
```
  __asm__ volatile("syscall"
                   : "=a"(ret)
                   : "a"(n), "D"(a0), "S"(a1), "d"(a2), "r"(r10), "r"(r8),
                     "r"(r9)
                   : "rcx", "r11", "memory");
```
  [-]
  - bitwize 11 days ago
    GCC inline assembly looks incredibly cursed. Back in the day, the Borland tool suite (Turbo/Borland C++, Turbo Pascal) had inline assembly that looked more like the D compiler example above.
    GCC does, however, know what to do with a .s file, so you can write your assembly routines outside your C(++) source and just compile them in like a C module, which is what I did last time I was hardcore slinging x86 opcodes.
    [-]
    - JonChesterfield 11 days ago
      It's the attempt to tell the host language what you're doing with the arguments that makes a real mess. Module scope is roughly the same as putting it in a separate file, i.e. less horrible. C++ has better string literal escaping options which would help.
      It's very easy to get the constraints wrong and have the aggregate still "work", until unrelated changes months later perturb the register allocation slightly such that it no longer runs as hoped.
      It's documented and usable but working with it is never a very good time.
  - WalterBright 11 days ago
    The inline asm syntax was designed to match the Intel asm documentation.
    A hidden feature of the D inline assembler is the compiler knows which registers are read/written, so there's no need for the programmer to explicitly specify it. The compiler will also figure out the right addressing mode for local variables, so the programmer is relieved of that, too.
    [-]
    - JonChesterfield 10 days ago
      Deriving the read/write behaviour from the instruction definition is so far superior to the gcc approach that I wonder how we ended up here. That is a very good call by the D toolchain.
      [-]
      - WalterBright 10 days ago
        It's a small investment into the compiler to add it, and a large payoff for the user to not have bugs from manually doing it.
        I proposed doing the same for printf - have the compiler example the arguments and insert the appropriate formats in the format string.
  - clausecker 10 days ago
    On the other hand, this inline assembly meshes very well with the register allocator and instruction scheduler in modern compilers. So it's perfect for teaching the compiler about the few special purpose instructions it doesn't know about without compromising the performance of the rest of the code.
    The one in the parent comment does not; it's basically a black box and the compiler has to save and restore the entire state of the function around it. Total performance killer unless you write large chunks of code in inline assembly (and then you're probably better off just using an assembly file).
  - sebazzz 9 days ago
    You could also assemble the assembly using an assembler into a separate object file and export a symbol from there, and let the linker do the job, right?
sargstuff 12 days ago
'Computer Architeture: A Quantitative Apporach" and/or more specific design types (mips, arm, etc) can be found under the Morgan Kaufmann Series in Computer Architeture and Design.
"Getting Started with LLVM Core Libraries: Get to Grips With Llvm Essentials and Use the Core Libraries to Build Advanced Tools "
"The Architecture of Open Source Applications (Volume 1) : LLVM" https://aosabook.org/en/v1/llvm.html
"Tourist Guide to LLVM source code" : https://blog.regehr.org/archives/1453
llvm home page : https://llvm.org/
llvm tutorial : https://llvm.org/docs/tutorial/
llvm reference : https://llvm.org/docs/LangRef.html
learn by examples : C source code to 'llvm' bitcode : https://stackoverflow.com/questions/9148890/how-to-make-clan...
[-]
- RheingoldRiver 11 days ago
  You have a typo that will make an amazon search fail on copy paste, corrected version is "Computer Architecture: A Quantitative Approach"
  kinda amazing to me that amazon cannot fix this but it returns 0 results (at least for me)
  [-]
  - LocalGauge 11 days ago
    The reason is because (probably) you are using the double quote as well. Amazon search tool seems to be using double quote in a way similar to google search. It makes it so that double quoted part has to be included in the found results.
    [-]
    - RheingoldRiver 10 days ago
      didn't copy that to amazon, should've really put it between ` ` even though the formatting wouldn't have applied here
t-3 11 days ago
I like this book, it's just as good as The Art of Assembly Language, but much cheaper: https://rayseyfarth.com/asm/index.html
If you are interested in ARM or RISC-V assembly, the concepts are pretty similar but the instructions are different. For any architecture, you're going to have to read the architecture manuals to get a good working knowledge of the instructions and how to use them. An easy way to get started is to write a program in C, then replace the functions with assembly code one by one until your C code is just main() and a header.
ARMv7: https://developer.arm.com/documentation/100076/0200/a32-t32-...
ARMv8: https://developer.arm.com/documentation/ddi0602/2024-03/Base...
alternative: https://www.scs.stanford.edu/~zyedidia/arm64/
RISC-V: https://riscv.org/technical/specifications/
x86: https://www.intel.com/content/www/us/en/developer/articles/t...
web format: http://x86.dapsen.com/
If you like to learn by example (most of these are not great, but good enough to get started):
https://rosettacode.org/wiki/Assembly
https://github.com/TheAlgorithms/AArch64_Assembly
[-]
- OldeMold 11 days ago
  I’d like to mention that The Art of Assembly Language didn’t feel like a good resource to me, as it spoke largely about “High Level Assembly” which is an awful lot like C and didn’t really focus on CPU architecture in a particularly technical way.
  I was expecting a bit more of a technical approach and was disappointed. If you’re in the same boat reference manuals and compiling to .s files may be the way to go.
  [-]
  - t-3 11 days ago
    Ah, I guess I never actually read The Art of Assembly Language - I read the newer version The Art of 64-bit Assembly, which deals with 64-bit x86 assembly.
pizlonator 11 days ago
How I learned:
Step #1: read the arch manual for some CPU. Read most if not all of it. It’s a lot of reading but it’s worth it. My first was PowerPC and my second was x86. By the time I got to arm, I only needed to use the manual as a reference. These days I would start with x86 because the manuals are well written and easily available. And the HW is easily available.
Step #2: compile small programs for that arch using GCC, clang, whatever and then dump disassembly and try to understand the correspondence between your code and the instructions.
[-]
- treyd 11 days ago
  The Godbolt compiler explorer can be very helpful for step 2 there. It's neat to see how different compilers codegen the same source.
- rerdavies 11 days ago
  > I would start with x86 because the manuals are well written and easily available
  .. and because the ARM system manuals are just unimaginably awful. ;-P
  [-]
  - pm215 11 days ago
    Maybe I'm biased because I've spent a lot more time with them, but I rather prefer the Arm manuals, because I found they more reliably have the exact detail and solid pseudocode for how everything behaves. Plus for ages Intel split it up annoyingly into multiple different PDF documents (though it looks like you can get a combined manual now).
chc4 11 days ago
IMO You should just stick some programs in Ghidra/Godbolt and see what they emit, especially for small individual snippets whenever you think "I want to do X, what's the best way of doing it". There really isn't much difference between "baby's first assembly" program, where you just have movs and like five other common instructions, and the kind of assembly an optimizing compiler emits: it's a matter of recognizing that some operations can be merged into a more specialized one or the addressing mode of another, or you can use a setcc with a results flag from something you already computed, or what have you. The good code that LLVM and JITs emit for the most part aren't due to much better instruction selection but due to much better optimization passes, which learning more about assembly doesn't help with: it's about transforming code in general at a high level, which you would do at the compiler IR step before touching assembly at all.
[-]
- CalChris 11 days ago
  Everyone should know Godbolt. (I don't know Ghidra; I probably should.)
  First, Godbolt is a useful tool for understanding what a particular compiler will do with your code with different switch set.
  Second, it's a useful tool for communicating because you can enter source, set some switch and see some output. You can (and we do) get a persistent shortened url to submit on StackExchange, HN, … so that people concretely know what you're complaining about.
  Godbolt should get the ACM System Software Award.
- rerdavies 11 days ago
  > much better optimization passes, which learning more about assembly doesn't help with
  I think you have to include an understanding of the underlying processor architecture as part of "learning more about assembly". If you're writing assembler without instruction scheduling in mind, you would be better off not writing assembler at all, and letting the LLVM optimizers do instruction scheduling for you.
  [-]
  - hnthrowaway0328 11 days ago
    How can one learn instruction scheduling for Intel. Are the algo of branch prediction and out of order execution revealed in the manual? Thanks.
    [-]
    - rerdavies 11 days ago
      I don't think it's humanly possible to do it perfectly. Heuristically, the Architecture manual gets you 95% of the way there.
      For really detailed insight into code optimization, the Intel Profiler ($$) gives you a lot of tools for precise instruction scheduling (e.g. an indication of which instructions are stalling during execution of your code, useful analysis of cache miss rates, and which instructions caused those cache misses). ARM also provides a profiler that may do the same for ARM chips, but it is insanely expensive.
      You can make do with LINUX stochastic profilers, but it may be helpful to have some utility code that provides dumps of relevant profiling registers for your CPU (e.g. L1, L2,L3 cache missed counts, missed-branch counts, processor stall counts, &c.) I'm not sure what x86 processors provide; but writing code to dump ARM profiling registers proved to be incredibly useful in a recent profiling and optimization misadventure.
      Fwiw, unless you're using instructions that don't map well onto high-level languages, it's pretty difficult to beat well-tweaked GCC-generated code by more than a few percent. I imagine LLVM is the same. Unless you're writing code whose wellfare depends on whether it's 3% faster than a competitor, it's probably not worth it to drop into assembler.
      With a bit of tweaking you can even get all the major C/C++ compilers to generate SIMD code that's consistently annoyingly good from non-SIMD C/C++ by encouraging the compilers to perform SIMD vectorization optimizations.
      The other way to learn is to do. Profile EVERYTHING with a stochastic profiler. Tweak based on your necessarily limited understanding of the architecture. Profile again to confirm that your optimization actually is valid. Repeat until done.
    - JonChesterfield 11 days ago
      It's proprietary but somewhat amenable to exploring through experiment.
      The heuristics go something like:
      1. Find out what execution ports your processor has. E.g. it can probably do two 256bit loads from L1 cache each cycle and probably can't do two stores. It can do arithmetic at the same time. Beware collisions between your arithmetic and address calculations.
      2. Look for some indication of what the register files are - you don't want to read from a register immediately and probably don't want to wait too long either, and there's a load of latency hiding renaming going on in the background. This one seems especially poorly documented.
      3. Aim is to order instructions so that the dynamic scheduler has an easier time keeping the ports occupied and so that stalls on register access are unlikely
      4. Choosing different instructions may make that work better in a gnarly NP over NP sort of fashion
      5. Moving redundant or reversible calculations across branches can be a good idea
      The DSP chips are much more fun to schedule in the compiler as branches are usually cheaper and there's probably no reordering happening at runtime.
    - anonymoushn 11 days ago
      Some of the information you want is published by Intel. Some of is better found on https://uops.info/ or in Peter Cordes's answers on StackOverflow or in the comments section of Agner Fog's web site. Some of it you must determine experimentally. Empirically modern compilers do not fully exploit much of this knowledge, though e.g. LLVM-MCA proves that they contain a good bit of it.
      [-]
      - zypeh 3 days ago
        For HPC code that aims to run on intel cpu. Do you recommend compiler like intel OneAPI c compiler over LLVM or GCC? Before one starting to profile / invest time on reading manual on specific compiler.
      - rerdavies 11 days ago
        Anecdotally, GCC seems to do a startlingly good job of instruction scheduling, on x86 and ARM. I've always wondered what sort of architecture models the big compilers have, and have been meaning to browse the source code to find out for some time. Does anyone know?
- astrange 11 days ago
  > or you can use a setcc with a results flag from something you already computed
  Funny enough, on large CPUs this can be slower than recomputing something, because they don't like long dependency chains and sometimes even have penalties for reading a register that hasn't been written for hundreds of instructions.
woadwarrior01 11 days ago
I'm currently reading Apple's "Apple Silicon CPU Optimization Guide"[1] and it's excellent! Very reminiscent of Intel's Software Developer Manuals[2], which I read a long time ago.
[1]: https://developer.apple.com/documentation/apple-silicon/cpu-...
[2]: https://www.intel.com/content/www/us/en/developer/articles/t...
[-]
- CalChris 11 days ago
  Thanks. I didn't know about Apple's optimization guide. When I had wanted to know latencies etc in the past, I'd have to infer what I could from LLVM scheduler resources (written by Apple).
mtreis86 11 days ago
Play through the game Turing Complete, by the end you'll have built your own ISA and solved some puzzles with it. Keep playing for to get on the high scores list and you'll turn those assembly routines into ASICs.
zoenolan 12 days ago
https://www.nand2tetris.org/
As a good refresher on assembly and compilers
[-]
- emmanueloga_ 11 days ago
  I really like this book, is more than a refresher imo, it really goes from NAND gates all the way up to building a CPU using a Verilog-like language, its assembler language, then a higher-level language, etc.
  The only thing that is not as nice is the tooling, it's GUI based and uses Java. One of the projects on my backburner is to attempt to write a better toolset for it. Or maybe I should just wait for someone else to do it haha (to be clear, there's nothing wrong with the existing tooling, I'd just rather something not based on Java that I could run on a normal IDE, say, VS Code).
  [-]
  - deosjr 11 days ago
    That's exactly why I built my own tooling when going through the book. I built everything in Golang and am working on some visualisations in Javascript. Repo: https://github.com/deosjr/nand2tetris Website: https://deosjr.github.io/ (lispmachine part is wip)
- hyperman1 11 days ago
  One thing here that was genius is the supported assembly language. It has only 2 instructions, and they map directly to the hardware: load a value, and do an ALU/jump action. I never imagined you could get the idea of assembly simplified as bare bones as this.
ksherlock 11 days ago
Someday -- not today, not tomorrow, but someday -- you'll probably want to read Agner Fog's optimization manuals.
https://www.agner.org/optimize/#manuals
asalahli 11 days ago
I started with this excellent NASM tutorial[0] then went straight to Intel manuals.[1]
0. https://cs.lmu.edu/~ray/notes/nasmtutorial/
1. https://www.intel.com/content/www/us/en/developer/articles/t...
joncmu 11 days ago
If you want to learn one of the oldest assembly languages you can still find a modern computer to run it on check out IBM Z assembly. There is a great list of resources here: https://idcp.marist.edu/assembler-resources
The one resource they don't list is the ISA manual which is called the Principles of Operation which the latest version can be found here: https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf
It is actually pretty amazing at how easy it is to learn other architectures once you understand how one or two work.
[-]
- seankurtz 11 days ago
  The benefit to learning s390x assembler over others is that there are actually still experts around that program entirely in the macro assembler, and environments that still expect or require it (technically, its a solved problem to link against the assembler code from Metal C, but all the docs and APIs are specified entirely in assembler for much of the low level OS internals, and you really do need to know it to work with it in any significant professional capacity).
  Its tough to learn on x86 or ARM, because there has been a lot of standardization towards C, and that is sorta the expected low level API/ABI even at the kernel level in 2024. Calling conventions have been standardized, and really very little actual programming is done manually in assembly these days. So its sort of artificial trying to learn assembly on those ISAs these days. Certainly there are not many APIs specified in assembly. Perhaps one exception would be compiler/linker backends, but that is a whole other can of worms you've gotta learn if you go that route.
  IBM Z is different in that respect. It makes it probably a more difficult and unwieldy platform overall if you just wanna make a quick app and ship (why I'm not advocating you all go out and buy mainframes lol), but for learning how to program assembler "in the real world", its the last bastion of a much older school of programming and a great learning environment that will challenge some of the assumptions you maybe have picked up that actually come from Unix or C and not the ISAs themselves.
  Its also, in my humble opinion, a pretty nice macro assembler to actually work with. Its had a lot of development over the years (perhaps as a result of it remaining a significant force in the systems programming world for mainframes long after similar programming interfaces were quietly retired on newer platforms), and so I quite like the actual assembler itself, now known as IBM HLASM.
  I tend to recommend newcomers pick up a free pdf book from here by the esteemed John Ehrman (RIP)
  https://idcp.marist.edu/assembler-resources
  along with the POPs specified above. And make an account on zXplore or similar and play around after doing the (very short) introductory portion.
  [-]
  - CalChris 11 days ago
    > Perhaps one exception would be compiler/linker backends
    I think for compiler+linker backends it's really important to have a decent reading level of assembly. In llvm, most of the work will be in instruction selection and you'll then read the results. With lld you write a very stylized handler with some switch statements and very stylized macro incantations. And read the results with dis and elfdump.
- fuzztester 11 days ago
  >If you want to learn one of the oldest assembly languages you can still find a modern computer to run it on check out IBM Z assembly.
  6502 assembly on the Commodore 64 was good fun.
  Pretty simple ISA.
CoastalCoder 11 days ago
I don't have a good book to suggest, but one tip you may find helpful:
A typical function has two kinds of assembly code:
(1) The ABI-required logic for functions and function calls, and
(2) Everything else, which can be more or less whatever you want. As long as you don't stomp on the details required by the ABI.
[-]
- cellularmitosis 10 days ago
  1) is the bit I seem to have trouble finding good resources for. It seems every "intro to assembly" tutorial spends all this time talking about instructions, and never seems to get to something as basic as "how do I create and call a function in assembly?".
  Any suggestions there?
  [-]
  - CoastalCoder 10 days ago
    I've only had to deal with this on x86-64 Linux, and I worked from this reference document: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf (section 3.2) [0]
    I imagine similar documents exist for each OS-architecture pairing, since the ABI is a convention that (ideally) everyone will agree on.
    If you need a definitive answer for some particular OS-architecture pairing, you might ask within a development community that also needs to support that ABI. E.g., gcc, LLVM, lldb, gdb.
    [0] IIRC: Even after reading that doc I was unclear on a few details. I got clarification by writing some minimal examples and looking at the assembly that gcc produces. Note that you may need to disable inlining and optimization, or else you might draw the wrong conclusions from the generated assembly.
    [-]
    - cellularmitosis 10 days ago
      Thanks for the reply.
      Actually, I ended up finding the answer to my question (“why doesn’t every assembly tutorial cover calling conventions on like, page 2?”), and it was because I was coming at this topic from a C perspective.
      https://stackoverflow.com/a/17309038
      “ If you're writing in assembly language, you can do whatever you want. It's only when you want to interact with some external code (maybe a library, maybe a system call) that you need to obey the calling convention.”
      [-]
      - CoastalCoder 10 days ago
        Personally, I found that approach to learning assembly somewhat difficult.
        If you're first learning assembly in some little simulator, or on bare metal, I can see the point in deferring any discussion about the ABI until later.
        My first assembly project needed to be invoked as a function call from a C/C++ program. So I'd have really benefited from at least a brief explanation about the boilerplate asm you need to be a callable function, how to make function calls / syscalls to print some output, etc.
anonymoushn 11 days ago
The highload.fun wiki[0] links some resources. The intel optimization manual[1] is also useful.
These resources are mostly aimed at solving problems for which compilers are not very useful, so there are probably other resources that are a better fit.
[0]: https://github.com/Highload-fun/platform/wiki
[1]: https://www.intel.com/content/www/us/en/content-details/6714...
sim7c00 11 days ago
Low Level Programming by igor zhirkov even though its not really about assembler specifically. it has a good chapter on it and teaches good to apply knowledge of machine-code/assembly to an architecture/system (amd64 in this case), and then spends a lot of time to teach how to translate that upwards to higher languages rather than downward. teaches u to find out and research stuff yourself too. he's a good teacher.
know its not about llvm and jit etc. - but imho the basics is first this, and then moving up. otherwise it's confusing.
oldmanludd 12 days ago
OpenSecurityTraining2 has some Assembly courses
https://p.ost2.fyi/courses
[-]
- anta40 11 days ago
  Ah, so it's the updated version of this: https://opensecuritytraining.info/IntroX86.html
  Let's see...
mtklein 11 days ago
I'd suggest working incrementally from areas of your existing strength. Tweak whatever code base you are most familiar with, starting with a tiny change, and see how the assembly changes. I use objdump -d and git diff --no-index for this all the time.
billsix 11 days ago
I've liked Jonathan Bartlett's books, his newest is "Learn to Program with Assembly"
[-]
- rramadass 11 days ago
  Seconded; his books are a very good introduction to assembly programming.
volkadav 11 days ago
If you're looking for introductory material, I'd highly recommend Computer Systems, A Programmer's Perspective by Bryant and O'Hallaron: https://csapp.cs.cmu.edu/ It sounds like the material you're after would mostly be in chapters two or three through five depending on where you'd want to start. The second edition is much cheaper used and follows broadly the same path, though it does have x86-32 in the main text with -64 as an appendix ("web aside"); third swaps that.
maldev 11 days ago
I would highly recommend AMD's developer manual. It's a lot more written for actual reading rather than a pure tech manual with super thick language like Intel's is.
I would also recommend NASM's guide for syntax and such. https://www.nasm.us/xdoc/2.13.03rc1/html/nasmdoc0.html
bombcar 11 days ago
The first thing you’ll learn is that a macro assembler is surprisingly high level; much of what you think of as C-style high level can be done by macros.
jim_lawless 11 days ago
"x64 Assembly Language Step-by-Step: Programming with Linux" (4th edition) by Jeff Duntemann is a pretty good book.
[-]
- udev4096 11 days ago
  Hey, thanks for the book suggestion. It's an interesting one and will definitely help me in my journey of learning assembly (just started a few days ago)
andrewstuart 11 days ago
> modern assembler
For long out of date assembler this YouTube channel: https://www.youtube.com/@ChibiAkumas
For modern assembler this YouTube channel: https://www.youtube.com/@WhatsACreel
[-]
- snvzz 10 days ago
  >For long out of date assembler
  Note that chibiakumas also covers RISC-V, which is about as modern as it gets.
brianrhall 11 days ago
Assembly Programming and Computer Architecture for Software Engineers https://github.com/brianrhall/Assembly Also helpful is the compiler explorer https://godbolt.org/ Although a more modern way to do cross platform low level tasks is with compiler intrinsics. The book above introduces intrinsics, but Intel has a great intrinsics guide https://www.intel.com/content/www/us/en/docs/intrinsics-guid...
alexdowad 11 days ago
Aside from what has already been suggested, you could consider reading selected chapters of Intel's programmer manual. I personally read through the whole thing once (well, skimmed some parts).
From my experience, Intel's x86 manual is better and easier to read than AMD's. It's a free download.

Koshkin 11 days ago

A great way to learn assembler is to closely examine code generated by a compiler, e.g. on godbolt.org.

[-]

sparkie 11 days ago

Manually with:

    gcc -c -O0 -no-pie main.c

Probably also be worth throwing in -nostdlib while learning to keep it bare-bones, otherwise will need to link the c runtime.

Assemble the emitted .S file:

     as -o main.o main.S

Link with c runtime

    ld -o main -lc main.o crt1.o crti.o crtn.o --entry main

If using -nostdlib, add your own start.S with a simple entry which calls main with no command line arguments.

    BITS 64

    %define SYS_exit 60

            global _start

    _start:
            xor edi, edi
            xor esi, esi
            call main
            xor eax, eax
            mov al, SYS_exit
            xor edi, edi
            syscall

Assemble with:

    nasm -felf64 -o start.o start.S

And link both:

    ld -o main main.o start.o --strip-debug

Check the result:

    objdump -x -S main

To keep even more minimal, add your own link.ld file instead of using the default:

    OUTPUT_FORMAT("elf64-x86-64")
    OUTPUT(main)
    ENTRY(_start)

    INPUT(main.o)
    INPUT(start.o)

    SECTIONS
    {
        PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); 
        . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
        .text : { *(.text)  }
        .eh_frame : { *(.eh_frame) ; }
        .data : { *(.data) ; }
        .bss : { *(.bss) ; }
    }

Link with:

   ld -T link.ld

[-]

rramadass 11 days ago
Nice!
PSA: Read Hongjiu Lu's classic paper ELF: From The Programmer's Perspective as a preliminary to get an idea of how the above steps fit together.

fuzztester 11 days ago
IMO that is not a good starting point for beginners to assembly.
They can go for it later, after learning assembly basics.

JonChesterfield 11 days ago
There are two aspects to assembler. One is the target machine - learning what instructions, memory, performance characteristics you're dealing with.
The other is the assembler - what syntax it gives you, how it handles macros, whether it optimises, whether it does any semantic analysis. GNU AS is different to NASM is different to flat assembler.
I didn't get much out of reading compiler disassembly relative to handwritten assembly. I'd recommend trying to find some of the latter, might need to be maths libs or video codecs or similar. I'd be interested in recommendations here, the asm I learned from was proprietary.
vmchale 11 days ago
I wrote a blog post on writing a JIT that can handle FFI calls: http://blog.vmchale.com/article/jit
If you want the full monty, I think you'll have to read the LLVM documentation on JIT linking: https://llvm.org/docs/JITLink.html
I haven't found any academic papers or tutorials on JIT linking, unfortunately.
rerdavies 11 days ago
Go to the source: the Intel Software Developer Manuals.
https://www.intel.com/content/www/us/en/developer/articles/t...
You will want the first two volumes. For LLVM and JIT work you don't need the last two volumes.
Not kind, or gentle, but certainly definitive and authoritative.
criddell 11 days ago
Lots of people are recommending x86 and I wonder if they are talking mostly about the x86 specifically or would that include x86-64? I’d really like to get better at working with crash dumps and since everything in my world is 64-bit, that’s what I’m seeing.
BTW, if anybody has recommendations for assembly in the context of crash dumps, I’d be very appreciative.
kylecazar 11 days ago
Fond memories of ordering the then-free print volumes of the IA-32 reference manuals from Intel... and actually receiving them.
[-]
- rerdavies 11 days ago
  Fond memories of dropping in to the local Intel sales office to pick up the then-free Intel Reference manuals whenever a new processor came out! No appointment necessary.
vbezhenar 11 days ago
If you're interested in ARM 32 bit, I can recommend book "Raspberry Pi Assembly Language Programming: ARM Processor Coding"
It very thoroughly describes Cortex M0 assembly language and it also touches the concept of multiprocessor programming. And you just need two Raspi Picos (one to serve as programmer) which are very available.
dragontamer 11 days ago
> as well as to write high performance low-level code
This is different. I would suggest "Intel® 64 and IA-32 Architectures Optimization Reference Manual", as well as https://www.agner.org/optimize/ .
badrabbit 11 days ago
Azeria labs has a nice arm assembly tutorial, the lady behind it also has nice books on it that I highly recommend.
Vosporos 11 days ago
The mario kart Wii retro-players have you covered with ARMv8: https://mariokartwii.com/armv8/
pyinstallwoes 11 days ago
Build a Forth
KingOfCoders 11 days ago
OT: Not a modern one, but "Z80 Assembly Language Subroutines" has been my favorite computer book for 40+ years.
HarHarVeryFunny 11 days ago
As you probably appreciate, there's a lot of a difference between just being able to write assembler, and being able to write optimized assembler that will beat a modern optimizing compiler (else what's the point, other than fun, unless you are the one writing the compiler, which seems to be your interest).
One of the issues with modern processors (wasn't true back in the day with the old 8-bitters) is that the processor is so much faster than memory access that this needs to be taken into consideration when writing optimized code. Instruction timings (number of clock cycles) for memory access are going to vary a lot depending on where the data is being held - in cache or in main memory. Writing optimized code (high level as well as assembler) therefore becomes not just a matter of making the code itself as minimal and fast as possible, but also organizing the program's data access to operate out of cache as much as possible and minimize main memory access. The key is to be sensitive to the layout of your data in memory, and try to have your inner loops/code access nearby (same cache line) data rather than hopping about all over the place. e.g. If you have a 2-D array that's laid out in memory row by row (vs col by col), then you want to access it that way too (work on rows) to take advantage of cache.
I used to write a lot of 8-bit assembler back in the day (as well as more recently for some retro-computing fun), but never x86, so don't have any specific resources to share. Once you've learned the basics of the instruction set, a good point to start might be to take some simple functions and compile to assembler both with and without optimization enabled - and try coding the same function yourself in assembler to see if you can beat the compiler. Search for "x86 tricks" type of resources too - the things that other assembly programmers have learnt how to optimize use of the instruction set and write fast and compact code.
Note that cache considerations apply to code as well as data, so you want your code to be compact (fit as much of your inner loops into cache as possible), and also to branch as little as possible, for two reasons. First, you want to take advantage of cache by executing consecutive instructions as far as possible, and second branching kills the pipelining performance (you are throwing away work already done) of modern processors, even though they try to mitigate this with branch prediction.
rramadass 11 days ago
Not specific to LLVM/JIT but for assembly checkout the books by Larry Pyeatt(ARM) and Daniel Kusswurm(x86).
fuzztester 11 days ago
For those who want to do x86 assembly first, google Paul Carter assembly language.
It could be one option.
201984 11 days ago
uops.info is a very useful website when you're starting to optimize your code. It shows you the throughput and latency of most x86 instructions as tested on a large range of microarchitectures.
jmspring 11 days ago
Under no circumstances should knowledge of Assembler be needed to work with LLMs.
Many people that work in DS/ML can barely make it with Python.
[-]
- neandrake 11 days ago
  I think you misread the original post. They’re referring to LLVM the compiler toolchain, not LLMs.