I've written "self-modifying" (really JITed) code for several architectures, mainly ARM, and I when I had to do it for amd64 I was very much surprised by how straightforward it was.
On ARM you have to be very careful to handle the cache correctly when you write self-modifying code, because when you access memory using a regular load or store it's obviously treated like data and goes through the data cache while the instructions are fetched through the instruction cache. So when you write an opcode you have to be careful to flux it out of the data cache (at least up to the point where the caches unify, typically L2 on ARM) and then invalidate the icache to make sure that you get the opcode back.
On modern x86-64 architectures, which typically have a very advanced cache system, I expected to have to deal with that as well. As this article shows, you don't. You just write whatever and you can execute it straight after. When you think about it it's a rather complicated thing for the hardware to implement. I wonder why they do it that way instead of relying on the software to issue flushes in the (relatively rare) situations where a hazard is possible.
So did they keep it that way purely for historical reason and back-compat? Few people write self-modifying code (or even code loaders) these days, it seems like a tricky feature to implement in hardware for relatively little gain.
Depends if the self-modifying code in question is stuff like kernel drivers for popular hardware on widely used operating systems. The x86 manufacturers do bend over backwards for compatibility. You can run 16 bit code on the latest i9 processor if you like, including MSDOS if your motherboard still supports legacy boot modes. That is a remarkable level of backwards compatibility.
Obfuscation. It might be a terrible practice, but has spawned entire languages, including some that have been used on occasion by industry, as well as the puzzle-solving community at large. (Say, integrate it into a `compile --release` flag.)
Right off the back of obfuscation, DRM. If the code modifies itself, especially in unexpected ways, then breaking it becomes harder. (`compile --protect`?)
A lot of the later demo effects on the c64 is using so called "speedcode" which basically is code generating code on the fly to achieve things like dynamically copying data areas. In effect intelligent loop unrolling i asm.
One of the reasons we needed extremely aggressive unrolling was the size/lack of a stack, making recursion or anything that might look like it difficult/limited/impossible, due to needing to store a location to jump back to.
Anything you could do to minimise the amount of things needed to store, and their size, wasn't wasted time.
And though it might very ocassionally be similarly necessary on some chips today, it's more likely you'll work in assembly on those.
The real issue with self-modifying code is that it's basically a security bug waiting to happen, plus you have other complications like having to flush the instruction cache on architectures like ARM. Generally, the benefits are not worth it unless except for very specific cases.
Self modifying code is not a security issue in and of itself, but it requires memory that is writable and executable (though, not necessarily at the same time). This is usually a bad idea because it makes arbitrary code execution much easier if your code has unrelated bugs in it, because it provides attackers a place to write shellcode and get it to run.
There's a security feature called W^X  (also called DEP in Windows). Basically you can use special mode which prevents memory pages to be writeable and executable at the same time, so self-modifying code is not allowed, but it prevents exploits from modifying memory containing executable code. OpenBSD uses it as well.
That's not what DEP is, but it is dependent on DEP. DEP is just another word for the NX bit, that allows a page to be marked as non-executable. With DEP you can still have a page that is RWX if you set its permissions that way.
Before COUs supported stacks, quite a few compilers produced self-modifying code.
Instead of using a stack, many CPUs stored the address to return to in a specific register. If a function wanted to call other functions and return afterwards, it had to store that return address somewhere. Popular solutions where “directly before the start of the function” and “in the jump instruction at the end of the function”. The former is easier for the compiler writer, the latter leads to faster code (the return becomes a simple unconditional jump. With the other approach, you have to load the return address and then jump to it)
And yes, you can’t have reentrant code or even recursion that way. That’s one reason early COBOL and early FORTRAN didn’t support recursion.
I bet it would be the next level (if it doesn't exists already) to be used by top-level agencies, or by bad guys.
Some code already found in the wild couldn't be decrypted (e.g., stuxnet), has obfuscated connection (e.g. https://news.ycombinator.com/item?id=18864895) and is updated as needed (any C&C that sends new modules to the infected machine). Add some capability to self modify and you can have hard times generating signatures ou behaviours that are traced by "antivirus", firewall, and so on.
Self-modifying Lisp, perhaps? Do Turing tapes count?
Really the main use of these techniques (and the article has a telltale in its use of the word "shellcode") is injecting code into a program that was not originally intended to be modifiable. Usually (although not always) across a security boundary.
Yes, that was to some extent the point of using the same representation for code and data (homoiconicity) in Lisp - the ability to edit the code from inside itself. I'm not sure how widely this is used outside the macro system and development environments.
The (old) MIX programs in The Art of Computer Programming by Knuth were often self-modifying, in the sense that they modified jump addresses at the end of routines to return to the right caller. (That is, they set the address to right after the call.)
The book chose that way to do it because it was standard practice at the time. It quickly fell out of fashion, though. Nowadays, we do it with a call stack, and each stack frame holds a return address. (Which is better anyway, since it allows routines to be re-entrant.)
The newer MMIX architecture (in some of the newer books) don't rely on such self-modification.
Sometimes they will also make small changes to code that was generated earlier, typically by changing a branch instruction to point somewhere else.
Dynamic linkers may do that, too, though glibc doesn't normally do it, as far as I know: it prefers to update a pointer to code: same result without needing memory that is both writable and executable and without having to invalidate the instruction cache.
Since the advent of ROP glibc's approach isn't much more secure, though it is saner. A few years ago OpenBSD added the kbind syscall which provides the linker equivalent of W^X. kbind is a kernel-mediated memcpy operation that restricts the code permitted to write to a memory block--specifically, the last piece of code to write to it, which is invariably the linker.
C++ virtual functions are problematic for the same reasons. In C code I've started to avoid function pointers altogether in favor of switch-based dispatch, limiting an attacker to invoking a small, statically defined set of functions, not any arbitrary code in the address space. If I feel the problem demands heavily polymorphic code I'll pull in a scripting language like Lua.
From my delve into such things, in Linux at least the linker is a binary called before the program runs, so its a separate program potentially modifying the code.
Interesting philosophical question whether that's still self modifying though.
The OP's point, though, was that the glibc linker (ld.so) doesn't modify code; it largely limits itself to modifying tables of pointers. For access to shared symbols the compiler emits code that indirects through these tables, such as through the global offset table (GOT). The machine code itself is mostly generated to use relative offsets, so the linker doesn't usually need to change static opcode operands.
It does this because ELF is a newer, more abstract executable format. By contrast Windows and AIX have evolved an older dynamic linking strategy which depends more heavily on the linker patching address constants embedded in the code, presumably because of the better backward compatibility. I'm too young to have had first-hand experience with the details, but I do vaguely remember the Linux transition from a.out to ELF and it seemed rather disruptive (though it was all magic to me).
But the Windows approach isn't rightly self-modifying code, either. It's more like a delayed compilation stage. Self-modifying code implies code that rewrites itself dynamically during runtime. Runtime normally means in the normal course of regular program execution, as opposed to link time. From the perspective of the code, link time is a one- or two-time event--static linking and, optionally, dynamic linking--that initializes the application code prior to its first run.
It's almost always a better idea to create something new, verify it was created successfully, do any syncing, switchover, optionally re-check for correct syncing, and then delete or take down original. That's pretty much how robust clustering works.
It's not hard to reason about from a formal POV-- by "self-modifying" code, you're basically editing the continuation of the current function/statement. It's just not very useful-- by definition, the extra performance that you might gain from directly self-modifying the program's binary code can't be any higher than the overhead of a good interpreter/JIT. That's just not interesting.
Lisp macros are not self-modifying code in any shape or form.
They calculate new syntax tree fragments from existing syntax tree fragments, often using purely functional techniques (no mutation at all, not even of local variables in the macro).
The compiler internals of pretty much any programming language do similar tree to tree transformations: just not ones that the program itself can specify as part of its code. We don't call C self-modifying because a for(;;) loop structure was changed by the compiler into if/goto with generated labels, and then changed again into assembly code.
A macro that mutates the input source code on which it operates wouldn't pass a code review in any competent Lisp shop.
It can happen. For instance (setf (car '(1 2)) 2) is self-modifying code; it tries to replace the 1 with a 2, and that 1 is embedded in the program code itself, in a literal list. This is undefined behavior according to ANSI CL, very similarly to how "abc"++ (modifying a string literal) is undefined behavior in C. Note that this code isn't implementing a macro.
I've never heard of compiled Lisp object code being mutated.
Lisp programs as such can be self-modifying; a common example of that is updating while running: loading new versions of existing modules, replacing old functions with new ones. That doesn't involve modifying code.
While your use case (creation of new syntax tree in a deterministic fashion) is definitely the most common use case of Common Lisp macros - it is not the only one. Common Lisp macros are literally code executed at compile time on the source AST. It needs not make a deterministic change to that AST, needs not be side-effect free, and needs not use only and solely the context of its input AST to output the final AST.
Doesn't mean I suggest doing this, but please do not sell unhygenic CL macros short. :) In fact, in CL one has to go to some lengths to be careful to make macro outputs "safe."
Regardless of any lack of referential hygiene, or reliance on side-effects such that the same piece of code is not expanded the same way twice, CL macros must not mutate the input AST, period. It is undefined behavior, and easily avoidable.
I once read about a clever use of self-modifying x86 code. The 8086 and 8088 are nearly identical chips, with the difference being that the 8086 has 16-bit I/O and the 8088 has 8-bit. The only way for a program to know which chip it's running on is to write a bit of self modifying code that takes advantage of this difference in I/O size. Both chips use prefetch, but they prefetch words, not bytes, and 8086 words are 16-bit, so it fetches twice as many bytes as the 8088. Thus, one can modify a location in RAM just after the current instruction and that change will be seen on the 8088, but not the 8086, which has already prefetched the previous value.
This is all from memory of something I read, probably on Usenet, ages ago. My apologies in advance if I messed up the details.
I wonder if any of the various x86 emulators out there get this difference right.
"Differentiating between 8088s and 8086s is trickier. The easiest way I've found to do it is to modify code that's five bytes ahead of IP. Since the prefetch queue of an 8088 is four bytes and the prefetch queue of an 8086 is six bytes, an instruction five bytes ahead of IP won't have any effect on an 8086 the first time around"
I mean, how else are you going to write self-modifying code? It's kind of implicit in the name of the thing that it's going to need to at a minimum have write and exec permission, and read is kind of important to make sure you're actually modifying the right stuff. Yes it's a terrible idea to have write and execute permissions on the same piece of memory, but that's part of why literally the very first line of the article says it's a terrible idea to write self-mutating code.