Ointers: A library for representing pointers where bits have been stolen (2021)

(github.com)

124 points | by fanf2 11 days ago

12 comments

gizmo 11 days ago
Stealing bits from pointers makes sense sometimes, but in most cases an easier and less error-prone way to increase memory efficiency is to just use indices or relative pointers. Instead of shaving individual bits of a pointer take advantage of the fact that most pointers in a program are almost exactly alike.
Instead of storing a 64 bit prev and next pointer for an item in a linked list just store e.g. two 16 bit offsets. Then the next_ptr = (this_ptr & base_mask) + offset * alignment.
Most of the time wasting memory by storing full 64bit pointers is no big deal. But when it is, you can go a lot further than just shaving off a few bits.
If you want to be extra clever you can encode values in the base address of your pointers by taking advantage of how virtual memory is allocated. Every process has its own virtual memory pages and you can request memory pages at specific address ranges. For instance if you align all your memory at a 32gb boundary then you can use 32bit pointers + a base pointer without even having to mask. You can also use virtual memory allocation tricks to convey information about alignment, typing, whether a pointer is allowed to be shared between threads and so on. And again, without having to butcher the pointer itself.
[-]
- jsheard 11 days ago
  Considering how common it is to use indices in lieu of pointers in Rust, it would be nice if it had native NonMax integer types similar to the NonZero types it already has. NonZero gives you the niche-filling optimization (e.g. Option<NonZero> and NonZero are the same size) but with indices you want zero to be a valid value, so it would be natural to use the maximum as the illegal value instead. You can kind of do it by offsetting the index by 1 or inverting the bits then putting it in a NonZero but that's a small extra runtime cost on every use.
  [-]
  - joseluis 11 days ago
    You can do a handy struct wrapper over a private Nonzero that xors the given value with the prohibited value (max in this case) at the new constructor. And like so for the get method. Xoring is very cheap. That's my favorite way of storing indices/links for nodes, since you can wrap them in Option for free.
    [-]
    - o11c 10 days ago
      Addition/subtraction are more likely to optimize well I think, since pointer arithmetic already uses those.
      [-]
      - jsheard 10 days ago
        Good point, on x86 at least the mov instruction can add/subtract a constant from the address by itself since that pattern is so common.
- dkjaudyeqooe 11 days ago
  That's really not the application for this, it's tags, which are endlessly useful or characterizing the pointer. For instance you can use the MSBs to establish the type of whats pointed to, or for reference counting. I use the LSB to indicate if the value is a pointer or an unsigned integer, whereby you can store both an (integer) key to a record in a database, or a pointer to the record data in memory. You could use another bit in the pointer to indicate that the record is dirty.
  Using unused bits in pointers can make you code more efficient and simpler.
  [-]
  - westurner 11 days ago
    Can unused bits be used for signed pointers or CRCs?
    [-]
    - aetherspawn 10 days ago
      I would argue that we shouldn’t bother to build CRC checking of memory into programs and that ECC RAM should just become the norm eventually.
      99.9999% of memory operations aren’t CRC-d so trying to fight against flaky hardware is a bit of an uphill battle.
      [-]
      - westurner 10 days ago
        Pointer authentication: https://llvm.org/docs/PointerAuth.html :
        > Pointer Authentication is a mechanism by which certain pointers are signed. When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer.
        > Before the pointer is used, it needs to be authenticated, i.e., have its signature checked. This prevents pointer values of unknown origin from being used to replace the signed pointer value.
        ECC can't solve for that because it would only have one key for all processes.
        Given a presumption of side channel writes, pointer auth is another layer of defense that may or may not be worth the loss in performance.
        https://news.ycombinator.com/item?id=21791269#21800181
- zozbot234 11 days ago
  This requires using arenas extensively in your program or perhaps playing tricks with virtual memory, because your system allocator will give you arbitrary pointers by default and you'll often want to point to an object from a different allocation.
  [-]
  - gizmo 11 days ago
    Right. Which is why I favor doing your own memory management entirely (which has some huge advantages) or not worrying about memory at all and trusting a garbage collector. I don't think there are many situations left where memory management is too important to leave to a gc but not important enough to do from scratch.
    [-]
    - adql 11 days ago
      Plenty of niche ways that are limited enough to not require program wide architecture change
      https://muxup.com/2023q4/storing-data-in-pointers#some-real-...
- gpderetta 11 days ago
  Definitely indices are a great option, but then you need a base pointer and a way to allocate off of it. That can add significant complexity. So it is all a tradeoff.
pavlov 11 days ago
Technically it’s an ointer on PowerPC and SPARC, but a pointe everywhere else.
sojuz151 11 days ago
This look like something that debuggers would hate, because following pointers would be broken
[-]
- jxf 11 days ago
  I was just thinking this. You'd either need to special-case your debuggers or be able to use something that provided bespoke hooks for pointer-following.
  [-]
  - gpderetta 11 days ago
    Wrap it in a class and teach the debugger to interpret it. It might be possible with gdb and python pretty-printers for example.
malkia 10 days ago
I've posted about MGS PS1 port we did to PC and how some pointers to the C4 bomb planted posted meant it was on the wall or on the ground - here ->
https://news.ycombinator.com/item?id=9739731
to quote me:
- The game used a tricky pointer hack, basically on the PSX accessing a pointer with the highest-24-bit set means read it from the CPU cache, otherwise not (or maybe the other way around). This was used for example to indicate whether the C4 bomb was planted on the ground, or on the wall instead of keeping a booblean/bit flag for it. Possibly was used for some more things. Since it was 24-bit, that meant 16mb.
zozbot234 11 days ago
Are there ways on Linux to make sure that all user-visible pointers in a process are allocated within some subset of the overall address space? For instance, imagine writing a 64-bit program such that all userspace pointers are allocated within the first 32 bits. This would allow for representing them in memory using four bytes, as is commonly done on 32-bit architectures--without adding more extensive or ad-hoc changes such as a new "x32" binary format. Except that you could also limit to 24-bit (giving 16M of address space), 16-bit (for a "tiny" code model) or anything else. Of course, this would require some amount of support in the linker/loader, userspace libraries and the kernel itself. But the effort might be worthwhile if it can allow for replacing more ad-hoc approaches.
[-]
- ksherlock 11 days ago
  Linux mmap (since 2.6) has a MAP_32BIT flag, but only for x86-64:
  Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2 GB of memory, so as to improve context-switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this performance problem, so use of this flag is not required on those systems. The MAP_32BIT flag is ignored when MAP_FIXED is set.
- fathyb 11 days ago
  An allocator is free to only `mmap` addresses that are within a range. I think jemalloc could allow you to do that using custom arenas.
- o11c 10 days ago
  ASAN relies on kernel internal details, but this has broken a few times.
  Realistically you should just mmap a huge aligned chunk in the first place, then allocate inside that.
willcipriano 11 days ago
I never had the problem of too many pointers, but if I did I'd try the "arena"[0] approach.
Preallocate a huge area of memory and use smaller indexes relative to the start instead. Provided a pointer is bigger than a int. You might be able to use this in that approach and keep partial pointers instead.
[0]https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...
gumby 11 days ago
> What do you call a pointer with the high bits stolen? An ointer!
What is the name a pointer with the low bits stolen? Machines with strict word addressing will ignore the bottom two bits so you can safely store what you want there.
Not sure I've used, much less programmed a machine with that strict constraint in a while though. These days CPUs are all designed to implement the C abstract machine. At least the GPU folks are not subject to this constraint.
asb 11 days ago
I somehow hadn't come across this library, but have a whole blog post on the various ways people store data in pointers (and when+why it's safe) https://muxup.com/2023q4/storing-data-in-pointers
[-]
- makz 10 days ago
  Awesome, been looking for this information for ages already.
couchand 11 days ago
Am I misreading the bitmask code? It looks like (in addition to a few other ideas) it's using the old "stick a few extra bits in an aligned pointer", but it seems to be only manipulating high bits, whereas aligned pointer zeroes are low-order bits.
I'd suggest a heavier investment in testing infrastructure.
[-]
- masklinn 11 days ago
  64 bit architectures don't actually have 64 bit address spaces, both AMD64 and ARM64 have 48 bit address spaces by default (some CPUs have extensions you can enable to request larger address spaces e.g. LVA on ARM64 and 5LP / LA57 on AMD64 but that's opt-in on a per-process basis).
  So while you have 3 bits available at the bottom of the pointer, there are 16 at the top. That's a lot more payload you can smuggle. There are even CPU extensions which tell the processor to ignore some of that (Linear Address Masking for Intel, Upper Address Ignore for AMD, and Top Byte Ignore for ARM).
  [-]
  - vlovich123 11 days ago
    Small correction. That's true for 4-level paging where virtual memory is capped at 256 TiB (LAM48). There's a 5-level extension that reduces the wasted space to 7 bits (LAM57) to allow for 128 PiB of virtual space.
    I have no idea what the purpose of the extension is that when I don't believe you can get that much secondary storage attached to a CPU unless you go to tape which is pointless for virtual mapping.
    [-]
    - masklinn 11 days ago
      ...
      I literally mentioned five-level paging in my comment.
      But it's an opt-in extension, you need the kernel to support it, and then you need to ask the kernel to enable it on your behalf. So it's not much of an issue, you just have to not to use this sort of extensions (it's not the only one, ARM64 also has a similar one though it only goes up to 52 bit address spaces) with 16~19 bits of tagging.
      [-]
      - LegionMammal978 10 days ago
        You do need to opt-in when compiling the kernel, but on Linux it doesn't take anything particularly special on the program's side to enable it. The rule is that non-fixed mmap() will only ever yield 48-bit addresses, until the program specifies a 57-bit address as a hint, after which it can yield 57-bit addresses at will. (This rule has lots of curious consequences for the location of the vDSO, when 5-level paging is enabled and an ELF program uses big addresses for its segments.)
        [-]
        vlovich123 10 days ago
        Any idea what the use case is for such large addresses? Is it RDMA or something else? Even with RDMA I find it hard to believe that companies have > 256TiB of RAM installed behind a single switch.
        [-]
        LegionMammal978 8 days ago
        I recall hearing one use case where you have a whole lot of threads, and most of them will only use a little memory, but a few will need a whole lot of memory. So you assign each thread a very large segment of the address space upfront, so that they never have to coordinate with other threads at runtime. At 1 GiB of space allocated to each thread, it takes only 262k threads to use up the whole 256 TiB address space.
        menaerus 9 days ago
        Good question. I also may not understand that since as of today, and to my knowledge, top of the line Intel Xeon's can support up to 4TB per socket. This means that for a 8-socket system, largest amount of physical RAM would equate to 32TB ... which is not even close to the addressable (virtual) memory on even 48-bit systems (256TB).
- Sharlin 11 days ago
  The stored representation is packed such that all the stealable bits are contiguous. To get the original pointer value it’s unpacked first.
- pixelesque 11 days ago
  It looks like it can use both traditional "tagged pointer" alignment bits AND virtual address bits...
  [-]
  - couchand 11 days ago
    I see that in the readme but I don't see where that's handled in the bitmasking. It appears to be universally masking high-order bits here: https://github.com/irrustible/ointers/blob/1961f75bbb9818d72...
    [-]
    - formerly_proven 11 days ago
      Read the function above the function you linked to.
      [-]
      - couchand 11 days ago
        Oh that's frightening they shift the entire pointer by the alignment
        [-]
        db48x 11 days ago
        Why? Alignment causes some low–order bits to be zero. Shifting the pointer to the right drops those zeros on the floor, leaving high–order bits zero (or sign extended or whatever). Then you can put your tag up there instead. Shifting the value left by the same amount drops the tag and recovers the same pointer for use.
        [-]
        couchand 10 days ago
        My guideline with pointer tricks is: if you're not just masking bits it's too complicated for real use.
        vlovich123 11 days ago
        Any idea why it's preferred to do the alignment route instead of storing the bits in the upper part of the pointer & masking?
        [-]
        db48x 10 days ago
        Personal preference, architectural differences, phase of the moon, etc, etc.
zgs 11 days ago
Every time I've seen similar done in the past, it has come back to hurt the instigator.
[-]
- masklinn 11 days ago
  Apple's been using that for more than a decade: https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-bui... http://www.sealiesoftware.com/blog/archive/2013/09/24/objc_e...
  OCaml's also been using tagging to mark unboxed type pretty much from the start, nearly 30 years ago: https://dev.realworldocaml.org/runtime-memory-layout.html#:~....
  [-]
  - duskwuff 11 days ago
    And it's not like Apple isn't aware of the dangers, either; they used top-byte tagging on the 68000, and it took them a lot of time and effort to migrate away from that to support 16+ MB systems.
    [-]
    - gpderetta 11 days ago
      IIRC the issue was that 68000 would transparently mask the top bits, so code could get away with avoiding the explicit masking, which of course breaks when you move to a larger address space.
      More modern cpus enforce specific bit patterns for the MSBs, so broken code would be caught immediately.
- jstimpfle 11 days ago
  It's common for libraries to use the 2 lowest bits in tree structures e.g. RB tree. It's not a problem at all since the data structure is controlled by the library. Even if tree nodes are allocated/provided by the user, having a "4-byte aligned" requirement in the API contract isn't a problem at all -- in fact you need to work quite hard to allocate a structure with a pointer that isn't at least 4 byte aligned (i386), or 8 byte aligned (x64).
  [-]
  - a1369209993 11 days ago
    > in fact you need to work quite hard to allocate a structure with a pointer that isn't at least 4 byte aligned (i386), or 8 byte aligned (x64).
    Well, no, actually; it's:
```
  p = malloc(size+1)+1;
```
    It's just quite implausible that you'd do that by accident.
    [-]
    - menaerus 10 days ago
      Well no it isn't since malloc returns a void* and incrementing a void* is impossible for a compiler to figure out the offset that it needs to increment address for. For that operation to succeed/compile you need to cast the result of malloc first to p*. This means that the resulting alignment will still be correct.
      The actual danger is in reinterpreting the pointers, or thereof some arbitrary chunk of memory, to a type that doesn't necessarily satisfy the alignment requirements for that given memory address.
      [-]
      - jstimpfle 10 days ago
        gcc will compile this code and advance the pointer address by 1 byte. Yes, it's not standard conformant.
        My point though is that this is unlikely to happen by accident. The code above is clearly constructed. It's more work to type it than the correct version. The code makes no sense at all. 1 byte is wasted and it's in general a bad idea to allocate unaligned pointers.
        And when you for example provide your own allocator for nodes, you'll probably do it like this:
        Treenode treenode_pool[MAX_NODES];
        or this:
        Treenode *treenode_pool = allocate(Treenode, node_count);
        And the memory will be aligned properly. Even when you use a plain malloc (no type or alignment information given to the allocator) you will get memory that is at least pointer-aligned.
        [-]
        menaerus 10 days ago
        > gcc will compile this code and advance the pointer address by 1 byte. Yes, it's not standard conformant.
        https://godbolt.org/z/cjoYvjxGq
        <source>: In function 'void foo(int)': <source>:4:28: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] 4 | int * p = malloc(size) + 1; | ~~~~~~~~~~~~~^~~ <source>:4:28: error: invalid conversion from 'void*' to 'int*' [-fpermissive] 4 | int \* p = malloc(size) + 1; | ~~~~~~~~~~~~~^~~ | | | void\* Compiler returned: 1
        [-]
        jstimpfle 10 days ago
        first, yes it did accept the pointer arithmetic even though it issued a warning.
        Second, the error about implicit conversion from void-pointer to typed-pointer is in C++ only -- not in C, where you won't even get a warning. The error happens because you didn't cast the void pointer to the target type and doesn't have to do anything with the fact that gcc _does_ let you (as a GCC extension) do arithmetic on void pointers.
        If you remove the "+ 1" you'll still see the same error in C++. In C++, change the target pointer to a void pointer:
        void *ptr = malloc(42) + 1;
        or, alternatively, cast the RHS expression properly (doesn't matter with "+ 1" or without)
        int *ptr = (int *) (malloc(42) + 1);
        and the error will go away.
        [-]
        menaerus 10 days ago
        I don't think there's a need to explain to me how pointers work, or what to do to make this error/warning go away, since it should be pretty obvious from my response that this is something I know very well.
        I was only disputing the claim that ending up with the unaligned ptr is not trivial for given line of code by providing explanation, and counter-example, why this is not possible. At least not in C++ because that's the language I had in mind.
        What you're saying is that it's possible to do void pointer arithmetic in C by using the GNU extension - fine, I can't disagree with that.
        [-]
        jstimpfle 10 days ago
        You corrected me and I corrected you. No need to take personal offense. I don't care what you know -- I obviously took the compiler output that you presented above as a refutal to what I said earlier, and I pointed out why it's not a valid refutal.
- db48x 11 days ago
  Every lisp system that ever existed uses this technique, and it never hurt them any. Emacs or SBCL or Open Genera; they all work perfectly well.
  [-]
  - gpderetta 11 days ago
    And the modern take on this is NaN Boxing!
    [-]
    - saagarjha 10 days ago
      I think you and I have a different opinion on what “modern” means ;)
      [-]
      - gpderetta 10 days ago
        This millennium :D
  - rjsw 11 days ago
    Maclisp on PDP-10 and Franz Lisp didn't, they used bibop type checking instead of tagged pointers.
    [-]
    - spacechild1 10 days ago
      > Franz Lisp
      Off topic, but that's probably the silliest name for a programming language I've seen. I love it!
    - db48x 11 days ago
      That’s merely an elaboration on the same theme.
      [-]
      - rjsw 11 days ago
        It isn't, pointers in Franz Lisp can be used unmodified in C.
        [-]
        db48x 11 days ago
        But those pointers still have bits in them that indicate the type of the object that they point to. It is merely the case that the tag bits are chosen at run time so that they correspond to allocated memory regions, rather than being chosen ahead of time.
        [-]
        rjsw 11 days ago
        None of the bits in the pointer have been "stolen" though, which is the point of the referenced article, they are all still available to access the full address space of the CPU.
        [-]
        db48x 11 days ago
        That doesn't mean that they aren’t still there, and don't tell you the type. It’s still a tagged pointer, and because all objects of the same type must be allocated out of the same page (or collection of pages), it still limits how much memory can be dedicated to objects of that type.
  - lmm 11 days ago
    > Every lisp system that ever existed uses this technique, and it never hurt them any.
    Wasn't it part of the reason they ended up with poor "mechanical sympathy" on regular PCs, and got a bad performance reputation as a result?
    [-]
    - db48x 11 days ago
      No. Their reputation for poor performance mostly arises because the default data structure is a singly–linked list. Although the language makes these lists extremely convenient for the programmer to use, linked lists invariably result in terrible cache locality. High–performance systems that are written in lisp (and they do exist), achieve that performance by avoiding lists and using vectors and arenas instead.
      Of course, it must be said that this performance penalty rarely matters in practice. For most programs, the limiting factor is not performance but the cost of implementation and maintenance. Lisp has has many powerful and convenient features that make implementation faster and surer than in any other language, and which usually greatly lower the cost of maintenance as well. Thus by writing a new program in Lisp you can get it off the ground and earning you money faster than you could in almost any other language. You can also add features more cheaply as time goes on, allowing you to keep up with your competitors. It is only when your system is nearing completion, and all necessary features have been invented and implemented, that you should think about rewriting select parts of that system in a systems language where more efficiency is possible. Not only will you only understand the system after it is written, but you can measure the actual performance of the system to figure out which parts you need to rewrite, and which parts have acceptable performance already and thus do not need to be rewritten.
      [-]
      - pfdietz 11 days ago
        Cache locality is improved if the Lisp has a compacting garbage collector, but you still need extra memory for the cdr links. Lisp Machines had cdr-coding that allowed those to be avoided but I don't think that's practical on stock architectures.
        [-]
        db48x 11 days ago
        True, but don’t forget that one of the _other_ causes of performance problems in lisp programs is the creation and subsequent collection of garbage. If you know ahead of time which of your data needs to be accessed or created by the most performance–sensitive parts of your program, then you can put that data into vectors to start with and avoid all of the extra work.
        [-]
        pfdietz 11 days ago
        Long-lived things end up in older generations and aren't traversed much during GC, which is the moral equivalent of being statically allocated.
        There's overhead if you mutate old objects in a GCed system due to card marking.
        Lisp vectors are vectors of pointers, so there's still an overhead of dereferencing through it. Presumably the objects being pointed to end up compacted together, eventually.
        [-]
        db48x 11 days ago
        I should have been more clear; I meant using vectors as arena allocators, so that all of your data ends up stored contiguously with no extra indirection.
      - mik1998 11 days ago
        > linked lists invariably result in terrible cache locality
        With a compacting GC (ie most of the relevant ones) lists often provide better memory locality than arrays. They only lose at a large scale due to having +8 bytes of memory, which makes arrays occasionally cheaper for large sequences.
        [-]
        gpderetta 10 days ago
        Even with perfect cache locality, a linked list will be significantly slower to traverse than an array because of worse pipelining and parallelization. Partially unrolled lists can help, but at the cost of additional complexity.
        Of course not all container accesses require traversal.
        nequo 11 days ago
        Why do lists provide better memory locality than arrays? The array already stores its elements next to each other in memory. The compacting GC helps arrange the elements of the list close to each other too, right? Then wouldn’t the compacting GC at best even out the performance difference for sequential traversal?
      - jxf 11 days ago
        > For most programs, the limiting factor is not performance but the cost of implementation and maintenance.
        The limiting factor for what? Their commercial success, or something else?
        [-]
        db48x 11 days ago
        Success, yes. It need not be direct commercial success; lots of programs are written purely for internal use within some larger organization. And even when there are known performance requirements, Lisp is often still a good choice because the performance is handled by specialized hardware. Consider games, for example. Time to market and rapid iteration are both paramount, while performance is taken care of not by writing a fast renderer but by sending everything to the GPU to be rendered on specialized hardware.
    - samatman 11 days ago
      It's not. Two counterexamples: LuaJIT's interpreter/VM, and Wren's, both use NaN boxing, and are among the fastest VMs out there (we're ignoring JITs as out of scope).
      It isn't tagging pointers that makes things (some Lisps are 'things', some are not) slow: it's pervasive abstraction and indirection. Doing some masking on a pointer is pipelined, out-of-order, one to three cycles, and is absolutely dwarfed by cache misses and conditional mispredictions, let alone the sort of pointer chasing which is common in idiomatic Lisp (or indeed Python) code.
    - bitwize 11 days ago
      I don't know what you mean. Maybe in the early early 80s that was the case, but PCs were still 16-bit back then, and would've been a poor fit for Lisp anyway.
      One of the reasons why the Lisp machines died out is that round about the mid-80s or so, compiler technology improved for generic 32-bit processors, and it became possible to run Lisp software on a VAX, 68k, or RISC CPU faster than it ran on the Lisp machines' bespoke architecture. Back during the first AI hypecycle, the makers of Golden Common Lisp introduced the Hummingboard to the market, billed as an inexpensive solution to have a "Lisp machine in a PC". It was just a 386 on an ISA card (predating Compaq's 386 PC by about a year) with gobs of memory on board; a special build of Golden Common Lisp allowed code to be run on that CPU rather than the main one.
      [-]
      - lispm 11 days ago
        Symbolics had its own PC implementation of Lisp: CLOE.
        From the Lisp FAQ: "CLOE (Common Lisp Operating Environment) is a cross-development environment for IBM PCs (MSDOS) and Symbolics Genera. It includes CLOS, condition error system, generational garbage collection, incremental compilation, code time/space profiling, and a stack-frame debugger. It costs from $625 to $4000 and requires 4-8mn RAM and a 386 processor. "
        Later they also ported Genera to DEC Alpha. Currently I have access to an implementation which runs on ARM64 and Intel x64.
      - coldtea 11 days ago
        >One of the reasons why the Lisp machines died out is that round about the mid-80s or so, compiler technology improved for generic 32-bit processors, and it became possible to run Lisp software on a VAX, 68k, or RISC CPU faster than it ran on the Lisp machines' bespoke architecture.
        I'd say Lisp machines died out because Lisp died out (commercially). Other languages got popular, the second AI winter didn't help at all either.
        If Lisp itself had fared better (even if it was on generic hardware), Lisp machines could have been saved too, they could still use a VAX, 68k, or RISC CPU underneath, and optimize for the developer experience. Or they'd have turned Lisp machines from hardware into a "Lisp machine" OS or IDE/REPL/etc environment for other OSes succeeding. But none of that took off.
        [-]
        bitwize 11 days ago
        > Or they'd have turned Lisp machines from hardware into a "Lisp machine" OS or IDE/REPL/etc environment for other OSes succeeding.
        You mean like Allegro CL? LispWorks? Genera on DEC Ultrix?
        With the exception of Genera, these solutions are available, maintained, and supported today. Even Genera hung on for a good few years. Lisp machines started to wane a few years before the AI winter hit in full force, because the idea of dedicated hardware to run Lisp programs made sense when your alternative was Maclisp struggling on a PDP-10, but became expensive, slow, and fiddly compared to the generic boxes running fast 32-bit processors with, again, much improved compiler tech. Genera on DEC Alpha, even as an interpreted VM, was so much faster than any of Symbolics's bespoke CPU architectures that Symbolics just quit making hardware and declared the Alpha version of Genera the official upgrade path.
        [-]
        rjsw 11 days ago
        One road not taken would have been to port the MIT/LMI/TI Lisp Machine software to stock hardware, it was 32-bit until the end so didn't need an Alpha as the host. A fast 68020 with a custom MMU would have been fine. There was only around 12k of "assembler" to rewrite to provide the VM and simple RTOS.
        coldtea 10 days ago
        >You mean like Allegro CL? LispWorks? Genera on DEC Ultrix?
        None of these cover my whole description, namely the last part:
        "Or they'd have turned Lisp machines from hardware into a "Lisp machine" OS or IDE/REPL/etc environment for other OSes SUCCEEDING".
        I wasn't talking about mere existance of those.
        My argument was "if the issue was not Lisp falling in adoption in general, but merely Lisp Machines being dedicated hardware (as the parent claimed), then Lisp OSes for generic hardware and Lisp IDEs/envs for popular OSes would have succeeded.
        [-]
        lispm 10 days ago
        > Or they'd have turned Lisp machines from hardware into a "Lisp machine" OS or IDE/REPL/etc environment for other OSes succeeding.
        That was a very different business model, very different offering into a different market. One could not move that business model easily, in an existing company to a different market. There were offices, factories, contracts, ... -> costs to get rid of. The thing collapsed, before anything usefully could be scaled down.
        For example the Symbolics graphics products were very expensive, just the software. A new owner ported it to SGI and Windows NT machines. To be a technically viable product it used a commercial Lisp vendor. It survived a while in that market (modelling/animation/game tools for game studios, animation studios, ...), changed owners again and then died.
        Lisp (the forbidden word during/after the AI Winter) was a part of the problem, but generally a new business model and customers for it wasn't found/searched. For example TI just closed its AI business and never cared about it from then on.
        Something like Genera was a huge pile of Lisp code written during 1.5 decades. During its best times the OS and maintenance upgrades were already more expensive than a PC.
        Applications from the Lisp Machine were ported away. One no longer needed the OS and no longer had the OS. Some applications died, some survived, some died later.
        Some applications (or the development environment) survived for some years on emulators (-> Interlisp-D was ported to SUNs and PCs, Genera was ported to DEC Alpha).
        [-]
        skissane 10 days ago
        > Something like Genera was a huge pile of Lisp code written during 1.5 decades. During its best times the OS and maintenance upgrades were already more expensive than a PC.
        Genera's biggest contemporary problem is that John C. Mallery seems to want to just sit on it rather than make it available to people.
        Likely future revenues must be close to nil, so why not open source it? Apparently he talked about it years ago but never actually has.
        Yes, a lot of the code is really dated, but if it were open source, maybe some people might freshen some of it up.
        [-]
        lispm 10 days ago
        The MIT has open sourced their Lisp Machine software.
        https://tumbleweed.nu/lm-3/
        bitwize 9 days ago
        Franz and LispWorks are still in business through several boom-bust cycles. You never specified, what are your criteria for success? Some market share percentage? My point was, the second AI winter happened, but interest in Lisp machines waned first, because Lisp on generic processors reached a point where they could run the same code faster for less money.
        zozbot234 11 days ago
        > Or they'd have turned Lisp machines from hardware into a "Lisp machine" OS
        Emacs is that Lisp OS. But it's still lacking a good text editor.
        [-]
        pfdietz 11 days ago
        Ok, that made me laugh.
      - skissane 10 days ago
        > Maybe in the early early 80s that was the case, but PCs were still 16-bit back then, and would've been a poor fit for Lisp anyway.
        There were LISPs for 16 bit MS-DOS. The most famous was arguably XLISP, which was used as the basis for AutoCAD’s AutoLISP extension language, and also the XLISPSTAT statistics package. Another was muLISP, which was resold by Microsoft as Microsoft LISP, and also used as the basis of the muMATH computer algebra system, and also its successor Derive.
        [-]
        bitwize 10 days ago
        I'm familiar with Xlisp, Microsoft Lisp, and PC-Scheme. What I'm not sure about is how performant they were, or whether large systems could be built with them.
        [-]
        skissane 10 days ago
        > What I'm not sure about is how performant they were,
        They were sufficiently performant to be used in anger. I mentioned some of the real world uses
        > or whether large systems could be built with them
        I mentioned the muMATH computer algebra system, built with muLISP. I think that would count as a large system, at least by the standards of the time.
    - lispm 11 days ago
      No, the reason for less than stellar performance of Lisp on the 32bit x86 CPUs were things like too few registers.
      On 64bit Intel CPUs this is much less of a problem and Lisp runs roughly twice as fast in 64bit mode. I would think that this even holds without taking advantage of wider data words.
      On other architectures this was less of a problem. 32bit CPUs had more registers on a RISC or some other CISC CPUs.
  - _flux 11 days ago
    OCaml uses the lowest bit, which is nice because (aligned) pointers work as-is.
- mbrubeck 11 days ago
  This bitvec I wrote for Servo and Gecko uses this technique, and has been shipping in Firefox releases for over six years with no bugs found:
  https://docs.rs/smallbitvec/
  I'm pretty sure you can find several more examples in Firefox, as well as other major browsers.
- junon 11 days ago
  VM writers use this egregiously. I've not heard of it causing issues. It's not that complicated.
  Lua, Lisps, many others.
- titzer 11 days ago
  All fast JavaScript VMs use some form of pointer tagging. V8 uses (compressed) tagged pointers and both JSC and Firefox used NaN-boxing.
  WasmGC has a i31ref as part of its reference hierarchy, which is implemented with pointer tagging.
- pixelesque 11 days ago
  I've used tagged pointers with no issues at all over the years to "smuggle" / store values in...
- coldtea 11 days ago
  This "trick" has been used since the dawn of time in major platforms
  [-]
  - adql 11 days ago
    I dunno, pointers were kinda small on 8/16 bit platforms
    [-]
    - coldtea 10 days ago
      Smalltalk for example used 1 bit (or 2 bit?) tagged pointers
- adql 11 days ago
  Any example ?
- speed_spread 11 days ago
  If you build your whole app with it, yeah. If you use it tactically for a certain data type it can be very nice. Just don't try to dissimulate it's specialness.
queuebert 11 days ago
If the ointer points to a very large block of memory, is it an oinker?
gwbas1c 11 days ago
Could something like this be used to lower memory footprint?
[-]
- menaerus 10 days ago
  Besides that it makes some lock-free algorithms even feasible and some easier to implement, yes, it also can be used to downsize the memory pressure at the cost of extra CPU cycles if you know that the metadata you want to track fits in those 16+ bits.