Transactional memory similarity to garbage collection

(realworldtech.com)

74 points | by signa11 1087 days ago

5 comments

pcdavid 1085 days ago
This analogy is not a new idea, see "The transactional memory / garbage collection analogy" by Dan Grossman at OOPSLA 2007: https://dl.acm.org/doi/abs/10.1145/1297027.1297080
PDF available here: https://homes.cs.washington.edu/~djg/papers/analogy_oopsla07...
[-]
- amelius 1085 days ago
  > Transactional memory is currently one of the hottest topics in computer-science research, having attracted the focus of researchers in programming languages, computer architecture, and parallel programming, as well as the attention of development groups at major software and hardware companies. The fundamental source of the excitement is the belief that by replacing locks and condition variables with transactions we can make it easier to write correct and efficient shared-memory parallel programs.
  Has this research come to fruition? What languages/environments use such an approach today?
  [-]
  - jasonwatkinspdx 1085 days ago
    Well, I'd say it's a bit muddled. Pure STM systems have seen limited success, mostly on the jvm, particularly clojure. But as a technology it's not spread widely in the industry, despite the fundamental advantage of composability. I personally would attribute this to a sort of gulf created by two common approaches:
    1. It's become very common to structure software as a stateless service cluster backed by a stateful database technology. Most of the databases used in this approach would not be described as STM, but the net effect to the application is similar, even if involving more layers and complexity.
    2. People have gotten pretty comfortable with simple mutex patterns connected by queues of various sorts. This is a sort of worse is better situation, where the simplicity and high performance of a mutex protected whatever in a toy test or microbenchmark is far more efficient than STM. However, a system composed of many of these mutex protected islands proves a whole different beast indeed. STM has largely been criticized from the perspective of the former, vs the latter.
    There are many people who have made the observation that transactional concurrency control, state of the art garbage collection, and even file systems have been converging on similar features. This is being driven by the common constrains on both sides of what hardware and humans expect. In particular with persistent memory, I think you'll see all three of these unify into a single design, because systems that attempt to solve these problems separately will have very inferior match to the hardware.
    [-]
    - barrkel 1085 days ago
      I think trying to scale out compute + mutable state across multiple CPUs on a single box has somewhat fallen by the wayside outside of specialized applications like databases and distributed query engines, as a local maximum avoided in favour of scalability via more boxes.
      There's several forces behind this. Applications are more likely to be web-based or services - i.e. inherently distributed - and less likely to be desktop or traditional client/server, where almost all compute happened on a single server. As distributed services, maximizing statelessness and avoiding mutable shared memory is key to solving a lot of problems: scaling out (no shared memory), redundancy (keep a second copy of the application logic running somewhere, no sync required), recovery and idempotence (if something fails, try again until it succeeds - repeated attempts are safe).
      Reliable persistent queues are part of that. They let you bring services up and down and switch over without down time, or restart after a failure and resume where they left off.
      The problems of shared mutable state are best kept in specialized applications: databases, queuing systems, distributed caches, consistent key-value stores. Keeping state consistent in a distributed system is a genuinely hard problem, and STM isn't much help, except perhaps as an implementation detail in some of those specialized applications.
      [-]
      - jandrewrogers 1085 days ago
        For what it's worth, scaling mutable shared state across multiple CPUs on a single box has fallen by the wayside for databases too. Thread-per-core style software architecture has become idiomatic, in part because it efficiently scales up on machines with a very large number of cores. Nothing about this precludes scale-out, it just allows you to serve 10-100x the load on a single box as the more typical naive design.
        Web applications aren't special in this regard. We've just normalized being incredibly wasteful of computing resources when designing them, spinning up huge clusters to serve a workload that could be easily satisfied with a single machine (or a few machines for availability).
        Premature scale-out, because we've forgotten how to scale-up, is the root of much evil.
        [-]
        kvhdude 1085 days ago
        This has been my experience as well. I shifted from embedded systems programming to web services world briefly. I found that scale up approach has been so maligned that it is ok to spin up several hundred ec2 instances running a (poorly written) java application instead of a single multi core instance running erlang that performed much better. Some frameworks even ran web services in php and spent all effort in not having the request reach php code since it was doing like 10 reqs/sec.
        [-]
        laurencerowe 1084 days ago
        Not a Java fan, but the single multi core instance being more perform any than a fleet of VMs is likely language independent.
    - thesz 1084 days ago
      SPJ of Haskell fame once discussed perceived STM failure in .Net: https://haskell-cafe.haskell.narkive.com/t6LSdcoE/is-there-a...
      Basically, poor STM performance in .Net is too much mutation and mutation-induced logging. This is why clojure's STM is much more widely used.
      I have particularly deep experience in various Haskell projects and I must insist that no relatively (1) big Haskell project (2) with a hint of concurrency and (3) developed by several people can go without use of STM.
      I can use MVars myself for problem space discovery. When there are at least two of use, I will use TVar and insist others should too.
    - foobiekr 1085 days ago
      I think (1) is really about where you land it.
      I am not free to name the product, but I worked on a product that has done billions of dollars in revenue that designed a bunch of state management checkpointing using a custom STM implementation for C (yes, C). It made so, so many things straightforward in terms of corner cases; we also extended transactionality to sync over the wire in some cases.
      I also think STM-adjacent designs are gaining some traction - for example, FoundationDB looks an awful lot like an STM system from the point of view of applications, much more than it looks like a traditional database.
  - twoodfin 1085 days ago
    It’s widely regarded as a dead end. Conceptually, it’s quite appealing so who knows, it might find new life someday. Garbage collection was a niche technology for decades, as was JIT compilation.
    Joe Duffy’s write-up on how STM failed at Microsoft with .NET is illustrative:
    http://joeduffyblog.com/2010/01/03/a-brief-retrospective-on-...
    [-]
    - PaulHoule 1085 days ago
      right... it's not hard to invent some scheme that scales to a 2 core computer but it is not crazy to have 64 or more cores these days and even if you have 2% serial overhead from communications you are not going to scale past 50.
      The problems of making a system really scale as opposed to just scale are bad enough that the best we know how to do is give people simpler primitives that work -- each layer you add to the system is another opportunity to add another limitation to it.
      Modern garbage collectors are fairly scalable (e.g. don't break outright as the heap size increases) and any manual memory management would also have the risk of scaling problems, to the point where you might say that garbage collection is more "scalable" when the thing being scaled is the kind of arbitrary complexity that appears in business applications.
      What people complain about with garbage collection are: (1) a fixed constant overhead they think is too high and will probably always think to high, (2) long pauses that are manageable in practice unless you are timing events with millisecond precision with a microcontroller.
      Thus one of these things is mainstream and other isn't.
      [-]
      - tomp 1085 days ago
        > What people complain about with garbage collection are: (1) a fixed constant overhead they think is too high and will probably always think to high, (2) long pauses that are manageable in practice unless you are timing events with millisecond precision with a microcontroller.
        I keep hearing this about "modern" GC and it keeps being very much not true in practice. I'm running IntelliJ IDEA (Java) on a laptop with 8GB of RAM, and it keeps freezing every so often. I'm suspecting GC causing paging. This is something that is obviously avoided with reference counting or manual memory management.
        [-]
        jhgb 1084 days ago
        > This is something that is obviously avoided with reference counting
        You'd prefer your data being larger in memory due to refcount fields and slower to access due to atomic operations on those fields?
        [-]
        tomp 1084 days ago
        It's not like tracing GC is without memory overhead...
        Slower is a question of priorities. I'd definitely trade off slower overall performance (I don't really need my IDE to be that performant... it's not like it's using 100% CPU, or even 10%!) but a huge improvement in latency (i.e. no catastrophic slowdowns / freezes).
        [-]
        jhgb 1084 days ago
        There are very few ways of having zero memory overhead, though.
        Also, most of the flaws of Java's memory management seem to be concentrated in the part of the language that says that almost everything has to be an object with a header and on the heap. Remove that and your graph of objects to trace shrinks very substantially.
        withinboredom 1084 days ago
        Interestingly, I was running out of memory, so I gave it more RAM (like 20GB) and it got even worse. Apparently it takes a long time to GC that much memory. I’d also rather amortize that pause. The loss of productivity due to a sudden pause is real.
      - twoodfin 1085 days ago
        Yeah. As is pointed out down-thread of the linked post, even when it was confined to a niche, garbage collection did what it claimed it would (eliminate the need for error-prone manual memory management) at an acceptable cost for many systems & applications. JITs were similar.
        So far, STM and HTM have followed a path more like VLIW instruction sets: Conceptually appealing for their promise of simplification, but in practice only demonstrating success in specialized uses, with the tradeoffs proving unattractive seemingly everywhere else they've been tried.
        [-]
        sitkack 1085 days ago
        I think it is a false equivalence to say that STM and HTM are like VLIW (which has been been a resounding success for DSP workloads). I am not defending VLIW.
        Transactions are extremely powerful conceptually, to be composable and not cause flapping requires a higher level of coordination.
  - whateveracct 1085 days ago
    Haskell has wonderful STM in the stdlib and a pretty good collection of community libraries on top of that.
  - pjlegato 1085 days ago
    Clojure famously makes extensive use of a software transactional memory (STM) to provide painless multithreading:
    * https://clojure.org/reference/refs * https://sw1nn.com/blog/2012/04/11/clojure-stm-what-why-how/
    [-]
    - fulafel 1084 days ago
      It kind of came and went in Clojure, not much code uses it today.
      In your second link, the linked followup post is also interesting as it captures some misunderstandings one might have when starting out with STM code.
  - cma 1085 days ago
    Hardware support is only very recent, Intel added support at various times and then had to remove it due to security and correctness issues, but I think it is now finally available:
    > Intel's Transactional Synchronization Extensions (TSX) is available in some of the Skylake processors. It was earlier implemented in Haswell and Broadwell processors as well, but the implementations turned out both times to be defective and support for TSX was disabled. The TSX specification describes the transactional memory API for use by software developers, but withholds details on technical implementation.[11] ARM architecture has a similar extension.[12]
    https://en.wikipedia.org/wiki/Transactional_memory
    I believe even with it available, OSes turn it off because of all the other Intel vulnerabilities and it making them even more expensive to workaround.
  - The_rationalist 1085 days ago
    Tsx is disabled on rhel :/ https://www.phoronix.com/scan.php?page=news_item&px=Red-Hat-...
signa11 1085 days ago
honestly, i found torvalds opinion quite useful: https://www.realworldtech.com/forum/?threadid=201184&curpost...
[-]
- RcouF1uZ4gsC 1085 days ago
  I thought the big win for Transactional Memory over locking was composability. Locks don’t compose. Whenever accessing data, you have to think about all the locks or else you end up with dead(live)lock.
  The promise of transactional memory was that it composes and you only have to think about your data access and not have to worry about running into deadlocks.
  [-]
  - hinkley 1085 days ago
    Borrow checkers feel like a bottom up approach to STM to me.
    There’s a lot of static analysis in the JVM that reduces locking, speeds up garbage collection, and removes the need to declare the type of a generic variable.
    I expect over time we will see inference in borrow checkers reduce the brain cycles you need to spend thinking about these things, and borrow semantics will be “good enough” for a long time.
    Or who knows, maybe STM systems will be able to infer borrow semantics and benefit from aggressive lock elision.
bigmattystyles 1085 days ago
Does this all go back to a derivative of the halting problem? You have an abstraction, a computer must make a decision on what you might ever want to do but while you may never intent for a set of conditions to occur, the computer cannot 'know' that. And the abstraction doesn't offer a way for you to specify your conditions to a more optimal manner.
exabrial 1085 days ago
Quarkus on the JVM has an interesting implementation of TX Memory, and it's integrated with the JTA global XA manager, which really gives it some cool capabilities.
slver 1085 days ago
Memory is allocated in pages, which are split in subpages, etc., then you throw at it heaps, queues, linked lists and so on.
All of this structure doesn't exist in the memory. So you need transactions at the page level which is too crude, or the byte level which is too granular.
Hardware is simply the wrong level to solve this problem at. Unless the hardware starts understanding how we use memory way, way, way, way better than it does right now.
[-]
- drivebycomment 1085 days ago
  You have misunderstood what transactional memory is. Think of it as database transaction applied to memory operations. Like a database transaction makes a collection of read and write and computation based on those read and write appear atomic w.r.t other transactions and read/write, transactional memory generally aims to make a section of instructions to have transactional semantic for all load and store and other instructions in the transaction. Most transactional memory proposals are not page level granularity - though obviously most proposals have certain limits, and byte level being too granular doesn't even make any sense.