RTM-Z80: Retro Tiny Multitasking System for Z80 Based Computers

(github.com)

111 points | by elvis70 825 days ago

2 comments

kragen 825 days ago
The Z80 or 8080 is super interesting for bootstrapping purposes because it's roughly the smallest computer where a self-hosted development environment is practical. Smaller systems like the 8008, the PDP-8, or the AVR are just a little too small; you can run a filesystem and a compiler or an assembler on them, but you probably don't want to. By contrast, CP/M on the 8080 could host not only the assembler used to write CP/M, but also the BDS C compiler (now free software) or the Turbo Pascal IDE. And, unlike the PDP-8, you can still buy a Z80.
Depending on how you count, the 8080 has a few dozen instructions: https://dercuano.github.io/notes/8080-opcode-map.html#addtoc...
So a new multitasking OS for the Z80 is pretty great!
It has a few big disadvantages. Although you can multibank, there's no way to directly address more than 64K, which is a real pain. Even the Z80 is not well suited to languages where functions are recursive by default, like C, and the 8080 is even worse at that. There's no memory protection, not even an MPU, so there's no fault isolation and no virtual memory. And the Z80 chips I've found are ridiculously power-hungry, like, 500 mA, and still run slower than an AVR.
[-]
- codebje 825 days ago
  You can buy brand new 20MHz Z80s, but with the multiple clock cycles per instruction they'll still be slower overall than a 20MHz AVR.
  The later chips in the 8-bit family are much more power efficient; I built an eZ80 board whose total power budget is 500mA including SD card and Ethernet module. It runs at 50MHz, but performance isn't really the reason we muck around with 8-bit CPUs, right? :)
  [-]
  - kragen 825 days ago
    That's pretty neat! How much does your eZ80 board use when it's not accessing the SD card?
    Performance can substantially ease the difficulties involved with mucking around with 8-bit CPUs.
    As I explained in https://news.ycombinator.com/item?id=30017810, my point of comparison is the Ambiq Apollo3, which purportedly gets about 100 Z80s of performance on about half a milliwatt.
    [-]
    - codebje 824 days ago
      The board runs at 3v3, the CPU consumes around 140mA, with SD and Ethernet off its just memories and LEDs adding to that, so typically not much higher than 150mA sustained.
      It's likely a bit less than that over USB Vbus but it's just running a linear regulator down to 3v3 so mostly the difference just becomes heat.
      [-]
      - kragen 824 days ago
        Thanks! 140mA is pretty high; it's literally 1% of the battery life (100× the power draw) of a more modern 8-bit processor like a 20-year-old 20MHz AVR, which is a bit faster than a 50MHz Z80. And as I mentioned below the Ambiq processors are orders of magnitude faster at the same power draw, and can scale down to very low power draws. They can self-host compilers but probably not logic synthesis.
        [-]
        codebje 824 days ago
        The eZ80 is a better Z80, but it's still a Z80. It's worse in pretty much every way to a modern MPU.
        The STM32 line is my preference for performant parts - there's an STM32L0 on the board doing USB to serial. I could likely emulate the eZ80 on that faster than the real thing runs, but I couldn't build an 8-bit bus computer with it.
        [-]
        kragen 824 days ago
        I'm pretty sure the S1 MP3 and the TI-83 CPU don't use 100 mA though. I'd sure like to know if someone is making a low-power Z80 I can actually buy.
        I mean, it's a little goofy, because a lot of the Z80 design was a response to design constraints that no longer exist, right? You don't need to fit your CPU in 40 pins or 10000 transistors anymore. And even the MuP21 probably would have been a better design under those constraints.
        [-]
        codebje 823 days ago
        The TI-84 Plus CE, at least, uses the same CPU as my board (plus or minus revisions) - but it clocks it _much_ lower, at 15MHz. The eZ80 uses around 30mA at that speed; it also has a sleep mode using under 3µA that I would expect a battery powered device to make heavy use of.
        The Z84C00 at 6Mhz will also "only" use around 30mA, down to 10µA in standby. At 15MHz the Z84C00 uses something more like 80mA, though, so the "turbo" mode in that calculator line probably chews through batteries.
        I expect TI still uses the eZ80 CPU for the same reason other companies still use Z80-based processors - backwards compatibility. The Z80's got a very long life in industrial control systems, where you want to avoid rewriting the code if you reasonably can.
        The eZ80 is about as modernised as you can take the Z80 core, but as you say a lot of the Z80 design is due to the constraints of the era. It was also designed to be code compatible with the Intel 8080, and probably would have just quietly fizzled on launch if it didn't have that.
  - anthk 824 days ago
    At 50 mhz, Z-Machine v5 games get usable enough.
- jhallenworld 825 days ago
  Another disadvantage is lack of any king of relocation hardware. If you don't have it, it means you are tied to a particular memory layout without run-time re-linking. 6809 has it through position independence. 8088 has it with segment registers (at least for .COM files and small memory model .EXE files. Large model files are re-linked during loading).
  CP/M's solution to this is to start the code at near-zero (0x100). This way forces the OS, drivers and hardware out of the way and permits binaries from one system to at least possibly run on another.
  [-]
  - ddingus 825 days ago
    6809 also has great reentrancy support with stack and pointer relative indexing modes featuring auto increment / decrement.
    More interestingly, I never understood why code was at $0100 in CP/M land. I just got a Z80 card for my Apple 6502 based retro computer. I have never used CP/M and the availability of fairly sophisticated software interests me.
    [-]
    - kragen 825 days ago
      The first 64 bytes of the 8080 address space contain the interrupt vectors, which I think normally pointed into the CP/M BIOS, so you wouldn't want to overwrite them with the contents of your COM file executable. But the BIOS is loaded from disk (unlike on the IBM PC), so that needs to be RAM. CP/M needs a little bit of memory space for its own variables, and it makes sense to put it next to the interrupt vectors, since we know there's RAM mapped there, because there were no RAM chips as small as 64 bits. And we know there's more RAM than CP/M needs for enough of its own variables to permit a warm boot, because there were no RAM chips as small as 256 bits either. So it makes sense to load the user program immediately after CP/M's variables, because any extra slack space there is wasted.
      That sort of constrains the load address to be something in the range 0x41-0x200 or so. I'm not sure why they ended up at 0x100 instead of, say, 0x80 or 0xC0.
      This setup was a big problem for the TRS-80, which mapped ROM at the bottom of memory.
      [-]
      - kragen 825 days ago
        Correction: according to https://en.wikipedia.org/wiki/Intel#Early_history Intel's first product, in 01969, was the 3101 64-bit TTL SRAM chip. So there were RAM chips as small as 64 bits. But they weren't what was normally in CP/M machines; the base MITS Altair came with 256 bytes of RAM, but needed at least 4096 bytes to run BASIC.
  - nielsbot 825 days ago
    Not sure it's exactly the same thing, but O.G. Mac OS got around this with "handles". Basically pointers to pointers. The toolbox was free to move (or even unload?) chunks of memory because all addressing was doubly-indirect.
    The toolbox on the //GS may have also used handles.
  - ecpottinger 825 days ago
    Could the Amiga file type use of Hunks be used here? Your code sections were in Hunks with each Hunk having a relative section that tells where offsets need to be updated to match the position in memory the Hunk was written to.
    [-]
    - kragen 825 days ago
      Sure, that's just a standard linker; the Unix and Windows term for "hunks" is "sections". This can vary on a couple of axes: when the linking happens and how many relocations you have.
      The extreme case of "how many relocations you have" is zero, which is position-independent code. The 8080 inconveniently uses absolute addresses rather than relative addresses in its call and jump instructions and doesn't have a PC-relative data addressing mode (or really much in the way of addressing modes at all; the Z80 is slightly better here). Lacking a PC-relative mode for one or the other means some base pointer register somewhere has to tell the module where it lives. On the 8080 or Z80 this would make your code super slow and bloated, particularly since statically allocating addresses to your data is such an important optimization there.
      An intermediate case is "the linker relocates a global offset table". Without any kind of base address register this requires each GOT entry to be at a fixed memory address, so it has to be globally allocated across all the modules you're ever going to have loaded: one entry for every subroutine or global variable. Every call and global variable reference has to indirect through the GOT. For global variables this is also super painful on the 8080 because of the lack of indexed addressing. A nice plus, though, is that it makes it easy to unload a module, as long as the module isn't on the call stack or otherwise has its addresses stored someplace. This is useful both for having more code available to run than you have memory for, and for dynamically reloading code as you modify it.
      The other extreme is the usual case for LINK.COM or ld(1): you have one relocation per callsite or global variable reference, so linking takes a long time, but there's no per-variable-reference or per-callsite overhead.
      There's a similar set of options for when the linking happens. You can allocate all the static addresses during compilation, which avoids the need for a separate linking stage but also makes separate compilation impossible, which is pretty much a nonstarter for self-hosted development on 8-bit systems. You can allocate the static addresses when you build an executable, which is the standard LINK.COM/ld(1) approach. You can allocate them when the program starts, which is the current ld.so approach, and requires the linker to run at program startup. (You might automatically cache the thus-relocated version of each module so that future loads at the same address could be instant, but the floppies on 8-bit micros were both too small and too slow for this to be practical.) Or you can allocate them when the first reference to them is made, by initially pointing all the GOT entries at a trampoline function that passes control to the dynamic linker to resolve just that single GOT entry, which requires the linker to run throughout the lifetime of your program. (In theory you could do this without a GOT on a per-callsite basis, but I've never heard of anyone doing that.)
      A scheme more common than runtime linking on 8-bit computers (and on pre-IBM-360 mainframes) was "overlays," which allowed you to have more code in your program than would fit in RAM, without slowing it down. You partition your program into a central core, which was always loaded, and two or more overlays, only one of which was loaded at a time. The overlays couldn't call each other, but they could call within themselves and to the central core with no overhead. When the main program wanted to call an overlay, it would have to first make sure that the overlay it was calling was the one that was currently loaded, and if not, load it from disk, which typically took several seconds. And overlay schemes could get more elaborate, with the possibility of having multiple overlays loaded at once, direct calls between overlays, and so forth. If you had multibanked memory you could load different overlays into different memory banks, and switching between banks was a lot faster than loading from floppy, but it wasn't nearly as convenient as just having a bigger memory space, even in the shitty 8086 segment:offset fashion.
      I've been thinking a lot about reviving some of these recondite techniques for low-power computing. The Ambiq Apollo3 ARM Cortex-M4F chip costs US$5 and has 1MB Flash and 384K SRAM, and uses 33 pJ/instruction plus 68 μA overhead at 3.3 V. So you could run a 10-DMIPS "workstation", a hundred times as fast as a Z80 and comparable to a SPARCStation 2 or a 386/40, on 0.6 milliwatts, which is about the same as a solar calculator, and you can burst to much higher speeds (48 MHz, about 60 DMIPS) or run there all the time when you have direct sunlight. Another 0.05-0.2 milliwatts pays for a 400×240 SHARP LS027B7DH01A memory LCD. But 384K of RAM is pretty limiting, so you need some kind of virtual memory or overlay scheme. It doesn't have an MMU, but I think its MPU avoids the need for a trusted compiler scheme like HotSpot to prevent different tasks from corrupting each other. And of course ARM is good at indexed addressing, has lots of registers, and has position-independent jumps and calls by default.
      [-]
      - jhallenworld 825 days ago
        Ambiq Apollo: There is a benchmark for these low power chips:
        https://www.eembc.org/ulpmark/ulp-cp/scores.php
        (click on the headers to sort..)
        I think Apollo is the most general purpose of these chips, even though it no longer gets the top score. I assume the RSL10 is using its DSP, and the R7F0E01182CFP is Cortex-M0+ (no floating).
        I used it for a project: the peripherals are really nice. For example, the I2C slave uses a shared memory design instead of waking up the CPU to receive anything.
        [-]
        kragen 825 days ago
        Thank you! How much power does it really use in practice? I haven't had the chance to test one myself. What does ULPMark-CPX of 705 mean? I assume it must be a measurement of some kind of computation per joule, since it's higher at lower voltages, but is the actual power consumption computable from it? Or is that only how much power it uses in deep sleep? Are those other chips also using subthreshold logic?
        Maybe https://www.eembc.org/ulpmark/ulp-cm/scores.php is the actual power consumption used for computation (or rather vice versa: "number of CoreMark iterations a device can execute per milli-Joule" according to), which doesn't seem to include the Ambiq chips yet. I have the impression that a CoreMark is typically about 2 Dhrystone MIPS, which is not dimensionally compatible with the above.
        [-]
        jhallenworld 825 days ago
        How much: it was really low, and easier to get it in practice compared with ST's STM32L chips. I wonder if ST's new U series is using some kind of sub-threshold logic, but I assume Ambiq has the patents..
        [-]
        kragen 825 days ago
        There's a lot of subthreshold research done over the last 20 years, a lot of it not by Ambiq, and I imagine that Ambiq has avoided patenting a lot of the stuff they've discovered in hopes of being able to protect it for longer as a trade secret. And I think subthreshold isn't really relevant to sleep-mode power consumption except in the sense that it might lower the consumption of your RTC or PWM or whatever.
        There have been a couple of experimental processors on which professors have published papers claiming an order of magnitude lower power consumption than Ambiq claims for their chips. Assuming that's true, I don't know how much of that is a question of process variability (e.g., maybe 5% of Ambiq's chips also get such low power consumption) and how much is a question of other engineering compromises (e.g., maybe ARM is a particularly power-hungry architecture, as the lower power consumption of the GD32VF chips suggests, or maybe the experimental chips only worked between 47° and 47.5° and even then they crashed once an hour).
        By "really low" do you mean active power more like 1 microwatt, 10 microwatts, 100 microwatts, or a milliwatt?
        [-]
        jhallenworld 825 days ago
        So this was Apollo 1 at 3.3v: I think it was less than 3 milliwatts at full power, but easily below 1 microwatt when sleeping. For our design, I kept it mostly asleep. I also used Apollo 2, not quite as low power, but faster, better UART and it had more RAM and flash. For the UART: they added a timer to give you an interrupt when no more bytes were received (otherwise there was no interrupt because the Rx FIFO was not full).
        What I liked is that all peripherals were "low power"- all of them could work when the CPU was off and all could wake the CPU (well it woke on any pending interrupt). So it means it's more likely that the CPU is asleep. With STM32L, only peripherals designated low power would work while the CPU was off.
        [-]
        kragen 825 days ago
        Hmm, my first thought was, "Wow, 3000 microwatts is pretty disappointing; that's in the same ballpark as STM32L or even AVR8, though presumably the Apollo 1 was several times faster than the STM32L. I wonder if their newer chips are better; the datasheets sure claim they are."
        But actually that number is totally in keeping with what I'd gotten from the Apollo3 datasheet: 33 pJ per Dhrystone "instruction" at 48 MHz and 1.25 DMIPS/MHz is 1.98 mW, and then there's 0.22 mW overhead, so we'd expect about 2.2 mW. 3 is close enough, especially for an earlier-generation chip. You might get an STM32L or an ATMega328 down to a milliwatt or two in run mode, but what (theoretically!) distinguishes the Ambiq chips is not their maximum power draw, but their efficiency per instruction and their deep power down load, which is what determines their power draw at a given computational load.
        Limited low-power peripherals can be a huge headache, yeah.
        Thank you very much for sharing your knowledge!
    - jhallenworld 825 days ago
      It sounds like the same situation as large model on 8088. I'm wondering if any 8-bit systems did anything like this, but I don't think so.
      [-]
      - codebje 825 days ago
        Turbo Pascal used to support building your program as multiple executables with a shared data space, so you could chain from one part of your application to another.
        [-]
        fentonc 825 days ago
        Turbo Pascal also supports overlays, which allows you to chunk up your code and swap it into memory. I genuinely enjoy using the Turbo Pascal IDE on either my 1984 Kaypro or the monster 16-core “Zedripper” Z80 laptop I built. It’s great!
        [-]
        kragen 825 days ago
        I really liked your Zedripper page!
        [-]
        fentonc 824 days ago
        Thanks - I'm looking forward to going back to commuting someday, so I can actually use it!
  - anthk 824 days ago
    DOS too I think.
```
     org 100h # ;)
```
    [-]
    - kragen 824 days ago
      Yeah, MS-DOS COM files were designed so that you could reassemble your CP/M programs and get a working MS-DOS program. That's why they loaded at 0x100.
- userbinator 825 days ago
  Low power consumption SoCs with Z80 cores are rather common; in fact, a large number of the population probably has a device with one already.
  https://en.wikipedia.org/wiki/TI-83_series
  https://en.wikipedia.org/wiki/S1_MP3
  [-]
  - kragen 825 days ago
    I have a couple of S1 MP3s but I haven't seen a new one in 15 years, and the old S1 MP3 hacking community has died. Moreover I don't know where to source the chips. Similarly, I have no idea where to source the TI-83 calculator chips; does TI even sell them to third parties at all? How much power do they use?
- anthk 825 days ago
  Check CollapseOS.
elvis70 825 days ago
The initial announcement: https://groups.google.com/g/rc2014-z80/c/is43UD_1vFA
> My name is Ladislau Szilagyi, I'm 67 and I'm a big fan of Z80 retro computers.
> I just published on GitHub ( https://github.com/Laci1953/RTM-Z80 ) my RTM/Z80 project.
> RTM/Z80 is a multitasking kernel, built for Z80 based computers, written in Z80 assembly language, providing its users with an Application Programming Interface (API) accessible from programs written in the C language and the Z80 assembly language.
From the repo:
The user manual: https://raw.githubusercontent.com/Laci1953/RTM-Z80/main/RTM-...
> The RTM/Z80 project is intended to offer to the retro-computer hobbyists and to anyone willing to learn about multitasking systems the necessary resources needed to understand and learn the basics of this interesting but difficult area of software engineering.
The demo video: https://raw.githubusercontent.com/Laci1953/RTM-Z80/main/RTMD...