Guix Reduces Bootstrap Seed by Half


284 points | by stargrave 137 days ago


  • reacweb 137 days ago

    I do not understand. When I have started on linux, 25 floppies were enough for a full installation (including latex and X) on my big HD (120MB). How a striped down version of bash, coreutils&co and guile could require 120MB ?

    • sideeffffect 136 days ago

      I suspect that the big part of that 120MB is bootstrap-guile (Guile is a Scheme implementation)

      if you want to go really minimalistic, see stage0, it's like 500 Bytes

      details are in the article

      • nrclark 136 days ago

        Out of curiosity, what makes Guile take up so much space?

        I considered using Guile as the scripting/extension language in some of our tools, but I couldn't get over the large size and the number of additional dependencies it introduced.

        There's a certain irony to the idea that Guile was designed to be a TCL killer, and wound up being larger and harder to deploy.

        • pera 136 days ago

          That's likely the case: at least in Arch Linux, Guile+deps is about 50MB.

          • bArray 136 days ago

            Can you elaborate on what stage0 does? I'm not sure I understand what it means by "bootstrap".

            • markjenkinswpg 136 days ago

              Stage0's binary seed is a monitor, a program that lets you twiddle data in hex into memory addresses (in hex).

              Code injected into the monitor can do the next best thing, a similar program that loads text files with hex into binaries and skips ends of lines after a comment marker starts.

              As such, the initial stuff in the stage0 project is further programs that are still in my view blobs, but instead of binary blobs they're plain text hex blobs with comments documenting the assembler equivalent and what's going on.

              From there there are iterations of having hex files (hex 1 and hex2) with increasing complexity of symbol tables so references can be made to addresses and relative jumps by symbols.

              From there stage0 project makes the leap to "stage1" and there are things like basic editors, file concatenation tools, macro based assembler and so . It all ends with a C-subset (M2-planet) compiler written in assembler.

              Work is in progress to rewrite the scheme interpreter mes in M2-planet instead of normal C. The c compiler mescc (written in scheme) can build tcc and onward to gcc.

              There's bootstraps along these lines for a fantasy machine called knight and there's x86 versions and maybe some other archs in the works.

              My stalled side project is interpreting the knight stuff in python:

              I'm excited by the idea that I could use really old GNU/Linux distro CDs that I trust with python2.3+ as a bootstrap path and eventually with some other work even use old power macs with MacOS X that included python2.3+ as another cool place to bootstrap the free world.

          • mhd 136 days ago

            | Bootstrapping in this context refers to how the distribution gets built from nothing.

            The GNU build system was huge since about forever.

            • swiley 136 days ago

              If it bothers you, try alpine. I think the minimal rootfs on their site is 3MB or so.

              • MayeulC 136 days ago

                Though the kernel itself might sit at ~60 MB (depending on compile-time options), and you might need to addd another 120MB for the firmware.

                Of course, this can be stripped down a lot if you decide to taylor the kernel to your system (less drivers, etc).

                To answer GP, a full texlive installation takes around 2GB on my system. That's mostly fonts, but also packages, tools and documentation. I take the bloat for convenience, because I can afford a few extra GBs nowadays. If you want to go for minimalism, you still can (and even reach smaller footprints than back then), but that's no longer the mainstream option. Not when a GB is a few cents.

                • djsumdog 136 days ago

                  Does Guix use the Linux kernel or Hurd? In any case, I can understand kernel growth. There are a lot of new drivers, new and old architectures (386 machines were supported in Linux until a few years back; and there's still a fork that does), all the new file systems, all the new networking and cgroup isolation. You can build a minimal kernel without a lot of that stuff, and it will still run on older embedded hardware. But just having a large kernel source and large default kernels/initrds isn't bloat so much as supporting a lot of hardware possibilities.

                  • Sean1708 136 days ago

                    From here:

                    > Our goal is to provide a practical 100% free software distribution of Linux-based and other variants of GNU, with a focus on the promotion and tight integration of GNU components, and an emphasis on programs and tools that help users exert that freedom.

                    • brirec 136 days ago

                      Not trying to sound snarky here: does _anything_ use Hurd?

                      • MayeulC 136 days ago

                        Well, there's that (former?) Debian project, an Arch variant as well, and I think there is a GUIX variant as well[1]. But they still share the same limitations: 32 bits, no USB, etc.

                        Hurd itself has interesting concepts, like translators. And it might be useful in some specific cases where robustness is needed. But you might as well try MINIX 3.

                        Hurd is on my list of things to try, just like Plan 9.

                        [1] (edit):

                        • danudey 136 days ago

                          It's "supported" by some people, but it's not ready for prime time, and never will be.

                          • brirec 136 days ago

                            I can't help but laugh at the irony that one of the most popular and most proprietary operating systems in the world uses a forked Mach kernel called XNU, for "X's Not Unix."

                            Anyone at Apple have a beef with the FSF back when they named that?

                      • MuffinFlavored 136 days ago

                        > Of course, this can be stripped down a lot if you decide to tailor the kernel to your system (less drivers, etc).

                        Is there not a tool that does this for you automatically? That'd be kind of neat! Like a webservice that takes your lspsci/lsusb output, then gives you a compiled kernel just for you back?

                        • jcelerier 136 days ago

                          the kernel itself is able to do it with `make localmodconfig`. you'll need a rebuild if you buy a new usb toy though...

                      • davexunit 136 days ago

                        There's a significant difference between bootstrap binaries and the actual distro binaries and this thread seems to be conflating the two. Out of curiosity, does anyone know how big Alpine's set of bootstrap binaries are? This article estimated Debian's to be ~550MB. It would be interesting to know which distro has the least amount of stuff taken for granted.

                        • swiley 136 days ago

                          Unless you enable module loading then you can probably just build the module.

                      • asutekku 137 days ago

                        There’s a bunch of more stuff available on it nowadays.

                        • giancarlostoro 136 days ago

                          And lots of hardware drivers, for lots of potential hardware I assume as well. I bet Gentoo could be setup to be entirely minimal by removing uneeded things from the distro, not sure if it's known for that.

                        • secraetomani 136 days ago

                          And a typical install of Windows 95 was 55 MB. Which included a bunch of images, sounds, ...

                          • oblio 136 days ago

                            > images

                            800x600 16 bit backgrounds and 16x16 16 bit icons...

                            > sounds

                            MIDI sounds...

                            We're not living in 1995, anymore.

                            • Fnoord 136 days ago

                              Weren't (the) BMPs large in size?

                              • oblio 136 days ago

                                I'm pretty sure that even at that time compressed formats were used :-)

                                JPG was released in 1992, GIF in 1987.

                            • secraetomani 136 days ago

                              What, your C compiler includes high-definition image and audio files? What times to be alive...

                              • oblio 136 days ago

                                He was comparing Win 95 to modern software.

                                Also, as far as I know even stuff like compilers (if they want to support Unicode fully) have to pull in dependencies like ICU, which because of its symbol DB weighs about 30MB.

                                That's another problem Win 95-era software didn't have to care about...

                                • AnIdiotOnTheNet 136 days ago

                                  ICU is a problem of our own making. Computers worked with other languages all the time before Unicode. You could say the same for pretty much any modern dependency.

                            • yjftsjthsd-h 136 days ago

                              But crucially, it didn't ship any compiler chain capable of building itself; I know compilers were smaller then, but apples-apples...

                              • folkrav 136 days ago

                                I'd be curious about comparing the resolution, color depth and compression levels of those images, and the bitrate/compression levels/codec of those sounds. Let's take one wallpaper. Win95 must have included a what, max 1024x768 wallpaper size? Assuming identical compression and color depths, that's still 786432 pixels total, compared to the 4K wallpapers we get today with 8294400 pixels - a larger than 10x increase. That's vastly oversimplifying, as a 10x pixel count doesn't directly translate to a 10x file size, but still, that's just for one image.

                                Pretty sure sounds in Win95 were MIDI, which aren't even actual audio files.

                                • secraetomani 136 days ago

                                  We are comparing here a full graphical OS with kernel & drivers versus a bunch of command line tools.

                                  And no, Win95 did include real audio files (if small).

                                  • nitrogen 136 days ago

                                    Don't make the mistake of assuming a GUI is automatically more complex or more advanced than command line tools.

                            • yourapostasy 137 days ago

                              Dayum...just how far do the turtles go? Even when they reach full source bootstrap, are they ruminating over concerns about the firmware/BIOS? If those concerns are addressed with an equivalent bootstrap-seeded coreboot, then are there concerns with the silicon? I never even thought someone was taking this level of security seriously enough to actually put the effort into it, but I'm extremely glad to see they are. I can easily see high-security DevOps builds of secrets management stores driven by such a bootstrapped Guix to nearly indefinitely satisfy the provenance-type questions from the regulatory compliance teams I work with.

                              • mikelevins 136 days ago

                                There are researchers considering things all the way down to silicon.

                                I worked on a contract for about a year and a half on a subproject of the research program described in this New York Times article:

                                The project I consulted on was an attempt at a comprehensive system design that replaced everything all the way down to and including the hardware. System software was written in Haskell and in a newly-designed language called Breeze that offered both static and dynamic typing. The dynamic typing was supported by tagged hardware with specialized instructions (that effort included Tom Knight, one of the original designers of Lisp Machines).

                                The organizing principle was that every piece of code and data was owned by a well-defined system entity with well-defined permissions that were proven correct both statically and dynamically. For example, sending data to an endpoint that did not have the required authorization to receive it was a type error.

                                The thinking was that the state of the art is so broken (from the point of view of data integrity and security) that it's necessary to start from scratch. The overarching project name, appropriately enough, was "Clean Slate".

                                I don't know what they're up to now.

                                • orblivion 136 days ago

                                  If you're worried enough, I think there's some trick where you use multiple different C compilers made by different organizations to build (let's say) gcc. Then you use all those gccs to rebuild gcc in a deterministic way. If all of those rebuilt gccs are byte-for-byte the same, then either it's safe, or all of the initial compilers you used were in cahoots. Depending on who you pick to start, that's arguably very unlikely.

                                  • thfuran 136 days ago

                                    >If all of those rebuilt gccs are byte-for-byte the same

                                    Isn't that pretty unlikely even if all the compilers are uncompromised?

                                    • delroth 136 days ago

                                      No, that's what reproducible builds are all about:

                                      Debian is currently sitting at around 93% of packages being reproducibly built. NixOS is close to 99% of the minimal install image set (

                                      • tjoff 136 days ago

                                        Reproducible builds don't mean compiling with different compilers, or even different versions of the same compiler, to result in the same binary.

                                        • comex 136 days ago

                                          That's why you don't compare the GCC binaries built by different compilers. Instead, you use each of those GCC binaries (which may use different machine code but should implement the exact same functionality) to build another GCC, and you compare those binaries. That should result in identical binaries as long as GCC:

                                          - is deterministic,

                                          - does not invoke undefined behavior, and

                                          - was correctly compiled by the original compilers.

                                  • z29LiTp5qUC30n 137 days ago

                                    well if you look at #bootstrappable's logs it looks like they are planning on building custom hardware out of TTL chips to eliminate all software/bios/firmware from the bootstrap; which when combined with libresilicon means no place for any attacks to hide.

                                    • everlastingfan 137 days ago

                                      Well there is when you design signals that play on the physical properties of the silicon and the silicon design is known, think Rowhammer. The next level of this is making permanent modifications to the silicon, e.g. using focused RF signals to change the properties of individual transistors.

                                      Relying on security through obscurity is bad, but you need some obscurity.

                                      Passwords and keyfiles are ultimately a form of security through obscurity.

                                      • kragen 136 days ago

                                        “Planning on” is an exaggeration. It would be good if someone did that, but those 120MB are still five orders of magnitude bigger than the MES seed, and that's a much bigger problem right now.

                                        • nwah1 137 days ago

                                          Unless Dennis Ritchie or Ken Thompson placed a "trusting trust" attack in the original C compiler, since virtually all modern code was compiled by something that was once compiled by it.

                                          (Ken Thompson coined the term and knows how to do it.)

                                          • pdw 137 days ago

                                            To counter that, they're working on a bootstrap chain that starts with a tiny 500 byte hex editor and ends with a compiler that can build the GNU toolchain.

                                            • newnewpdro 137 days ago

                                              That's the whole point of these projects - they're preventing that type of attack.

                                              If you implement your bootstrap C compiler entirely in machine code for instance, it doesn't matter if the "original C compiler" is compromised, since it's not being compiled.

                                            • jtl999 136 days ago

                                              >building custom hardware out of TTL chips to eliminate all software/bios/firmware from the bootstrap

                                              I'd be curious to learn more about this.

                                            • everlastingfan 137 days ago

                                              It's turtles all the way down.

                                              I'm ruminating on the potential for "regular" electric signals to carry quantum information, that's the next aspect of security to consider with quantum around the corner.

                                              So don't worry, there are and will be bigger things to worry about.

                                              • chungy 136 days ago

                                                Malicious firmware concerns might be alleviated with cross-compilation. There might be an assumption that firmware inserting untrusted code only actually inserts trusted code of its native architecture, though even this is no guarantee.

                                                • secraetomani 137 days ago

                                                  > secrets management stores

                                                  Maybe starting from something like an Arduino is a better route for that.

                                                  • everlastingfan 137 days ago

                                                    Is Arduino firmware open? Are the PCB and chip designs open?

                                                    • kragen 136 days ago

                                                      Yes, yes, and no. Also the compilation toolchain depends on GCC. Open silicon designs aren't enough; you need to be able to verify that the design correctly describes the physical hardware you are actually using.

                                                      • secraetomani 136 days ago

                                                        Even if not, way less stuff hiding in there than in your average computer. Remember that these days, Intel CPUs have web-servers embedded in them.

                                                  • gglitch 136 days ago

                                                    Neat - MES Scheme is apparently named after Alan Kay's description of Lisp as the Maxwell's equations of software.


                                                  • archi42 136 days ago

                                                    So the only trust anchor remaining are the kernel and the hardware. It seems an attacker has to build a kernel module that detects the bootstrapping process and injects the (self-replicating!) bad code while building the final gcc.

                                                    I like the work, but I still don't think the kind of attack mitigated here is practical. OTOH it's nice to have the option (if I was to build/publish my own distribution I would use this as my trust anchor, plus some ancient hardware and Linux 2.4 CDs to build my own bootstrap environment; though as a random guy on the internet I am probably less trustable than e.g. the Debian people).

                                                    • antoineMoPa 136 days ago

                                                      Anyone using Guix in production? (Anyone using Guix?)

                                                      • uncletaco 136 days ago

                                                        Not in "production" but I use it on my personal laptop and desktop. I wanted to learn a scheme and I figured what better way than to dedicate all of my home computing time to configuring a system and crying myself to sleep trying to build firefox on it.

                                                        • robto 136 days ago

                                                          If you get firefox working, please share your definition on the nonguix[0] repo. Lack of Firefox is what is currently holding me back from installing Guix on hardware. I don't know anything about package building, but I'd be happy to help collaborate if you need it.


                                                          • uncletaco 136 days ago

                                                            So two things:

                                                            1. I believe icecat is going to update pretty soon to version 68 to track with the latest esr, so perhaps in a week or so please check the main channel for an icecat version that should work with most, if not all, up to date extensions.

                                                            2. If I do get Firefox built and packaged then I’ll be more than happy to add it to the nonguix channel. Though I’m strongly considering just creating an unofficial icecat that keeps up with the latest version of Firefox like icecat on windows does. We’ll see what happens after these tears dry.

                                                            Icecat is really just a set of scripts to strip out branding from Firefox and packages it with gnu shit, though the gnuzilla project provides the esr version for convenience.

                                                            • dannyobrien 136 days ago

                                                              I use x11docker to run a recent firefox on Guix for now.

                                                          • garmaine 136 days ago

                                                            Bitcoin. All bitcoin binary releases going forward are compiled with Guix for reproducibility. (The Bitcoin Core team previously used an in-house guix-like thing that they developed themselves. They switched to guix this year.)

                                                          • rendaw 136 days ago

                                                            I'm using it on my fileserver, swimmingly. I build the image on my desktop and then swap the boot drive and restart. It replaced some Arch linux hack scripts I was using to do the same, and with much less external tooling.

                                                            I wrote up a general guide here along the way:

                                                          • xvilka 137 days ago

                                                            I wonder if they target RISC-V platform too. Or OpenPOWER (the case of Raptor Engineering).

                                                            • chungy 136 days ago

                                                              Seems to be only i686 and x86_64. It's a pretty small project all things considered, but I'm sure a porting effort would be appreciated.

                                                              • tremon 136 days ago

                                                                from the article (default blurb added to the end of every blog post):

                                                                it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

                                                                • Fnoord 136 days ago


                                                                  > The Guix development branch we just merged introduces a reduced binary seed bootstrap for x86_64 and i686

                                                              • equalunique 135 days ago

                                                                The first step would be getting Mes[0] built for RISC-V / OpenPOWER. Want to give it a shot and let Guix devs know how it went?


                                                                • kragen 136 days ago

                                                                  It isn't necessary for the original bootstrap seed to run on many platforms. Just at least one trustworthy one. From there we can cross-compile everything else.

                                                                  • xvilka 135 days ago

                                                                    And x86 is the least trustworthy one. With all these hidden modes of execution from Intel and AMD...

                                                                • atian 137 days ago

                                                                  Yeah startups are gonna need another way of financing fast if seed amounts keep going down. If anything the startup age is over on the west coast.

                                                                  • moomin 137 days ago

                                                                    When you only read the headline.

                                                                  • m4r35n357 137 days ago

                                                                    Fascinating stuff!