I like simple archives, but can it be not tarballs? For the kinds of application described in this article, tarballs are pretty bad:
Either you extract it from scratch every time you run an app, taking a long time penalty...
... or you extract once to cache, and assume that nothing
changes the cache. This is pretty bad from both operational and security perspective:
- backups have to walk through tens of thousands of files, thus becoming much slower
- a damaged disk or a malicious actor can change one file in the cache, making damage which is very hard to detect.
There are plenty of mountable container formats -- ISO, squashfs, even zip files -- which all provide much faster initial access, and much better security/reliability guarantees, especially with things like dm-verity.
Yes, most tarballs do not support random access (there are some metadata extensions that allow this). This makes large tarballs annoying to use on systems with slow disk I/O (even a hard disk may be too slow (to the degree of being annoying to work with)). This is by far my biggest gripe with the format. Certainly, smaller tarballs are a very handy format as long as you stay inside the Unixy world of computing – and as long as you keep looking out for the various incompatibilities between the different tar implementations.
"... there are some metadata extensions that allow this)."
Where to find these extensions? Are they portable between Linux and BSD?
The 1998 dict project included a utility called "dictzip" for random access to the contents of gzip compressed files.
Dumb question: Is it possible to create a utility or even a hack that performs "random access" into tar archives?
Example use case: the user only wants to untar a small number of selected files from a large tarball such as a source tree.
The user has tried both the "-T filelist" option and using memory file systems instead of hard disk drives.
A zip file is a concatenation of gzipped files. A .tar.gz is a gzip stream of concatenated files. Anything that could do random access into the contents of a zip file entry could do similar things with a tarball.
With a transparent random access overlay, the difference mostly disappears, reducing to whether the stream needs to be scanned or whether it's indexed, which is itself orthogonal - zip file directory at the end is redundant.
So you mean at each "random access", you actually have to scan the whole .tar.gz file to find the location? For large tarballs, that will definitely hinder performance a lot. The difference does not disappear at all.
AFAIK a compressor like zip builds a dynamic running table of frequent byte sequences; the resulting archive is written in such a way that when you decompress it, you re-build the table in the process.
So if you concatenate files A, B, and C and then compress the result, then by the time the compressor starts compressing the data of C, it will have that table built from A and B. To extract C, you'll need to re-build the same table and thus you'll need first to decompress A and B.
In a zip file each entry is compressed individually; this gives random access, but worse compression rate, because the table is not re-used between files.
Tape drives don't really support random access, though, which is reflected in the design of the tar format and its offspring. That is, in fact, the problem here, and why formats designed for random access instead of sequential access are far better for storing file systems for containers and VMs.
I'm pretty sure the article implies this is for user-facing applications where the user would manually extract it once to a place of their choosing then run it from there. I think you're missing the point of the whole article.
But why would you want to extract if you can mount the file directly? For simple archives, extracting is fine. But for larger archives (like a compiler -- 1000 files or more), loop-mounting is much better than extracting:
- Does not slow down your backup by adding thousands of files
- No need to wait for initial file extraction
- You can quickly and easily verify integrity of the whole archive
And if you are using fuse, it does not require any special privileges either!
Do you not still pay a significant performance penalty by reverifying the container upon each application load? Especially considering that, if the container is signed, you need to verify the signature itself before trusting the container, and full signature verification - including checking whether the signature has been revoked - involves expensive network calls?
If your operational and security model really frowns on trusting your extraction cache, then perhaps a different workflow is more appropriate - download the container, verify the container, extract, bake the OS plus extracted apps into an image, sign the image, verify the image upon each boot and mount apps read-only. Then you don't need to re-verify anything upon each launch, instead trusting that your image creation process is routinely updating and re-verifying the software in your current images.
Verification of a single file is much faster than walking entire tree, especially when there are lots of small files, for example when there is a compiler or large python project inside.
A simple example: my /usr/include is 33037 files, 356M uncompressed. On SSD with cold cache, it takes 6.7 sec to read each file individually, or 0.7 sec to checksum a single 356M archive, a 10x difference.
The difference in the backup time is even more dramatic -- the backup program has to call stat() either 33K times, or just once, a 3,330,000% improvement! The other filesystem tools (What takes all the space? What has changed in the last X hours? Please sync this directory elsewhere.) will have similarly high speed improvements.
So if I had a choice, I would love my dev environment to come in mountable form. Similarly, I don't understand why container runtimes (like docker) don't use loop mounts more -- it seems like many advantages and very few disadvantages.
As for signature verification -- I don't care about 3rd party signature and revocation, I just want to ensure that I am running the same code every time. There are many ways one can damage extraction cache, especially if it is owned by the same user as application (like the topicstarter post described) -- sysadmin errors (`sudo find / -name app-old -delete`), application errors (create cache file in bin dir), disk errors (silent corruption), transfer errors (one file did not get transferred to a new computer). Loop mounting makes disk errors easier to detect, and eliminates other classes of error entirely.
Interesting I didn't know this existed. Is there a way to layer sqlar like docker images? (Besides just tarring them up I guess.)
I wonder if this could be implemented with the WAL/journal system. Make each layer immutably append to the previous layers to make restarting at any layer trivial. I'm not sure if there's such a way to hook into the journal directly like that though.
I really love the work the guix folk are doing. I'd love to run guixsd on my laptop if it was easy and supported to run plain upstream linux instead of linux-libre. It just seems like such a lovely easy to use project from the little time I've spent playing with it, it's actually a small shame they're part of the "unsexy" GNU project and subject to GNU politics.
That's due to regulation, combined with hardware manufacturers correctly choosing to load the firmware by the driver/host, instead of some on-board permanent storage.
Note that there are reasonably performance 802.11n cards with non-reverse-engineered open source firmware. They iirc use the ath9k driver, and are the result of the manufacturer opening them up to both Linux and BSD kernel license compatible status. They are great for hacking and there are some with 5GHz support. You have to keep in mind that hacking the firmware might violate the RF spectrum laws, which is relevant as much as GDPR compliance: if they can excert legal pressure on you, and do more than send angry letters and call you in the middle of the night, you have to consider if those
jurisdiction's laws forbid your doings.
TLDR: they exist, they are not expensive, they can't do 802.11ac or 802.11ad, hacking the kernel-license compatible source might violate FCC or similar regulations and could well be punished harsh in case someone complains about what you do and you'r behavior is provably non-spec-conformant.
Be careful, and choose your hardware wisely to not use binary blobs. Also I assume you use an old CPU, if you wanna go the linux-libre route.
I have a system where I'm not sure yet which OS it will get, but I already (with help, and soldering) removed the Intel ME from the firmware, and might even physically remove the processor that would have executed this, or do this soft and just cut it from power or something.
I would really like to hear from more people who've used NixOS in anger. We used the Nix package manager (for packinging our application and managing dependencies) in our organization for a while, and it seemed to create a lot of pain, so I'm wondering if we were using it poorly or if the Nix ecosystem just needs to mature.
Its refusal to package firmware binaries, for one, even if that firmware is required to have a useful machine. I'm looking at AMD specifically here, where recent graphics cards (including APU's) don't even do text-mode without the firmware.
(edit: I understand the why of it, and even agree on principle, but it still prevents me from running linux-libre on most of my systems)
While Linux-libre is the default for Guix there are no limitations in place that would keep you from using vanilla Linux. In fact, Guix makes it extremely easy to build custom packages, and that includes custom kernel packages.
You can augment the package collection that comes with Guix with a simple environment variable, so the insistence on software libre on the side of the project should not represent a technical hurdle.
Next time you build or choose a system, consider one that can run free software.
I did, and it makes most things quite a bit easier.
Edit: I did after struggling with hw requiring nonfree blobs of different shapes and size for a couple of years. Currently I was lucky to get my hands on a system that I can run using linux-libre and the only component I have "extra" is a usb wifi card.
> Next time you build or choose a system, consider one that can run free software.
The only workstation that boots with entirely free software is like, the Talos II PowerPC, with a minimum cost of $5000.
Everyone else requires a binary blob somewhere. Either a UEFI blob, BIOS blob, some kind of driver somewhere, or whatnot. Raspberry Pi, AMD, Intel, everybody.
And before the Talos II, I don't think an "Open PC" devoid of proprietary binary blobs even existed. At least, something that is reasonably modern (ie: 64-bit, decent security, decent support with modern OSes)
What about pre-ME thinkpads, after replacing the wifi card with an ath9k/open source firmware one? Does the intel chipset graphic require a blob for simple framebuffer/textmode operation? Because I can't remember including any blobs in the libreboot I use there, and iirc I get output before a linux kernel is able to load device firmware.
It is 64bit, and runs pretty much anything from (from what I can tell, but not sure, due to CHMPXCHG16B) Windows 10, over FreeBSD to Android. Probably even something like QNX.
Yes, you might not call this reasonably modern, but according to the hard facts you listed as qualifiers for being reasonably modern, they tick off.
I don't remember whether the video BIOS was extracted from the old binary or if it is the open-source replacement, but I'd tend towards the latter as I don't remember searching for the backup/dump of the original firmware.
And yes, it's running coreboot, and at least CLI/linux-framebuffer arch linux works. I didn't yet get to setting the rest of the system up, but considering I bought it specifically for high-security operation, as the ME can be physically removed without loosing more than the build-in Ethernet port, I'm not pressed to do it anytime soon.
Edit: I'm pretty sure I followed , which leads me to the new conclusion that I did use libreboot, a more strict version of coreboot (think coreboot=Archlinux, libreboot=GNU Guix), and had to fiddle with the question whether the open-source video bios would work. This confuses me a little, as I remembered buying an X61s, not an X60s, but from the fact that it booted after flashing, I deduce it had to be an X60.
GNU stands for a philosophy of freedom, thus guixsd won't provide official repositories for installing proprietary software, some users don't like it, even though they might be interested in the technological approach of the system.
GNU utilities, are not only unsexy, they are bloated and messy, and prone to failure; the GNU implementations (coreutils: grep, cat, tail, etc) of standard UNIX tools are not done with simplicity in mind.
But hey, after all GNU is Not Unix. For those of us, who really appreciate the UNIX philosophy still have OpenBSD, which is the only light in a world of chaos, in my opinion.
> GNU utilities, are not only unsexy, they are bloated and messy, and prone to failure; the GNU implementations (coreutils: grep, cat, tail, etc) of standard UNIX tools are not done with simplicity in mind.
I've heard people say how GNU code is bloated and messy many times before, but never that they're prone to failure. I've never had any failure myself with any GNU code. Can you give some examples of failures you've experienced?
Also, I'm looking at the coreutils source right now, and it's not as messy as I was expecting. true.c is only a pageful with 80 lines, many of which are simply because of the license comment and the usage() function for --help. cat.c and tail.c also seem reasonably understandable. Biggest complaint I can make is that there's cases where spaces and tabs are mixed in the indentation, but I've long resigned myself to expect that in projects that have more than 1 major contributor.
I do, however, think that glibc and gcc are pretty messy. I tried looking for the definition of fopen() in openbsd's libc and found it in less than 30 seconds by grepping. I still haven't found glibc's. gcc seems to rely heavily on its own extensions, because I don't understand what's going on here:
That looks like a function prototype in a function definition, but it seems to mean an assignment going by the next line. Then in toplev.c, we have:
toplev::main (int argc, char **argv)
That looks like C++, but the file extension is ".c"...
You know what? Nevermind. Comparing the code for true.c and cat.c between glibc and openbsd's libc, I do rather like how clear openbsd is in its code. Damn. Sexy is a good word. Now I understand why people speak so well of it. I don't even need grep, the source file hierarchy is so clear. Looking back at GNU's true.c, I don't even understand half of what's going on there in those 80 lines, and it turns out that true.c is also the source for false.c, it just #include "true.c".
TL;DR I agree that GNU utilities are messy. I'm not sure of the bloated aspect, because I do like that utilities have internationalized documentation built-in, but that seems to be bloat by openbsd's standards. And I wouldn't know of them being prone to failure, because I never had one with them.
EDIT: Huh. I wanted to reply to Hello71, but there's no reply link under his post. Anyone know why? Anyway, yeah, I saw a comment in the file mentioning that over a line that referred to stdout. Can't check now because I'm away from the computer. I didn't really understand the reason though.
It is c++. The file is .c but whatever. They use a lot of c++.
I agree with you however. Having worked with the code gnu relies a lot on macros & a lot of auto generated code. The code is a big mess, imposible to tackle if you dont spend a huge amount of time on it.
A lot of symbols are generated through #defines and pastting (X macros) so you cant grep shit for one.
> I've heard people say how GNU code is bloated and messy many times before, but never that they're prone to failure.
Just have a look at changelog for coreutils . Sure it's very long, especially if you're not following its releases, sure it's full of weird edge cases that you might've never encountered (I'm certainly way too lazy to go as far as to look for those rare bugs that I stumbled upon years and years ago but there definitely were some), but this, IMO, is a great illustration of how GNU (or, rather, GNU coreutils) code is "prone to failure"—mainly because it sometimes tries to do way too much.
speaking of true --help, did you know that GNU true can exit non-zero? the exact way is left as an exercise to the reader :)
(if you're actually trying it at home, remember that "true" is virtually always a builtin. AFAIK there is no legitimate way to have shell builtin true return non-zero. (overwriting the command doesn't count :P))
It's because he's a dick, and not in the good way.
It's not about his opinions, it's about his ineffective and misguided leadership. Why is GNU still fighting the same battles from thirty years ago when new ones have emerged that they're not even paying attention to?
GNU is becoming the PETA of software, and it's not a good look.
> GNU is becoming the PETA of software, and it's not a good look.
As a GNU hacker (and co-maintainer of GNU Guix) statements like this make me sad. It is very unfortunate that Richard Stallman's personality is casting a shadow on the GNU project, which was started by him but is really a loose connection of projects that share ideas that were outlined in the GNU Manifesto.
I see GNU Guix in the tradition of other GNU software like Emacs or the Hurd that aim to give users more power and to remove arbitrary limitations. Emacs is probably the epitome of a hackable system that lets the user shape the software according to their own needs to an extend that is extreme and rarely found in any other system.
The Hurd aims to allow regular users to do things that in traditional Unices requires super-user privileges. It aims to remove arbitrary obstacles to free users from the unhealthy power dynamics of the user/admin division.
Guix gives users powerful tools to manage their software environments without having to beg admins, and to easily package software variants without having to depend on professional distributors. At the same time no user can harm another user on shared systems. Guix gives users the ability to take advantage of software freedom, by making it really easy to hack on software in a user-controlled reliable system.
When seen from this perspective, the GNU system that individual software projects are contributing to is a collection of tools that liberate users from helplessness due to unnecessary restrictions. This common goal defines the modern GNU project these days, and I think it is very unfortunate to overlook this because of Richard Stallman and his quirks, his sometimes dictatorial style, or his harmful attitudes towards important social aspects of free software.
I appreciate Richard's past work immensely, but I do not consider him representative of the GNU project that I work on, nor do I think his leadership style is benefiting the project.
Give GNU a chance based on the project's merits and its goals. Long live Free Software --- copyleft and non-copyleft alike!
I've got some simple advice: Get rid of RMS. Get rid of him now.
The longer he's the figurehead of GNU, the longer he has any say in your projects, the longer he'll poison the well. This "joke" fiasco touched off a firestorm of commentary from people that are quite clear that he's been highly problematic for decades now.
You don't want someone toxic running GNU. Microsoft managed to shed their sweaty gorilla and look what's happened to them. They're not fully redeemed, but they stopped fighting and destroying.
Just as the early FSF cared not for tradition, for history, for the investment of time and energy on the part of others, they should not care today if they want to be a radical force for change. Keep that spirit. Tear down anything worth destroying because it gets in the way of what's right.
The important question, the only question, for an organization that promotes actual change is what can he do to improve things tomorrow.
Sadly we've lost Aaron Swartz, but that's the caliber of person you need today. Fearless, energetic, passionate, and fighting the right fights from the front lines. Aaron will be missed, but the FSF and GNU should be looking for, encouraging, motivating the next Aarons no matter what their background is.
They're fighting the same battle, because it's still on and they haven't won.
I'm still wishing for a world where all electronics hardware and software is open source. Can't really visualize an industry like that be economically functional, but I hope someone does. My hope is with GNU.
Imagine if we were still fighting battles from the 19th century, that Prussia was still exchanging musket fire with France.
That's what GNU is doing today with their stubborn fights about licensing when there's far bigger problems emerging.
How about a right to privacy? How about a right to timely patches for their Linux-based phones? How about a right to repair hardware running GPL software? How about a right to know if your device has security faults?
I can make software that mines the personal emails of dissidents, runs facial recognition on hacked webcams, and ruins lives, and that's all fine as far as GNU's concerned so long as I give out the source code to anyone who asks.
So your position is that GNU should both get rid of Richard Stallman and start addressing this stuff. Clearly, you are not basing this upon Richard Stallman addressing these very things for quite a few years now via the GNU WWW site.
I know that RMS is not GNU, but the man is a raging egomaniac - and the way he talks takes credit for basically everything he's come into contact with. Unless I'm underestimating the bounds of possibility for one person's contributions, he uses 'I' in a lot of places it would be fair to say 'we'.
(Note, I came to this conclusion after reading about a bunch of his technical accomplishments, which I can see are awesome, even if the obvious megalomania evidently occasionally dampens their effects.
I think his work is fantastic, his politics are largely reasonable - but I think his self-obsession is often the driver behind a large amount of damaging and counterproductive behaviour.
Politics is the art of compromise - not convincing everybody you're a saint while alienating your natural allies.)
I'm not sure. Isn't Android proof of a sort that Linux is still worth something without GNU?
I wouldn't have any problem with the 'GNU/Linux' idea if it wasn't so obviously part of a greater pattern - when he talks about it, he talks about GNU being the primary contributor - but he typically uses the singular, even when the plural would refer to GNU, and the singular refers to himself.
I also think the world is better for the FSF, but I cant help but wonder, what would the world be like if the FSF was headed by somebody who felt it more natural to think in terms of 'we', as opposed to 'I'? Even somebody not nearly as technically accomplished, charismatic, and intelligent? I think ultimately, it's the ideas, of knowledge as the common wealth of humankind, rather than the curious personality of RMS, that gave the GNU project its power - and ultimately, it's the limitations of RMS that hold it back.
While they were harassing Linksys about GPL the whole IoT thing happened and now we're living in a world full of trashy Linux-based devices that are a hazard to society. Sure, you can get the source code to your internet-based webcam, but because it can't be easily patched, it can also be hijacked by a couple of high-school kids in Alaska so they can sabotage their Minecraft server hosting competitors.
So good job.
As long as RMS is such a prominent figure the GNU/FSF organization there's no separation.
And if the software weren't locked down, anyone (users, communities, other vendors) could step in to provide such updates. That's not some hypothetical, either— compare the rates of OS updates in projects like LineageOS to to the distributions of Android shipped with most phones. If vendors couldn't TiVo-ize, there would absolutely be communities and downstream vendors stepping in to provide devices with regular updates. Because the devices are locked down, that can't happen.
And what do you expect the FSF to do? Out-lobby consumer electronics manufacturers to pass laws requiring some kind of security update guarantee? Even if they succeeded, could we call the result empowerment? Getting out from under the thumb of the manufacturer and actually _owning_ the things you own is the point, not the theoretical promise of recourse if the party which practically retains all of their power over you can be proven in court to have misbehaved, only after the abuse has taken place.
This is absolutely the same fight, and if anything the approach you're arguing for is more conciliatory, not more ‘relevant’.
Theoretically being able to update your device and actually being able to update your device are two different things.
There's going to be a billion variants on every little IoT device in the future and all the best intentions and enthusiasm on the part of the free software community will not be enough to provide patches to all of them.
This is something that's the responsibility of the vendor, and the GNU software license could make that a requirement for using the software.
It's not about laws, it's about licensing. If they don't like the license they're free to use someone else's software.
Having inexpensive operating system software you can dump on a cheap device without license fees is both a great thing, and also what got us into this IoT hot mess.
That's because you erroneously think that the conflict was about abortion. It was actually about whether user reference manuals should properly contain jokes about such highly politically charged topics.
It wouldn't have taken much more time for you to back your point with examples so we'd have some idea of what you're talking about. Please also explain the ideology of how proprietary software is not worth fighting with a practical implementation and ethical discussion.
Most of the time when people object to GNU or rms they fail to convey that they understand what software freedom is or how continually relevant software freedom is today. I'd bet that the majority of threads on these (overwhelmingly corporate) repeater sites are easily handled by stressing how important a user's software freedom is. Every DRM, proprietary software (Windows ignores user settings, this new device from $VENDOR spies on its users, etc.) is easily dismissed by getting into the same discussion about how software freedom would allow the user to alter the software, protect their privacy, treat their friends and neighbors better by sharing improved versions of the software, inspect and modify the software (or have someone they trust do it for them), and run the programs when they want (instead of losing access when a proprietor feels like ending "support"). Snowden readily credits free software for his success in leaking sensitive NSA documents to us all (docs which still make media stories years later). Three cheers for software freedom, rms, and Snowden!
Posts like the parent post tell me sites like these are the thing losing relevance by showing how ineffective public moderation is and how unacceptable it is to dare to say something not echoed in corporate tech media.
"Gnu's Not Unix": A recursive acronym used as a pun about an operating system from the 1970s, existing solely as a reflection of an aging neckbearded hippie hacker's personal philosophy about software, that is pronounced "GUH-NEW".
I don't think it's only his philosophy. In fact, before, I would have thought that personal philosophy to be common sense, but it then turns out it isn't. It still bewilders me how it's the status quo that when you buy an expensive piece of electronics, it's never really yours to use as you please. It's more like the companies are lending it to you for a one-time payment. They keep full control. If they want to remove features or brick the product you bought from them or place arbitrary restrictions on features that require no work from them and then charge extra for lifting the restrictions, it's totally ok. How does that make sense? Yet it's the dystopia the industry has been turning into day by day, and it's all made possible because of closed source software.
 - One example of this could be Amazon's ridiculous rental of digital books, since it can only work by downloading the file to your device and then charging you more for it to prevent your device from deleting it. Another example is YouTube Red, to be able to download videos the app already downloads for free anyway to be able to stream, and also so that it won't pause videos when you move the android app to the background.
I love that they took the NixOS idea and converted it from brackets to S-expressions, but I do wish that they’d used Common Lisp instead of Scheme. Had they gone with the former, I think that we’d be one step closer to computing’s ultimate goal of a Lisp machine on every desk …
That article made me warm up to guix and its practical side. Are guix app bundles just bare tar archives with /usr/local prefix semantics or do they need special metadata files? How are compiled binaries with hardcoded and/or autoconf'd prefixes handled for relocation (I guess using Linux namespaces somehow)?
In Guix every package ends up in its own directory, which may have references to other packages in /gnu/store. An application bundle is really just a package closure, i.e. the directory for the package and all directories it references, recursively. One way to bundle up things is with `tar` (the default of `guix pack`), but Guix also supports other bundling targets, such as Docker. No special metadata files are required.
Relocation currently requires a little C wrapper, which uses Linux namespaces, as the blog post indicates.
If you want something more advanced, such as a bundle that includes an init and services, it's best to use `guix system`, which builds VM images among others.
The packages that Exodus produces are actually quite similar to those introduced in this announcement. Both tools generate simple tarballs that can be extracted anywhere to relocate programs along with their dependencies, and both tools bootstrap the program execution using small statically compiled launchers written in C. They contrast guix pack against Snap, Flatpak, and Docker, but Exodus would probably make a more apt comparison in many ways.
This is remarkably off-beat for the GNU project. Tar files are far from the most ideal tool for container images because they are sequential archives and thus extraction cannot be done using any parallelism (without adding an index and being in a seekable medium, see the rest of this comment). I should really write a blog post about this.
Another problem is that there is no way to just get the latest entry in a multi-layered image without scanning every layer sequentially (this can be made faster with a top-level index but I don't think anyone has implemented this yet -- I am working on it for umoci but nobody else will probably use it even if I implement it). This means you have to extract all of the archives.
Yet another problem is that if you have a layer which just includes a metadata change (like the mode of a file), then you have to include a full copy of the file into the archive (same goes for a single bit change in the file contents -- even if the file is 10GB in size). This balloons up the archive size needlessly due to restrictions in the tar format (no way of representing a metadata entry in a standard-complying way), and increases the effect of the previous problem I mentioned.
And all of the above ignores the fact that tar archives are not actually standardised (you have at least 3 "extension" formats -- GNU, PAX, and libarchive), and different implementations produce vastly different archive outputs and structures (causing problems with making them content-addressable). To be fair, this is a fairly solved problem at this point (though sparse archives are sort of unsolved) but it requires storing the metadata of the archive structure in addition to the archive.
Despite all of this Docker and OCI (and AppC) all use tar archives, so this isn't really a revolutionary blog post (it's sort of what everyone does, but nobody is really happy about it). In the OCI we are working on switching to a format that solves the above problems by having a history for each file (so the layering is implemented in the archiving layer rather than on top) and having an index where we store all of the files in the content-addressable storage layer. I believe we also will implement content-based-chunking for deduplication to allow us to handle minor changes in files without blowing up image sizes. These are things you cannot do in tar archives and are fundamentally limited.
I appreciate that tar is a very good tool (and we shouldn't reinvent good tools), but not wanting to improve the state-of-the-art over literal tape archives seems a bit too nostalgic to me. Especially when there are clear problems with the current format, with obvious ways of improving them.
As far as I can tell the only thing ZOO has over tar archives is having a history of each file (using the VMS concepts of file versions) -- meaning that it probably still has some of the problems I outlined above. While that is useful, it is still not as good as it could be. Also, you don't really want file versions with container images, you want to have conceptual "layers" (which would be sort of like having versioned files but it's more like snapshot IDs -- or like ZFS's birth-times).
One needs to give it more than a superficial glance. ZOO was designed to be randomly accessible, with the directory headers forming a linked list. It actually has an uncompressed index and can take advantage of seekable files. It also supports both long and short filenames; CRCs of the metadata structures (c.f. the recent kerfuffle about xz); and an extensible, versioned, header mechanism that not only could be extended but actually already once was extended to add the long filename support amongst other things.
Is there an actual paper or some high-level summary of the format -- not to mention a modern implementation? The only summary I could find was the one on Wikipedia. I also found the source code of "unzoo" but it's a bit difficult to understand the benefits of a file format if I first have to understand its implementation.
I didn't take a superficial glance out of laziness, it's because I couldn't find any more information about it. But I think you also missed that I mentioned that the style of versioning implemented in ZOO (as far as I can tell based on a Wikipedia page) is not the correct style for snapshot-like versioning.
You're right that general-purpose filesystems have solved quite a few of the indexing problems already, unfortunately there are a few things stopping general filesystems on a loopback device from being practical (or safe, or the best idea):
* The container (file) for the filesystem must necessarily be larger than the metadata+data for the filesystem because filesystems really don't like almost-full disks. And unless I'm mistaken sparse files are not usable for loopback devices (so you can't hack your way out of it).
* Most filesystems don't have a snapshot-style history so you would have to pick a specific filesystem from that list (otherwise you'd be forced to make CoW duplicates of the filesystem to create snapshots -- which is interestingly how Docker does layered storage with devicemapper) which has slightly similar problems to layered tar archives.
* The kernel's filesystem parsers are not really considered to be safe against an adversary, from what I've been told by filesystem engineers. So mounting random loopback files with filesystems on them might end badly.
* There is no way of looking at the archive using a userspace tool (without mounting), unless you re-implement the kernel parser for the filesystem. To be fair, this is true for any format, but filesystems are far more complicated and harder-to-parse than most other formats.
* Having a single blob as your entire image history and so on will mean that you can no longer have content-addressable storage for your images without adding something like content-defined chunking on top (which is then another layer of storage on top of your underlying storage).
* Using a Linux filesystem would mean you couldn't use the filesystem on different operating systems very easily. Even if it was compatible on whatever other filesystem you are using, userspace has no way of being sure there isn't a bug in either side's parser -- and what happens if one side changes the on-disk format. If the protocol is in userspace then it can be handled there.
* Most filesystems don't let you remap users, so if you wanted to run a container in a user namespace you would need to either rewrite the filesystem structure or mount the filesystem and copy it to another filesystem. To be fair, tar archives require you to do the mapping on extraction which is a similar problem, but far less complicated.
* Everyone would be opinionated about what filesystem to use, which means that you'd have to deal with every filesystem people throw at you, making it harder to be interoperable and adding choices where they aren't necessary. It should be up to the user what filesystem they use for storage, not the image distributor.
Now, this hasn't stopped people from trying to use this. Singularity's internal format is a loopback file with a filesystem inside, and they have privileged suid binaries that mount it. And it does have genuine performance benefits, and if you don't want things like content-addressability then it can work for some usecases.
I realize the title is just a hook for the (very cool!) work in the article, but a couple things that tarballs don't/can't specify that Docker containers can:
- environment variables like locales. If your software expects to run with English sorting rules and UTF-8 character decoding, it shouldn't run with ASCII-value sorting and reject input bytes over 127.
- Entrypoints. If your application expects all commands to run within a wrapper, you can't enforce that from a tarball.
You can make conventions for both of these like "if /etc/default/locales exists, parse it for environment variables" and "if /entrypoint is executable, prepend it to all command lines", but then you have a convention on top of tarballs. (Which, to be fair, might be easier than OCI—I have no particular love for the OCI format—but the problem is harder than just "here are a bunch of files.")
It's not necessarily a good thing for the container to be able to specify locale. Locale should be picked up from the surrounding system; it's just that unfortunately the surrounding system is usually not configured correctly.
And entrypoints/wrappers are definitely possible from a tarball. Just wrap the executables in bin/, replacing them with shell script (or whatever) wrappers pointing to the real executables. That's what Nix/Guix do for languages like Python which require dependencies to be provided by environment variables (as they don't have a way to "close over" the locations of their dependencies).
A tar is a linked list of file paths and contents, it cannot be indexed to a particular file. A compressed tar has to first be decompressed and then the chain of links traversed. Accessing a file in compressed tar is o(n) with where the file is placed within the compressed tar stream.
It isn't that it is possible, it is that is horribly inefficient.
Zips on other hand unify storage and compression such that one has random access to particular file, hence most modern file formats are zips with xml or json inside.
Does anyone know how this would apply, for example, to sharing a Guile 2.2 application with Debian/Red Hat based distributions? I want to use Guile 2.2 for development, but I am worried because it was only recently was released for major distros (at least with Ubuntu I know it was released with 18.04) and it doesn't seem to support the creation of executables.
See this older discussion on statically linking guile , one should be able to bake your source into a C program that statically links Guile 2.2 to create a self contained executable. If that is too cumbersome, I would use a container.
Or one that can list/extract files without reading the entire archive, or one that can use binary diffs, or one that supports encryption, or one that supports long file names, or one that isn't hamstrung by different implementations of different standards on different platforms, or one that doesn't use 512 byte blocks, or one that is actually usable on modern operating systems, ....
> This program (named "sqlar") operates much like "zip", except that the compressed archive it builds is stored in an SQLite database
> The motivation for this is to see how much larger an SQLite database file is compared to a ZIP archive containing the same content. The answer depends on the filenames, but 2% seems to be a reasonable guess. In other words, storing files as compressed blobs in an SQLite database file results in a file that is only about 2% larger than storing those same files in a ZIP archive using the same compression.
Uh.... Yeah, I don't need a complicated, incompatible version of Zip that is 2% larger. I'll just use Zip.
Sure. `guix pack` is a neat hack and it isn't tied to any particular archive format.
When using plain Guix you won't need to use any archive format at all; packages simply end up each in their own unique directory and can be used just like that. You can easily spawn a container environment where only the relevant directories under `/gnu/store` are mounted.
It's on my list to add more target formats for `guix pack`, but generally I'd recommend using Guix directly to reap all benefits. `guix pack` is only really useful for cases where you cannot use Guix on the target system.
Are you complaining about the complexity of file format itself? My understanding is it's pretty simple: a linked list of headers with the contents of each file after each header. Or are you complaining that it doesn't do compression itself like ZIPs do?
Articles like this are pointless. I get that guix and nix are neat, and I think that every single time something about one of them is posted, but I don't have the slightest clue how to use either one of them.
Do you want to convince people that something like guix is better than docker? Then take something that is currently distributed using docker and actually show how the guix approach is simpler.
i.e. I have a random app I recently worked on where the dockerfile was something like
ADD requirements.txt /app
RUN pip install -r requirements.txt
ADD . /app
RUN groupadd -r notifier && useradd --no-log-init -r -g notifier notifier
How do I actually take a random application like that and build a guix package of it?
Another project I work on is built on top of zeromq, and it would be great to use something like guix to define all the libsodium+zeromq+czmq+zyre dependancies and be able to spit out an 'ultimate container image' of all of that, but all this post shows me how to do is install an existing guile package.
With Guix you get full introspection of your entire package dependency graph, you can check and manipulate every aspect - and it is still simple and easy to work with. With GuixSD you get this same introspection and overview, but of your entire system. creating a container, vm or even a docker image is a simple '$ guix system <container|vm> config.scm' away. And your config.scm is as complex as you like it to.
The simplest way would be to package the app for guix and you could just run '$ guix environment <name-of-package>' and you would be dropped into an environment with all your dependencies and whatever else the application requires in your path ready for hacking, get your sources and editor and start working.
If you need a vm or similar though I'd translate your example above into a system config where:
- packages include python-2.7 and whatever is in requirements.txt (this may mean you have to package a few things, but again this is usually super easy)
- users and groups are added to the config, as they always are, no extra step necessary.
- exposing ports and networking is available as options for qemu script guix produces to launch the vm.
- CMD ./notify.py: create a "simple" service that can be autostarted by the system on boot.
- filesystem access is also handled by arguments to the qemu script.
As always though there are several paths to Rome, and these are just two of them.
Zeromq and libsodium are already packaged on guix, czmq and zyre looks like they would be simple to package, guix is really quite simple to work with, which I think is the reason so many of the users and devs are running it as our daily drivers, even though it is strictly beta (0.14. I think is the last release).
And pointless, come on - what does that even mean? Does it mean you don't value them? I was quite happy to read about a neat new thing I can use my favorite tool for.
> With Guix you get full introspection of your entire package dependency graph
Yes, I know all that. It's neat. I would like to learn more about it.
> The simplest way would be to package the app for guix
I was asking how to package the app for guix, and your response is the simplest way would be to package the app for guix...
> If you need a vm or soimilar though I'd translate your example above into a system config where: - packages include python-2.7 and whatever is in requirements.txt (this may mean you have to package a few things, but again this is usually super easy) - users and groups are added to the config, as they always are, no extra step necessary. - exposing ports and networking is available as options for qemu script guix produces to launch the vm. - CMD ./notify.py: create a "simple" service that can be autostarted by the system on boot. - filesystem access is also handled by arguments to the qemu script.
Yes, I'm sure it is super easy. How do I do it?
Do you know how to use the dockerfile I posted above? You run
docker build -t myapp .
docker run myapp
that's super easy. 9 lines and 2 commands. You can now add docker expert to your resume.
> Zeromq and libsodium are already packaged on guix, czmq and zyre looks like they would be simple to package,
Well, I was working on a fork of things, so I would have needed to install my forks.
> guix is really quite simple to work with
I'm sure it is!
> And pointless, come on - what does that even mean? Does it mean you don't value them? I was quite happy to read about a neat new thing I can use my favorite tool for.
You are correct, I don't really value posts saying how cool and easy something is and how much better it is than other solutions, when they don't actually present a complete solution someone can actually use.
I get that it is not other peoples job to teach me how to use something like guix, but do people not understand why things like Docker won?
Right, your dockerfile contains a requirements.txt with unknown complexity and number of packages, your app is without a name and does not have any links to code.
I'd be happy to provide some examples. Say you want your fork of libsodium:
(inherit libsodium) ; now anything not defined in this package will be inherited from libsodium
(source (origin (method url-fetch)
(sha256 (base32 "hash"))))
; Add whatever other fields your fork needs.
Sure it's slightly more verbose. That's a bit of the cost of having something you can actually rely on, with that degree of hackability.
If you actually want help to package these things ask on our mailinglist or IRC, we're happy to help with specifics. But you're basicly complaining that I didn't give you a concrete solution to a problem with several missing details that are important. Docker would not be able to instantiate your python project if it did not know the contents of your requirements.txt.
The thing is docker is huge and bloated; is far from secure, and will probably stay that way for the foreseeable future; has a more or less complete lack of introspection; and is not strictly reproducible (sure, it gets quite far along the way, but it really is not).
Guix on the other hand is rather lightweight, and you have a fair amount of control over how lightweight it should be; builds from source, and has a sort of hotpatching system for security fixes; has introspection and is quite close to bitreproducible.
Sure, docker is _easy_, as long as it works. And I'd argue that because of its complexity and obscurity it is not practically free software.
Regarding your concerns about Docker, I agree with that (even though I've been working on Docker and in the wider container community for almost 5 years now). However, there are plenty of tools that are compatible with Docker but provide similar benefits.
For instance, (from the openSUSE community which I'm a part of) we have KIWI that provides builds with full introspection on a package level (similar to what you're doing with Guix). If you build the image inside OBS (our build system) then if a dependency of your image is updated then your image will be rebuilt automatically and published in OBS (where it can be further pushed to any Docker/OCI registry you like). The packages are signed, and the image is also "signed" (though it currently signs the image artifact and doesn't use image signing since that is still not standardised). And most packages in openSUSE are bitreproducible (we build everything in OBS).
The above is far and above much better than the current standard in the "official" world of Docker, but unfortunately because OBS has a UI from the early 2000s (which is when it was written) it doesn't get enough attention outside of the communities that use it (and enjoy using it a lot). Everyone wants Dockerfiles even though they cannot provide these features (and you cannot get package manifests of your images without running a package manager in the image, which means you cannot get vulnerability information from the manifest).
[ Though I'm mostly talking about openSUSE here, I also happen to work for SUSE on the containers team. ]
> However, there are plenty of tools that are compatible with Docker but provide similar benefits.
And Guix is one of them, remember? From the article:
> Add -f docker [to your `guix pack` command] and, instead of a tarball, you get an image in the Docker format that you can pass to docker load on any machine where Docker is installed.
> The above is far and above much better than the current standard in the "official" world of Docker, but unfortunately because OBS has a UI from the early 2000s (which is when it was written) it doesn't get enough attention outside of the communities that use it (and enjoy using it a lot).
This is so true! I've mostly moved on from traditional, imperative package managers and associated distros in favor of the functional package management paradigm exemplified by Guix, but I still recommend openSUSE to my friends who prefer a more traditional/mainstream distro because of the love I have for the Open Build Service and Zypper.
The web interface for OBS does feel clunky these days, but it's a wonderful tool not just for improving the reliability and quality of software packages, but distributing them. Zypper is hands-down the most powerful and complete high-level package management tool I've ever used as part of a binary-based GNU+Linux distro. I love that openSUSE provides an instance of OBS that anyone can use for free to build packages for not just openSUSE but a TON of different distros.
I wish more people would explore, take advantage of, and celebrate OBS just like I wish they'd do the same with Nix and Guix!
I think you've reinforced the point they were making. It's pitched as easier, but clear examples of common usage aren't provided. You've provided a response longer than the 9 line Dockerfile, and we still don't know how to replicate it with guix.
I thought giving concrete commandline invocations to be rather clear and precise.
I use 'guix environment <somepackage>' and 'guix system vm config.scm' every day. I don't need more, cause these two solves most of the problems that was described earlier.
What is it I can provide that would be clearer, more common usage, than the examples I use almost literally as they are here?
And that 9 line docker file references at least one other unknown file, and is part of a bigger program. Docker would not be able to reproduce with the information given in that post. How do you expect me to reproduce something with at least 2 huge unknowns?
That is why you got a more generic answer for implementation, but once you have your implementation once, you only need the commandlines I provided.
That Dockerfile simply runs the commands listed therein in a glorified chroot, and then packages the result. The commands could easily be wget tar.ball && tar xf tar.ball && ./configure --prefix=/bla/bla/docker/ && make -j4 && make install
So, the question is, how to package something with guix, and how to run it.
With docker you run something as docker run [--interactive] [--terminal] [--entrypoint=...] <image> [[command] args]
Your libsodium fork example is nice, but we still don't know how to package a simple program.
> it would be great to use something like guix to define all the libsodium+zeromq+czmq+zyre dependancies and be able to spit out an 'ultimate container image'
You define a package for your own project that depends on libsodium/zeromq/etc from GuixSD. Then you export your own package with 'guix pack'. For an example of what a package definition looks like, take a look in /gnu/packages in the GuixSD repository, for instance libsodium  or Vim .
I did something similar recently to build an Nginx "application bundle" . It uses Nix (previously Guix, but Nix worked better for me in the end) to build a squashfs image. You can then run the binary on that filesystem with systemd-nspawn, or as a regular service by setting RootImage=. Some advantages over the Docker approach are that you can easily customise the build (e.g. changing the ./configure flags for Nginx without having to manually perform all other build steps), and bit by bit reproducibility (if you build the same commit six months from now, on a different machine, you will still get the same image out).
> Do you want to convince people that something like guix is better than docker
No, we show that Guix is a tool that gives you a way to work with software environments at a higher level; but at the same time you don't have to give up on application bundles like Docker. You can simply generate Docker images or other forms of applications bundles from that higher-level representation.
You are welcome to take a look at this paper that I co-authored where we explain why we use Guix for a reproducible bioinformatics pipeline, and the rigorous, declarative functional package management approach instead of the imperative approach of Docker files:
It's hard to give you any specific recommendations with so little context, but I will try. For starters, I should point out that you can't really compare Guix directly to Docker. Guix is a package manager, Docker isn't. The article talks about 'guix pack', which makes it possible for Guix to interoperate with non-Guix systems, and one supported system is Docker. You can deploy software with just Guix, too, either on GuixSD or a foreign distro with Guix installed.
Anyway, in your Dockerfile I see that your application uses Python and you do some package management and service management stuff that is mixed together. In Guix, these things are separated. So the first step would be to define a package for your software, and then you would deploy that package. For a real world example of a Python application, here is what the AWS CLI package looks like:
(uri (pypi-uri name version))
(synopsis "Command line client for AWS")
(description "AWS CLI provides a unified command line interface to the
Amazon Web Services (AWS) API.")
The package recipe contains all the metadata, build instructions, and dependencies. Now that you have a package, it can be built with Guix and then deployed in a variety of ways. Judging from the Dockerfile, your software is some daemon that listens on port 8080, so:
* You can install the software directly using 'guix package -i your-package-name' and run the notify.py program. Good for trying things out.
* If you are deploying to the Guix system distribution, you could write a service definition so that you can manage the daemon via the init system. The service would take care of creating the notifier user and group, starting the service on boot, etc.
* You could use 'guix pack --format=docker' to export an image suitable for running with 'docker load'
* You could use a different 'guix pack' format (and maybe make it relocatable) for running on some other non-Guix system
I should also add that I don't think the work is fully done yet on handling the entirety of Docker use-cases. It's a work in progress. I can think of a number of things that I want to add to Guix to make this workflow better that I haven't had a chance to hack on yet.
The package here uses the `python-build-system`, which defaults to the latest version of Python, but you can override that by specifying `(arguments '(#:python ,my-python))`, where `my-python` is a variable bound to a package value of the Python variant that you want to use.
You can easily install more than one version of a package as long as you have a package definition for it. You can install different variants (not just different versions) into separate profiles.
Guix is a Scheme library providing lots of variables that are bound to package values. These package values may have links to other packages (that's done with quasiquotation). Together they form a big graph of packages with zero degrees of freedom. Every version of Guix provides a slightly different variant of this package graph. When installing any package you instantiate a subset of this particular graph. Updating or modifying Guix gives you a different graph.
In order to keep things manageable we try to keep the number of variants of any particular package in Guix to a minimum, but you can install older variants by using an older version of Guix; or you can add new variables that are bound to package variants or different versions and install those.
it's not really application specific, just stuff like
the actual packages generally aren't important.
The cases were that would become interesting are where they require some C library dependencies first, like libpq-dev. In those cases something like guix/nix would be nice because it could be used to pull in the specific external dependencies as well.
It's a feature: you must be running tar as root or equivalently to restore to uids/gids other than the effective process uid. Otherwise you could happily overwrite any host system file including parts of the O/S. It's a restriction shared by all archivers.
How can a normal user create files owned by another user? If tar allowed that, you could write any file with any permission and any ownership anywhere by first crafting a tar file of those files and then extracting them. It'd render the file permissions and ownership system completely moot.
EDIT: To get the effect you want, run tar as root. That's required to ensure you have the permission to override the DAC system, first.
> I've had files that had 666 user:group permissions/owner that I tar into a backup file, then untar, only to find that the file is now 664 with me:me ownership.
It was PEBKAC, not tar's fault (GNU tar, anyway). Tar does store the original
owner and permissions. But the ownership of the unpacked files -- do you
really expect your process to set ownership of the files to another user?
The permissions would also be restored to 666 if you ran the tar as root;
there are several options whose defaults depend on whether EUID is 0 or not.
That's a detail of the extraction tool. In umoci (which extracts tar archives as part of an OCI image) you can remap the users or even extract as yourself and then add an xattr which represents the original owner in the archive (which is then read back when creating a new tar archive from the delta of the rootfs).
The tooling already existed because it's part of a stack that goes from build tool to package manager to operating system configuration manager, with all kinds of features for developers floating around along the periphery. It handles all of these things uniformly, reliably, reproducibly, and in a way that deduplicates shared dependencies.
This article is just showcasing a relatively small bit of tooling on top all that which makes it possible to reuse that work to produce containers out of the very same stuff, in a whole range of formats.
`guix pack` and `nix-bundle` are illustrations of how a novel solution (functional package management) to the very problem to which app bundling constitutes utter capitulation (dependency management) can not only retain the virtues the app bundle approach throws away in the hopes of making deployment simple, but even match it in ease of deployment when _none_ of the infrastructure of the package management system is expected to be present on the deployment target.
From where I stand, that's damn impressive.
All of this was achieved without the kind of ‘standardization from above’ that Apple gets to do on its platform. It's true that app bundling could have been a lot simpler if the Linux community lived in a locked box at the mercy of a Vampire King bearing the power to upgrade users' kernels in the dead of night without bothering to ask them, who preempted any diversity or choice in operating system components with a uniform common runtime, and gleefully ripped unseemly APIs out from under developers with every OS release. But instead— thank God!— we have such a wide range of environments under the name ‘Linux’ that I'm ready to agree with you and call it insane. Yet here we see that hackers made it work anyway, without bossing anyone around or compromising on the strengths of proper package management. And that's fucking awesome.
Boy, you sure make fragmentation, constant wheel reinventing, and the necessity of complex tooling to perform simple tasks almost sound like a good thing. I suppose it must be for the small percentage of people who value those things over actually being able to do stuff.
Given the near-complete lack of non-oss software support Linux has, it seems like both developers and users rather prefer uniform common runtimes and a lack of diversity in their operating system components. It's almost like a whole lot of things get much easier if there's some kind of standardization.
> Boy, you sure make fragmentation, constant wheel reinventing, and the necessity of complex tooling to perform simple tasks almost sound like a good thing.
Why, thank you!
Redundancy of efforts in F/OSS is of course a bad thing. It's perhaps even more tragic in free software than in proprietary software, because in free software, developers have fewer formal barriers to drawing upon the work of others. But it's something free software projects can't simply disable by exerting brute control over their users and contributors. The point is that with tech like this, the hackers behind projects like Guix have triumphed in a tougher struggle than NeXT or Apple ever picked. And they've built technology that copes with a wider range of environments, not via ugly hacks on edge cases, but through a thoughtfully designed build system which renders the whole dependency tree of every program it builds transparent, reproducible, and portable. That they had to build a vehicle for such wild and varied terrain is not what I'm celebrating, the cool thing is that they _did_.
> Given the near-complete lack of non-oss software support Linux has, it seems like both developers and users rather prefer uniform common runtimes and a lack of diversity in their operating system components.
Alternatively, when you refuse to distribute source code, compatibility for you involves greater demands on your platform, because you can't leave downstream distributors to recompile and you refuse to allow your more capable users to fix your software's incompatibilities. It's almost like a whole lot of things get easier when you distribute source code with your application.
Regardless, I think there are a lot of factors that together explain the predominance of free software on free operating systems. Proprietary software companies aiming to hit as large a market as possible with a single codebase turning away from perceived fragmentation in the ‘Linux market’ is certainly one of those many factors.
> Alternatively, when you refuse to distribute source code, compatibility for you involves greater demands on your platform, because you can't leave downstream distributors to recompile and you refuse to allow your more capable users to fix your software's incompatibilities.
And yet Windows still manages to run software written for a decade+ old version of it, and users often make compatibility patches for now-unsupported software, all without the source or recompilation. I think a big misstep by the OSS community has been its reliance on the crutch of "you have the source, do it yourself", and that includes making their software even work on a system in the first place. It leads to thinking like "it's ok if we break backwards and forwards compatibility, everyone can just recompile!".