I followed the tenets laid out in this article to provision a home router on NixOS... only difference is I used BTRFS snapshots instead of ZFS (which would have been pretty heavyweight for a packet filtering appliance).
It's a pretty great system for such an application. All the details of all the specialized, rarely-touched, hard-to-remember moving parts (nftables syntax, dual-stack DNS resolution, RS-232 serial connection parameters, etc. etc.) are all neatly collected under /etc/nixos for future me to puzzle out... and under version control, backed up offsite. It would be pretty easy for me to swap out failed hardware or upgrade it.
I wouldn't mind getting more infrastructure set up on these lines, and then maybe figure out a good setup for NixOps.
It wasn't too bad learning a little Nix to keep the configs DRY, modularized and parametrized. I find the results clean & readable, even though I'm hardly a grade-A FP propellerhead.
My main complaint was that nftables rules need to be expressed as dead strings instead of proper objects in the Nix language, which limits their composability. This would be a nice thing for the wish list.
Isn't a lot of this wisdom encapsulated in the (old, uncool) configuration management stuff like chef and puppet? The main difference being that you still need to rebuild your systems regularly to keep yourself honest (no lazy one-off changes, everything goes into the CM codebase).
I mean, I get that NixOS can pin versions of everything, and we've all been bitten by a new server build on identical CM code failing because of an upstream version change. But it's an eternal problem: pinning all the versions means that you a now micromanaging a zillion different versions of shit that you don't want to care about.
The problem with Chef and Puppet is that you tell it what to _do_ to a machine, but what you care about is what the machine _is_ or _has_. If you install a package on a machine using Chef, and then delete the line to install the package, it won't go ahead and uninstall the package. Even if you want to keep all your machines up-to-date with your configuration, and tell Chef to do an uninstall, uninstalls don't necessarily restore the state of the machine to what it was before the install. Data frequently gets left behind.
Only what has been packaged, nix let’s you roll your own package if you like, letting you overlay ontop of existing expressions to produce your own. Want to rebuild a specific OpenSSL with a specific tool chain and compile time flags? Easily expressed in nix if you don’t mind the time to build it all
I'm more familiar with puppet, but I'd venture to say that it really depends in how well you write your code.
In my previous job where I used it heavily, every module had an enable parameter. If it was true, the module would follow one logic branch (install package, write config, perform action) but if it was false then it would follow another (remove package, delete config, perform action). This was a basic standard, and a new module would be rejected if it wasn't possible to toggle it on/off and configure it from Hiera (hierarchical yaml data). It wasn't perfect, but once the complex stuff was dockerized the puppet codebase shrank a lot. Most of the differences between our VMs were defined in Hiera (edit a yaml, toggle something on, override some default).
> In my previous job where I used it heavily, every module had an enable parameter. If it was true, the module would follow one logic branch (install package, write config, perform action) but if it was false then it would follow another (remove package, delete config, perform action).
This is one area where NixOS shines: The ability to remove things from your configuration with no mental overhead. Even if you take great care to do all the cleanup work in your runbook, something is bound to get missed and you still haven't accounted for the state of the system before the runbook was run the first time.
For NixOS it's easy: "Not in config" = "It's not there". All system configurations are themselves normal Nix expressions that get evaluated into derivations which are then built into immutable paths under /nix/store. Most files in /etc are simply symlinks to files in the Nix store, and the activation script takes care of creating and deleting those symlinks when a new configuration is activated.
You can be fairly good about it, but there's always state with the potential to be left behind or become inconsistent: transitive dependencies of packages sticking around, packages with pre/post scripts, services writing out data files to unexpected locations on disk you forget to cleanup, a failure somewhere between a resource and a "notify" firing causing your service to not restart after a config file update and the next run not rerunning the notify, etc.
While all of that is correct, I feel like you're leaving the most important part unspoken: Nix (and, by extension, NixOS) prevents this by construction, so that you are no longer dependent on somebody using the tool perfectly in every single place to not have this issue.
It's the difference between "you can use this tool safely" and "the tool makes sure that you will use it safely", basically.
> But it's an eternal problem: pinning all the versions means that you a now micromanaging a zillion different versions of shit that you don't want to care about.
This is one common question people ask when I introduce them to Nix and its ability to pin packages. They are often surprised to hear that you only need to pin one thing, which is the version of the Nixpkgs package collection.
Nixpkgs  is the central repo that contains all the Nix expressions for the "official" packages. It contains everything Nix needs to know to build a package. When you rebuild your system configuration, Nix will resolve all the packages using your local checkout of Nixpkgs into store derivations (.drv).
A .drv file uniquely identifies a specific version of a package. When any input to the expression change (source URLs, dependencies, etc.), it will result in a different derivation that will be installed under different paths in /nix/store. In other words, different inputs -> different outputs.
This is where binary distribution comes into play. Nix builds packages in sandboxes with tight isolation (no access to network or outside fs). Because the resulting store paths depend on the inputs being the same, for the same store path you will pretty much  get the exact same thing no matter where it's built. So Nix will lookup the binary cache for a matching object with the same store path. If it fails to find a match (e.g., because you modified the expression), Nix will simply build it. Binary cache can thus be seen as a transparent optimization that will get you binaries that are the same and will behave the same, as if you built it locally.
To summarize, when you use NixOS, you already have the source (Nixpkgs) to be able pin everything. In fact, you cannot rebuild the system without it. If the expressions don't change, the results won't. The only thing you need to to is to actually pin it to a specific commit and keep track of it in your VCS.
I've configured my own and work systems for a few years using Puppet and Ansible, and I have to say Nix is miles ahead of them both.
- Actually reproducible builds, as opposed to "only if all the external factors involved, like time and third party packages, are identical between builds". One thing this enables is building your system packages in your automated pipeline and just downloading the resulting cache.
- Much more compact than either. For an example, the last state of my own Puppet configuration was  - five directories with hundreds of files. The current state is 13 files total, only one of which contains the Nix code actually needed to set up dozens of packages and services for a full desktop OS. That file clocks in at 346 lines, and comes in at a fraction of the size and complexity of the less featureful Puppet.
- Nix shell, where you can build a set of packages for use within a single project without having to worry about it borking other projects (like adding things to $PATH).
- Very few shell commands are ever needed.
- Built-in features I haven't seen in either of the other systems: allowing unfree packages either globally or individually, setting up UEFI, LUKS and LVM with a handful of lines, and stopping the compilation immediately if there's a problem rather than continuing by default.
The only caveat is the Nix language itself. The whole thing might be more complicated than the Puppet language, but you can see for yourself that the current configuration isn't exactly rocket science.
Can you ever see it used by a large organisation to manage fleets of machines? Can you continually apply a NixOS codebase across thousands of machines, securely? Is the language capable of handling roles, environments etc? Does it support an external database where you can manage variables (bonus if the lookups are environment-aware, so a 'dev' machine in Europe doesn't get 'test' config from the US).
If so then it could genuinely replace any existing CM system. But displacing the existing tech, rewriting the whole CM codebase and getting the team over the learning curve is a monumental.
I get that Puppet isn't 100% reproducible, but my point (in another part of this thread) is that time marches on and there are only so many things you need to (or want to) 'pin'. In the past, our practice was to leave software versions all 'latest' except for known issues or requirements. Otherwise you're in a constant fight with the security folks over old software versions. If we hit an issue on a puppet run in our test environment, we put the brakes on the puppet rollout until it's. Once it gets through a few days in test, we promote it to the next environment, and eventually do a staged release through the production environments.
I don't do a whole lot in this space, but when I do I create base docker images to handle cross-cutting concerns like this. I end up with a little stack of base images depending on which ones need some large or complex install (python, nginx, node, etc), and occasionally you end up with a couple vestigial things to avoid the worst combinatorics, but it mostly seems to work.
You have to get past runbooks to actual scripts to do this, but that's not a bad goal to have. Change history in wikis leaves a lot to be desired, and when you're trying to go from beginner to mastery, the change history can help a lot.
It's better to think of these kind of systems as Disposable infrastructure instead of Immutable infrastructure. The difference being is that you occasionally, temporary can accrue some state on a machine (eg: debugging, hotfix), but you are accustomed to the idea that you can indiscriminately dispose of any system without worrying about state present on the system. So after you're done debugging, you just throw away the system and have a new replacement build automatically (by autoscaling or similar systems).
Managing NixOS system is an organizational pain in the ass. Virtually everyone knows Ubuntu. Virtually all vendors provide some sort of standard supported deployment like Docker or Ansible. Now anyone who’s affected by a NixOS deployment either has to spend months learning Nix or escalate to the a handful of gurus.
I don't think there's anything that stops you from doing that? You can edit the running system the usual way AFAIK. Even better, afterwards you can just reset the box and all your changes are automatically cleaned up.
I'm unclear on the benefit of NixOS vs container optimized os (aka coreos/flatcar). These systems have read only root partitions, no package manager, minimal services. The package manager is simply docker.