I'm talking specifically about CLI tools, webservers, and other tools distributed as static binaries or via package managers.
What's the reasoning for so many package maintainers optimizing for <5mb binaries at the expense of usability?
It seems like when >90% of hosts are running on 2008+ hardware with SSDs or even moderately fast HDDs, loading time and storage space are not major issues below the 30~50mb mark.
A recent example from HN: https://raphlinus.github.io/rust/2019/08/21/rust-bloat.html
Developers who are specifically targeting systems with limited memory will try to produce small binaries. If you're talking about a distribution like Ubuntu, though, it's simply not a concern. At all. Applications are built to do whatever it is that they need to do, and whatever size they end up being is how big they are, almost without exception.
The reason CLI binaries are small is that ALL binaries are small. They are compiled, and any resources they need are stored externally in other files. They use shared libraries, making the code even smaller through re-use.
I have 1785 programs in /usr/bin on my Ubuntu server, and all but 10 of them are under 5M. The ones that are larger are only big for unusual reasons (e.g. mysql_embedded).
I'm not sure what you're referring to when you talk about usability. Are you saying that my 1.1MB nginx lacks some utility? And that it lacks it because someone was worried about the size of the binary? That's simply false to the point of being nonsensical.
One of the biggest binaries I use is the Postgres server, at a hefty 6.5MB. Is Postgres missing features that affect its usability?
I was the product manager, and we did have complaints about the size. So it was removed:
https://mysqlserverteam.com/mysql-8-0-retiring-support-for-l...
I don't believe there have been any regrets.
Postfix lacks OpenDMARC and OpenDKIM out of the box, which are recommended for almost all non-satellite mail server installs these days. Granted they're not native Postfix modules, but they're so important that I wish it had them out-of-the-box.
Same is true for caddy, certbot, and several other tools I use often.
It is often necessary to build custom packages if you want functionality which isn't core to the original software (perl and lua aren't necessary to run nginx core). If you work with the package maintainer, you may find a way to have the distro build the extra functionality as optional packages.
But that's the crux of my argument, most of the time I don't just want the core, I want to be able to do almost everything and I'm willing to pay the binary price to have an "apt install nginx-everything" available.
Users with limited system resources could always "apt install nginx-minimal".
(If you can show me a distro that includes all the modules for nginx, caddy, postfix, certbot, etc. by default, I'll switch in a heartbeat!)
(1) I'm not sure that it is at the expense of usability, so I want to question your premise.
(2) Size on disk is a proxy for many other kinds of bloat. That article specifically mentions one in the opening, compile time. Surface area of the codebase is another good example, where people are concerned that the amount of libraries and code that they're pulling in means higher probability of library bugs and maintenance burden.
(3) Bandwidth still has a cost.
I'd argue that lacking a way to do that is a significant hit to usability, and it requires recompiling with the nginx_http_perl_module or lua module to get that feature.
E.g. if you use protobufs for C++, you probably don't want to pull in all of ghc so protoc can support haskell. You would rather use that space for ccache or something.
Another concern is embedded platforms. If rust can't produce meaningful code that's under a megabyte, then it can't be used to write elevator controllers or microwave oven displays or to program pacemakers -- and as regular computers get more RAM to try to make up for the end of Moore's law, a larger and larger percentage of cases where it makes sense to use a compiled systems language over a JITed scripting language will be embedded ones. To some people, if a compiler can't produce 1k binaries, it's worthless because it forces you to substitute in a beefier machine.
There's also general performance and security concerns: any unnecessary code adds to both vulnerability surface & footprint. If you work your system really hard & you're always a meg or two short of running out of RAM, then saving a meg or two of code size matters a whole lot. (This applies less to desktops & more to applications where you might process lots of data on off-the-shelf hardware: if you're clever & take advantage of parallelism, you can process many TB of data on a machine with less than a gig of ram in a relatively short time, and you can cut it down further if you can run more copies of your application, which you can do better if your memory footprint is smaller.)
Finally, I don't think it's safe to overestimate what percentage of machines are less than eleven years old. Ever since Moore's Law ended, the case for upgrading hardware has been a lot weaker, and even when it was running strong folks often went a decade without doing so.
Bloat doesn't necessarily translate into usability, and usability doesn't necessarily transfer between users. There are a lot of folks for whom a 5mb app is, necessarily, unusable.
I wholeheartedly agree with this. If there is unnecessary code, there's no reason to keep it.
I'm more directing my gripes towards packages that have highly desirable, well-audited add-ons / modules that package maintainers choose not to include in the default distributions (under the reasoning that "users want small binaries").
Take for example nginx, caddy, certbot, or postfix (there are many others too).
All of these require recompiling from source with build flags to enable the inclusion of even their most common add-on modules, e.g. nginx_http_lua_module, nginx_http_perl_module, caddy:http.cache, cerbot dns plugins, etc.
Recompiling from source breaks the ability to use a package manager for install and automated updates, which drastically reduces usability for the majority of users. There are ways around this of course, but for the average user, having to compile a package from source is a major hurdle.
Instead, why not distribute the binary with no add-ons as "apt install packagename-minimal" for the users with bad internet / low-resource requirements, and make the default "apt install packagename" distribution the "batteries included" version?
(If you're interested, here's an old blog post of mine that goes into detail on why I think package manager distributions are worth the effort to maintain in general, even for static binaries or packages with dylibs: https://docs.sweeting.me/s/against-curl-sh)
What about caddy which doesn't offer module support at all in the package manager verserion?
Certbot and postfix also don't include their most commonly used modules out of the box.
That has no effect on how long it takes to download it, but help with:
- Less dependencies to follow
- Maybe even benefit from instruction cache
- Shorter compile times
- People don't laugh at you when you have half gigabyte of binaries to start an rds instance and create a user in it. (wtf hashicorp?)
- Smaller attack surface
- Som now started to target WASM as secondary target and then suddenly it's not web, but also web
Folks are stuck with 24.4k still, over noisy radio signals or unmaintained copper lines.
My original post was bourne out of frustration with nginx not having nginx_http_perl_module out of the box. If it's never loaded in the config file though, it never has to be loaded into memory at runtime.
I've personally ran gentoo on a 4.2GHz i3-530 as well as a 4.5GHz FX-4350. From kernel to firefox, I only gained performance going from -O2 to -Ofast. I have no hopes of considerable cache misses on a modern CPU.
https://news.ycombinator.com/item?id=20761449
Not that it matters with the low number involved.
Then refer people to "apt install packagename-minimal" if they only want the core.
It's made me so much better at bash, and I've never had an issue where I wished it had features that it doesn't (which is what my original post was mostly talking about), thank you so much for all your hard work on it!
https://github.com/jwilm/alacritty/releases/download/v0.3.3/...
If accepting a feature that causes a 3x compile-time regression were easy, then that 29s build is just 5 easy decisions away from a 2-hour build. Now, nobody cares about adding more time to the builds, since nightly, and developers commit hoping it doesn't fail. This is sadly all too familiar.
> Good programmers write good code. Great programmers write no code. Zen programmers delete code. https://www.quora.com/What-are-some-things-that-only-someone...
If you ever have an automated process built around your tool, eventually something will run it lots of times. The more it's downloaded, copied, and run, the more the "bigness" affects performance. It's best to choose smaller whenever it doesn't take away features that you need.
Here's an analogy using your question:
Why do aircraft builders optimize for weight?
Just to be clear, I'm not talking about cargo planes or helicopters, those are separate conversations.
I'm talking specifically about passenger airliners, fighter jets, and other aircraft that are built in large quantities.
What's the reasoning for so many aircraft builders optimizing for weights <50000 kgs at the expense of usability?
How does a smaller binary make things more efficient?
Even if you have ample disk space and memory, smaller binaries can be a performance advantage. A smaller binary with less instructions will more easily fit into CPU caches. Binary size can be an indicator of performance.
There are several reasons that I like small packages, both when making them and when consuming them:
- Easier to maintain. One packages does a single thing, it's easy to reason and compose with other packages. This is specially true for utility packages. It's easier to document a single thing, to debug a single functionality, etc.
- Faster installs. It does take several minutes under certain circumstances to install larger packages/projects for me. Not everyone using tech tools live in a world with fast internet. Sure this amounts to ~30 min/week max, but would prefer to use my time differently.
- You say 30-50mb, but my typical React project is 80-100 MB in node_modules. As an example, my current laptop is ~4 months old, and the "projects" folder has 670k+ files and weights 6+ GB.
- Copying projects. While you talk about binaries, it's usual that it's either a single minified file with a decent size, or hundreds/thousands of dependencies. While the size itself doesn't matter so much, the amount of files matters for things like backups, searching in files, etc.
- Signaling. People who care about this, normally won't throw a lot of dependencies on top of it if it can be easily avoided. So you know there aren't many surprises normally or Guy Fieri images: https://medium.com/s/silicon-satire/i-peeked-into-my-node-mo...
- Marketing. Many people care about it for these or other reasons. Everything else being the same, a smaller package is better, so there's no disadvantage if you can easily shed some of the library size. I don't care that much, especially because of what you say. But some people seem to do.
- Tooling is easy! Rollup, webpack, uglify, etc. There are many easy (okay, not Webpack) projects to bundle and minimize a project, which are needed from front-end JS.
Now, I wouldn't optimize at the expense of usability or dev time. For instance, the two features I removed from `drive-db` were removed because they were half-baked. File cache ~> replaced by in-memory cache, which also allows for browser and Cloudflare Worker usage (the main point for refactoring). MongoDB-like queries? ~> JS today is good enough not to need those for a small spreadsheet. They didn't work the same as MongoDB, and not even I, the library author, used them ever.
I think I should've added more detail in my original post explaining that I'm very much pro-removing code, my gripes were directed at packages that have optional modules that aren't included by default because maintainers claim "users want small binaries".
I'm of the opinion that most packages should include their optional add-ons in their default distributions, and offer a separate "minimal" version without any add-ons for the users with low-resource requirements.
(See my response to the other comment above too)
But what about things where it is purely an option? There's a slippery slope there IMHO. Database connection? For a server library, no way. Key Value store? Also no way... no, wait, those are used for sessions, now you have to either do no sessions by default, have in-memory sessions which are super tricky (because they "work", until there's tricky production issues) or add Redis/similar connection. Same for rendering templates, I added the 3-4 most common but not others and feels really "meh". It's also A LOT of work to add the top 3-4 options, even without documentation.
I think Django supports db-backed/cache-backed/file-backed sessions out of the box, but I think Flask doesn't have native sessions at all, it's only addable with additional libraries, but I could be wrong.
> Imagine if the apple you were eating for breakfast had 291 ingredients, or if the car you drove to work had 291 parts. You’d be worried, wouldn’t you?
This is naive or just flat out uneducated at best, leading me to question every other proposal in the article. For context, a quick Google search states a regular car can have roughly 30.000 parts. I won't guess how many different biological components an apple has, but I'd hazard it's way up there as well.