Text files are king! I store every single byte I can in text files. Examples:
- Tabular data : TSV (almost all Un*x/GNU tools handle this out of the box)
- Simple "records" : GNU Recutils format (https://www.gnu.org/software/recutils/)
- Formatted texts : Markdown, LaTeX, ...
If I need some hierarchical kind of information, I use a folder structure to handle this.
I know that not everything can be stored as text. But I try to use open, well documented and future proof formats. Examples:
- Images : PNG
- Music : FLAC, Ogg, ...
- If I really need to preserve orignal format/design of a web page: PDF
Nothing's perfect but stay away from any closed/obscure/proprietary formats.
The PNG spec is an interesting read. There are some overly fancy things in there and some dead-practical ones, like the magic number at the front of the file contains an 8 bit character ("8 bit clean" was still a phrase you could utter back then, today it's assumed), the section headers are plain text (you can run 'strings' on it), and there's even a bit that determines whether the section is mandatory to render the file properly. You can add arbitrary metadata and any other reader can still display the image.
Agreed. I don't know for sure what operating system or device I'll be using in 20 years, but I know it'll be able to read and edit text files. I once used MS OneNote and it's great, but once you leave Windows you basically have to throw it all away, and so in the long run I just wasn't comfortable raising the cost of switching more with every note I created.
And of course interacting with those files using the vast ecosystem of countless simple commandline tools and using the same efficient text editor to edit almost all of my documents makes the whole thing a much better experience than all alternatives - at least when text is viable; imho diagrams etc. are still too cumbersome compared to a quick free-hand sketch.
The irony here is you both talking about "Microsoft and Apple" as if the famous difference in their text file structures, newline conventions, was not there, and that we lived in an alternative reality where there were "portable raw text files" enabled by Microsoft and Apple.
Wow thanks for this recommendation. I've got a few things lying around that I've been using awk/bash for and where even sqlite is overkill but it looks like this solves the same issues in a much better and more concise way. I might try converting these this afternoon. Can't wait to give the csv conversion a try too.
Yes, these things can work. I think Plain TeX would be more likely to work better in fifty years than LaTeX; I use Plain TeX myself. I think TSV is also good.
I think PDF is complicated, though. (However, there are simpler subsets defined which omit some complicated stuff.) (If you really need to store the contents of a page, PNG might do.)
The SQLite version 3 database format is also unlikely to change I think and it is documented. (SQLite is also in the public domain, which also helps. You can avoid WAL and that other stuff if you want to ensure working in future, I suppose.) (If it does change a lot, probably it won't be called version 3, any more, I think.)
Yes, it is probably true. There are less complicated subsets, but using PNG to save the picture of the page might help better (although PNG doesn't used with CMYK or with extra separations). There is also DVI, which TeX uses as output, and it is simple so a program can easily be written to rasterize it or to convert to whatever other format the printer uses.
The thing that people forget about "almost all Unix/GNU tools" is that there is not just one character-separated variable-length-record flat file text table format. There are at least three. And that's just on the Unices and Linux, and not counting ASCII.
Agree to text files. I would throw in Yaml to the mix. I use it preferably for all data files meanwhile that one would otherwise use json for. It‘s slower to parse than json, but it‘s incredibly simple and human readable (and writable), and will stand the test of time.
Nah. Yaml is a total mess with far too many special cases (time handling, many different ways to write booleans). I think it doesn't have any kind of formal spec? It makes sense where you want to optimize for human editing in the short term, but it is by no means a format for the long term.
Does anyone need to keep system logs for fifty years?
Practically speaking, I'm fine with a constraint where the systemd database format + tools need to be roughly kept in sync. I can't think of a realistic example where this wouldn't be the case. Most of the logs in this database are supposed to be ephemeral.
If you're in banking or medicine or something and are required to keep certain logs for a decade+, you should figure out what you actually want to keep and put it in a format that would be reasonable to access on that kind of timeline.
No idea what systemd's implementation is meant to accomplish, but, in general, on a memory-constrained system, a binary log can theoretically speed up system performance by taking up less memory, which leads to more available pages for other purposes.
Does anyone actually look at the systemd log? I ignore it. Rsyslogd handles all of our logging needs in a sensible textual format that does not require special commands to view or useless make-work to pipe into a database.
To answer your question, I thought and still think it was a bad choice. The only time I would ever be interested in the contents are during early-boot failures, exactly the time when the toolset is limited and most folks aren't familiar with what's available - exactly when simple text is easiest to work with without finding another machine to stare at the `journalctl` man page.
The rationalization about detecting record corruption makes very little sense to me. (Now, there is a valid concern about potential log forgery, enabled by poorly written apps that directly log user input without sanitation. But that's better mitigated in the buggy app, which almost certainly is doing other unsafe things with user input. And if that were actually the concern, they had other choices that would have been far less annoying.)
One could better ask how people feel about all of the binary databases that exist in Unix, from the Berkeley DB databases used for the likes of termcap and the system account database on the BSDs, to the binary login database that has been around since the 1970s.
Not parent commenter, just sharing my solution. I use Zim because it already saves a folder hierarchy of markdown TXT files. The base folder is synced to Dropbox, and the markdown is readable/editable enough if you open it on a mobile device.
I'd love to use CherryTree because it supports encryption and is more functional, but it stores everything in a single XML/SQLite file. Neither Zim or CherryTree have mobile apps.
I really tried to use Joplin, which saves markdown too and has mobile apps, but the desktop app is huge. I prefer to use those resources for Keybase.
I still use utilities I wrote in the 1980s. My text editor dates back to 1985 or so. The dmd D compiler back end dates back to 1982 (undergoing continuous improvements since then).
I've had my daily driver for about 30 years now. My stereo is 40 years old (I use it all day every day).
Modern cars are all run by a computer. I bet this will be really hard on anyone in the future who wants to restore one - there's just too much highly specialized technology in them. Most parts for my '72 Dodge can be made by a competent machinist or metalworker if necessary.
I see this in airplanes, too. People resurrect or replicate airplanes right up to the jets. But the jets? Sigh. You can't make a jet engine in a machine shop. So any that survive are static museum pieces, while the WW2 "warbirds" buzz around outside.
I have a bunch of utilities I wrote from the late 80s/early 90s and use them regularly too; many of them are Win32 binaries which haven't been changed since they were first created and used in Win95, and still run on Win10.
Incidentally my car is also nearly 50 years old and doesn't need a computer to run, although it's received some powertrain and suspension upgrades over the years as well as computers in the form of GPS, cameras, and proximity sensors.
I recently finished restoring a refrigerator from the late 1930s. With new insulation and seals it uses less power than a lot of the "smart" fridges today, and doesn't need a computer to function either. It would probably last another 80 years.
No, they're not original equipment but an add-on. Search "parking aid radar distance sensor DIY" at the usual online stores. Really helps with getting a full-sized car squeezed into tight parking spots, and general maneuvering in cramped spaces.
Well, it's also almost impossible to buy/sell a new small personal propeller-driven plane, so one needs to buy used, then retrofit. But you're damn right: if you've got a de Havilland Canada Beaver, you could rebuild the engine 60 miles north of the arctic circle, if you needed to, in a small airplane hanger. But the damn thing hasn't been produced in ~50 years.
> But the jets? Sigh. You can't make a jet engine in a machine shop.
Is this really because of the digitization of things, or because modern jet engines are just that complex on all accounts?
I mean, the act of making the turbine blades only uses some of the most advanced metallurgy techniques known to man. It's not just putting things together the same way you can install a transmission to a chassis or a hard drive to a computer.
As a counterpoint, though - I bet you could make a jet engine in a machine shop. It would be terribly inefficient, loud, unreliable, but there is really nothing stopping you from making something that can produce the thrust necessary for flight (up to an extent). After all, the people in the '40s were blessed with even less resources and information than we have now, and they got it to work.
> because modern jet engines are just that complex on all accounts?
It's indeed not about computers. Jet engines in the 1950s were designed without computers.
I do know of a couple from-scratch replicas of Me-262s that were built. The only differences were:
1. no machine guns
2. modern instruments
3. modern helicopter turbine engines were used
Nobody wanted to fly with the jet engines of the war years. I recall those engines had a life expectancy of 20 hours. (Or maybe it was 2 hours, not sure.) It was mainly the metallurgy that did them in.
I think they strengthened the nose gear, too, as it had an ugly tendency to collapse on landing.
Correlation between age and presence of global variables?
As to cars, I'm pretty sure you're right - although in the future it will be a lot easier to get custom electronics made (PCB assembly is now very cheap compared to even just a few years ago), the software would require reverse engineering the rest of the car i.e. thousands and thousands of man-hours of design. IIRC the first car to use a CAN bus was around 1991
I also worry about computer-controlled cars being too easy to fix - in the sense that if a mechanical part fails the driver is in theory capable of adapting to that to pull over safely but what happens when your DIY electronic power steering algorithm fails?
I started on the D front end in 1999. It was attached to the NorthwestC/Datalight/Zortech/Symantec/DigitalMars backend (for DMD), to gcc (for GDC) and to LLVM (for LDC). The backend was originally written for my C compiler, then my C++ compiler. The source code can be found here:
I don't care for IDEs because they only work on one machine (I develop on several). ME works identically on all of them, and is easy to port.
I keep thinking of transitioning to vim, but never get around to it. The text editor is not the gating factor to my productivity, anyway. Most of my coding sessions consist of simply staring at the code. What helps most is lots of windows open on a BFM (Big Frackin Monitor). I want a wall sized monitor with a retina display.
D itself isn't that old but the backend in one the D compilers (dmd) has evolved from Digital Mars C++ which itself is/was various C++ compilers over time (Written by Walter). This is the source of the (resolved a few years ago) technically-not-open-source issue with DMD, where the main compiler was Boost licensed but the backend copyright was still owned by Symantec even if the source was available. This is no longer the case.
I wrote a simple cgi-bin food ordering app in ~2000. Python and MySQL. The program continues to run, 20 years later (I have moved on to a new job, but I talk to the old team). The python version was updated, the MySQL server was updated, and they did a data change when they moved to a new food caterer.
I think the main attributes of the program that contributed to the longevity were:
- no ORM or other complex library to interact with SQL. Just the basic SQL client.
I work in a company with over 40 years old and is incredible to see some very old software still functioning :), lots of orphaned projects that keep rocking forever.
Some engineers who have been in the company for decades usually say how a lot of our legacy codebase was cutting edge at one point and how a lot of their peers would have written it differently if they knew the software would still be in use decades to come. Safer, proven, simpler constructs with minimum dependencies seems to be the way.
How would you write your code today if you knew it would of been your last commit and still in use in 30 years ?
Just keep bootable OS drive image for every project (set of projects) which builds offline. Make sure to also download platform docs, dependency sources, manuals - git clone, javadocs, ruby ri/rdoc so on. Even keep IDE set up there.
I keep that habit currently by separating work with virtual machines. Storage is cheap and I can come back to my project tomorrow or in 2050 with amd64 emulator. It is also easy to backup or archive it - just rsync the image to NAS or burn it on DVD.
> Just keep bootable OS drive image for every project (set of projects) which builds offline.
Those images don't stay bootable, for different reasons:
* Media changes - Try booting from your tape, or your floppy disk set.
* Unsupported new hardware - Remember how your Linux boot partition needed to be at the beginning of the HDD? Or how Windows setup would not recognize SATA drives unless you added drivers from a special floppy disk?
* It boots, then gets stuck or goes blank on you - display driver issues being a common cause of this.
As an alternative to archiving the build chain, consider documenting a reproducible build from widely used components.
The risk is that the components of the reproducible build may no longer be available 50 years from now. But rolling your own archival is not bulletproof either, for the same reasons that untested backups aren't bulletproof.
> How would you write your code today if you knew it would of been your last commit and still in use in 30 years ?
Generally: minimize dependencies. External library or API dependencies? Versions can drift, the system can change out from under you in incompatible ways. That goes for the OS, too, of course. Data dependency that you aren't 100% in control of? Same. All are forms of state, really. Code of the form "take thing, do thing, return thing, halt" (functional, if you like—describes an awful lot of your standard unixy command line tools) is practically eternal if statically compiled, as long as you can execute the binary. Longer, if the code and compiler are available.
This. It doesn't mean go overboard with NIH, but you have to evaluate and select your dependencies judiciously. It's not about developer productivity with these types of products.
Also, make as much of your program configurable as possible so you can tweak things out in the field. For example, if you have a correlation timeout. Make that configurable. But don't go overboard with that either. :)
Another aspect of this is pick dependencies that are well encapsulated (so if you need to change them or update them it's generally easy to).
Of course, this is just a good choice regardless. Still, it shocks me how often people will choose libraries and frameworks that require very opinionated structure on large swathes of code, rather than having well defined minimal touchpoints.
I definitely relate to this. Just recently I had need of a printer and I was thankful I had stashed an old Canon inkjet printer in the attic. It technically is new enough to have wifi, but it was one of the first of that generation so it's a real hassle to get it connected. I was very happy that it has basically no other "smart" features so it can't phone home or report on my activity or anything like that.
I foresee that in the coming years I too will long for "old" technology and devices that just do what they are advertised to do, unencumbered by data harvesting and loads of hidden "features" that are not really there for my benefit. I can already relate to the farmers wanting old school tractors that just do the job.
I like old cars because of their limited software and extreme modularity. By "old", I mean early 2000s. I currently drive a 2003 VW Jetta. The car has a computer (obviously), but it's been running the same software since it drove off the factory floor, and its reach is limited. It controls the engine timing, power locks and windows, and other basic features, but that's it. No fancy features that break or are obsolete in three years (my car is almost 17 years old, and the only obsolete thing on it is the tape player, which I still use—I doubt the same will be true of today's cars in 17 years).
This model works well, and if something breaks it's trivial to fix. Also, it means non-modal physical controls for absolutely everything. I've driven newer cars, and there's nothing in them that I miss when I go back to my own car.
 A lot of functions are controlled by separate hardware modules. The turn signal relays and clicker, for example, are contained inside the emergency flasher button, and swapping it out takes less than a minute. Same for the windshield wipers (relay box under the steering wheel), radio (standard double-DIN), rear-window heater (relay and timer inside the dashboard button), window and lock controller (part of each window motor), and the headlight switch (switch on the dash that physically switches the light circuits).
 Not even Apple CarPlay, which I see as largely useless, because my windshield-mounted Garmin GPS unit and a magnetic smartphone mount with a Bluetooth-enabled stock radio does everything CarPlay does, but better.
One of my summer intern jobs during college was testing old DOS programs to make sure they would run on Windows. This was for a factory that could no longer deploy DOS so they had to go to Windows for support. Most were fine but some used platforms/languagues like Clipper which had a timing loop calculation on startup and of course, as you might be guessing, that loop ended up being a divide by zero. Enough old DOS games had that problem that people had come up with workarounds like having a batch file that kicked off a program that consumed as much CPU as possible, delaying a short while and then starting the problematic program. Due to the other CPU hogging program slowing things down, the timing calculation on startup would take longer and no more divide by zero. I don't know if they actually shipped our solution in the end... The Clipper programs were all in house but apparently they had lost the source code for many of them.
Very good post but unfortunately, sometimes bad SW lasts a long time too. I was recently called down to our proto line to help out with a 20 year old machine vision app. All the HW was obsolete and the SW was unmaintainable. I know that for sure because I wrote it. The whole manufacturing line was shipped overseas a couple times and returned to us a little while ago but the engineers just kept it running as-is. In this case the cameras, framegrabbers and light sources were all still working, which is impressive.
However, SW is closed source and the licenses can't be renewed, in the end that's what's making it unmaintainable.
which is underlying the LaTeX set of macros and stuff that is still in use for academic papers.
 Also TeX is a good example to bring up as an early example of a complex system that was intended to have freely available source code (before the term "open source" and before GNU).
When he created TeX, Knuth said there was a lack of example source code that could be freely examined by students and others in the field, so he released the code and in fact published it in book form along with the source for Metafont and the code for the Computer Modern fonts.
IBM ACP (Airline Control Program) was originally a S/360 OS optimized for transaction processing that underlies things like the PARS airline reservation system and the card authorization systems at AmEx. It has been under continuous development and evolution since, I believe, 1969. Along the way it became ACP/TPF (Transaction Processing Facility) and then just TPF. It's now called z/TPF and runs on the latest zSeries mainframes.
I suspect, due to it's widespread use in embedded systems of all sorts, there will be things still running MS-DOS 50 years from now.
According to the man page for bsdtar that ships with Ubuntu
A tar command appeared in Seventh Edition Unix, which was released in January, 1979. There have been numerous other implementations, many of which extended the file format. John Gilmore's pdtar public-domain implementation (circa November, 1987) was quite influential, and formed the basis of GNU tar. GNU tar was included as the standard system tar in FreeBSD beginning with FreeBSD 1.0. This is a complete re-implementation based on the libarchive(3) library. It was first released with FreeBSD 5.4 in May, 2005.
Not to mention the dependencies. Most of the "dependencies" of bash scripts are widely used unix tools or other scripts which are generally fairly stable. The python I've been exposed to (I don't know how typical this is) uses random pip modules that are already deprecated and will never be upgraded to python 3, if pip itself even exists and is compatible with packages made today.
To know if something will work in 50 years you have to look at the weakest links and that's not necessarily the language.
That's missing the point. No matter if it's Python 2, 3, 4 or 5. It's so easy to read Python, that you will understand the script without effort. And even if there's no interpreter around you can just port it in a few minutes to whatever language you're using now.
If in the meantime Bash has fallen out of favor, have fun identifying what all these cryptic commands do, without having an interpreter available to check your assumption of what it's doing.
Yes, granted, Bash is more likely to stay, since it has stayed for a long time now. But given a future where there's neither Bash nor Python anymore, I'd prefer porting the Python script.
The people that have yet to learn Python outnumber those that have at least 10 to 1, and hopefully even more. Those people should get to learn a more perfect language. They shouldn't need to know any history about its development for things to make sense and be predictable. It's a little selfish to complain just because you aren't the one reaping most of the benefits.
We are in the very infancy of computer programming. Computers will likely be around for centuries, maybe even thousands of years. The best time to remove historical baggage and get everyone to switch was 10 years ago, the second best is now. IPv6 shows that if you "treat users nicely" you might not live to see the results.
And obviously we benefit too because we get to use a better language.
There are huge number of python programs which will shortly got very difficult to run. Perl still supports v5, in most cases it is easy to run old Java, C, C++ and Haskell code (in my experience). Python is unique is working extremely hard to kill off an older version, going as far as threatening to sue anyone who tries to keep it running.
I don't believe it would take that much work to keep python 2 in maingence mode, and I also think if the PSF asked for a company to officially take over they would easily find one.
Author here. When I made the correction, I took a look at what `ldd /usr/local/bin/hugo` gave me, because `hugo`, the static website generator I use, was packaged as a binary. `ldd` returned `not a dynamic executable`. I was originally under the impression that `grep` was similar to `hugo` in that sense, and `ldd` changed my perception.
Not quite sure what the take-away is here, and whom he is talking to.
Good software is good. Well, uh, yeah. Anything else?
Write software that people actually want?
Well gosh darn, why didn't I think of that!
Also, GNU grep is not antifragile. I learned that when I upgraded my PCRE to a new major version and suddenly grep failed. grep is used by configure scripts, so without a working grep you can't compile a new grep. To me that is the very opposite of antifragility.
> Also, GNU grep is not antifragile. I learned that when I upgraded my PCRE to a new major version and suddenly grep failed. grep is used by configure scripts, so without a working grep you can't compile a new grep. To me that is the very opposite of antifragility.
Bootstrapping is a very common problem with basic tools.
Author here. I’m really curious about your upgrade path for PCRE. Do you have a gist or a runnable available? I don’t think I’ve ever encountered or heard of that happening, would be great to understand intuitively!
A broken or missing keyfob likely won't even impact a vehicle's trade-in value. The vehicle will go to auction, and it will be sold by an independent dealer with one set of keys, or they'll pick up a used keyfob on a secondary market for $50 and re-pair the set.
Software doesn't have to last 50 years, but if you want it to, first build hardware that'll last 50 years, and make it upgradeable. Otherwise you'll be trying to get an emulator to work to fake out the software to think it's still running the same platform. And that's to say nothing of networking and application standards changing, or, say, a lack of mirrors, and tons of runtime dependencies.
Linux-packaged software will survive, because it's based on a system of lots of mirrors. But most new software isn't packaged, it's just available as source modules on one or two sites, or in GitHub releases. Who's mirroring all that?
>I want to build something once, and ideally use it for the rest of my life (or maybe 20-50 years to start off) without having to worry about having to update something or other or risk losing something important down the line.
I can't be willing to build personal stuffs on unstable or proprietary grounds either.
To me it's like building your house on an iceberg or in a prison.
For server side software we already have some relatively safe grounds in that regard,
but for GUI I only know of frameworks with no clean API to abstract away their implementations,
and which come with much dependencies and idiosyncrasies that can be hard to implement efficiently
on top of other libraries. OpenGL is an API, but it's complex (designed for efficient 3D)
and doesn't cover everything (windowing, mouse/keyboard events, basic primitives, text, images, etc.).
Tk is great indeed. I actually started programming with it, and loved it (I don't remember encountering any glitch with it, things just worked as you say). But it has its limits.
I want something more powerful (in a typed programming language capable of efficient data manipulations, rather than a text-based scripting language) and flexible (a basic API with the least constraints, not a toolkit with fixed design and implementation choices), etc.
I've recently started adding TCL/Tk to my toolkit with Ashok Nadkarni's fantastic book. It truly has a ton of functionality across multiple platforms with excellent support for Linux and Windows. SQLite is awesome too. I really like building commands too.
Maybe. I can absolutely the protocol still living ... but perhaps the core code has been rewritten ... and even if it took 30 years and painful Python3 like evolution ... all clients running the version of core rewritten in "Rust '32" or something.
> One thing I know I can't do for my personal projects is constantly dedicate time to working on it after I've “shipped”.
Very appealing, but long-lasting software --- grep, SQL, java --- is maintained long after "shipped".
The thing that makes it easy for you to create better software, also makes it easy for someone else to do the same to you.
They can take your idea, with a slightly different perspective, that makes it better for more people (even if "worse" in terms of what you aimed at). Which is probably what you did to someone else in the first place.
Great article. It explains a lot of the psychological appeal of old school, Unix-y software. That sense that no one can take functionality from you, or from your own ability to extend what already exists.
My oldest code has been in use for about twenty years. Not because it's any good but because it does a job and the organization cannot be bothered to spend the minimal amount of money it would take to replace it with something better. I cringe every time I hear about this code.