Ask HN: Which Linux filesystem “produce” less wear and tear on SSD NAND?

Linux filesystems usually do lot of read and write on storage eg. hhd and ssd. On the other hand SSD are getting cheaper but give a less TBW per capacity. So which Linux FS is doing less writes (write amplification) on SSD? EXT3, EXT4, XFS, BTRFS, F2FS ...? Linux OS is for desktop use!

28 points | by programatico 1455 days ago

8 comments

  • rwha 1454 days ago
    I have two laptops with SSDs running only Linux, and they both have mostly been powered on for two years or more (XFS and BTRFS). Both are still operating normally and smartctl shows minimal wear.

    I would focus on mount options that limit writing (e.g., relatime/noatime) or putting ~/.cache on tmpfs.

    In my experience ~/.cache gets the most frequent writing during normal desktop usage. A lot of applications ignore XDG standards and create their own snowflake folder directly in $HOME. You might want to watch for and replace those making a lot of writes with a symlink to where they belong. (This quickly became a frustrating battle that I lost).

    • aruggirello 1454 days ago
      Totally agree about using relatime/noatime mount options and using tmpfs wherever it's sensible. But if you want to take control of your $HOME, it probably makes sense to mount it separately. Putting ~/.cache on tmpfs will lose thousands of useful cache files every time you reboot (hitting performance, but also causing more reads eg. to larger original files), possibly consume gigabytes of memory, and should IMHO be avoided. Getting paranoid about disk wear on a 2yo laptop is perhaps a bit exaggerated.

      Talking about desktop Linux, I (and my family) and workplace colleagues all run Kubuntu systems on SSD's continuously since 2012 - that's 8 years of continuous disk wear and counting. We're talking about 3 laptops and 5 desktops, several of them mdadm RAID0 (on 2-3 disks), all Ext4, and NO swap partitions (warning, this may lead to occasional crashes due to OOM situations, though that's going to improve soon hopefully). Three of these systems are heavy usage workstations, 5 of them have one or more VMs running, all of these systems are backed up to external USB2/3 disks via Timeshift [0] since 2012 (VM disk images are backed up separately). A few critical directories are shared via cloud, which thus also acts as a backup tool. All disks are periodically health checked via smartmontools. This in my experience maximizes performance (thanks to Ext4 and RAID0), while keeping stuff safe (thanks to backups). IMHO desktop systems don't need the kind of online redundancy provided by other RAID levels, and restoring a full system (including grub, on an mdadm system) from a Timeshift snapshot, which I did multiple times already, has always been a breeze.

      Among the ~10 SSD corruption events I witnessed in this decade, I could track ~half down to a failing PSU or a bad SATA cable, though I had a couple disks abruptly die too (one NVME M.2 drive probably had thermal issues). Still, IMHO none was caused by wear.

      relevant smartctl output for one of my desktop SSDs:

        9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       12726h
        241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       623690
        249 NAND_Writes_1GiB        0x0032   100   100   000    Old_age   Always       -       23897
      
      So, ~24 Tb written so far...

      [0] https://teejeetech.in/

      • Piskvorrr 1454 days ago
        If they're always-on, reboots are rare...I keep a similar workstation on ext4 over RAID-1 (once bitten by bad disks, twice shy; I seem to have been hit by an abnormal number of submarine drives back in the day).
    • znpy 1454 days ago
      Funny story: almost three years ago I got a second hand thinkPad T440. It came with a 240GB ssd. A second hand laptop with an used ssd inside.

      At the time I didn't have much money so I just kept the drive and decide to possibly face the issue.

      I still use it, of course (it's my personal laptop).

      Some days ago I switched to root and noticed a dead.letter file in root's home directory.

      I examined the file and it was an email dropped into the home directory by smartd, alerting me that my hard drive was near failure.

      The funny thing is, that email was from February 2018. More than two years since that email, that drive is still working.

      • interfixus 1454 days ago
        Well, well, I guess we are a movement then. I got a secondhand T440p three years ago, with a 240GB SSD. This machine is indestructible - it had already seen heavy use in an engineering company, and I really haven't spared it either. Running various permutations of Arch and Xfce all the way through. Last month, I finally did upgrade the SSD to a brand new Samsung EVO 1TB. The old unit (also a Samsung EVO, as it turned out) is doing just fine, though, but now running in a non-critical convenience setup.

        I can't really sing enough praises for this ancient ThinkPad, now also with a new very cheap, ridiculously superb display and stuffed full of RAM. I have a newer, sleeker, faster Lenovo Something sitting around, but somehow the 440 is what I always end up using.

        • copperx 1453 days ago
          What is the resolution of the display?
          • interfixus 1452 days ago
            1920x1080, high as the T440p will go. At around USD 70 on eBay, I am blown backwards by the quality.
      • Piskvorrr 1454 days ago
        In my experience, SMART alerts you "that drive you can't access is probably failed"; rarely do you get an actual early warning.
  • tyingq 1454 days ago
    I think you would get more mileage out of tracking where all the writes are, and making whatever changes are needed to reduce them.

    Auditd, can, for example, track every write. Track it over a good sample period of typical use, then make whatever changes are needed. Might be database tuning, moving specific files to tmpfs, changing the way you do backups, reducing writes to syslog, changing fs mount options, etc.

    Auditd is a little complex, but it's fairly easy to find write-ups on how to monitor writes and generate use reports.

    • zzzcpan 1454 days ago
      You can just use the find command for tracking writes, for example:

        find / -type f -printf '%TY-%Tm-%Td %TT %p\n' | sort
      
      (you probably want to add -mount in there too and use find per each mount point separately)
      • tyingq 1454 days ago
        Auditd would show frequency, the process name and uid that did the write, and when, etc. Last modified time isn't always enough.
    • cmurf 1454 days ago
  • moviuro 1455 days ago
    Rule of thumb: your preferred FS will be OK. Limiting write is not a goal in and of itself.

    See https://wiki.archlinux.org/index.php/Improving_performance#R...

    • Piskvorrr 1454 days ago
      True for SSDs; if you're using COTS SD cards (think RPi), it might be still a concern.

      I managed to get plenty of useful life from an old, failing SD card by only loading (R/O) the bootloader, kernel and initramfs, and never touching it beyond initramfs (mount a USB drive as /). Similarly for an ancient PATA (!) SSD in a netbook: grub+kernel+initramfs, and nothing else; saves me from a USB stick out the side.

  • loser777 1455 days ago
    Is there likely to be any meaningful difference? With most worthwhile SSDs incorporating a sizable DRAM cache and OS file system caching on top, would day-to-day journaling and other overhead be expected to make a dent in SSD longevity?
  • cmurf 1454 days ago
    Ext3, ext4, and XFS have a journal that's constantly being overwritten. On Btrfs, the file system is the journal. It does have a bit of a wandering trees problem [1], where F2FS expressly intends to reduce the wandering tree problem.[2] There's also different approaches that don't involve filesystems [3].

    But I think you have to assess the crash resistance and repairability of filesystems, not just worry about write amplification. I think there's too much made about SSD wear. The exception are the consumer class of SD Card and USB flash, those are junk to depend on for persistent usage, best suited for occasional use, and all eventually fail. If you're using such flash, e.g. in an embedded device, you probably want to go with industrial quality flash to substantially improve reliability.

    Consider putting swap on zram [4] or using zswap [5]. I've used both, typically with a small pool less than 1/2 of RAM. I have no metric for clearly deciding a winner, either is an improvement over conventional swap. Perhaps hypothetically zswap should be better because it's explicitly designed for this use case; where zram is a compressed RAM disk on which you could put anything, including swap. But in practice, I can't tell a difference performance wise.

    [1] https://arxiv.org/abs/1707.08514

    [2] https://lwn.net/Articles/520829/

    [3] https://www.usenix.org/conference/fast13/technical-sessions/...

    [4] https://www.kernel.org/doc/Documentation/blockdev/zram.txt https://github.com/systemd/zram-generator

    [5] https://www.kernel.org/doc/Documentation/vm/zswap.txt

  • kasabali 1444 days ago
    Have a look at this paper [0]

    While there are big gaps of write amplification for metadata writes, on macro benchmarks all filesystems have similar results.

    btrfs has the biggest WAF, but you can enable compression globally and I suspect that difference alone will make it come ahead of others.

    [0] Analyzing IO Amplification in Linux File Systems https://arxiv.org/abs/1707.08514

  • victorMLL 1454 days ago
    Ignore this comment
  • designclub2477 1454 days ago
    Now a days China is working on Robots served for Humanity and Madical some of them are work online commands. Chaina is useing Internet Supply from local areas but to check the accuracy they are using https://www.speedput.com/ ptcl speed test and 100% of accurate result