The File Filesystem (2021)

(mgree.github.io)

346 points | by wegwerff 21 days ago

23 comments

  • PMunch 20 days ago
    Oh this is cool! I recently wrapped libfuse in Nim and after porting the 'hello' filesystem example I made one which is more or less exactly this. However my version you pipe data and have to provide a mountpoint, then when it's done it writes the result over stdout. That means you can inline it in a pipe chain but also that you have to make sure to grab the output.

    At the moment I'm exploring other stuff which could be made into file systems. I've got a statusbar thing for the Nimdow window manager which allows you to write contents to individual files and it creates a bar with blocks on them as the output. It makes it super easy to swap out what is on your bar which is pretty neat.

    Another tool I've made is a music player. It uses libvlc and when given a folder it reads all the media with ID3 tags and sets up folders like 'by-artist', 'by-album', etc. Each file is named as '<track number> - <song title>' and contains the full path to the actual file. To play a song you cat one of these files into 'control/current' and write the word play to 'control/command'. There's a bit more to it like that like a playlist feature and some more commands, but that's the basic idea. The goal is to have a super-scriptable music player.

    • pedrovhb 20 days ago
      Here's an idea: recursively mount code files/projects. Use something like tree-sitter to extract class and function definitions and make each into a "file" within the directory representing the actual file. Need to get an idea for how a codebase is structured? Just `tree` it :)

      Getting deeper into the rabbit hole, maybe imports could be resolved into symlinks and such. Plenty of interesting possibilities!

    • fishyjoe 20 days ago
      Would you mind sharing the Nim code? I've been interested in working with FUSE for a while, and use Nim for a few projects.

      No worries if not, I'm just curious!

      • JNRowe 19 days ago
        Not PMunch, but bindings¹ and statusbar².

        Nimble has a couple of fuse projects and wrappers registered too.

        ¹ https://github.com/PMunch/libfuse-nim

        ² https://github.com/PMunch/statusbar

        • PMunch 19 days ago
          The audio player is unfortunately not on GitHub yet, I've still got a few kinks to work out before it's in a shareable state. The statusbar project was also shared mostly so the other Nimdow users could play around with it, so the code quality is quite sub-par.
      • PMunch 19 days ago
        JNRowe beat me to it! Feel free to browse the rest of my GitHub, its mostly Nim code. And if you want to talk Nim or Fuse you can join the Nim Discord server (or the Matrix or IRC bridge) or post on the Nim forum.
    • lambdaxyzw 20 days ago
      This makes me think, it would be nice if there was an easy built-in way to expose information about a process using the filesystem. Something like "cat /proc/$pid/fs/current_track" to get a name of a current song from a music player, or "ls /proc/$pid/fs/tabs" to list open tabs in my browser (and maybe use this to grab the html or embedded images).

      I mean right now it's possible to do this using FUSE, but that's convoluted and nobody does it.

      • PMunch 19 days ago
        Can you actually write to the /proc directory? Working with libfuse I've had much of the same ideas, basically allow programs to expose information as files. The beauty of the fuse system is also that you only have to respond to requests when they happen. So you don't have to actually create all these files until someone asks for them. Another idea I've had is to expose the configuration of a program through a filesystem. Instead of having a config file and a refresh command or a complicated IPC you could simply write to files to change the config.
  • RetroTechie 21 days ago
    Useful enough that it should be an OS-level standard feature, imho.

    Unix-like OSes allow mounting disk images to explore their contents. But there's many more file formats where exploring files-inside-files is useful. Compressed archives, for one. Some file managers support those, but (imho) application-level is not the optimal layer to put this functionality.

    Could be implemented with a kind of driver-per-filetype.

    • duped 21 days ago
      Really what you'd like to see is a way to write the mount command for each file type (do one thing well) and another command to detect the file type and dispatch accordingly (probably similar to the `file` command), all in user space.

      The only thing standing in the way of this today is that MacOS doesn't expose a user space file system API. You can do this on Linux, Windows, and BSDs today.

      (No, file provider extensions don't cut it, Apple devs who read this, please give us a FUSE equivalent, we know it exists).

      • stuaxo 20 days ago
        I want stuff to work like ZipMagic did in the early 90s/ early 2000s.

        You could cd into zip files, they would act as directories and files at the same time.

        I seem to remember Linus saying a file could act like a directory in Linux a long time ago too.

        Though I don't think Linux has filters for the filesystem like Windows does so implementation might be more tricky.

      • Groxx 21 days ago
        Does https://osxfuse.github.io/ cover this? Or is there some fundamental issue? (beyond "it's not built in")
        • skissane 21 days ago
          Recent macOS versions do have a general purpose built-in API for user mode filesystems. That API is incompatible with FUSE. The big problem is it is undocumented and you need an entitlement from Apple to use it, and Apple won’t give you that.

          Apple do have a publicly available API for cloud file systems (Dropbox-style products), but it makes a lot of assumptions which makes it effectively unusable for other use cases.

          Then there are third party solutions like osxfuse. These have the problem that they rely on kernel extensions and Apple keeps on making those harder and harder, and is aiming to get rid of them; plus, they are all now proprietary licensed, albeit often with a free license for open source use.

          One approach that does work without any kernel extensions or private APIs is to make your user filesystem an NFS server and then mount that. One competitor to osxfuse does that, but it also is proprietary

          • ranger_danger 20 days ago
            According to the MacFUSE author, their specific approach is not actually undocumented:

            > Apple has put it in an umbrella called "unsupported" (in the kernel interfaces section) ... either Apple will not take this interface away, and if they do, it will be to provide a better interface

            http://preserve.mactech.com/articles/mactech/Vol.23/23.03/Ma...

            • skissane 20 days ago
              Yes, that’s not what I’m talking about though

              MacFUSE uses the kernel mode VFS API

              I’m talking about the undocumented user mode filesystem LiveFS/UserFS/com.apple.filesystems.lifs API which was added in Monterey (macOS 12), and since Ventura (macOS 13) is used to implement the OOTB FAT and exFAT filesystem support. Using that requires private entitlements (e.g. com.apple.private.LiveFS.connection) which Apple (thus far) won’t give to anyone else

              • ranger_danger 20 days ago
                What about the File Provider API?
                • skissane 20 days ago
                  It is designed for the cloud storage use case (Dropbox, Google Drive, etc) - it creates local copies of files and synchs them with remote ones. Not what you want to do in the general case.
                  • skissane 20 days ago
                    I take it from the downvotes people think what I said is factually wrong?

                    If that's the case, I wish someone would point out where I've got it wrong instead of just silently downvoting

          • w10-1 20 days ago
            > it makes a lot of assumptions which makes it effectively unusable for other use cases

            What use-cases are foreclosed and why?

        • duped 21 days ago
          Well that requires a kext so it's a nonstarter, and fuse-t uses NFS which is extremely janky and unreliable on MacOS.

          The fundamental issue is that macOS doesn't provide an API for this natively.

          • skissane 20 days ago
            > fuse-t uses NFS which is extremely janky and unreliable on MacOS

            I was wondering what issues you were talking about, and then I found this - https://github.com/macos-fuse-t/fuse-t/issues/45 - data corruption

            > The fundamental issue is that macOS doesn't provide an API for this natively.

            The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it. I don’t understand why Apple won’t.

            Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

            • duped 20 days ago
              > The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it.

              If no one can call it it's not an API, it's an implementation detail. And I don't even think its exposed by headers, just alluded to by people who claim APFS is implemented in user space.

              > I was wondering what issues you were talking about, and then I found this

              Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request. That's unacceptable for a user space file system (although FUSE is only kinda better, in that it can force processes that read from the FS into uninterruptable sleep that prevents them from being killed).

              > Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

              Because Apple doesn't give a fuck about developers. Every developer will eventually learn this, but for those that haven't - Apple doesn't want you writing software for their platform, unless you're an Apple employee and on an Apple team paid to do it. It's why their docs suck, it's why to learn anything you need to watch ADC videos instead of read manpages, and it's why all the cool stuff is behind protected entitlements that you can't get or will be limited in using.

              • mike_hearn 20 days ago
                No, it's almost certainly not because they don't give a fuck about developers. They definitely do.

                It's much more likely that they want to:

                a. Dogfood the API using internal use cases first when they can still make changes to the API without breaking anything. Note that the latest MacOS releases moved some filesystems into userspace using this new API. They probably learned some stuff by doing that.

                b. Work out how to protect system stability from crappy userland filesystems. As you point out, bugs in FUSE providers can hang apps.

                c. Work out how such an API interacts with their sandboxing system and how to avoid FUSE-style filesystems being used to subvert the sandbox. This is a common source of exploits in FUSE-style systems and is one of the key learnings from GNU/Hurd: UNIX software is written on the assumption that filing systems aren't malicious and invalidating that assumption creates new bug classes.

                d. Work out what the most important use cases are and try to ensure those use cases will have a good or at least uniform UX first.

                Providing a FUSE-like API is presumably also just not a high priority. By far the most common use case in terms of number of users is the Dropbox use case. FUSE is mostly used for toys and experiments beyond that (like filefs). Those matter and I'm sure there are friendly geeks on the Darwin team who'd like to enable those, but Linux also works for exploration. Certainly Apple management would not be happy about an engineer who decided to enable nerd experimentation but undermined the security system whilst doing so.

                And it's worth remembering that you can have root on macOS. It means disabling SIP and adding a kernel boot arg, but that only takes a few minutes and then you can grant apps any entitlements you like:

                https://github.com/osy/AMFIExemption

                That's no good for people who aren't developers, but most FUSE filesystems are designed for developers anyway.

              • skissane 20 days ago
                > Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request.

                I wonder if their SMB/CIFS client implementation has these kinds of issues? It probably gets used more heavily

                > And I don't even think its exposed by headers

                Apple (accidentally?) released some of the private headers for this feature in one of their open source releases: https://github.com/apple-oss-distributions/msdosfs/blob/rel/...

                • duped 20 days ago
                  Maybe? It's kind of hard to tell. It's not exactly easy to write any of these servers from scratch to find out. But I wouldn't be surprised - they want app developers to be using the file provider extension API, which is unsuitable for everyone who isn't making a Dropbox clone.

                  That link is very interesting. It doesn't smell like any other Apple API as they're exposing a vtable with good documentation comments. It would be interesting to hack with this with SIP disabled to see how it works. I'm especially curious about how mount/unmount work and how the plugin registers itself with the OS, or what application is the client/host.

            • nine_k 20 days ago
              > why don't they

              It would make macOS more of a general-purpose OS, would increase the amount of functionality from which third parties would benefit, but Apple themselves would likely not. That would increase the number and variety of tech support requests, ever so slightly but still, and would introduce a few new attack surfaces.

              Instead, Apple's strategy is to tighten the macOS more and more, and turn it into a specialist OS completely controlled by Apple, with a few companies like Adobe and Ableton licensing access to its internals.

              • samatman 20 days ago
                I've been using OSX since 2003, and developing on it for more than ten years. At no point have I seen anything that it's reasonable to call "tightening macOS", let alone the absurd claim of complete control except for an inner circle of elite companies.

                The closest thing would be adding the attestation system, so that unsigned binaries have to be explicitly given permission to run... once. That's a security feature which trades a bit of convenience for a lot of protection, especially for the average user. I have no problem with that sort of thing.

                I see this sort of sentiment very frequently from non-users of the operating system, but never from those of us who actually use it. Go figure.

              • skissane 20 days ago
                Apple used to be a lot more developer-friendly company. It is part of what got them where they are now - the fact that so many developers use Macs, which in turn encourages business software vendors to support Macs

                Stuff like this is of little interest to ordinary users (at least not directly), but appeals to developers

                By de-emphasising the developer is experience, they are undermining one of the factors that got them to where they are today

    • Sophira 20 days ago
      This already exists - avfs[0] does this as a FUSE filesystem. It's not the most intuitive to use, but it works, and is extensible.

      [0] https://avf.sourceforge.net/

    • lmm 20 days ago
      This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it, probably not helped by the whole murdering-his-wife thing.
      • skissane 20 days ago
        > This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it

        IIRC, because it contained these strange beasts which functioned as both files and directories - i.e. cat would return data, but then you could cd into them and run ls. Linus (among others) didn’t want to permit those violations of the file-directory dichotomy into the Linux kernel.

        • stuaxo 20 days ago
          Oh that's funny- I remember a much much earlier Linus mentioning how this would he possible in Linux, I didn't know anyone actually did it.

          I think you really should be able to "cd" into any kind of structured data.

    • frizlab 21 days ago
      Honest question: How is this useful? I don’t see any use-case where this would come in handy.
      • RetroTechie 20 days ago
        It allows you to use any tool available for regular files, on the files-in-files as well.

        As opposed to extract contents and then work on that (requiring extra steps + disk space). Or be limited to what specialized utilities support.

      • crabbone 20 days ago
        We used this in Gitlab CI. Unfortunately, the only way they deal with artifacts is by putting them in Zip files. Cache between builds would thus be stored as a Zip file. However, fully extracting it before each build would sometimes take as much, if not more time than to just build fresh. Mounting a Zip file as a filesystem allows extracting entries on-demand, at the time a file access would've been made. This was a notable speedup in our compilation process.
        • pizzafeelsright 19 days ago
          tar is what you're looking for, no?
          • crabbone 18 days ago
            It was a while ago, and I haven't used Gitlab in a few years. Maybe they've added TAR as an option since, but Zip was the only option at that time.
      • w10-1 20 days ago
        It reduces the code required to convert from N-producers to M-consumer from N x M to N+M because they're reading from and to a well-understood common form.
      • TimeBearingDown 21 days ago
        You could seed compressed archives of massive text files or similar via BitTorrent while making the contents available to your apps in read-only mode.
      • russfink 20 days ago
        Exporting data to some format would be easy.
    • crabbone 20 days ago
      I thought archivemount already did that. Am I missing something?

      Anyway, even if that's not what you are looking for, FUSE is a more general mechanism that will allow you to do what you want (well, it seems like, at least) and much more.

    • jraph 21 days ago
      It exists :-)

      For zip archives, there are fuse-zip and mount-zip which are FUSE filesystem.

      As an intermediate between OS level and application-level, there are desktop environment level: gvfs for GNOME and KIO for KDE, but they are compatible only in their own ecosystems.

      • lambdaxyzw 20 days ago
        Would be nice to have something that integrates with 7z - it supports a lot of weird archive types, including "weird" ones I care about (for example PE files, better known as ".exe files").
        • russfink 20 days ago
          Or zstd. I have some dd blobs of partitions, the blobs are zstandard-compressed, would like to mount them.
          • mxmlnkn 18 days ago
            Ratarmount also works for that. However, it, and any other tool I know of, works only well if it was compressed with pzstd because of a limitation of the zstd format. It needs separate zstd frames for fast seeking.
      • ramses0 20 days ago
        ratarmount for tar files.
    • xk3 20 days ago
      > Compressed archives, for one

      You can look inside of archives pretty easily with `lsar` (part of the unar package). It works with disk images like ISO 9660 files too

      But yes, especially for nested archives, having deeper OS support would be nice.

    • kybernetikos 21 days ago
      It's not exactly the same, but nushell provides ways of exploring inside files.
  • jasonpeacock 21 days ago
    • amiga386 21 days ago
      All you need now is a giant pile of rules for which revisions to select and you have the unholy demon that is Rational ClearCase
      • skissane 20 days ago
        I always thought Oracle ADE was a cooler demon. Shame the internal talk about productising it never went anywhere.
      • TheGlav 21 days ago
        What!? you didn't add a versioned database layer on a server with code stored in clearcase that stored those ClearCase config specs to manage the configuration of your config specs to manage the configuration of your version control system that had your application configuration in it?! How did you even operate? /s
  • paulgb 21 days ago
    This is really neat, but when I saw the headline I got excited that it was something I have been looking for / considering writing, and I figure the comments here would be a good place to ask if something like this exists:

    Is there a FUSE filesystem that runs in-memory (like tmpfs) while mounted, and then when dismounted it serializes to a single file on disk? The closest I can find are FUSE drivers that mount archive files, but then you don't get things like symlinks.

    • speps 21 days ago
      Closest I found: https://github.com/guardianproject/libsqlfs

      > The libsqlfs library implements a POSIX style file system on top of an SQLite database. It allows applications to have access to a full read/write file system in a single file, complete with its own file hierarchy and name space. This is useful for applications which needs structured storage, such as embedding documents within documents, or management of configuration data or preferences. Libsqlfs can be used as an shared library, or it can be built as a FUSE (Linux File System in User Space) module to allow a libsqlfs database to be accessed via OS level file system interfaces by normal applications.

    • Scaevolus 21 days ago
      Not purely in-memory, but something like https://github.com/jrwwallis/qcow2fuse maybe? It's clunky compared to OSX's DMGs, but if you squint it achieves similar ends.

      Otherwise you could achieve this with a tmpfs wrapped to serialize to a tarball (preserving symlinks) when unmounted.

      • ranger_danger 21 days ago
        Oh nice, I didn't even know that existed. I've been using qemu-nbd and parted by hand and it gets cumbersome, so this might help a lot. Thanks!
    • hnlmorg 20 days ago
      Why does it have to be in memory?

      I’m sure you’re already aware of this, but there are all kinds of very real scenarios that could lead to corrupted data if you’re only flushing the buffer upon unmounting.

      Sounds like you’ve got an interesting problem you’re trying to solve though.

    • khc 21 days ago
      does it have to be fuse? cant you mount a disk image with loopback
    • ranger_danger 21 days ago
      I can't think of anything _exactly_ like that, but I think you can get close by just copying some type of image file to /tmp and then moving it to disk when you're done after unmounting.
      • AgentME 20 days ago
        /tmp isn't stored in memory; it's usually a normal on-disk filesystem that's cleared regularly. You want /dev/shm instead, which is a purely in-memory filesystem on normal Linux systems.
        • codetrotter 20 days ago
          > /tmp isn't stored in memory

          It is if your system uses tmpfs for /tmp

          https://en.wikipedia.org/wiki/Tmpfs

          • throwway120385 20 days ago
            The point they were trying to make is that it doesn't have to be, and it isn't in several of the Linux systems I've used over the years. Assuming that it is is a bad idea.
            • arjvik 20 days ago
              /dev/shm always is though
  • compressedgas 21 days ago
  • alephaleph 21 days ago
    Reminds me of Omar Rizwan's TabFS <https://omar.website/tabfs/>
  • sambeau 21 days ago
    Ha. I did this back in 2003. It's surprisingly fast, and makes it simple to do granular locking.

    I used it as a per-user database for a web-templating language for a giant web-site building tool.

  • purple-leafy 20 days ago
    When I saw the title I thought it was a meme.

    But wow what a clever idea. Not sure id ever need to reach for it personally as I do most data processing in a higher level language, but I can imagine people can find use cases.

    Nice out of the box thinking

  • freeney 21 days ago
    This looks awesome, I need to give it a try asap. I can very well see myself using this to navigate or search inside JSON files
  • Edmond 21 days ago
    If you're intrigued by this then Solvent-Configr might be of interest: https://aws.amazon.com/marketplace/pp/prodview-i3ym46leenag4

    It uses file system mechanics to model objects, meaning you can design object based solutions that support file system style navigation.

    Demo: https://youtu.be/XgTgubZQPHw

  • qazxcvbnm 20 days ago
    What happens if your JSON key has a slash?
  • timrobinson333 20 days ago
    It's an interesting idea but I think the usefulness would be greatly enhanced if it could handle json arrays; most needed json structures contain array elements in my experience
  • planede 20 days ago
    Hmm, this opens the possibility to also commit these files as directory structures. I wonder how this would affect merges and conflicts.
  • chuckadams 21 days ago
    Neat. Now how about a filesystem that takes a directory of files and exposes it as a single json file? You could call it the Filesystem File, and mount it in the File Filesystem if you wanted...
  • sigmonsays 19 days ago
    this is cool but it's fuse.. which is not so cool.

    these days i reach for jq.. I've recently became interested in duckdb too.

    Using a tool that is specialized for the format is usually more ideal than a generic one treating everything as files.

    There is a lot in JSON that can't be represented in a flexible way as files and directories. For instance, what if a key has "/" in it.. What happens to lists and their order when you re-serialize How are hashes represented.. how can you tell if a parent is a object or a list.. inserting a item into a list is a ton of error prone renames.. the list goes on

    (edited for formatting)

  • willlma 19 days ago
    This reminds me of [pry](https://pry.github.io/) for Ruby that allows you to cd into other scopes
  • isoprophlex 21 days ago
    For fuck's sake! Not everything needs to be a file!

    (This is a joke. I love the idea and execution.)

    • vidarh 21 days ago
      Daniel Stenberg, of cURL fame, co-write an Amiga-editor called FrexxEd where the open buffers were exposed as files in the filesystem.

      Meaning you could write any shell script to manipulate an open buffer (not that important as it also exposed all editor functionality both via IPC via Arexx and via FPL - a C-like scripting language), and that you could e.g. compile without saving (that was very helpful on a system where a lot of people might only even have a single floppy drive, and where being able to have the compiler in the drive and compile straight from the in-memory version in RAM so you didn't have to keep swapping floppies was highly useful (just remember to save before actually trying to run the program - no memory protection...)

      • salgernon 21 days ago
        Classic MacOS in the 80s had "MPW" macintosh programmers workshop - that treated open text windows as files and selections within the windows as files, so it wasn't uncommon to have a portion of an otherwise documentation file have a "click here and hit enter", which would use the selected text as stdin for some semi-ported unix tool. (no memory protection or multitasking, so true pipelines with backpressure didn't work)
        • bombcar 20 days ago
          BBEdit has something akin - you can select text and run a Unix command on the selection via temp files. Very useful
      • fsckboy 20 days ago
        I want a webbrowser that does that, lets me shell-cd into each tab as a directory
    • mypalmike 21 days ago
      I see what you did there.
  • secwang 20 days ago
    remind me of djb's envdir
  • agumonkey 21 days ago
    All I see is a generic tree walk mechanism, here implemented in folders/files, but in plan9 it's .. plumber ?
  • ThinkBeat 21 days ago
    No XML, Excel,PDF or CSV support yet.
    • IlliOnato 20 days ago
      I guess support for XML would be tricky, because XML is just way more complex format than the ones already supported. It is still essentially a tree, but with additional structure.

      Representing elements and their contents is easy enough. But attributes, comments, processing instructions, entities... And remember, an XML document can include a DTD (it does not have to be in a separate file).

      To present it as a file system in a useful, non-convoluted way? I will be very, very interested if it's possible, but not holding my breath.

      • eyelidlessness 20 days ago
        On the one hand, I can’t help but point out you forgot to mention the other big inherent complexity that would make XML-as-FS a uniquely complex beast: namespaces.

        On the other hand, I can’t help but point out that a related technology comes very close to demonstrating how you might map XML to a file system: XPath. Probably the biggest issue would be syntax, and again largely due to namespaces.

  • waldrews 20 days ago
    Could this be used in Windows by exposing a Samba file share from WSL or Docker?
  • yjftsjthsd-h 21 days ago
    I would gently suggest naming it filefs or something; ffs already means https://man.freebsd.org/cgi/man.cgi?ffs(7)

    That said - good idea/approach; seems like an excellent way to cleanly extend the unix approach to structured file formats:)