Openrsync imported into the tree

(undeadly.org)

138 points | by protomyth 71 days ago

10 comments

  • geofft 71 days ago

    > The actual work of porting, however, is matching the security features provided by OpenBSD's pledge(2) and unveil(2). These are critical elements to the functionality of the system. Without them, your system accepts arbitrary data from the public network. ... rsync has specific running modes for the super-user. It also pumps arbitrary data from the network onto your file-system. Do you want that running without specific mitigation in place?

    This is a confusing claim. What exactly does "accepts arbitrary data from the public network" mean? (Most servers do that, they just choose not to process the data without additional validation.) And in what way is it critical to the functionality of the system?

    Is the claim that, after calling pledge() and unveil(), the openrsync process is happy to satisfy arbitrary read/write requests from the other side of the connection, and so without them it is insecure?

    Does openrsync view peer-induced memory corruption after pledge() or unveil() as a vulnerability? Or is the idea that the attacker can already "pump arbitrary data from the network onto your filesystem" and that the attacker gaining control flow is not a meaningful escalation of privileges?

    My impression is that pledge() and unveil() are hardening tools, intended to limit the damage from a process that has already gotten out of control (in the same way that e.g. running Apache as non-root does not mean that you're actively fine letting attackers run code as www-data). Is that impression wrong? Is openrsync using them for the basic functionality of making sure that a file is only being rsynced to the filename given on the command line?

    • anjbe 71 days ago

      I’m trying to understand your question.

      Typically, a process once compromised can do all sorts of things: touch files, access the network, execute programs, and so on. Among other things, OpenBSD’s security culture focuses on mitigating the damage done by compromised code through development practices such as privilege separation.

      Traditionally this was done by splitting functionality into multiple processes, each serving a specific purpose such as doing network communication or parsing configuration, and dropping privileges in any way possible such as chrooting and switching to a dedicated user. Thus the attack surface is reduced, and the potential damage done by a compromised (sub‐)process is reduced as well.

      pledge() and unveil() are the latest evolution in OpenBSD’s technique. pledge() whitelists syscalls, and unveil whitelists files that can be accessed.

      So your process reads this arbitrary data from the public network. You validate it through some function and pass the data on to the next stage of your program. But what if there’s a bug in your validator, and your process gets compromised?

      If your process hasn’t had its capabilities reduced, the attacker can do practically anything, especially if the process has superuser privileges.

      But if the program uses a multi‐process privilege‐separated architecture, your validation process can’t access the filesystem or the network and isn’t running as root. If it tries, the kernel will kill it for violating its pledge. All the compromised process can do is pass malicious data through whatever interface you’ve provided between your validator and filesystem processes, hopefully an interface that is simple, well‐defined, and well‐audited.

      What if your filesystem process gets compromised? With pledge() it can’t access the network or execute external code. With unveil(), even its file accesses are limited to the files whitelisted earlier in the program. It can’t read your SSH keys or delete your photos.

      Certainly, if the process can be compromised that’s a bug that needs to be fixed. But we see new bugs constantly in the software we use every day. It’s a safe bet to say we will encounter more. By using a secure architecture, the damage these bugs can cause is drastically reduced.

      There’s a really good description and demonstration of privilege separation in another project by Kristaps, acme-client (a Let’s Encrypt/certbot alternative): https://kristaps.bsd.lv/acme-client/

      Another such project is Google Chrome, which uses pledge() and unveil() on OpenBSD.

      • geofft 71 days ago

        My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea. That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

        I do expect this is structured as you describe - that it has a validator, and that it uses these kernel features as additional hardening if the validator has a bug. But I would not describe that as requiring pledge() / unveil() and certainly not requiring it for functionality. So I don't know what the author means.

        And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the worst the remote side could do is corrupt files but it could have just have sent different contents for the files in the first place. This seems unlikely to me, but I'm having trouble figuring out an alternate interpretation.

        • anjbe 71 days ago

          > My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea.

          Knowing Kristaps, he probably considers strong privsep and privdrop basic functionality. That is after all why he developed acme-client in the first place; he acknowledged at the time the plethora of “lightweight” certbot alternatives but was more concerned with security architecture.

          > That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

          Chrome uses different techniques depending on the platform. On OpenBSD it uses pledge() and unveil(), while on Linux it uses seccomp. Kristaps isn’t a fan of seccomp’s complexity, as he mentions in the readme: “Linux's security facilities are a mess, and will take an expert hand to properly secure.” He’s not suggesting it can’t be done, and the Google Chrome team in particular has the kind of expertise he’s talking about.

          For projects of less‐than‐Chrome scale, though, Kristaps feels that seccomp is too difficult: https://github.com/kristapsdz/acme-client-portable/blob/mast...

          > And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the remote side could just have sent different files.

          I don’t understand this interpretation. It’s not what I got from the readme at all. What kind of validation do you expect Kristaps to be overlooking?

      • aidenn0 71 days ago

        It is possible to read the readme[1] to imply that unveil is the only protection from escaping the root (e.g. with a ".." directory). The only way to know for sure is to dig through the code though.

        https://github.com/kristapsdz/openrsync

        • admax88q 71 days ago

          You're not wrong but the point being made is that wouldn't you want a tool which writes data from the network to disk to have those mitigations enabled?

        • hawski 71 days ago

          From a comment on the site: "(...) its (original rsync's) compressed manual page is almost as big as the compressed openrsync sources (...)"

          It's license (ISC ofc.) and size makes it great resource to study rsync. I would like to have Dropbox on my phone as legendary combination of rsync and cron. It may be nice to have a port to Java so it would work without JNI, but maybe that's only my fetish.

          • ComputerGuru 71 days ago

            I just want to point out that rsync is, in fact, no longer ISC licensed but rather GPL (v3, at that), which is likely a big part of the reason this new implementation even exists.

            • meruru 71 days ago

              rsync was never ISC licensed afaik. The parent is referring to openrsync's license.

              • tinus_hn 70 days ago

                Rsync was developed by the Samba people, it is under the same license (GPL).

          • accrual 71 days ago

            Very cool news. rsync(1) is one of the first things I install on a new OpenBSD instance.

            Tangentially related, I've been using Time Machine-like wrapper [0] around rsync(1) for a few years. It's very helpful for maintaining snapshots of my home directory.

            [0] https://blog.interlinked.org/tutorials/rsync_time_machine.ht...

          • amaccuish 71 days ago

            For those wondering what this is, see https://github.com/kristapsdz/openrsync

            • benatkin 71 days ago

              I'll try explaining it. It's a new implementation, from scratch (clean room) of rsync, which will become the new rsync in OpenBSD. The tree that it's been imported into is the openbsd cvs tree that contains openbsd, openssh, opencvs, and other major projects.

            • CaliforniaKarl 71 days ago

              I would not be surprised if, in a few years, this becomes one of the CLI tools installed on macOS, either as part of the default install or as part of the Xcode CLI tools.

              • gpvos 71 days ago

                Does macOS have any security features similar to pledge/unveil or any of the Linux hardening packages?

                • Fnoord 71 days ago

                  It has a port of PF.

                  • gpvos 71 days ago

                    I'm more interested to know about system call and filesystem access restrictors. I think pf is only a packet filter.

                    • Fnoord 70 days ago

                      There's SIP and Keychain, but it does not prevent say Safari from accessing Mail or user memory in general. If macOS becomes an iOS port (instead of iOS being the derivative work of the barely used UNIX system called macOS) perhaps we'd see some of the iOS specific hardening. AFAIK that kind of sandboxing does not exist in macOS. How difficult would it be to port something like pledge or unveil to macOS?

                • riffraff 71 days ago

                  Why? MacOS has a bunch of GPL stuff, such as bash, IIRC.

              • AdmiralAsshat 71 days ago

                Interesting. This is the first project I can think of where a clean-room implementation was done so that a project could use a less free license ("free" as defined by the FSF).

                Does anyone else know of instances where a company did a clean-room implementation of a previously FOSS tool so that they could make a paid/proprietary version? Usually it goes the other way.

                • joshklein 71 days ago

                  ISC is a more free license for its users. GPL protects theoretical future users of theoretical derivative software by restricting freedom for its users.

                  It's important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn't much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.

                  • m463 71 days ago

                    That is not precisely accurate.

                    The GPL does not place any restrictions on how software is used, so the (literal) users are not restricted.

                    It restricts how it is redistributed.

                    • joshklein 71 days ago

                      Apologies, I intended "user" in my comment to mean "a developer using the license". Thank you for clarifying.

                      • e12e 71 days ago

                        This is the core difference between gnu and bsd - guaranteeing freedom for all current and future users VS all current and future distributors (in particular, the bsd guarantees the right to fork and close - often seen as essential for commercial use in a new software or software+hardware appliance; while gnu attempts to guarantee that any downstream user will always have the four freedoms).

                        • TeMPOraL 71 days ago

                          It's so easy to forget that at the very end, there are people who are using software. Developers are middlemen for most code in the products they build (think dependencies). GPL cuts through that, and always has the end-user in mind.

                    • demoray 71 days ago

                      WSL. The implementation, lxcore.sys, is a clean room implementation of the Linux kernel ABI.

                      • protomyth 71 days ago

                        How is the ISC (version of BSD license used by OpenBSD) less free than the GPL3? This is very far from a "paid/proprietary" version.

                        • meruru 71 days ago

                          Using "less free" or "more free" in this context just leads to pointless semantic debates. What happened is that someone made a clean-room implementation of a copyleft program in order to have it available under a copyfree license. Both licenses are Free.

                          http://copyfree.org/policy/copyleft

                          • TeMPOraL 71 days ago

                            First time I see this, thanks. The website isn't explicit about this point, but from what I gather, "copyfree" isn't viral in the way GPL is. It seems to provide "Free as in Freedom", but unlike GPL, doesn't protect that freedom from being immediately taken away.

                          • jimktrains2 71 days ago

                            I think gp means that this code is allowed to be used in products that choose to limit the end users freedoms.

                            (I don't mean that as a plus or negative, but as just a statement on one of the largest philosophical differences between the bsd-style and gpl licenses: Who's freedoms are being protected? Those of the final end user or those of the developer?)

                            • mouldysammich 71 days ago

                              Its much closer to a proprietary version than a GPL version would or could be however.

                              • kevin_thibedeau 71 days ago

                                Proprietary-friendly is not less-free.

                                • StavrosK 71 days ago

                                  How not?

                                  • anjbe 71 days ago

                                    I want my code to be usable by anyone developing free software of their own. I want them to be able to integrate it, modify it, redistribute their modified copies, and more.

                                    The GPL, being long and complicated (over 5000 words, and that’s just the GPLv3!), and with the ideological restrictions built in, is incompatible with many widely used free licenses, not least previous versions of itself. In any situation where social or legal barriers prevent the target audience from switching to the specific version of the GPL in question, any code I release under it is unusable to them.

                                    Releasing my software under a simple, understandable, and permissive free license prevents this from ever happening.

                                    I dislike proprietary software. I don’t use it or create it, and advocate against it wherever I can.

                                    But given the choice between letting some Chinese featurephone developer use my code without “giving back,” and preventing swaths of the free software community I care about from using and improving my code for themselves, I will favor permissiveness every time.

                                    • GalacticDomin8r 71 days ago

                                      Because you are actually free to do what you want with it, instead of free to do what someone else wants you to do with it.

                                      • StavrosK 71 days ago

                                        Yes, you are free to close it up and sell it, but everyone else then isn't free to use your changes. "More friendly to proprietary purposes" is "less free".

                                        It's kind of like arguing that a country where anyone can steal from anyone else with impunity is more free. Not when you consider the rights of the person being stolen from.

                                        • derefr 71 days ago

                                          Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free. Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done” for all intents and purposes and there is no reason at all to fork, proprietary or otherwise. (This comes up in the context of pure algorithms code a lot.)

                                          Also, even if a project is copylefted, people can still just do... exactly what they did here. Which, while different in the weak sense of “avoiding copyright” or maybe “avoiding patents”, in the context of systems code like this almost always results in the same code on both sides anyway. If the choice is between either giving the proprietary developers your code to use, or making them re-implement exactly what you wrote without your copy for reference—with no option for “they don’t implement it at all”—then exactly what is the point of choosing the latter over the former?

                                          • StavrosK 71 days ago

                                            > Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

                                            We aren't talking about non-free, we're talking about less-free.

                                            > Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

                                            No. The fact that there can be forks of the project that aren't open is what means that the project itself is less free than a project where all forks must be open.

                                            > Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done”

                                            I don't consider this relevant to the argument at hand.

                                            > then exactly what is the point of choosing the latter over the former?

                                            Are you asking me what the point is of making something you don't want people to do hard for them vs making it easy?

                                          • adamrt 71 days ago

                                            That's not a good analogy. In this case, people aren't being stolen from, they are freely giving it away for someone else to do as they wish.

                                            Additionally, "theft" as you put it, in this case, doesn't affect the original property owner.

                                            • TeMPOraL 71 days ago

                                              Digital analogies to theft rarely are good, but this one is passable. The main point is that this license grants software freedoms, but then doesn't do anything to protect it - thus enabling middlemen (like most of us devs are, for most software we write!) to immediately strip those freedoms away.

                                              Proclamations of rights aren't really useful if they don't have means for enforcing those rights are not taken away.

                                              • StavrosK 70 days ago

                                                I wasn't using it as an example of being deprived of property, but as an example of how infringing on other people's freedoms leads a system to be less free than one that doesn't.

                                          • chriscappuccio 71 days ago

                                            You have the freedom to create your own proprietary derivative, a freedom you lose with the GPL version of rsync.

                                    • int_19h 71 days ago

                                      On a very high level, LLVM/Clang happened because Apple needed a clean-room implementation of GCC.

                                      • bsder 71 days ago

                                        And because the gcc code was an impenetrable mess--intentionally so in order to prevent people from making a non-GPL alternative.

                                        • yjftsjthsd-h 71 days ago

                                          Of gcc, or "a C compiler (with extensions as seen in the wild)"?

                                          • int_19h 71 days ago

                                            Well, Clang implemented gcc extensions long before it went for MSVC ones...

                                        • meruru 71 days ago

                                          The BSDs have a strong preference for copyfree licenses. They tolerate copyleft programs, but try to switch to copyfree when possible. See for instance GCC -> Clang/LLVM.

                                          • wmf 71 days ago

                                            Both the ASF and FSF have a variety of NIHed projects that appear to exist purely for license ideology reasons. The most famous that comes to mind is Apache Geronimo, a clone of JBoss that few people used but was bought by IBM for ~$120M IIRC.

                                            • mindslight 71 days ago

                                              https://en.wikipedia.org/wiki/Bionic_(software)

                                              (You know, since we're tossing grenades)

                                            • meruru 71 days ago

                                              I hope this ends up being a lot simpler and easier to understand than the original rsync. The rsync manpage is way too long.

                                              • gmueckl 71 days ago

                                                Rsunc solved a complex problem that comes in many nuanced variants. It may seem trivial at the outset, but it is actually not. So I don't think that rsync has many features that are somehow unnecessary or bloat.

                                              • m0nty 71 days ago

                                                > The rsync manpage is way too long

                                                I see its thoroughness as a feature, not a bug. It's very well written and I can just ignore the bits I'm not interested in. I wish more man pages were "too long" like this one is.

                                              • joppy 71 days ago

                                                What does a "clean-room implementation" mean?

                                                • Tor3 71 days ago

                                                  The first (well-known) 'clean-room' implementation was when Phoenix implemented an IBM PC-compatible BIOS by having one team studying the IBM source (which was available), then writing up a specification for how it worked, handing that specification over to somebody else (they were Phoenix' legal team, IIRC), which then handed the specs over to another team that had never seen the IBM source. They sat down in their "clean room" (b/c it wasn't tainted by actual IBM source) and implemented a BIOS from specs only. In that way Phoenix was protected from any claims of copyright infringement: Nothing was copied, and the people writing the code had never seen the original source.

                                                  In that particular case the specs were reverse-engineered from actual source, but that's not a necessary part of the process. It's more common to have one team study the protocol, data going over the wire, disassembling, etc, then use the knowledge gained to write specs, and then another team implements the equivalent functionality from specifications only.

                                                  • anjbe 71 days ago

                                                    Not derived from the existing code. The reason it’s mentioned is to assert that openrsync is not subject to the original rsync’s GPL.

                                                  • gerdesj 71 days ago

                                                    Is it any better than rsync?

                                                    • yjftsjthsd-h 71 days ago

                                                      It is better in some ways, and worse in some, both largely subjective. It has a different license, is smaller, less battle tested, from different developers, designed with different goals in mind.

                                                      It Depends™ on how you judge.

                                                      • rstuart4133 70 days ago

                                                        All openrsync implements is the equivalent of a fast "cp -a" across the network, plus it can also remove files if they don't exist. rsync does much more and over the years I've used most of it, so there is no way I would use openrsync. The upside is the manpage of openrsync isn't that much more complex than cp, which is a definite bonus if that's all you are doing.

                                                        The only thing I would change about rsync is it's default, which IMO should be to copy all meta data supported by both sides. Ie, the default should be to make the destination as similar to the source as possible. It's default is to only copy the data, and you must add options to say what else you want copied. To make matters worse you can just add every option because if you say you want to copy something not supported by one side of the other it errors out. I may have missed it as I am reading the man page source, but openrsync didn't seem to change that.

                                                        • kristapsdz 69 days ago

                                                          No. openrsync implements the rsync protocol. It doesn't have all of its options, but the protocol is what it is. Do you have any idea what you're talking about?

                                                      • theamk 71 days ago

                                                        It is interesting that "open" part of openrsync refers to license -- BSD, vs original rsync's GPL

                                                        It's not often I see "open" to mean "non-GPL" in software :)

                                                        • anjbe 71 days ago

                                                          Fun bit of history: the “Open” here comes from OpenBSD; but the “Open” in OpenBSD came from the development process, not the license.

                                                          Before Git and SVN, we had CVS, and to check out code from a CVS repository you needed to have an account on the CVS server. If you wanted to contribute but didn’t already have a developer account, you were limited to writing patches against release tarballs or whatever alternative method upstream supplied.

                                                          One of OpenBSD’s major projects in the mid 90s was creating anonymous CVS, where anyone could check out code without any account. This came from Theo’s experience after losing his NetBSD account, where he found himself unable to make meaningful contributions anymore without the ability to cvs checkout, cvs diff, etc. So when he started OpenBSD, he had in mind to open up the development process to everyone, account or not.

                                                          This is described in the commentary for the OpenBSD 6.1 release song: https://www.openbsd.org/lyrics.html#61

                                                          • zdw 71 days ago

                                                            rsync went GPLv3 a while ago, and many businesses don't trust some of the newer clauses that were added.

                                                            Similarly, the more strict strict BSD crowd has issues with the Apache2 license clauses regarding revocation - see here: https://www.openbsd.org/policy.html

                                                            • enneff 71 days ago

                                                              Stallman himself makes a big fuss about "Open Source" not being equivalent to "Free Software". See: https://www.gnu.org/philosophy/open-source-misses-the-point....

                                                              • protomyth 71 days ago

                                                                I get the feeling the "open" part was because they were hoping to get it included in OpenBSD like OpenSMTPD, etc.

                                                                • meruru 71 days ago

                                                                  Yeah, that was my assumption too. It's coming from the OpenBSD community, so openrsync it is.