> The actual work of porting, however, is matching the security features provided by OpenBSD's pledge(2) and unveil(2). These are critical elements to the functionality of the system. Without them, your system accepts arbitrary data from the public network. ... rsync has specific running modes for the super-user. It also pumps arbitrary data from the network onto your file-system. Do you want that running without specific mitigation in place?
This is a confusing claim. What exactly does "accepts arbitrary data from the public network" mean? (Most servers do that, they just choose not to process the data without additional validation.) And in what way is it critical to the functionality of the system?
Is the claim that, after calling pledge() and unveil(), the openrsync process is happy to satisfy arbitrary read/write requests from the other side of the connection, and so without them it is insecure?
Does openrsync view peer-induced memory corruption after pledge() or unveil() as a vulnerability? Or is the idea that the attacker can already "pump arbitrary data from the network onto your filesystem" and that the attacker gaining control flow is not a meaningful escalation of privileges?
My impression is that pledge() and unveil() are hardening tools, intended to limit the damage from a process that has already gotten out of control (in the same way that e.g. running Apache as non-root does not mean that you're actively fine letting attackers run code as www-data). Is that impression wrong? Is openrsync using them for the basic functionality of making sure that a file is only being rsynced to the filename given on the command line?
Typically, a process once compromised can do all sorts of things: touch files, access the network, execute programs, and so on. Among other things, OpenBSD’s security culture focuses on mitigating the damage done by compromised code through development practices such as privilege separation.
Traditionally this was done by splitting functionality into multiple processes, each serving a specific purpose such as doing network communication or parsing configuration, and dropping privileges in any way possible such as chrooting and switching to a dedicated user. Thus the attack surface is reduced, and the potential damage done by a compromised (sub‐)process is reduced as well.
pledge() and unveil() are the latest evolution in OpenBSD’s technique. pledge() whitelists syscalls, and unveil whitelists files that can be accessed.
So your process reads this arbitrary data from the public network. You validate it through some function and pass the data on to the next stage of your program. But what if there’s a bug in your validator, and your process gets compromised?
If your process hasn’t had its capabilities reduced, the attacker can do practically anything, especially if the process has superuser privileges.
But if the program uses a multi‐process privilege‐separated architecture, your validation process can’t access the filesystem or the network and isn’t running as root. If it tries, the kernel will kill it for violating its pledge. All the compromised process can do is pass malicious data through whatever interface you’ve provided between your validator and filesystem processes, hopefully an interface that is simple, well‐defined, and well‐audited.
What if your filesystem process gets compromised? With pledge() it can’t access the network or execute external code. With unveil(), even its file accesses are limited to the files whitelisted earlier in the program. It can’t read your SSH keys or delete your photos.
Certainly, if the process can be compromised that’s a bug that needs to be fixed. But we see new bugs constantly in the software we use every day. It’s a safe bet to say we will encounter more. By using a secure architecture, the damage these bugs can cause is drastically reduced.
There’s a really good description and demonstration of privilege separation in another project by Kristaps, acme-client (a Let’s Encrypt/certbot alternative): https://kristaps.bsd.lv/acme-client/
Another such project is Google Chrome, which uses pledge() and unveil() on OpenBSD.
My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea. That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.
I do expect this is structured as you describe - that it has a validator, and that it uses these kernel features as additional hardening if the validator has a bug. But I would not describe that as requiring pledge() / unveil() and certainly not requiring it for functionality. So I don't know what the author means.
And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the worst the remote side could do is corrupt files but it could have just have sent different contents for the files in the first place. This seems unlikely to me, but I'm having trouble figuring out an alternate interpretation.
> My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea.
Knowing Kristaps, he probably considers strong privsep and privdrop basic functionality. That is after all why he developed acme-client in the first place; he acknowledged at the time the plethora of “lightweight” certbot alternatives but was more concerned with security architecture.
> That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.
Chrome uses different techniques depending on the platform. On OpenBSD it uses pledge() and unveil(), while on Linux it uses seccomp. Kristaps isn’t a fan of seccomp’s complexity, as he mentions in the readme: “Linux's security facilities are a mess, and will take an expert hand to properly secure.” He’s not suggesting it can’t be done, and the Google Chrome team in particular has the kind of expertise he’s talking about.
> And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the remote side could just have sent different files.
I don’t understand this interpretation. It’s not what I got from the readme at all. What kind of validation do you expect Kristaps to be overlooking?
From a comment on the site: "(...) its (original rsync's) compressed manual page is almost as big as the compressed openrsync sources (...)"
It's license (ISC ofc.) and size makes it great resource to study rsync. I would like to have Dropbox on my phone as legendary combination of rsync and cron. It may be nice to have a port to Java so it would work without JNI, but maybe that's only my fetish.
I'll try explaining it. It's a new implementation, from scratch (clean room) of rsync, which will become the new rsync in OpenBSD. The tree that it's been imported into is the openbsd cvs tree that contains openbsd, openssh, opencvs, and other major projects.
There's SIP and Keychain, but it does not prevent say Safari from accessing Mail or user memory in general. If macOS becomes an iOS port (instead of iOS being the derivative work of the barely used UNIX system called macOS) perhaps we'd see some of the iOS specific hardening. AFAIK that kind of sandboxing does not exist in macOS. How difficult would it be to port something like pledge or unveil to macOS?
ISC is a more free license for its users. GPL protects theoretical future users of theoretical derivative software by restricting freedom for its users.
It's important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn't much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.
This is the core difference between gnu and bsd - guaranteeing freedom for all current and future users VS all current and future distributors (in particular, the bsd guarantees the right to fork and close - often seen as essential for commercial use in a new software or software+hardware appliance; while gnu attempts to guarantee that any downstream user will always have the four freedoms).
It's so easy to forget that at the very end, there are people who are using software. Developers are middlemen for most code in the products they build (think dependencies). GPL cuts through that, and always has the end-user in mind.
Using "less free" or "more free" in this context just leads to pointless semantic debates. What happened is that someone made a clean-room implementation of a copyleft program in order to have it available under a copyfree license. Both licenses are Free.
First time I see this, thanks. The website isn't explicit about this point, but from what I gather, "copyfree" isn't viral in the way GPL is. It seems to provide "Free as in Freedom", but unlike GPL, doesn't protect that freedom from being immediately taken away.
I think gp means that this code is allowed to be used in products that choose to limit the end users freedoms.
(I don't mean that as a plus or negative, but as just a statement on one of the largest philosophical differences between the bsd-style and gpl licenses: Who's freedoms are being protected? Those of the final end user or those of the developer?)
I want my code to be usable by anyone developing free software of their own. I want them to be able to integrate it, modify it, redistribute their modified copies, and more.
The GPL, being long and complicated (over 5000 words, and that’s just the GPLv3!), and with the ideological restrictions built in, is incompatible with many widely used free licenses, not least previous versions of itself. In any situation where social or legal barriers prevent the target audience from switching to the specific version of the GPL in question, any code I release under it is unusable to them.
Releasing my software under a simple, understandable, and permissive free license prevents this from ever happening.
I dislike proprietary software. I don’t use it or create it, and advocate against it wherever I can.
But given the choice between letting some Chinese featurephone developer use my code without “giving back,” and preventing swaths of the free software community I care about from using and improving my code for themselves, I will favor permissiveness every time.
Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free. Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done” for all intents and purposes and there is no reason at all to fork, proprietary or otherwise. (This comes up in the context of pure algorithms code a lot.)
Also, even if a project is copylefted, people can still just do... exactly what they did here. Which, while different in the weak sense of “avoiding copyright” or maybe “avoiding patents”, in the context of systems code like this almost always results in the same code on both sides anyway. If the choice is between either giving the proprietary developers your code to use, or making them re-implement exactly what you wrote without your copy for reference—with no option for “they don’t implement it at all”—then exactly what is the point of choosing the latter over the former?
Digital analogies to theft rarely are good, but this one is passable. The main point is that this license grants software freedoms, but then doesn't do anything to protect it - thus enabling middlemen (like most of us devs are, for most software we write!) to immediately strip those freedoms away.
Proclamations of rights aren't really useful if they don't have means for enforcing those rights are not taken away.
Both the ASF and FSF have a variety of NIHed projects that appear to exist purely for license ideology reasons. The most famous that comes to mind is Apache Geronimo, a clone of JBoss that few people used but was bought by IBM for ~$120M IIRC.
Rsunc solved a complex problem that comes in many nuanced variants. It may seem trivial at the outset, but it is actually not. So I don't think that rsync has many features that are somehow unnecessary or bloat.
The first (well-known) 'clean-room' implementation was when Phoenix implemented an IBM PC-compatible BIOS by having one team studying the IBM source (which was available), then writing up a specification for how it worked, handing that specification over to somebody else (they were Phoenix' legal team, IIRC), which then handed the specs over to another team that had never seen the IBM source. They sat down in their "clean room" (b/c it wasn't tainted by actual IBM source) and implemented a BIOS from specs only. In that way Phoenix was protected from any claims of copyright infringement: Nothing was copied, and the people writing the code had never seen the original source.
In that particular case the specs were reverse-engineered from actual source, but that's not a necessary part of the process. It's more common to have one team study the protocol, data going over the wire, disassembling, etc, then use the knowledge gained to write specs, and then another team implements the equivalent functionality from specifications only.
All openrsync implements is the equivalent of a fast "cp -a" across the network, plus it can also remove files if they don't exist. rsync does much more and over the years I've used most of it, so there is no way I would use openrsync. The upside is the manpage of openrsync isn't that much more complex than cp, which is a definite bonus if that's all you are doing.
The only thing I would change about rsync is it's default, which IMO should be to copy all meta data supported by both sides. Ie, the default should be to make the destination as similar to the source as possible. It's default is to only copy the data, and you must add options to say what else you want copied. To make matters worse you can just add every option because if you say you want to copy something not supported by one side of the other it errors out. I may have missed it as I am reading the man page source, but openrsync didn't seem to change that.
Fun bit of history: the “Open” here comes from OpenBSD; but the “Open” in OpenBSD came from the development process, not the license.
Before Git and SVN, we had CVS, and to check out code from a CVS repository you needed to have an account on the CVS server. If you wanted to contribute but didn’t already have a developer account, you were limited to writing patches against release tarballs or whatever alternative method upstream supplied.
One of OpenBSD’s major projects in the mid 90s was creating anonymous CVS, where anyone could check out code without any account. This came from Theo’s experience after losing his NetBSD account, where he found himself unable to make meaningful contributions anymore without the ability to cvs checkout, cvs diff, etc. So when he started OpenBSD, he had in mind to open up the development process to everyone, account or not.