Container image distribution seems to be one of the primary problems this tackles:
"DevOps ... brings a lot of challenges: the efficiency of image distribution, especially when you have a lot of applications and require image distribution at the same time. Dragonfly works extremely well with both Docker and Pouch, and actually we are compatible with any other container technologies without any modifications of container engine."
FWIW, this was a similar problem that I tackled for Golang gopher gala hackathon 2015 - a custom bittorrent based docker image registry POC.
Interestingly my problem statement was somewhat similar:
"Large scale deploys are going to choke your docker registry. Imagine pulling/deploying a 800mb base image (for example the official perl image) across 200 machines in one go. That's 800*200 = 160GB [EDIT: Correction thanks to kingbirdy] of data that's going to be deployed and it'll definitely choke your private docker registry (and will take a while pulling in from the public image)."
Devs need to be aware that bittorrent now(for sometime) have a DHT solution that allows to have "mutable" slots with the crypto public address tied up to it.
So if you own the private key you can write the payload, and just share your public-key address to the people you want to share the payload with. In the payload you can write a traditional immutable torrent manifest for instance, which is in essence a public-key crypto based update system.
For a lot of cases i think it's a better approach than what IPFS and DAT provides, because you dont care about it being a global address. All you want is to share with a group of people, more in the p2p social/organic way.
I was playing with it once, using the libtorrent library and the main bittorrent DHT, and it was a very nice experience.. it finds the payload pretty fast when you think its a DHT, and you are working in pure p2p fashion.
The only single point of failure here is the DHT bootstrap peer.
Im planing to use this feature to distribute binary images for clients that have my public key.
Very interesting. Forgive my ignorance, but you mean it's kind of like mutable torrents that only the uploader can modify?
Does a client just search the DHT for the public key? I thought torrent clients searched for the hash of the files.
If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?
@namibj has made a more elucidative comment with links to libtorrent library as also the torrent BEP that describe how the DHT are supposed to work in detail.
> but you mean it's kind of like mutable torrents that only the uploader can modify?
Yes, your public key(hash) is the DHT key which is the one that identifies the payload, and only the private key owner can modify the content in that particular slot.
Thats why its cool, because you can have a p2p system that rely on trust between the parties, unlike the traditional torrent system.
Also im not so sure, but lately using centralized trackers are discourage and i guess that magnet links must use something like a DHT to work the way they do.
> I thought torrent clients searched for the hash of the files.
You are correct, but the DHT is a BEP and is something more "on the corner", but its theres and at least in the time i've tried it was working great.
> If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?
The rules are:
You can create any slot you want, its just a matter of generating the public/private pair you want to use. (This would allow you even to use a forward-secrecy algorithm if you need one)
The byte payload/value must be small, so you should use to give a manifest of something or to point to something else.. But lets not forget that Git just look at the HEAD record with only a hash to go on from there. Just pointing out to something else that can be a immutable resource, like a traditional torrent.
So you can point to a torrent, download and have a small list of anything you like.. working as a catalog, and go from there.. anyway if you play the indirection game here right, theres no limit to what you can do.
If you want the payload to be there you need to keep writing to it from time to time (the same value if you want), or it will expire and other peers will not be able to locate it anymore.
What would i do? i would use the payload to point to something else.. like some torrent in the classic bittorrent network (you can use just a magnet link), or expose the whole torrent header. You can also point to some http resource or whatever.
Need something more? how about point to a torrent that download a bootstrap program that start a RPC service over tcp.. or over a more simple HTTP interface.. than do something else from there..
I was thinking about how can i use this to create a update by using diff and patch, giving that by using a public key scheme you can create a trust relationship between parties and patch the binary with something coming from the 'mothership'.
> Are there any example projects using this?
I dont know any, but in my case i was playing with libtorrent implementation of DHT. And also as far as i know, is this kind of properties in the DHT that allow projects like IPFS to exist.
The cool thing about using the torrent implementation is because the main DHT have a lot of nodes already so you can find something pretty fast.
libtorrent has support for it , as far as I know. The "research paper" you probably mean  is the torrent equivalent of and RFC. AFAIK at least python uses something very similar.
There is an apparently Node.js implementation  of something that can publish a given torrent to a mutable address, and also retrieve a mutable torrent from a given address. I do not know how well this is implemented in the usual clients, but if you want it in your docker, you might want to talk to libtorrent directly, and implementing BEP 46 yourself should not be hard with the things the library has to offer. A benefit could be that depending on how you handle it, you might be able to store the tarballs docker images seem to be in their unpacked form, and just keep some metadata about what the header(s) of the tarball were, along with some file offsets. This way you would be relieved of the unnecessary storage burden, and able to possibly use many more of your servers to seed at least part of the images, e.g. maybe only the parts that are not mutated when the software is running. E.g., download once, unpack, only offer to seed those files/pieces that did not get modified in the meantime, without trying to re-download the "broken" data.
>At Alibaba, the system transfers 2 billion times and distributes 3.4PB of data every month, it has become one of the most important piece of infrastructure at Alibaba. The reliability is up to 99.9999%.
I took at look at their repo and it turns out there are surprisingly lots of good stuffs in it which never gets much spotlight or attentions.
except reliability reduces a total, not sum integers like beer count.
if I write a shell script with a scp line right now I can tell you it has 100% reliability. up to X% reliability means that "in the very nice and controlled environment, with the own devs attacking every production problem, trying to get 100% got us up to X%"
Docker themselves have discussed making the official registry extensible enough to support BitTorrent pulls, but I don't know if anything ever happened there.
Facebook has been using BitTorrent for deploys for something like 9 years now. They configured the tracker to prefer sharing peers with longer matching subnet prefixes, to keep bandwidth off the backbone as much as possible.
Justin linked the blog post which is probably the best written description. The short of it is that when you upload layers to Quay, it stream calculates the BitTorrent pieces. Private layers are given unique swarms isolated by namespace and peer discovery is protected by a tracker that contains middleware validating JWTs passed in the announce URL of the torrent. A custom client can be used to simplify downloading and importing of images into the local docker CAS.
Honestly, most organizations don't have sophisticated enough networks that the benefits outweigh the complexity of p2p orchestration. This is why it's popular at Alibaba, Facebook, Twitter, but most people are still just using the OCI distribution protocol.
Feel free to contact me (Keybase is in my profile) if interested. I'd love to get more people on the path to p2p, but it's often a solution looking for a problem.
There is a plan to use one language(GoLang) to refactor this project.
Now the project's directory structure has been restructured to meet the GoLang project style, and the entire CI process has been built.
The next step is to migrate the 'getter' & 'supernode' from '/src' to the root and reconstruct them with GoLang.
If you only need such a tool, other comments on this submission linked ways to use bittorrent with docker, and µTP  seems to be reasonably good at punching through the good old style of NAT, where ports are sequential and on the same IP, with something that can coordinate accessible to both. It also enables gentle use of your bandwidth, in the sense of playing reasonably well even if you don't have fq-codel or similar in use on the router. With somewhat nice networks it can be pretty gentle on other users of the networks, without wasting any part of it. Do consider QoS though, it is preferable to send other, important traffic first, as µTP is good at backing off in these cases. The latency in backing off is just a little high to be stealthy towards concurrent TCP connections. Packet loss is rare, but lag spikes are still a nuisance.
These are literally two things with the name Dragonfly. I don’t particularly care about whether the shared name is confusing here but it is disingenuous to pretend that identical names are merely similar.
To be fair, I've to admit that Dragonfly BSD doesn't show up on the first result page when you search for "dragonfly". What does show up is the malware by the same name.
Side note: this also shows how simplistic Google search really is. No way to search for "Dragonfly /computers/" opposed to "Dragonfly /nature/" with the terms in slashes denoting a concept or domain instead of a syntactic element.
I used to think that too, but at the rate JS developers are creating package it wouldn't take very long before all decent, easy to remember names to be used up. Surely you don't want Alibaba to name a product in Chinese sounded English which no one could spell or remember right?
( I still cant spell many of the Chinese Companies' English name )
Google manipulates its results in a variety of ways. The end result is that if you and your networked peers aren't already searching for things related to Dragonfly BSD, then you're less likely to find it, because Google will be 'helpfully' biasing your search towards other stuff.