Speeding up Linux disk encryption

(blog.cloudflare.com)

491 points | by jgrahamc 11 days ago

21 comments

  • nullc 11 days ago

    > otherwise, we just forward the encryption request to the slower, generic C-based xts(ecb(aes-generic)) implementation

    This seems like at least something of a bad idea, because that implementation (if my search-fu is correct) is:

    https://github.com/torvalds/linux/blob/master/crypto/aes_gen...

    Which is obviously not constant time, and will leak information through cache/timing sidechannels.

    AES lends itself to a table based implementation which is simple, fairly fast, and-- unfortunately-- not secure if sidechannels matter. Fortunately, AES-NI eliminated most of the motivation for using such implementations on a vast collection of popular desktop hardware which has had AES-NI for quite a few years now.

    For the sake of also being constructive, here is a constant time implementation in naive C for both AES encryption and decryption (the latter being somewhat hard to find, because stream modes only use the former):

    https://github.com/bitcoin-core/ctaes

    (sadly, being single-block-at-a-time and constant time without hardware acceleration has a significant performance cost! ... better could be done for XTS mode, as the above algorithm could run SIMD using SSE2-- it isn't implemented in that implementation because the intended use was CBC mode which can't be parallelized like that)

    Can't the kernel aes-ni just be setup to save the fpu registers itself on the stack, if necessary?

    • harikb 11 days ago

      Curious why CF needs to worry about side-channel attacks when all software run on those machines belong to / written by them. They do have a “workers” product with 3rd party code but they can easily keep storage servers out of that pool. Typically storage encryption is all about what happens when a machine is physically stolen, hard disk discarded on failure or other such actions beyond network security. Please correct me if I am wrong.

      • fanf2 11 days ago

        I believe you are wrong because of https://workers.cloudflare.com/

        • homero 11 days ago

          Last time i saw they use many contractors and third parties to deploy mini data centers in other data centers. They ship them servers to install. Cloudflare doesn't have many private data centers.

          • dependenttypes 11 days ago

            You could measure the timing over the internet.

            • starfallg 11 days ago

              Users aren't interacting directly with the storage layer so any timing attack via the network is going to be once or twice removed. Can attackers really gleam useful and mount a successful attack in this type of setup?

              • oconnor663 11 days ago

                This is almost certainly true in practice, but it's a big risk, compared to the risk tolerance that we usually engineer into crypto. For comparison, suppose someone was suggesting: "Why not use 80-bit keys instead of 128-bit keys? No one in the real world can brute force 80 bits, and we'll save on storage." Yes, that's true, but it's taking a relatively large risk for relatively little benefit. Hardware will get faster over time, and an extremely high value target might justify an extremely expensive attack, etc etc. We prefer 128-bit keys because then we don't even have to consider those questions. I think timing attacks are similar: Yes they're very difficult in practice, but they raise questions that we'd rather not have to think about. (And which, realistically, no one will ever revisit in the future, as hardware evolves and new APIs are exposed.)

                • necovek 11 days ago

                  I always imagined key size to relate to computation cost and not storage — what algorithm are you referring to?

                  • joshuaissac 11 days ago

                    The point is that you need more bits to store a longer key, but the storage space saved is very little in this case compared to how much easier it is to crack.

                    • necovek 10 days ago

                      Sure, but a difference of 0.1% of storage to go from 80-bit key to 1024-bit key for 1 Megabit of data (that's 118 bytes out of 128KiB), or 0,00001% for 1Gbit (128MiB) seems not worth raising as a concern.

                      (I've chosen example numbers just to make calculation trivial)

                      So I can't ever imagine storage size being the driver for choosing the key size, though from the other threads, it seems that there are algorithms that do have a storage overhead that might be related to key sizes.

                • eximius 11 days ago

                  YES.

                  It requires statistical techniques to remove the noise, making the attack harder, but not necessarily infeasible.

                  • mbreese 11 days ago

                    Is that really the case though when the differences in computation would be measured in microseconds, but the network noise would be in the order of milliseconds?

                    • nullc 11 days ago
                      • mbreese 11 days ago

                        I don’t know about that... in the paper the client and server are on the same network. It would be very interesting to repeat this study using faster processors (which will make this signal smaller) and over the public internet (making the noise bigger).

                      • eximius 10 days ago

                        This is why constant time functions are used in cryptographic implementations, even over the network.

                        These are called timing attacks and they're less common now because professional cryptographers know how to deal with it. But this is very much a perfect example of it.

                • nullc 11 days ago

                  Maybe not relevant for CF-- more likely relevant for wider use of the approach!

                • nshepperd 11 days ago

                  > Which is obviously not constant time, and will leak information through cache/timing sidechannels.

                  This confuses me. Why is it in the kernel if it's not constant time? Isn't that a security risk? (Is there any context where it would be safe to invoke this?)

                  • cbsmith 11 days ago

                    Sure. There's lots of cases where you are controlling for timing attacks elsewhere, or where a timing attack isn't a concern. This can particularly be true for a case where you are writing data to block storage with the idea that a potential attacker won't be accessing it until much later... at which point all timing information would be gone.

                    • nullc 11 days ago

                      Unfortunately cache sidechannels defeat a lot of measures that would otherwise destroy timing data.

                      I agree that there can be some cases where it doesn't matter but it's extremely expensive to be sure that it doesn't matter-- making it usually cheaper, when you consider the total costs, to deploy code that doesn't have the sidechannels.

                      • cbsmith 9 days ago

                        It's pretty cheap if the use case is, "the computer will be turned off before I worry about an attacker".

                  • nemo1618 11 days ago

                    I wish the world could move on from AES. We have ciphers that are nearly as fast without requiring specialized hardware, just generic SIMD. Imagine how fast a ChaCha ASIC could run!

                    There are other options for non-AES FDE too: most infamously Speck (suspected to be compromised by the NSA), but also Adiantum, which is now in Linux 5.0.

                    • dependenttypes 11 days ago

                      > Imagine how fast a ChaCha ASIC could run

                      Not as fast. Chacha20 uses 32-bit additions which are fast in software but expensive and slow in hardware. In addition protecting Chacha20 from power analysis attacks is more difficult compared to AES.

                      > just generic SIMD

                      Constant-time AES with SSE2 is actually faster than the naive variable-time AES. See https://www.bearssl.org/constanttime.html#aes

                      In addition Chacha20 is not nearly as fast as AES when using the AVX-512 Vector AES instructions.

                      > but also Adiantum

                      Which uses AES (once per sector).

                      • karim 11 days ago

                        Genuinely curious, would you mind explaining why some operation can be fast in software but slow in hardware?

                        • kop316 11 days ago

                          I think the parent comment is saying it is fast in software on a modern CPU, but making tha into an ASIC would either be a) slow or b) expensive due to the 32-bit additions.

                          IIRC (I can't find it right now), when NIST had the contest for AES, AES hhad to run on low power hardware in the late 90s/early 2000s. This required things like everything to be fast on an 8-bit microcontroller.

                          • dependenttypes 11 days ago

                            To implement 32-bit + in hardware you need 31 full adders and one half adder, each of which uses multiple gates and depends on the result of the previous adder.

                            Meanwhile + and bitwise and tend to take the same amount of cycles to be processed, and each cycle takes the same amount of time, see https://gmplib.org/~tege/x86-timing.pdf

                            Chacha20 in hardware would not be any slower than chacha20 in software, but it would be slower than other algorithms which do not use 32-bit +.

                            • JoshTriplett 11 days ago

                              > To implement 32-bit + in hardware you need 31 full adders and one half adder, each of which uses multiple gates and depends on the result of the previous adder.

                              This is not how CPUs typically implement addition, or other ALU operations. Carry-lookahead adders have existed since the 1950s: https://en.wikipedia.org/wiki/Carry-lookahead_adder

                              • sls 11 days ago

                                Thank you, I love this citation so much.

                                > Charles Babbage recognized the performance penalty imposed by ripple-carry and developed mechanisms for anticipating carriage in his computing engines.

                          • Twirrim 11 days ago

                            > In addition Chacha20 is not nearly as fast as AES when using the AVX-512 Vector AES instructions.

                            Note that Cloudflare opted for Xeon Silver chips that aren't good at AVX-512, unless doing pure AVX-512 operations.

                            • ZeroCool2u 11 days ago

                              And their 10th gen prod servers switched to AMD which, as far as I know, have SIMD support, but not AVX-512 support specifically.

                              • dr_zoidberg 11 days ago

                                That is correct, Zen 2 doesn't support AVX512 (no AMD chip does).

                          • nullc 11 days ago

                            AES in an ASIC is pretty efficient, I'd expect the difference to flatten if both had good hardware implementations. Not that I wouldn't be happy to see faster chacha20 on systems.

                          • gruez 11 days ago

                            >Which is obviously not constant time, and will leak information through cache/timing sidechannels.

                            What's the threat model here? I can't think of a plausible scenario where side channel attacks can be used to gain unauthorized access to FDE contents.

                          • convivialdingo 11 days ago

                            Did this commercially for 15 years. Always the same problems.

                            We ended up with several solutions- but all of them generally work the same conceptually.

                            First off, separation of I/O layers. System calls into the FS stack should be reading and writing only to memory cache.

                            Middle layer to schedule, synchronize and prioritize process IO. This layer fills the file system caché with cleartext and schedules writes back to disk using queues or journals.

                            You also need a way to convert data without downtime. A simple block or file kernel thread to lock, encrypt, mark and writeback works well.

                            Another beneficial technique is to increase blocksizes on disk. User Processes usually work in 4K blocks, but writing back blocks at small sizes is expensive. Better to schedule those writebacks later at 64k blocks so that hopefully the application is done with that particular stretch of data.

                            Anyway, my 2 pennies.

                            • tyingq 11 days ago

                              The blog post reads like this all happened recently, but their linked post to the dm-crypt mailing list is from September 2017[1]. I'm curious if they've interacted with the dm-crypt people more recently.

                              [1]https://www.spinics.net/lists/dm-crypt/msg07516.html

                            • unixhero 11 days ago

                              Did they reach out to the Linux kernel mailing list? Or just the dm-crypt team, I found the answer they received rather arrogant and useless to be honest.

                              • jlgaddis 11 days ago

                                I'm a huge "fan" of F/OSS but, unfortunately, such condescending answers are all too common in this "community".

                              • beagle3 11 days ago

                                Ages ago I benchmarked truecrypt overhead on my machine at the time (2006, I think?) and it was about 3%; I assumed that's a reasonable and still applicable number, also do dm-crypt and modern VeraCrypt. Guess I was get gradually more wrong through those years, according to the git archeology....

                                • singlow 11 days ago

                                  Also, disk speed in 2006 was probably much slower. Disks have gotten faster at a greater pace than processors during the last 10 years.

                                • ggregoire 11 days ago

                                  > Many companies, however, don't encrypt their disks, because they fear the potential performance penalty caused by encryption overhead.

                                  There is also the overhead of automatically unblocking a remote server during an unattended reboot. Reading the encryption password on a USB stick or fetching it through internet is a no from me. I think there are solutions about storing the password in RAM or in an unencrypted partition, but that's the overhead I'm talking about. I wonder how companies deal with that.

                                  • jlgaddis 11 days ago

                                    Red Hat's solution to this problem is NBDE.

                                    > The Network-Bound Disk Encryption (NBDE) allows the user to encrypt root volumes of hard drives on physical and virtual machines without requiring to manually enter a password when systems are restarted. [0]

                                    [0]: https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...

                                    • gruez 11 days ago

                                      Isn't this what TPMs are designed for? I think both intel and amd motherboards have them built-in by using the security processor in the CPU.

                                      • mercora 11 days ago

                                        i use kexec for reboots and store the keys for disks inside an initramfs which itself is stored on an encrypted boot partition. When i do a cold boot these systems boot into a recovery like OS so i can fix stuff when needed but mainly to do a kexec there (its not perfect but what is). If its possible to avoid this (i.e. i have physical access) i can decrypt the initramfs directly from grub using a passphrase entered locally.

                                        A warm reboot using kexec does not need any intervention from my side and directly boots into the already decrypted initramfs with the key already present and thus able to mount the encrypted volumes including the root volume.

                                        • r1ch 11 days ago

                                          Debian offers a dropbear shell in initramfs which you can use to SSH in and provide keys. I only have a handful of servers so currently I do this manually on a reboot but it would not be difficult to automate using for example SSH keys unlocking key material. The downside of this is your initramfs and kernel are on an unencrypted disk so a physical attacker could feasibly backdoor them. I'm sure there's some secure boot UEFI / TPM solution here.

                                          • zzzcpan 10 days ago

                                            You are missing an integrity checking step. You can do it by sending some sort of ephemeral binary over ssh that does integrity checking and requests a key with the resulting hash of the check to proceed, don't blindly trust sshd running from an unencrypted partition. But still at the end of the day it's all about obscurity and obfuscation, you can't make it provably secure. You can go far and make that binary one time randomly generated, obfuscated, bound its running time, you can use a TPM and what not, but it probably won't matter for pretty much any realistic threat model.

                                        • est31 11 days ago

                                          Wow those speed improvements are very neat. And an awesome blog post accompanying them. Prior to reading this, I've considered Linux disk encryption adding negligible latency because no HDDs/SSD can be fast enough for a CPU equipped with AES-NI, but that view has changed. Two questions: 1. are there any efforts to upstream them? 2. Invoking non-hw-accelerated AES decryption routines sounds quite expensive. Has it been tried out to save the FPU registers only if there is the need for decryption?

                                          • andyjpb 11 days ago

                                            The existing Linux system is useful for hardware that does less than 200MB/s, so you should be fine with HDDs.

                                            Cloudflare is optimising for SSDs.

                                            They don't talk about latency: all their crypto benchmarks measure throughput. Near the end they hint at response time for their overall cache system but there's no detailed discussion of latency issues.

                                            The takeaway for me is that I'm OK with what's currently in Linux for the HDDs I use for my backups but I'd probably lose out if I encrypted my main SSD with LUKS.

                                            At the end of the article they say that they're not going to upstream the patches as they are because they've only tested them with this one workload.

                                            I'd also be interested to see a benchmark comparing SW AES with FPU-saving + HW AES. Unfortunately their post does not include stats for how often their proxy falls into the HW or SW implementations. Whatever those numbers are, I'd expect FPU-saving + HW AES to be somewhere in the middle.

                                            • necovek 11 days ago

                                              You can easily achieve more than 200 MB/s with HDDs in RAID, but the bottleneck might be altogether different — I think it is an important distinction.

                                              While I applaud their wins, they have basically profiled the wrong thing, established the full overhead when disk speed/latency are basically removed, and only gone to actual production workload at the very end — in the worst case, their improvements could have been for naught, but they were "lucky" (not really, they were smart, but profiles did not really guide them — they just optimised the heck out of the system, but they could have been unlucky and not gain anything if the bottleneck was in a particular place unaffected by their code analysis).

                                              It's great that Cloudflare allows this kind of engineering to happen (investigative, explorative, and not necessarily RoI focused), but it's rare to find a company that does.

                                              • jlgaddis 11 days ago

                                                > The takeaway for me is that I'm OK with what's currently in Linux for the HDDs I use for my backups but I'd probably lose out if I encrypted my main SSD with LUKS.

                                                Yep, when building my latest workstation, I went with a pair of ("regular") SSDs (RAID1) for my data. Later, I decided to add an NVMe for the OS for the additional speed.

                                                I then went and encrypted all of the drives (via LUKS), however, which basically killed any additional performance I would've gotten from the NVMe drive. I would have been just fine as well off with only the SSDs and without the NVMe drive.

                                                • pmontra 11 days ago

                                                  I'm using LUKS on my SSDs. I never benchmarked them but they are fast enough that I don't care. I'm working with VMs right now, creating and destroying them with VirtualBox (automated). Kind of a local EC2. The disks are two Samsung EVO 950 and 960, 1 TB each. They're in a laptop from 2014, a SATA III at 6 GB/s so I guess I'm already capped by the interface and the encryption overhead doesn't matter.

                                                  • sweettea 11 days ago

                                                    They talk about throughput, but in practice their testing regimen is actually testing latency. Dm-crypt's performance ceiling is pretty high if you consider throughput rather than latency, and I would expect the tradeoffs to decrease latency would decrease maximum throughput at least slightly (although I have not tested their patch).

                                                  • knorker 11 days ago

                                                    At least your first question is answered in the article: Yes

                                                  • vletal 11 days ago

                                                    Has anyone already tried to compile the kernel with these patches for their desktop/laptop with encrypted drive? https://github.com/cloudflare/linux/tree/master/patches

                                                    • asymptotically2 11 days ago

                                                      Yes, I'm running them on kernel 5.5.13 (which came out today)

                                                      • 3r8riacz 10 days ago

                                                        Wow, I'd be so happy if you could share the steps you took to achieve this. Let's say I have a Debian machine, how could I try it out?

                                                  • pmorici 11 days ago

                                                    Interesting. One other thing they don't mention that I found interesting when doing my own digging on dmcrypt speeds a while back is that the 'cryptosetup benchmark' command is only showing the single core performance of each of those encryption algorithms. You can verify this by watching the processor load as it performs the benchmark. That lead me to find that if you have a Linux software RAID you can get much better performance by having 1 dmcrpyt volume per disk and then software RAID the dm devices instead of putting a single dmcrypt on top of the software RAID. Curious if that would stack performance wise with what they found here or if that just happened to help with the queuing issue they identified.

                                                    • mercora 11 days ago

                                                      i remember somewhat recently efforts to parallelize the work of dm-crypt where applicable had been merged. However, i guess having multiple separate encryption parameters and states (read: disks) leaves more opportunity for parallelization of the work especially if disk access patterns are not spread wide enough.

                                                    • thereyougo 11 days ago

                                                      >Being desperate we decided to seek support from the Internet and posted our findings to the dm-crypt mailing list

                                                      When I see a company such as CloudFlare being so transparent about their difficulties, and trying to find an answer using their community members, it makes me love them even more.

                                                      No ego, pure Professionalism

                                                      • sneak 11 days ago

                                                        Correspondingly, the response they received reflects just as strongly on the community itself.

                                                        • hyper_reality 11 days ago

                                                          Yep, the response they received was incredibly condescending. The follow-up from Cloudflare remained polite and added a lot more data, and was ignored.

                                                          It's a shame because I've seen this condescending attitude quite frequently in the crypto open source community, and am not really sure how it arises. At least in this case it seems to have had the good outcome of motivating Cloudflare to dig in deeper and solve the problem by themselves.

                                                          • dr_zoidberg 11 days ago

                                                            I'd say it comes from sitting at an ivory tower and giving 0 ducks about who you're talking to. In this case, the person/team asking was capable enough to go on their own, dig, test, change, and find a fix. It could've probably been easier for them if given proper directions.

                                                            OTOH, perhaps those that answered from atop the tower had little idea of the mechanisms that the author(s?) dug out and changed. So double shame on them, for being condescending and not knowing.

                                                            And it also falls on ourselves to be mindful of this behaviour, that can creep up on us without knowing. We sometimes think our time is super valuable and we don't have to spend it on some "newbie question" or this guy who doesn't understand. The past year I've been mentoring grad students in the lab I work at, and found myself once or twice going this route. I luckily caught it early, took a deep breath and gave them the time and explanations they needed. In the end I got a few nice surprises out of two amazing students, who were seeing a bit beyond what was evident.

                                                            • belorn 11 days ago

                                                              I see many large project addressing this issue by not having a public list where you can talk directly to the developers. Instead there is user lists which public relation managers maintain and where the expected result from the original mail would either be no response or a polite and nicely written one about testing the user configuration options that they had already tested. That way the developer would not need to respond unless the user has shown enough proof of work to demonstrate to the public relation managers that the issue should be forwarded to a developer, in which case the answer the developer would reply with would be under the assumption that the person/team asking is capable enough to use the directions to dig, test, change, and create a patch which then later might be added to the project.

                                                            • megous 11 days ago

                                                              Full response: https://www.spinics.net/lists/dm-crypt/msg07517.html

                                                              From the PoV of the person who responded they didn't provide any relevant information that would indicate what platform they run, or what speed they expect, or why they think 800MiB/s seems slow to them. On many platforms this would be a pretty good result. At first look, it looks like they expected the speed of unencrypted storage, because that's what they tried to compare against.

                                                              So the response seems reasonable at first glance to me. They got the answer to their main question. (which they omitted from their blog article)

                                                              • gravitas 11 days ago

                                                                I disagree with your assessment.

                                                                > If the numbers disturb you, then this is from lack of understanding on your side.

                                                                This is arrogance on the part of the person replying that hand-waved away their problem "you just don't understand" when in fact, they (Cloudflare) do/did understand. They then went on to prove that it was due to queuing within the kernel, not the hardware as commented by this person in their flippant reply.

                                                                • megous 11 days ago

                                                                  Cloudflare did not understand at the time. Anyway, I'm not questioning that the reply was not very helpful, I just don't see it as unreasonable. I liked the technical parts of the CF writeup overall.

                                                                  • marcinzm 11 days ago

                                                                    >Cloudflare did not understand at the time

                                                                    And neither did the people who responded to them it seems.

                                                                    • why_only_15 10 days ago

                                                                      The next line after that is "You are probably unaware that encryption is a heavy-weight operation". If you're posting to the dm-crypt mailing list about encryption performance, you're probably very aware that encryption is a heavy-weight operation.

                                                                  • tw04 11 days ago

                                                                    I'm not sure why you'd try defending such behavior other than you spent as little time reading the original email as the person who responded in the thread. They clearly state at the bottom their in-memory results - while they don't give EXACT hardware, it's more than enough to determine there is a major bottleneck in the encryption engine. To claim "encryption is heavy" is also a poor response - either the poster has no concept of the overhead of encryption with CPU offload or was just too lazy to put together a helpful response. Either way - no response would've been better than that.

                                                                    • megous 11 days ago

                                                                      So who do you get from 4.5GiB/s no encryption vs 850MiB/s with encryption with no other information to understanding whether there's a bottleneck in some unspecified encryption engine (with unknown throughtput and setup latency)?

                                                                      • tw04 11 days ago

                                                                        I dunno, if one of my coworkers had put in the work that OP did, showed me the results, and asked my thoughts: if I didn't have enough information I'd ask for more. If you're telling me you don't have enough information to assume some basics about the setup (I'd look at that and assume it was a modern Intel or AMD CPU just based on the throughput) - then how does he have enough information to dismiss the findings as "then this is from lack of understanding on your side."

                                                                        You can't have your cake and eat it too.

                                                                  • dependenttypes 11 days ago

                                                                    > or what speed they expect

                                                                    "Without LUKS we are getting 450MB/s write, with LUKS we are twice as low at 225MB.s"

                                                                  • _jal 11 days ago

                                                                    Honestly, read the original message and consider how you would have replied.

                                                                    They showed work in a vacuum - demonstrated that dm-crypt has costs over raw device access (I would hope so!) on some unknown hardware, and then asks "does this look right to you?"

                                                                    Well, yeah, that looks like it looks elsewhere, and by the way, there's a built-in command that also would have told you this.

                                                                    People whine about technical mailing lists, I think because they don't get the context. Think of them as sort of like water coolers at an office that specializes in whatever the list is about. You get a short slice of expert attention in between doing whatever they actually have to get done.

                                                                    Throwing a bunch of data on the floor and saying "hey, is this expected?" is not going to work well. Seriously, what were they expecting?

                                                                    • rrss 11 days ago

                                                                      > Well, yeah, that looks like it looks elsewhere, and by the way, there's a built-in command that also would have told you this.

                                                                      It's entirely possible to say both these things in a much more constructive and less condescending tone than was used.

                                                                      • _jal 11 days ago

                                                                        People are so very weirdly sensitive to these things when it is a big company that comes calling. Wonder why that is.

                                                                        Context matters. If you don't take the time to understand the context you're walking in to and don't follow local rules, don't be surprised if people are rude to you back. Not that I even think what they said was all that rude.

                                                                        Do you also think you can slide in to a gam3r chat and expect business etiquette?

                                                                        • manigandham 11 days ago

                                                                          I don't think a crypto mailing list is the same as "gam3r" subreddit, but it wasn't that rude overall.

                                                                          It was the tone that they "dont understand" when in fact Cloudflare does understand crypto and performance very well, and went so far as to dive into kernel code and submit patches that fixed the problem that others didnt even realize existed. Even so I agree this isn't worth such a big discussion.

                                                                          • saagarjha 11 days ago

                                                                            > People are so very weirdly sensitive to these things when it is a big company that comes calling.

                                                                            They're not sensitive when it's a "big company", they're sensitive when they're trying to get work done and they receive a flippant response.

                                                                    • pengaru 11 days ago

                                                                      It's a public mailing list, I see zero upstream kernel commits from arno@* so it doesn't appear the response came from someone who actually knows and works with the dm-crypt code.

                                                                      I'm on a number of public mailing lists and there's often a participant who tends to be both available/communicative and callous in their communication style. My assumption is there's a filter effect going on here where some folks who have very poor social abilities wind up at their computer alone all the time and public mailing lists become part of their few remaining human interactions.

                                                                      What I'd take away from this particular dm-crypt interaction isn't that the community is assholes, but that the community is small and the mailing list poorly attended/inactive.

                                                                      In the past I've reported my own dm-crypt problems upstream and it took years to get a bisected regression reverted. Just getting relevant people to pay attention was a challenge.

                                                                      • ezoe 11 days ago

                                                                        And according to the cloudflare, the current dm-crypt implementation is a horrible bit rot which got no reviewing for 15 years.

                                                                        • crest 11 days ago

                                                                          Especially because my old Haswell-E workstation running FreeBSD has no problem maxing out four encrypted SATA SSDs (>500MB/s each) at the same time with AES-NI. There is no excuse for slow cipher implementations and the queuing sounds insane saving and restoring the SSE registers can't be expensive enough to justify all those context switches between kernel threads.

                                                                        • peterwwillis 11 days ago

                                                                          You won't know about their ego or professionalism until you work with them. Posting on a mailing list and making a blog post about it is not proof of either of these, it's brand marketing. They're trumpeting their engineering talent to build good will/nerd rep so people will love their company, spend money there, and apply for jobs. (But what it does show is that they're good at marketing, because it's working)

                                                                          • the_duke 11 days ago

                                                                            Writing informative blog posts and publishing patches to the Linux kernel is the rare kind of marketing I can 100% support.

                                                                            • darkwater 11 days ago

                                                                              It is also marketing for sure, but it is well done marketing.

                                                                              It's like having a high PageRank in Google because you actually write meaningful, useful, well-written blog posts which Google happens (happened) to value vs link factory blog posts.

                                                                              • brnt 11 days ago

                                                                                Sounds like they're also pretty good at making disk encryption faster.

                                                                            • herpderperator 11 days ago

                                                                              > Unlike file system level encryption it encrypts all data on the disk including file metadata and even free space.

                                                                              Anyone have a source on how full disk aka block-level encryption encrypts free space? The only way I can imagine this could happen is by overwriting the entire disk initially with random data, so that you can't distinguish between encrypted and true "free space", i.e. on a brand new clean disk. Then, when a file (which, when written, would have been encrypted) is deleted (which by any conventional meaning of the word 'deleted' means the encrypted data is still present, but unallocated, thus indistinguishable from the random data in step 1), then gets overwritten again with random data?

                                                                              I would argue that overwriting an encrypted file with random data isn't really encrypting free space, but rather just overwriting the data, which already appeared random/encrypted. It is hardly any different to having a cleartext disk and overwriting deleted files with zeros, making them indistinguishable from actual free space.

                                                                              • koala_man 11 days ago

                                                                                The point of encrypting free space is just so you can't say how full the drive is.

                                                                                This way, an attacker can't focus cracking on the fullest disk, match stolen backup disks to hosts based on non-sensitive health metrics, etc.

                                                                                >The only way I can imagine this could happen is by overwriting the entire disk initially with random data

                                                                                Traditionally, for speed, you'd write all zeroes to the encrypted volume (causing the physical volume to appear random), but yes

                                                                                >Then, when a file (which, when written, would have been encrypted) is deleted

                                                                                You'd just leave it. Crucially, you don't TRIM it.

                                                                                >I would argue that overwriting an encrypted file with random data isn't really encrypting free space

                                                                                Yup, that's why it's not done

                                                                                • dlgeek 11 days ago

                                                                                  Debian does the first thing you discussed if you create an encrypted partition in the installer - it writes 0s through the crypto layer to fill the entire disk with encrypted data.

                                                                                • floatboth 11 days ago

                                                                                  > Data encryption at rest is a must-have for any modern Internet company

                                                                                  What is it protecting against — data recovery from discarded old disks? Very stupid criminals breaking into the datacenter, powering servers off and stealing disks?

                                                                                  A breach in some web app would give the attacker access to a live system that has the encrypted disks already mounted…

                                                                                  • eastdakota 11 days ago

                                                                                    As we push further and further to the edge — closer and closer to every Internet user — the risk of a machine just walking away becomes higher and higher. As a result, we aim to build our servers with a similar mindset to how Apple builds iPhones — how can we ensure that secrets remain safe even if someone has physical access to the machines themselves. Ignat's work here is critical to us continuing to build our network to the furthest corners of the Internet. Stay tuned for more posts on how we use Trusted and Secure Boot, TPMs, signed packages, and much more to give us confidence to continue to expand Cloudflare's network.

                                                                                    • netcoyote 11 days ago

                                                                                      > criminals breaking into the datacenter, powering servers off and stealing disks?

                                                                                      Yes, exactly. A company I worked for had a hard drive pulled from a running server in a (third party) data center that contained their game server binaries. Shortly afterwards as pirate company setup a business running “gray shards”, with - no surprise - lower prices.

                                                                                      • mercora 11 days ago

                                                                                        being able to purge old disks confidently in a secure manner is a upside huge enough to make this statement true in my opinion. There have been numerous incidents even involving companies specializing in securely purging disks. If your data is encrypted there is basically nothing to do you could even outright sell those from your DC or something. Just delete the keys/headers from the disk and you are safe.

                                                                                        Its also not possible to get data injected offline into your filesystem without having the keys. Without encryption you could just get the disk of the targeted server running somewhere and set your implants or what you have. When the server sees the disk back up it looks just like a hiccup or something.

                                                                                        • brobinson 11 days ago

                                                                                          > Its also not possible to get data injected offline into your filesystem without having the keys.

                                                                                          This is, in theory, possibly against volumes encrypted using AES XTS (which seems to the how the majority of FDE systems work) as the ciphertext is indeed malleable.

                                                                                          • mercora 11 days ago

                                                                                            i am no expert on this but i was thinking it is only possible to inject noise which is likely corrupting the filesystem in the process. copying/moving valid blocks should be prevented by XTS as far as i understood (which might not be that much). I guess using a filesystem with integrity checks helps a bit although its still not authenticated or something.

                                                                                      • zzzcpan 11 days ago

                                                                                        On top of what others have said it protects, for example, from governments of all countries you have servers in and their law enforcement coming in taking the servers, extracting keys for mitm, installing malware and backdoors, placing some child porn on the servers, etc., from staff from various companies in various countries that maintains and deploys the infrastructure or just has access to it doing similar nasty things, and so on.

                                                                                        • enitihas 11 days ago

                                                                                          I think mostly against breach in datacenter security. Most competent companies already have policies on how to deal with discarded old disks. The one that don't have might not be competent enough to use encryption on rest too.

                                                                                          It's all about layers of defenses.

                                                                                          • toolslive 11 days ago

                                                                                            encrypted data at rest allows you to do an instant erase of the device.

                                                                                            • derefr 11 days ago

                                                                                              Yes, the former. You can’t just put SSDs through a degausser!

                                                                                            • dependenttypes 11 days ago

                                                                                              > one can only encrypt the whole disk with a single key

                                                                                              You can still use partitions.

                                                                                              > not all cryptographic algorithms can be used as the block layer doesn't have a high-level overview of the data anymore

                                                                                              I do not really understand this. Which cryptographic algorithms can't be used?

                                                                                              > Most common algorithms require some sort of block chaining to be secure

                                                                                              Nowadays I would say that from these only CTR is common, which does not require chaining.

                                                                                              > Application and file system level encryption are usually the preferred choice for client systems because of the flexibility

                                                                                              One big issue with "Application and file system level encryption" is that you often end up leaking metadata (such as the date edited, file name, file size, etc).

                                                                                              Regardless I think that this is a really nice article. I can't wait to try their patches on my laptop.

                                                                                              • nemo1618 11 days ago

                                                                                                > Which cryptographic algorithms can't be used?

                                                                                                You can't use any algorithm that requires O(n) IVs (e.g. a separate IV per disk sector), because there's nowhere to store the IVs. (Another consequence of this is that you can't store checksums anywhere, so you can't provide integrity checks.)

                                                                                                You can't use CTR mode either, because you'll end up reusing counter values. What do you do when you need to overwrite a block with new data?

                                                                                                XTS mode solves this, at least partially. It's like CTR mode, but with an extra "tweak" that essentially hashes the block's content into the encryption key. So if you overwrite a block with new data, you get a new encryption key.

                                                                                                This isn't perfect, though, because it's still deterministic. If an attacker can see multiple states of the disk, they can tell when you revert a block to a previous state. But it's much better than other modes, especially since the main threat you want to protect against is your laptop getting stolen (in which case the attacker only sees a single state).

                                                                                                • dependenttypes 11 days ago

                                                                                                  > You can't use any algorithm that requires O(n) IVs (e.g. a separate IV per disk sector), because there's nowhere to store the IVs

                                                                                                  Certainly you can. You just have to reduce the effective sector size that the file system can use.

                                                                                                  > What do you do when you need to overwrite a block with new data?

                                                                                                  You generate a new random nonce (as per XChacha) and you store it in the sector.

                                                                                                  • Hello71 11 days ago

                                                                                                    > Certainly you can. You just have to reduce the effective sector size that the file system can use.

                                                                                                    get back to me when you find a high-performance (FAT doesn't count) Linux filesystem that supports sector sizes of 496.

                                                                                                    • dependenttypes 11 days ago

                                                                                                      Modern disks use much bigger sectors. See https://en.wikipedia.org/wiki/Advanced_Format

                                                                                                      • jdsully 11 days ago

                                                                                                        The issue is non power of 2 sector sizes. The kernel computes sectors with shifts not division (which would be slow).

                                                                                                        • dependenttypes 11 days ago

                                                                                                          I do not see how you would need to use divisions in that case.

                                                                                                          But even if that was the case, you could just pretend to the OS that you have 7 sectors of 512 bytes each rather than a single sector of 4032 bytes. (or if that was not possible you could just take the hit)

                                                                                                          • jdsully 11 days ago

                                                                                                            You need division to go from a file offset in bytes to a sector number, hence the need for power of 2 sizes to make this fast. The kernel assumes in multiple places sectors are a power for 2 for this reason - it doesn't rely on the compiler to optimize it (which may not even be possible for some of the compilers it works on).

                                                                                                            If you are talking about using reserved sectors for book keeping at the end of the disk that is possible and commonly done.

                                                                                                • richardwhiuk 11 days ago

                                                                                                  > I do not really understand this. Which cryptographic algorithms can't be used?

                                                                                                  CBC - which is one the most common stream cipher algorithm.

                                                                                                  It's not clear me whether GCM would work or not.

                                                                                                  • brandmeyer 11 days ago

                                                                                                    GCM requires somewhere to put the nonces and authentication tags. In principle, you could use a layer of indirection not entirely unlike a page table to store that information. For example, a 64-bit nonce, 64-bit block pointer, and 128-bit authentication tag could pack together in a radix tree for the job, retiring 7 bits of the virtual-to-physical mapping per level for 4 kB blocks.

                                                                                                    Of course, the downside is that now the block layer must tackle all of the write ordering issues that a filesystem does when updating the tree. The block layer would find itself greenspunning up a filesystem inside itself, even if it was a filesystem of only one file.

                                                                                                    • wahern 11 days ago

                                                                                                      The 128-bit tag length, which offers less than 128-bit strength depending on the nonce size, makes GCM and similar AEAD constructions poorly suited for archival storage. If you want to store more data without rekeying you need to reduce the authentication security. GCM makes perfect sense for ephemeral, message-based network traffic. Traditional, separate, keyed MACs still seem preferable for archival storage, especially with tree-based modes--native as with BLAKE3 or KangarooTwelve, or constructed like SHA-3-based ParallelHash.

                                                                                                      • brandmeyer 11 days ago

                                                                                                        The tag's strength doesn't depend on the nonce size in cases where you can use sequential nonces. Longer nonce sizes are valuable only when using randomly allocated nonces and you need to avoid the birthday paradox. 64 bits is considerably longer than the total write lifetime of modern disks. Even if you used a nonce per 512-byte block, you'd need well over a yottabyte of writes to roll through that counter.

                                                                                                        The profile that authenticated encryption defends against is an attacker who is attempting to feed the victim specially crafted bad blocks. 128-bit tags are good enough that the disk will be completely trashed long before the victim executes something of the attacker's choosing.

                                                                                                    • dependenttypes 11 days ago

                                                                                                      Apparently CBC is used by cryptsetup by default, see https://linux.die.net/man/8/cryptsetup

                                                                                                      It might not be ideal but it still can be used. Though, I would not call CBC common at all. Pretty much everyone has switched to CTR or some variant of it (such as GCM).

                                                                                                      Also, CBC is not a stream cipher algorithm.

                                                                                                  • steerablesafe 11 days ago

                                                                                                    > One big issue with "Application and file system level encryption" is that you often end up leaking metadata (such as the date edited, file name, file size, etc).

                                                                                                    I wonder how cryfs stacks up in this regard.

                                                                                                    https://www.cryfs.org

                                                                                                    • yrro 11 days ago
                                                                                                    • gok 11 days ago

                                                                                                      That response from the dm-crypt mailing list is unreal.

                                                                                                      • vbezhenar 11 days ago

                                                                                                        Offtopic, but why am I getting two scrollbars on this website? This is weird.

                                                                                                        • tyingq 11 days ago

                                                                                                          There is a scrollable div, the one that leads with:

                                                                                                          grep -A 11 'xts(aes)' /proc/crypto

                                                                                                          Is that what you mean?

                                                                                                          • zzzcpan 11 days ago

                                                                                                            I can confirm, they broke scrolling with CSS setting overflow-x in #main-body which for me also shows two scrollbars.

                                                                                                            • vbezhenar 11 days ago

                                                                                                              No, I literally getting two scroll bars for entire page. First scrollbar works, second scrollbar is disabled. Scrollable div is third scrollbar, but that's OK. It looks like that: https://i.imgur.com/Rs8a7m5.png

                                                                                                            • zackbloom 11 days ago

                                                                                                              Hi, I work on the Cloudflare Blog, we're working on deploying a fix now.

                                                                                                            • gautamcgoel 11 days ago

                                                                                                              Does anyone know what the picture is like on FreeBSD? Is it faster?

                                                                                                              • LinuxBender 11 days ago

                                                                                                                Does CloudFlare plan to get their kernel patches merged upstream?

                                                                                                                • yalooze 11 days ago

                                                                                                                  Second to last paragraph:

                                                                                                                  > We are going to submit this work for inclusion in the main kernel source tree, but most likely not in its current form. Although the results look encouraging we have to remember that Linux is a highly portable operating system: it runs on powerful servers as well as small resource constrained IoT devices and on many other CPU architectures as well. The current version of the patches just optimises disk encryption for a particular workload on a particular architecture, but Linux needs a solution which runs smoothly everywhere.

                                                                                                              • tbrock 11 days ago

                                                                                                                Any chance of this patch making it to the mainline kernel?

                                                                                                                • saagarjha 11 days ago

                                                                                                                  Not this one, specifically, but they've mentioned that they're working on upstreaming some derivative patches.

                                                                                                                • justlexi93 11 days ago

                                                                                                                  Neat. Poorly optimized queues can have a significant impact on performance, doubling throughput for disk encryption with some queue tweaks is pretty significant.

                                                                                                                  • lidHanteyk 11 days ago

                                                                                                                    As usual, Cloudflare wants to pretend that they are community players, but they aren't. If they weren't hypocrites, then they'd submit their patches like [0] for upstream review, but they haven't. (I searched LKML.) I understand the underlying desire, which is to avoid using a queue for CPU-bound work that could be done immediately, but there doesn't appear to have been any serious effort to coordinate with other Linux contributors to figure out a solution to the problem.

                                                                                                                    [0] https://github.com/cloudflare/linux/blob/master/patches/0024...

                                                                                                                    • Majromax 11 days ago

                                                                                                                      The article discusses this in the conclusion:

                                                                                                                      > We are going to submit this work for inclusion in the main kernel source tree, but most likely not in its current form. Although the results look encouraging we have to remember that Linux is a highly portable operating system: it runs on powerful servers as well as small resource constrained IoT devices and on many other CPU architectures as well. The current version of the patches just optimises disk encryption for a particular workload on a particular architecture, but Linux needs a solution which runs smoothly everywhere.

                                                                                                                      That is, they think their current patch is too specialized for their own use-case to warrant inclusion in the mainline kernel without significant adaptation.

                                                                                                                      • marcinzm 11 days ago

                                                                                                                        >but there doesn't appear to have been any serious effort to coordinate with other Linux contributors to figure out a solution to the problem.

                                                                                                                        Well when they reached out to the community they were told they're idiots and should f* off in only somewhat nicer language. Then they were simply ignored.

                                                                                                                        When your community is toxic don't complain that people don't want to be part of it.

                                                                                                                        • manigandham 11 days ago

                                                                                                                          They are submitting their work, after they put in even more work to make it more universally applicable to all Linux users. They also did try to engage with the community who basically told them that they didn't know how fast crypto should be.

                                                                                                                        • thedance 11 days ago

                                                                                                                          All this seems to me a series of very strong arguments for doing the crypto in your application.

                                                                                                                          • saagarjha 11 days ago

                                                                                                                            That would be even slower and more complex.

                                                                                                                            • thedance 11 days ago

                                                                                                                              Why? The slowness in this article comes from architectural brain damage inside the kernel. Doing the encryption and IO on your threads, when and where you choose to do it, is the solution. As your performance requirements increase, you are less and less likely to want kernel features.