Speeding up Linux disk encryption

(blog.cloudflare.com)

491 points | by jgrahamc 179 days ago

3 comments

  • nullc 179 days ago

    > otherwise, we just forward the encryption request to the slower, generic C-based xts(ecb(aes-generic)) implementation

    This seems like at least something of a bad idea, because that implementation (if my search-fu is correct) is:

    https://github.com/torvalds/linux/blob/master/crypto/aes_gen...

    Which is obviously not constant time, and will leak information through cache/timing sidechannels.

    AES lends itself to a table based implementation which is simple, fairly fast, and-- unfortunately-- not secure if sidechannels matter. Fortunately, AES-NI eliminated most of the motivation for using such implementations on a vast collection of popular desktop hardware which has had AES-NI for quite a few years now.

    For the sake of also being constructive, here is a constant time implementation in naive C for both AES encryption and decryption (the latter being somewhat hard to find, because stream modes only use the former):

    https://github.com/bitcoin-core/ctaes

    (sadly, being single-block-at-a-time and constant time without hardware acceleration has a significant performance cost! ... better could be done for XTS mode, as the above algorithm could run SIMD using SSE2-- it isn't implemented in that implementation because the intended use was CBC mode which can't be parallelized like that)

    Can't the kernel aes-ni just be setup to save the fpu registers itself on the stack, if necessary?

    • convivialdingo 178 days ago

      Did this commercially for 15 years. Always the same problems.

      We ended up with several solutions- but all of them generally work the same conceptually.

      First off, separation of I/O layers. System calls into the FS stack should be reading and writing only to memory cache.

      Middle layer to schedule, synchronize and prioritize process IO. This layer fills the file system caché with cleartext and schedules writes back to disk using queues or journals.

      You also need a way to convert data without downtime. A simple block or file kernel thread to lock, encrypt, mark and writeback works well.

      Another beneficial technique is to increase blocksizes on disk. User Processes usually work in 4K blocks, but writing back blocks at small sizes is expensive. Better to schedule those writebacks later at 64k blocks so that hopefully the application is done with that particular stretch of data.

      Anyway, my 2 pennies.

      • tyingq 179 days ago

        The blog post reads like this all happened recently, but their linked post to the dm-crypt mailing list is from September 2017[1]. I'm curious if they've interacted with the dm-crypt people more recently.

        [1]https://www.spinics.net/lists/dm-crypt/msg07516.html