|
|
Subscribe / Log in / New account

Speeding up Linux disk encryption (Cloudflare)

The Cloudflare blog has an article on the company's work to improve the performance of Linux disk encryption. "As we can see the default Linux disk encryption implementation has a significant impact on our cache latency in worst case scenarios, whereas the patched implementation is indistinguishable from not using encryption at all. In other words the improved encryption implementation does not have any impact at all on our cache response speed, so we basically get it for free!" Patches are available, but they are apparently not in any form to go upstream.

to post comments

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 7:00 UTC (Thu) by jezuch (subscriber, #52988) [Link] (5 responses)

That was fun! Always love a good optimisation story :) The response from the dm-crypt maintainers was not fun at all, though:

> If the numbers disturb you, then this is from lack of understanding on your side. You are probably unaware that encryption is a heavy-weight operation...

Am I being a snowflake for reading this as just a little condescending?

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 8:41 UTC (Thu) by Karellen (subscriber, #67644) [Link] (4 responses)

No, that's how I read it too.

I also thought it was pretty classy of Cloudflare to not mention the responder's name in the blog post, or link directly to the reply. I probably would not have been so kind.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 9:59 UTC (Thu) by epa (subscriber, #39769) [Link] (3 responses)

On reading that quotation I assumed it was from some peanut-gallery forum participant like myself, rather than from one of the maintainers.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 11:22 UTC (Thu) by geuder (subscriber, #62854) [Link] (2 responses)

It was said that the person replying has zero commits. (Typing on my phone, so I did not run git log --author to verify) Which brings us to the next problem. When you post as a newcomer to some list that something must be not as it should, you don't get qualified replies, even if have done a lot of homework before posting. Just happened to me on another list these days. Of course I know, nobody is paid for replying on mailing lists.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 27, 2020 15:03 UTC (Fri) by hailfinger (subscriber, #76962) [Link] (1 responses)

Agreed.
However, the person replying is the author/maintainer of the cryptsetup FAQ.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 27, 2020 17:00 UTC (Fri) by geuder (subscriber, #62854) [Link]

Ah, that changes things.

Bufferbloat

Posted Mar 26, 2020 9:33 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (1 responses)

It's interesting to see most of the problems were caused by various queues in I/O path. Buffers made sense for HDDs, but are impediments for contemporary storage. Biggest part of fix was ripping out the queues.
Once again bufferbloat is to blame, and once again storage layers are emulating what networking did earlier.

Bufferbloat

Posted Mar 29, 2020 22:40 UTC (Sun) by bored (subscriber, #125572) [Link]

That is only partially true. I had problems in the very early 2000's using disks arrays we purchased, that cost barely more than a midrange workstation, because the linux block layer/buffering would peg the CPU's in our machines at 100% while only delivering a small fraction of the disk's bandwidth capabilities.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 10:02 UTC (Thu) by epa (subscriber, #39769) [Link]

For benchmarking and debugging, it's probably worth having 'none' as one of the available ciphers, as long as nobody will get confused enough to pick it in production. Nowadays hardware-accelerated crypto is very fast, as the article notes, but it's still nice to eliminate that variable altogether when hunting a slowdown.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 26, 2020 11:52 UTC (Thu) by geuder (subscriber, #62854) [Link] (5 responses)

That's a great blog post to read. My main thoughts were:

That's what you get with one size fits all. The kernel is supposed to support everything from spinning disks, tiny 32 bit systems up to nvme, hundreds of gigs of RAM. It would be a miracle if all scenarios performed ideally. Still it performs well enough in many cases.

There is certainly happening a lot of bit rot in the kernel. The resources I have had at work have been always orders of magnitude smaller than what Cloudflare seems to have. Still we have identified similar problems, where 10+ year old code just doesn't work very well. With small resources all you can do is go away (not from Linux, but from a certain fs for example) or make a really dirty hack that you don't dare to show anybody else, even if it happens to work in your system.

Anyway, with Linux all these options exist, at Cloudflare scale and for the 0.3 person kernel teams. If you don't like it, go out, get/buy/write a kernel and report when you are happier :) I'm willing to listen, but I don't hold my breath until that.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 27, 2020 17:12 UTC (Fri) by geuder (subscriber, #62854) [Link] (4 responses)

> you don't dare to show anybody else

Sorry, sloppy wording. I did not intend to ask for violating GPL here. I just meant writing blog posts or posting it to a kernel list. In your tar ball there is always hope that nobody ever looks at it :) Although I as a developer prefer complete git history over tar balls...

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 29, 2020 17:05 UTC (Sun) by rillian (subscriber, #11344) [Link]

Don't worry. There's plenty of hope no one will look at your git repo either. :)

Speeding up Linux disk encryption (Cloudflare)

Posted Apr 3, 2020 15:27 UTC (Fri) by paulj (subscriber, #341) [Link] (2 responses)

I think it's pretty clear that for any project kept in git, that the full git repo is the preferred source for such a project. And generally, for any project kept in an SCM where checkouts imply the full history is distributed, a checkout with the full history will be the preferred form of source access for its developer.

Seems pretty obvious, except to those invested in it not being obvious.

Speeding up Linux disk encryption (Cloudflare)

Posted Apr 4, 2020 2:07 UTC (Sat) by pabs (subscriber, #43278) [Link] (1 responses)

That depends on your available bandwidth and the size of the repo, some people in some places for some repos are likely to prefer `git clone --depth=1` plus remote history interactions rather than a full copy of the history.

Speeding up Linux disk encryption (Cloudflare)

Posted Apr 4, 2020 5:23 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Not to mention reproducible build and offline build folks might like reproducible inputs. Yeah, you can use the commit id, but then you have to clone the whole thing (`git clone --depth 1 $repo $sha` is quite unreliable; you need to clone a refname, but then you need to guess at your `--depth` for how far back you want).

cpufreq

Posted Mar 26, 2020 14:55 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link] (1 responses)

Years ago doing a hours-long copy operation on a dm-crypt spinning disk, I found that setting the cpufreq governor to 'performance' instead of 'ondemand' sped it up significantly (on a CPU without AES-NI). I think that part of the time the CPU was doing encryption and part of the time the CPU was waiting on I/O, and the average CPU load was too low for the ondemand governor to use a higher frequency, even though overall throughput benefited significantly from a higher CPU frequency.

cpufreq

Posted Mar 27, 2020 19:20 UTC (Fri) by flussence (guest, #85566) [Link]

The ondemand governor is… bad, to put it mildly. It was slightly less bad with BFS, but with the advent of schedutil there's no contest. I'm not sure why it's kept around, in my experience even the vendor-specific tweaks don't improve it enough for it to be worth using.

Speeding up Linux disk encryption (Cloudflare)

Posted Mar 29, 2020 19:29 UTC (Sun) by floppus (guest, #137245) [Link]

Cloudflare says they've been running this in production, but would it be safe for a random sysadmin to use? Assuming they have a good backup policy in place? :)

The patches look remarkably simple, to my untrained eye. I realize they're a bit kludgy and not a good long-term solution, but can anybody familiar with dm-crypt comment on whether the patches as-is are likely to have any stability / data corruption issues?


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds