LWN: Comments on "The folio pull-request pushback"

The folio pull-request pushback

scientes — Sun, 10 Oct 2021 09:00:48 +0000

Maybe the TLB could be put *after* the cache, by using globally addressable memory, and having ranged memory permissions for such large slabs of memory, instead of paged memory be the only option (you can still defrag by putting a TLB and MMU *behind* the cache, which like an IOMMU has minimal overhead). We have 64-bit of address space now—there is no shortage of address space.

The folio pull-request pushback

immibis — Tue, 28 Sep 2021 16:13:54 +0000

Okay, what if I have 32GB of RAM and I want to grep a bigger project that was using 30GB before? You can't really justify a 300% overhead(!!!) by saying it's okay on one particular workload.

The folio pull-request pushback

SomeOtherGuy — Thu, 23 Sep 2021 08:22:53 +0000

There's a lot of focus on the name here, this reference is often overused but I think you guys are having a bit of a "bikeshed" moment - the name is the most trivial part of this.

This is quite a common situation, so much so the object-orientated analogy is common (but slightly altered)

We have a Bird class, later Birds gain the ability to fly, our Penguin subclass can't fly - we have to handle that.

At some point the sane action is to create a FlyingBird subclass and stick your flying birds there and leave Penguin under the now-implied-flightless Bird base class, or (which you can do here as structs don't inherit) have a FlightlessBird name we put Penguins under.

You have a third option: both, you now have FlightlessBird and FlyingBird and the implied-property-of-Bird (whichever it was, flightless or not) is now attached to the name.

That's it, those are your options up to name isomorphism - bickering about FlyingBird vs flightful_bird doesn't change what's going on.

The folio pull-request pushback

jwarnica — Mon, 20 Sep 2021 00:37:50 +0000

Kernel will cache everything, but discard LRU when needed.

A config file read, parsed, stored in working memory, and closed, it being cached it's a "waste" of cache memory either way.

It's only painful if caching that new file displaces an older cached file which will be read again in some human meaningful time.

The folio pull-request pushback

arnd — Fri, 17 Sep 2021 09:31:04 +0000

I did some measurements a while ago, using linux-5.4 at the time to see the effect of additional
memory usage of the different page sizes. I tried running linux-5.14-rc1+folio as well, and
put the results in a graph:

https://docs.google.com/spreadsheets/d/1Y-eeXEHr8Tud2ul4i...

I did this on a 16-Core Arm machine that supports 4KB, 16KB and 64KB pages, giving 4GB to a virtual
machine, and pinning down part of that memory before building a fixed kernel source tree.

Since the compiler uses mostly contiguous anonymous memory, the effect of the page size is not as strong as when considering only the page cache that wastes more memory, but you can definitely see that the 64KB kernel needs around double the RAM compared to a 4KB kernel, and it also suffers more when it does start paging. The 4KB kernel seems to work much better when it's already deep into swap, while the 64KB kernel gets unusable pretty much instantly as soon as it runs out of free pages.

The 16KB page kernel works better than expected -- not only is it almost as fast as the 64KB version when it has enough RAM available, it also copes with out-of-memory conditions aslmost as well as the 4KB version.

The folio-enabled kernel also seems to have a problem with running into swap, but I don't know if that's a result of something different in the folio patches, or a difference between the old 5.4 kernel and the new 5.14-rc1 version. If I find the time to run another test with 5.14-rc1 without the folio patches, I'll add the data to the graph.

The folio pull-request pushback

marcH — Wed, 15 Sep 2021 18:32:36 +0000

> Yes, folios may be imperfect and not-abstracted enough, but they *are* a step in the right direction and further improvements can be built built on top of them.
> I don't see an argument that folios block any of the other work that somebody else might be doing (but, or so it seems, currently isn't).

This is the key point IMHO. If some short-term code change makes more difficult some hypothetical, longer term plans, then the long term vision should be detailed enough to at least demonstrate how it conflicts with the short term change. Otherwise it's far beyond vaporware.

The folio pull-request pushback

cpitrat — Tue, 14 Sep 2021 11:05:26 +0000

A computer is done for general purpose. Dismissing any use case that needs opening a large number of small file is a weird way to answer the concern. The kernel files are representative of what most devs would work with: small files. Is it representative of all users and all workloads? Certainly not but it represents an existing scenario and quadrupling the memory consumed is problematic.

Many files smaller than 64KiB ar me opened by the system at some point and stored in cache (all configuration files, log files ...). Only in /etc I have 2500 files smaller than 8KiB and I'm pretty sure most of them have been opened at one point and cached by the kernel. Same for files under $HOME/.config and other dotted directories.

The folio pull-request pushback

Wol — Tue, 14 Sep 2021 05:28:22 +0000

> How many small files are actually stored in cache? Conf files are generally read and abandoned. If it is a gain, then major server operators would be quick to code systems to store data in larger data structures; even now, tossing a bunch of small files into a database could save space and time.

Hasn't the kernel just added a "read and abandon" facility? User space may read and abandon, but the kernel caches EVERYTHING by default aiui. The cost of that is measurable, and big.

Cheers,
Wol

The folio pull-request pushback

dvdeug — Mon, 13 Sep 2021 23:32:51 +0000

How many small files are actually stored in cache? Conf files are generally read and abandoned. If it is a gain, then major server operators would be quick to code systems to store data in larger data structures; even now, tossing a bunch of small files into a database could save space and time.

I'm sure increasing the page size to 64KB would be an improvement in some cases, and hurt others. I just think that storing every single kernel file in cache is a bad comparison. (If nothing else, if grepping the entire kernel is something you're actually doing frequently, you're doing it wrong; efficient searching via precomputed indexes has been around for 50 years, or if we ignore the electronic computer part, for centuries.) There should be some sort of measurement on real situations.

The folio pull-request pushback

dvdeug — Mon, 13 Sep 2021 22:37:06 +0000

If your system does not have a separate header cache, there's a problem you might like to fix. When opening the mailbox up cold, you're going to be paying that cost anyway. Once you've opened up the mailbox, your mail program certainly can and should store the headers in memory (much more reliable than the kernel cache, and maybe even noticeably faster), and possibly should store the messages in memory, to avoid all this cache mess to begin with. (It'll cost less memory than storing them in kernel cache, even with 4KB pages.)

> it's no less handwavy than the original assertion that you'll surely have 5 GB of free RAM.

Kernel programmers are well-paid professionals. They don't have used Dell Optiplexes as their main PC; their programming boxes are almost certainly high-end hardware. $1000 will buy you a computer with 16GB. There's a lot of websites that tell you 8 GB is fine for programming, but generally kernel programmers aren't going to need or want to skimp out on their hardware.

The folio pull-request pushback

Paf — Mon, 13 Sep 2021 21:50:13 +0000

Ok, but think about all kinds of server use cases with many small files. They don’t necessarily have “spare” RAM laying around. They’re specced to the system. This is a truly huge overhead in a lot of real world cases which would cost a lot of real world money.

The folio pull-request pushback

Sesse — Mon, 13 Sep 2021 09:32:47 +0000

Opening old emails? Remember, I'm talking about opening a _mailbox_, reading every single email to check its headers. (No, not all systems can maintain a separate header cache.)

Also, please note that when you dismiss others' (real!) use cases as “handwaving based off contrived situations”, it does not come across as the most friendly way to make your case. At the very least, it's no less handwavy than the original assertion that you'll surely have 5 GB of free RAM.

The folio pull-request pushback

dvdeug — Sun, 12 Sep 2021 22:37:19 +0000

> What if I want to read my email, which is on Maildir, and would like the mailbox to be in cache so that I can open it quickly? Is it reasonable to waste gigabytes of RAM (which I would prefer to use on opening a few extra tabs in my browser…) on 64 kB pages for each email?

Catting a small, cold file to Konsole with time -v yielded "Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01". Dumping a 60k cold file to Konsole with time -v yielded "Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.26". In both cases, this was off hard drive. I'm going to say that demanding your entire mailbox with thousands of emails to be in cache so you can save hundredths of a second in opening old emails is unreasonable, and demanding the entire kernel for everyone to be optimized for that occurrence is even more unreasonable.

What if the reduced kernel memory and reduced data handling by the kernel for 64KB pages speeds up opening new tabs in your browser and switching between them? I'd like hard numbers on a variety of real-life situations, not handwaves based off contrived situations.

The folio pull-request pushback

Sesse — Sun, 12 Sep 2021 11:40:40 +0000

> if you're someone running grep over the kernel source repeatedly, you've likely got 5GB to spare

I fail to find the logic in this?

But to take another case: What if I want to read my email, which is on Maildir, and would like the mailbox to be in cache so that I can open it quickly? Is it reasonable to waste gigabytes of RAM (which I would prefer to use on opening a few extra tabs in my browser…) on 64 kB pages for each email?

The folio pull-request pushback

dvdeug — Sat, 11 Sep 2021 22:38:01 +0000

> In a previous folio discussion, Al Viro did a quick calculation showing just how much more memory it would take to keep the kernel source in memory with a larger page size. A 64KB size would quadruple the memory used, for example; it is not a small cost.

Is that a realistic cost, though? 64KB would be 4832MB(? *); if you're someone running grep over the kernel source repeatedly, you've likely got 5GB to spare. A kernel compilation is going to be a lot messier, with a lot more temporary files and executables fighting for file cache. I've got about 3000 files open on this system according to lsof, which is far less than the 71000 kernel source files. That's an additional 192MB if each of them adds 64KB, which wouldn't be noticed.

There's a lot more to be studied, but I'm not sure that quick calculation reflects anything that really matters.

* Al Viro wrote "64Kb 4832Mb"; since he starts at 4Kb and everyone else says the base page is 4KB, not 4Kb, I assume that's careless capitalization.

The folio pull-request pushback

smurf — Sat, 11 Sep 2021 19:51:53 +0000

Ugh. Whatever happened to the idea that the proof is in the code?

Yes, folios may be imperfect and not-abstracted enough, but they *are* a step in the right direction and further improvements can be built built on top of them.

I don't see an argument that folios block any of the other work that somebody else might be doing (but, or so it seems, currently isn't).

The folio pull-request pushback

flussence — Sat, 11 Sep 2021 18:29:37 +0000

This seems like it's becoming the next BKL... a huge amount of churn, even though the improvements are measurable and significant it's going to take some time to convince everyone. Hopefully not as long as that though!

The folio pull-request pushback

koverstreet — Sat, 11 Sep 2021 04:23:11 +0000

My recap:

https://lore.kernel.org/linux-fsdevel/20210911012324.6vb7...