The future of the page cache

Posted Jan 26, 2017 19:12 UTC (Thu) by willy (subscriber, #9762)
In reply to: The future of the page cache by neilbrown
Parent article: The future of the page cache

> So the buffer cache doesn't exist, but the API does and it gives access to part of the page cache. I'm sure that is what Matthew meant, but I didn't think it can across very clearly.

I thought I did say that! I haven't rewatched my talk yet, but it's certainly what I intended to say.

What I didn't mention is that there are actually two potential places to look for data in the page cache; one for the file which contains this data block, and one for the block device's address space. So there's the potential for aliasing in the page cache in a way that there wasn't in the block cache. It doesn't usually cause any problems because metadata is in different blocks from file data. It's only problematic if someone's expecting to modify file blocks and disc blocks and have them be coherent with each other's changes.

I've also been thinking about my statement that it's a lot of little page caches. That's correct from the point of view of "doing a lookup", but from the point of view of maintaining the active/inactive list, it's a global cache.

The future of the page cache

Posted Jan 27, 2017 7:05 UTC (Fri) by geuder (subscriber, #62854) [Link] (3 responses)

>>> Now, the buffer cache still exists, but its entries point into the page cache....
>> So the buffer cache doesn't exist, but the API does and it gives access to part of the page cache. I'm sure that is what Matthew meant, but I didn't think it can across very clearly.

> I thought I did say that! I haven't rewatched my talk yet, but it's certainly what I intended to say.

Well, if kernel hackers have difficulties to find the right words to explain the subsystems they are working on...

Of course implementation can be tricky to get optimal in all cases, but userspace should remain unaffected. However, I haven't found it easy either to use it correctly. Not working with DAX here, but right the opposite speed-wise, USB2 memory sticks and eMMC...

To "flash" an embedded system I do something like

curl .../img.xz | zx -d | dd of=/dev/sda obs=1M

So when is the writing "done"?

What people typically recommend is calling sync(1). However, sync is documented to work on filesystems. And /dev/sda is not a filesystem. So sync(1) did not sound like right tool here. Instead I changed my code to

blockdev --flashbufs /dev/sda

It seems to work in practice, but is it the right thing to do?

Interesting enough when running the code on a rather recent Core i7, the dd completes fast and flashbufs takes 5-7 secs. Fair enough, buffering happened and it takes time to get data out over USB to not so great flash memory. But if I run the same code on a modest ARM the dd seems to hang before completion and then the flashbufs completes in no time. So it looks like on Intel the dd doesn't wait for buffers to be flushed but on ARM it does. For sure the difference cannot be Intel vs. ARM, but maybe some sysctl knob? Haven't compared whether both use the same IO scheduler.

A third option would be to run dd with oflags=direct. By a simple "watch free -w" I can see that this does not use buffer entries.

Writing being "done" could mean different things. In the most common case the user wants to remove the USB stick.

I am looking at different case right now

... | dd of=/dev/sda obs=1M
fsck.ext4 /dev/sda1
resize2fs /dev/sda1

So I don't necessarily need the data to written to media (yet), but I need the 2 block devices /dev/sda and /dev/sda1 to be "cache coherent". Linux does not seem to guarantee this, I can see the fsck fail rather frequently on the Intel machine (never seen it on the ARM machine). In practice blockdev --flashbufs /dev/sda before running fsck /dev/sda1 seems to do it. But I'm still a bit puzzled why it takes 5-7 seconds on one system and 0 on another.

The future of the page cache

Posted Jan 27, 2017 7:53 UTC (Fri) by geuder (subscriber, #62854) [Link]

--flashbufs should of course have been - -flushbufs. Sorry, too early in the morning, after hacking late...

The future of the page cache

Posted Feb 8, 2017 11:36 UTC (Wed) by abelloni (subscriber, #89904) [Link] (1 responses)

I think you want to pass oflags=sync to dd.

The future of the page cache

Posted Feb 8, 2017 11:54 UTC (Wed) by geuder (subscriber, #62854) [Link]

Yes, I have seen oflags=sync

However, it seems like a big waste to handle the whole x GiB synchronously, just to know when the last block has been written.

I have not measured though. And I have not tested whether it solves the /dev/sdX -- /dev/sdX1 coherence issue.

For the more common case of USB removal (not my problem here) I have found

udiskctl power-off

in the meantime. At least for everybody using systemd.