|
|
Subscribe / Log in / New account

Two new ways to read a file quickly

Two new ways to read a file quickly

Posted Mar 7, 2020 10:37 UTC (Sat) by adobriyan (subscriber, #30858)
In reply to: Two new ways to read a file quickly by walters
Parent article: Two new ways to read a file quickly

> Isn't the cost of that mostly locking and rendering them in the kernel, not the system calls?

Yes. Naming /proc and /sys as an example is quite funny.

On my system the numbers are:
a) calling non-existent system call -- 600 cycles (as measured by rdtsc)
b) calling umask(0) -- 670 cycles (system call which does something)
c) open, read, close /proc/version -- ~6500 cycles (static /proc file which goes through seq_file interface)
d) open, read, close /proc/loadavg -- ~7580 cycles (dynamic /proc file)

Sysfs generally generate deeper hierarchies and (correct me, if I'm wrong) revalidates dentries on each lookup.
But sysfs have simple file contents.

I feel that readfile is not important. Stracing all those stat collecting top-like utilities shows that they are living in stone age.

5516 openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 5
5516 lseek(5, 0, SEEK_SET) = 0
5516 read(5, "4082.55 63567.25\n", 8191) = 17

and the it reseeks to offset 0 again.

5516 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
5516 fstat(6, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
5516 getdents64(6, /* 273 entries */, 32768) = 6856
5516 openat(AT_FDCWD___WHAT___, "/proc/1/stat", O_RDONLY) = 7

Reading file to Vec[u8] by default In Rust does multiple system calls because it doubles the buffer for vector contents and starts with small value like 16(?).

Why even help userspace developers?


to post comments

Two new ways to read a file quickly

Posted Mar 7, 2020 11:34 UTC (Sat) by mpr22 (subscriber, #60784) [Link] (1 responses)

> Why even help userspace developers?

"Some userspace developers are gormless" is not an argument against providing better tools for userspace developers who are not gormless.

(Whether any particular tool is actually a better tool is a separate conversation.)

Two new ways to read a file quickly

Posted Mar 7, 2020 12:05 UTC (Sat) by adobriyan (subscriber, #30858) [Link]

It is not, but it can be very demoralizing.

If top(1) would start preading /proc/uptime, it will do 1 system call just like with readfile().

The best way to speed up reading lots of /proc and /sys files by factor of 5x is to upload statistics without VFS involvement.
but this battle is probably lost.

Two new ways to read a file quickly

Posted Mar 7, 2020 14:38 UTC (Sat) by burntsushi (guest, #110124) [Link] (5 responses)

> Reading file to Vec[u8] by default In Rust does multiple system calls because it doubles the buffer for vector contents and starts with small value like 16(?).

No it doesn't: https://doc.rust-lang.org/src/std/fs.rs.html#266-274

$ cat src/main.rs
fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = std::fs::read("/tmp/some-big-file")?;
println!("{}", data.len());
Ok(())
}

$ cargo build --release

$ strace ./target/release/rustfile
openat(AT_FDCWD, "/tmp/some-big-file", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=941088098, ...}) = 0
mmap(NULL, 941088768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9f65d43000
read(3, "Presented by IM Pictures\nProduce"..., 941088099) = 941088098
read(3, "", 1) = 0
close(3)

Two new ways to read a file quickly

Posted Mar 7, 2020 16:44 UTC (Sat) by adobriyan (subscriber, #30858) [Link] (4 responses)

>No it doesn't:

Most files in /proc report st_size=0.

openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
read(3, "cpu 2591925 76 66642 2680980 29", 32) = 32
read(3, "58 0 925 0 0 0\ncpu0 161817 6 407", 32) = 32
read(3, "8 167469 97 0 429 0 0 0\ncpu1 158"..., 64) = 64
read(3, "cpu2 158993 7 4186 170648 115 0 "..., 128) = 128
read(3, "60993 10 3957 168784 202 0 7 0 0"..., 256) = 256
read(3, "9 163063 143 0 60 0 0 0\ncpu12 16"..., 512) = 512
read(3, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(3, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 2048) = 821
read(3, "", 1227) = 0
close(3) = 0

Two new ways to read a file quickly

Posted Mar 7, 2020 23:02 UTC (Sat) by josh (subscriber, #17465) [Link] (3 responses)

It doesn't seem reasonable to put all the blame on userspace when the kernel gives it misleading information.

I wonder if we could enhance statx to have a STATX_SIZE_HINT flag? With that flag, statx could return a new attribute indicating that the file has an unspecified size and should be read in a single read call, along with a hint for a buffer size that's *probably* big enough. That would substantially reduce the number of read calls.

(Also, for future reference, the first statx call is Rust probing to see if the kernel supports statx, and it only happens for the first statx in the program. Likewise, the fcntl checks if the kernel respects O_CLOEXEC, and that only happens on the first open.)

Two new ways to read a file quickly

Posted Mar 9, 2020 14:10 UTC (Mon) by walters (subscriber, #7396) [Link] (2 responses)

Maybe a simpler change would be for the kernel to cache the *last* size of a file in /proc and report it? Though it might trigger bugs in userspace apps that wouldn't be prepared for the file growing between stat() and read().

Two new ways to read a file quickly

Posted Mar 9, 2020 15:29 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I think any app or library expecting stat to have trustworthy information at read time without some safety net is already a disaster waiting to happen (especially if they're in the /proc hierarchy; the file could fail to *open* after stat returns because the process exited in the meantime).

Two new ways to read a file quickly

Posted Mar 11, 2020 11:51 UTC (Wed) by adobriyan (subscriber, #30858) [Link]

Best way to read /proc is to read PAGE_SIZE minimum at once, and interpret short read as EOF for small files like /proc/uptime or /proc/*/statm which are bounded in size. Bigger reads should be (PAGE_SIZE << n) for unbounded files (/proc/*/maps):

m->buf = seq_buf_alloc(m->size <<= 1);

Most of sysfs is 4KB tops but arbitrary sized for binary attributes.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds