Two new ways to read a file quickly
Two new ways to read a file quickly
Posted Mar 7, 2020 10:37 UTC (Sat) by adobriyan (subscriber, #30858)In reply to: Two new ways to read a file quickly by walters
Parent article: Two new ways to read a file quickly
Yes. Naming /proc and /sys as an example is quite funny.
On my system the numbers are:
a) calling non-existent system call -- 600 cycles (as measured by rdtsc)
b) calling umask(0) -- 670 cycles (system call which does something)
c) open, read, close /proc/version -- ~6500 cycles (static /proc file which goes through seq_file interface)
d) open, read, close /proc/loadavg -- ~7580 cycles (dynamic /proc file)
Sysfs generally generate deeper hierarchies and (correct me, if I'm wrong) revalidates dentries on each lookup.
But sysfs have simple file contents.
I feel that readfile is not important. Stracing all those stat collecting top-like utilities shows that they are living in stone age.
5516 openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 5
5516 lseek(5, 0, SEEK_SET) = 0
5516 read(5, "4082.55 63567.25\n", 8191) = 17
and the it reseeks to offset 0 again.
5516 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
5516 fstat(6, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
5516 getdents64(6, /* 273 entries */, 32768) = 6856
5516 openat(AT_FDCWD___WHAT___, "/proc/1/stat", O_RDONLY) = 7
Reading file to Vec[u8] by default In Rust does multiple system calls because it doubles the buffer for vector contents and starts with small value like 16(?).
Why even help userspace developers?
Posted Mar 7, 2020 11:34 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
"Some userspace developers are gormless" is not an argument against providing better tools for userspace developers who are not gormless.
(Whether any particular tool is actually a better tool is a separate conversation.)
Posted Mar 7, 2020 12:05 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link]
If top(1) would start preading /proc/uptime, it will do 1 system call just like with readfile().
The best way to speed up reading lots of /proc and /sys files by factor of 5x is to upload statistics without VFS involvement.
Posted Mar 7, 2020 14:38 UTC (Sat)
by burntsushi (guest, #110124)
[Link] (5 responses)
No it doesn't: https://doc.rust-lang.org/src/std/fs.rs.html#266-274
$ cat src/main.rs
$ cargo build --release
$ strace ./target/release/rustfile
Posted Mar 7, 2020 16:44 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (4 responses)
Most files in /proc report st_size=0.
openat(AT_FDCWD, "/proc/stat", O_RDONLY|O_CLOEXEC) = 3
Posted Mar 7, 2020 23:02 UTC (Sat)
by josh (subscriber, #17465)
[Link] (3 responses)
I wonder if we could enhance statx to have a STATX_SIZE_HINT flag? With that flag, statx could return a new attribute indicating that the file has an unspecified size and should be read in a single read call, along with a hint for a buffer size that's *probably* big enough. That would substantially reduce the number of read calls.
(Also, for future reference, the first statx call is Rust probing to see if the kernel supports statx, and it only happens for the first statx in the program. Likewise, the fcntl checks if the kernel respects O_CLOEXEC, and that only happens on the first open.)
Posted Mar 9, 2020 14:10 UTC (Mon)
by walters (subscriber, #7396)
[Link] (2 responses)
Posted Mar 9, 2020 15:29 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Mar 11, 2020 11:51 UTC (Wed)
by adobriyan (subscriber, #30858)
[Link]
m->buf = seq_buf_alloc(m->size <<= 1);
Most of sysfs is 4KB tops but arbitrary sized for binary attributes.
Two new ways to read a file quickly
Two new ways to read a file quickly
but this battle is probably lost.
Two new ways to read a file quickly
fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = std::fs::read("/tmp/some-big-file")?;
println!("{}", data.len());
Ok(())
}
openat(AT_FDCWD, "/tmp/some-big-file", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=941088098, ...}) = 0
mmap(NULL, 941088768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9f65d43000
read(3, "Presented by IM Pictures\nProduce"..., 941088099) = 941088098
read(3, "", 1) = 0
close(3)
Two new ways to read a file quickly
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
read(3, "cpu 2591925 76 66642 2680980 29", 32) = 32
read(3, "58 0 925 0 0 0\ncpu0 161817 6 407", 32) = 32
read(3, "8 167469 97 0 429 0 0 0\ncpu1 158"..., 64) = 64
read(3, "cpu2 158993 7 4186 170648 115 0 "..., 128) = 128
read(3, "60993 10 3957 168784 202 0 7 0 0"..., 256) = 256
read(3, "9 163063 143 0 60 0 0 0\ncpu12 16"..., 512) = 512
read(3, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(3, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 2048) = 821
read(3, "", 1227) = 0
close(3) = 0
Two new ways to read a file quickly
Two new ways to read a file quickly
Two new ways to read a file quickly
Two new ways to read a file quickly