LWN: Comments on "Storage testing"

Storage testing

tytso — Wed, 29 May 2019 23:03:57 +0000

> Given that the kernel's upstream testing is totally inadequate currently, there's an opportunity here :-).

I assume you're talking about kselftests, which is the self testing infrastructure which is included as part of the kernel sources? It has a very different purpose compared to other test suites. One of its goals is that it wants the total test time of all of the tests to be 20 (twenty) minutes. That's not a lot of time, even if a single file system were to hog all of it.

Before I send a push request to Linus, I run about 20 VM-hours worth of regression tests for ext4. It's sharded across multiple VM's which get launched in parallel, but that kind of testing is simply not going to be accepted into kselftests. Which is fine; it has a very different goal, which is as a quick "smoke test" for the kernel. You'd have to ask the kselftest maintainer if they were interested in taking it in a broader direction, and adding some of the support that would be needed to allow tests to be sharded across multiple VM's. One of the things that xfstests has, but which kselftests does not, is the option of writing the test results in an XML format, using the Junit format:

This allows me to reuse some Junit Python libraries to coalesce multiple XML report files and generate statistics like this:

ext4/4k: 464 tests, 43 skipped, 4307 seconds
ext4/1k: 473 tests, 1 failures, 55 skipped, 4820 seconds
Failures: generic/383
ext4/ext3: 525 tests, 1 failures, 108 skipped, 6619 seconds
Failures: ext4/023
ext4/encrypt: 533 tests, 125 skipped, 2612 seconds
ext4/nojournal: 522 tests, 2 failures, 104 skipped, 3814 seconds
Failures: ext4/301 generic/530
ext4/ext3conv: 463 tests, 1 failures, 43 skipped, 4045 seconds
Failures: generic/347
ext4/adv: 469 tests, 3 failures, 50 skipped, 4055 seconds
Failures: ext4/032 generic/399 generic/477
ext4/dioread_nolock: 463 tests, 43 skipped, 4234 seconds
ext4/data_journal: 510 tests, 4 failures, 92 skipped, 4688 seconds
Failures: generic/051 generic/371 generic/475 generic/537
ext4/bigalloc: 445 tests, 50 skipped, 4824 seconds
ext4/bigalloc_1k: 458 tests, 1 failures, 64 skipped, 3753 seconds
Failures: generic/383
Totals: 4548 tests, 777 skipped, 13 failures, 0 errors, 47529s

This is an example of something which one test infrastructure has, that other testing harnesses don't have. So while it would be "nice" to have one test framework that rules them all, that can work on multiple different cloud hosting services, there are lots of things that are "nice". I'd like to have enough money to fly around in a Private Jet so I didn't have to deal with the TSA; and then I'd like to be rich enough to buy carbon offsets so I wouldn't feel guilty flying around all over the place in a Private Jet. Unfortunately, I don't have the resources to do that any time in the foreseeable future. :-)

The question is who is going to fund that effort, and does it really make sense to ask developers to stop writing tests until this magical unicorn test harness exists? And then we have to ask the question which test infrastructure do we use as the base, and are the maintainers of that test infrastructure interested in adding all of the hair to add support for all of these features that we might "want" to have.

Storage testing

unixbhaskar — Wed, 29 May 2019 10:37:08 +0000

Right, Ted. I was wondering how come that Docker fellow come into the picture of this kind "low level" stuff, which needs lots low-level access and tweaking.

Any container mechanism is certainly not built for this kind stuff in mind nor help greatly in the purpose.

Storage testing

roc — Wed, 29 May 2019 03:52:18 +0000

> The intel 915 tests fundamentally requires direct access to hardware --- it's not something you can emulate, and in fact you need a hardware library of different 915 video cards / chipsets in order to really do a good job testing the device driver.

Sure, but the software and services infrastructure for writing tests, running tests, processing test results, and reporting those results could be shared with lots of other kinds of tests.

> And networking tests often require a pair of machines with different types of networks between the two.

Ditto. (And presumably networking tests for everything above OSI level 2 can be virtualized to run on a single machine, even a single kernel.)

> Good luck trying to unify it all.

Unifying things after they're up and running is hard. Sharing stuff that already exists instead of creating new infrastructure is easier. Given that the kernel's upstream testing is totally inadequate currently, there's an opportunity here :-).

> Finally, note that there are different types of testing infrastructure. There is the test suite itself, and how you run the test suite in a turn key environment.

Yes, I can see that you want drivers for spawning test kernels on different clouds. They can exist in a world where other testing infrastructure is shared.

Surely you want a world where someone can run all the different kernel test suites (that don't require special hardware), against some chosen kernel version, on the cloud of their choice. That would demand a shared "spawn test kernel" interface that the different suites all use, wouldn't it?

Storage testing

tytso — Tue, 28 May 2019 21:37:51 +0000

Docker adds no real value, and in fact, to the extent that tries to insulate the container from the real hardware, it gets in the way. Yes, you can run in privleged mode, but at that point, docker is no more than a fancy tar.gz plus a chroot.

It's setting up all of the qemu configuration to run the storage testing which is where the real value lies. For example, this is what "kvm-xfstests smoke" runs:

ionice -n 5 /usr/bin/kvm -boot order=c -net none -machine type=pc,accel=kvm:tcg -cpu host -drive file=/usr/projects/xfstests-bld/build-64/kvm-xfstests/test-appliance/root_fs.img,if=virtio,snapshot=on -drive file=/dev/lambda/test-4k,cache=none,if=virtio,format=raw,aio=native -drive file=/dev/lambda/scratch,cache=none,if=virtio,format=raw,aio=native -drive file=/dev/lambda/test-1k,cache=none,if=virtio,format=raw,aio=native -drive file=/dev/lambda/scratch2,cache=none,if=virtio,format=raw,aio=native -drive file=/dev/lambda/scratch3,cache=none,if=virtio,format=raw,aio=native -drive file=/dev/lambda/results,cache=none,if=virtio,format=raw,aio=native -drive file=/tmp/xfstests-cli.VpexZxAo/kvm-vdh,if=virtio,format=raw -vga none -nographic -smp 2 -m 2048 -fsdev local,id=v_tmp,path=/tmp/kvm-xfstests-tytso,security_model=none -device virtio-9p-pci,fsdev=v_tmp,mount_tag=v_tmp -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -serial mon:stdio -monitor telnet:localhost:7498,server,nowait -serial telnet:localhost:7500,server,nowait -serial telnet:localhost:7501,server,nowait -serial telnet:localhost:7502,server,nowait -gdb tcp:localhost:7499 --kernel /build/ext4-64/arch/x86/boot/bzImage --append quiet loglevel=0 root=/dev/vda console=ttyS0,115200 fstestcfg=4k fstestset=-g,quick fstestopt=aex fstesttz=America/New_York fstesttyp=ext4 fstestapi=1.5

... and where the root_fs.img can be downloaded here[1], and built from scratch using directions here[2].

[1] https://www.kernel.org/pub/linux/kernel/people/tytso/kvm-...
[2] https://github.com/tytso/xfstests-bld/blob/master/Documen...

Changing kernel versions is just a matter of pointing qemu at the kernel in the build tree: --kernel /build/ext4-64/arch/x86/boot/bzImage

And why bother with a docker image when you can just use a qemu image file: -drive file=/usr/projects/xfstests-bld/build-64/kvm-xfstests/test-appliance/root_fs.img,if=virtio,snapshot=on

Docker doesn't help you with any of the rest, which includes setting up storage devices that should be used for testing. So why use Docker?

Storage testing

tytso — Tue, 28 May 2019 21:24:28 +0000

This is why I've extended kvm-xfstests and gce-xfstests to run blktests as well as xfstests. :-)

Seriously, while it would be nice if there was One True kernel testing system, it's just not going to happen. And that's because there is a huge amount of special infrastructure which is needed. File system testing requires using block devices which you can reformat; it also requires being able to run the same set of tests against different file systems and different configurations (options to mkfs, mount options, etc.)

The intel 915 tests fundamentally requires direct access to hardware --- it's not something you can emulate, and in fact you need a hardware library of different 915 video cards / chipsets in order to really do a good job testing the device driver.

And networking tests often require a pair of machines with different types of networks between the two.

Good luck trying to unify it all.

Finally, note that there are different types of testing infrastructure. There is the test suite itself, and how you run the test suite in a turn key environment. That test runner tends to be highly test environment specific. For example, gce-xfstests will pluck a kernel out of the developer's build tree, and upload it to Google Cloud Storage. It will then start a VM, and pass a URL to the kernel in the VM metadata. The VM will then kexec to the kernel-under test, and start the tests, and when they are complete, e-mail the results to the developer. From the developer's perspective, it's dead simple: " make ; gce-xfstests smoke". Done.

And if you're using a set of test hardware shared across 100 software engineers, using a custom hardware reservation system (both IBM and Google had such a set up, and naturally they were completely different), you'll need a different way of running tests. And that is always going to be very specific to the software team's test environment set up by their test engineers, which is why there will always be a large number of test harnesses.

Storage testing

jhoblitt — Tue, 28 May 2019 21:08:00 +0000

I would think that a runc/docker image would solve all of the environmental issues except for which kernel modules need to be available. Docker in a qemu vagrant box should be a complete solution. That makes it easy to change kernel versions.

Storage testing

roc — Tue, 28 May 2019 21:04:52 +0000

This stuff sounds good, but there is clearly a need to unify test harnesses and configuration across the kernel. If every kernel component has its own way of writing and running tests, that's going to be a disaster. Of course some components need special infrastructure, but modularity and extension points work for test harnesses just like other software.