Coverage deduplication for KCOV
From: | Alexander Potapenko <glider-AT-google.com> | |
To: | glider-AT-google.com | |
Subject: | [PATCH v4 00/10] Coverage deduplication for KCOV | |
Date: | Thu, 31 Jul 2025 13:51:29 +0200 | |
Message-ID: | <20250731115139.3035888-1-glider@google.com> | |
Cc: | quic_jiangenj-AT-quicinc.com, linux-kernel-AT-vger.kernel.org, kasan-dev-AT-googlegroups.com, Aleksandr Nogikh <nogikh-AT-google.com>, Andrey Konovalov <andreyknvl-AT-gmail.com>, Borislav Petkov <bp-AT-alien8.de>, Dave Hansen <dave.hansen-AT-linux.intel.com>, Dmitry Vyukov <dvyukov-AT-google.com>, Ingo Molnar <mingo-AT-redhat.com>, Josh Poimboeuf <jpoimboe-AT-kernel.org>, Marco Elver <elver-AT-google.com>, Peter Zijlstra <peterz-AT-infradead.org>, Thomas Gleixner <tglx-AT-linutronix.de> | |
Archive-link: | Article |
As mentioned by Joey Jiao in [1], the current kcov implementation may suffer from certain syscalls overflowing the userspace coverage buffer. According to our measurements, among 24 syzkaller instances running upstream Linux, 5 had a coverage overflow in at least 50% of executed programs. The median percentage of programs with overflows across those 24 instances was 8.8%. One way to mitigate this problem is to increase the size of the kcov buffer in the userspace application using kcov. But right now syzkaller already uses 4Mb per each of up to 32 threads to store the coverage, and increasing it further would result in reduction in the number of executors on a single machine. Replaying the same program with an increased buffer size in the case of overflow would also lead to fewer executions being possible. When executing a single system call, excessive coverage usually stems from loops, which write the same PCs into the output buffer repeatedly. Although collecting precise traces may give us some insights into e.g. the number of loop iterations and the branches being taken, the fuzzing engine does not take advantage of these signals, and recording only unique PCs should be just as practical. In [1] Joey Jiao suggested using a hash table to deduplicate the coverage signal on the kernel side. While being universally applicable to all types of data collected by kcov, this approach adds another layer of complexity, requiring dynamically growing the map. Another problem is potential hash collisions, which can as well lead to lost coverage. Hash maps are also unavoidably sparse, which potentially requires more memory. The approach proposed in this patch series is to assign a unique (and almost) sequential ID to each of the coverage callbacks in the kernel. Then we carve out a fixed-sized bitmap from the userspace trace buffer, and on every callback invocation we: - obtain the callback_ID; - if bitmap[callback_ID] is set, append the PC to the trace buffer; - set bitmap[callback_ID] to true. LLVM's -fsanitize-coverage=trace-pc-guard replaces every coverage callback in the kernel with a call to __sanitizer_cov_trace_pc_guard(&guard_variable) , where guard_variable is a 4-byte global that is unique for the callsite. This allows us to lazily allocate sequential numbers just for the callbacks that have actually been executed, using a lock-free algorithm. This patch series implements a new config, CONFIG_KCOV_ENABLE_GUARDS, which utilizes the mentioned LLVM flag for coverage instrumentation. In addition to the existing coverage collection modes, it introduces ioctl(KCOV_UNIQUE_ENABLE), which splits the existing kcov buffer into the bitmap and the trace part for a particular fuzzing session, and collects only unique coverage in the trace buffer. To reset the coverage between runs, it is now necessary to set trace[0] to 0 AND clear the entire bitmap. This is still considered feasible, based on the experimental results below. Alternatively, users can call ioctl(KCOV_RESET_TRACE) to reset the coverage. This makes it possible to make the coverage buffer read-only, so that it is harder to corrupt. The current design does not address the deduplication of KCOV_TRACE_CMP comparisons; however, the number of kcov overflows during the hints collection process is insignificant compared to the overflows of KCOV_TRACE_PC. In addition to the mentioned changes, this patch series implements a selftest in tools/testing/selftests/kcov/kcov_test. This will help check the variety of different coverage collection modes. Experimental results. We've conducted an experiment running syz-testbed [3] on 10 syzkaller instances for 24 hours. Out of those 10 instances, 5 were enabling the kcov_deduplicate flag from [4], which makes use of the KCOV_UNIQUE_ENABLE ioctl, reserving 4096 words (262144 bits) for the bitmap and leaving 520192 words for the trace collection. Below are the average stats from the runs. kcov_deduplicate=false: corpus: 52176 coverage: 302658 cover overflows: 225288 comps overflows: 491 exec total: 1417829 max signal: 318894 kcov_deduplicate=true: corpus: 52581 coverage: 304344 cover overflows: 986 comps overflows: 626 exec total: 1484841 max signal: 322455 [1] https://lore.kernel.org/linux-arm-kernel/20250114-kcov-v1... [2] https://clang.llvm.org/docs/SanitizerCoverage.html [3] https://github.com/google/syzkaller/tree/master/tools/syz... [4] https://github.com/ramosian-glider/syzkaller/tree/kcov_de... v4: - fix a compilation error detected by the kernel test robot <lkp@intel.com> - add CONFIG_KCOV_UNIQUE=y as a prerequisite for kcov_test - Reviewed-by: tags v3: - drop "kcov: apply clang-format to kcov code" - address reviewers' comments - merge __sancov_guards into .bss - proper testing of unique coverage in kcov_test - fix a warning detected by the kernel test robot <lkp@intel.com> - better comments v2: - assorted cleanups (enum kcov_mode, docs) - address reviewers' comments - drop R_X86_64_REX_GOTPCRELX support - implement ioctl(KCOV_RESET_TRACE) - add a userspace selftest Alexander Potapenko (10): x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c kcov: elaborate on using the shared buffer kcov: factor out struct kcov_state mm/kasan: define __asan_before_dynamic_init, __asan_after_dynamic_init kcov: x86: introduce CONFIG_KCOV_UNIQUE kcov: add trace and trace_size to struct kcov_state kcov: add ioctl(KCOV_UNIQUE_ENABLE) kcov: add ioctl(KCOV_RESET_TRACE) kcov: selftests: add kcov_test kcov: use enum kcov_mode in kcov_mode_enabled() Documentation/dev-tools/kcov.rst | 124 +++++++ MAINTAINERS | 3 + arch/x86/Kconfig | 1 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/vmlinux.lds.S | 1 + include/asm-generic/vmlinux.lds.h | 13 +- include/linux/kcov.h | 6 +- include/linux/kcov_types.h | 37 +++ include/linux/sched.h | 13 +- include/uapi/linux/kcov.h | 2 + kernel/kcov.c | 368 ++++++++++++++------- lib/Kconfig.debug | 26 ++ mm/kasan/generic.c | 24 ++ mm/kasan/kasan.h | 2 + scripts/Makefile.kcov | 7 + scripts/module.lds.S | 35 ++ tools/objtool/check.c | 3 +- tools/testing/selftests/kcov/Makefile | 6 + tools/testing/selftests/kcov/config | 2 + tools/testing/selftests/kcov/kcov_test.c | 401 +++++++++++++++++++++++ 20 files changed, 949 insertions(+), 127 deletions(-) create mode 100644 include/linux/kcov_types.h create mode 100644 tools/testing/selftests/kcov/Makefile create mode 100644 tools/testing/selftests/kcov/config create mode 100644 tools/testing/selftests/kcov/kcov_test.c -- 2.50.1.552.g942d659e1b-goog