Troubles with triaging syzbot reports

By Jake Edge
December 14, 2022

A report from the syzbot kernel fuzz-testing robot does not usually spawn a vitriolic mailing-list thread, but that is just what happened recently. While the invective is regrettable, the underlying issue is important. The dispute revolves around how best to report bugs to affected subsystems and, ultimately, how not to waste maintainers' time.

Al Viro was apparently fed up with syzbot reports that involved the ntfs3 filesystem but that were not copied (CCed) to the maintainers of ntfs3. The syzbot message was sent to the kernel mailing list, but Viro shouted his reply that "ANY BUG REPORTS INVOLVING NTFS3 IN REPRODUCER NEED TO BE CCED TO MAINTAINERS OF NTFS3". That complaint had been relayed several times in the past, he indicated, without the problem getting fixed, so he was planning to stop looking at the reports. In fact, they will be "getting triaged straight to /dev/null here".

After an ... impenetrable reply from Hillf Danton, Viro followed up with more details of the problems he sees. He pointed to a post from September where he made a similar request and said that others had also reported these kinds of problems to the maintainers of syzbot. The issue is that the mail sent by syzbot does not contain enough useful information for someone to quickly determine if it pertains to their area of interest:

It's really a matter of triage; as it is, syzkaller folks are expecting that any mail from the bot will be looked into by everyone on fsdevel, on the off-chance that it's relevant for them. What's more, it's not just "read the mail" - information in the mail body is next to useless in such situations. [...]
What really pisses me off is that on the sending side the required check is trivial - if you are going to fuzz a filesystem, put a note into report, preferably in subject. Sure, it's your code, you get to decide what to spend your time upon (you == syzkaller maintainers). But please keep in mind that for [recipients] it's a lot of recurring work, worthless for the majority of those who end up bothering with it. Every time they receive a mail from that source.
Ignore polite suggestions enough times, earn a mix of impolite ones and .procmailrc recipes, it's that simple...

Danton misunderstood what Viro was complaining about, but Matthew Wilcox tried to explain. The complaint is not that the linux-fsdevel list is being copied on the mail, but that the ntfs3 maintainers are not. Wilcox said: "So this is just noise. And enough noise means that signal is lost."

Viro agreed and painstakingly described exactly how he (and any other interested recipient of a syzbot report) would triage it, which eventually ends up at the syzkaller dashboard entry for the bug and its syzkaller reproducer. That file, which resembles "line noise", as Viro noted, does contain enough information to see that it was an ntfs3 filesystem that was being fuzzed. But that information is not in the email (or, better still, email subject), nor is it used to direct the report to the right people to look at it. The underlying problem is that the syzkaller/syzbot maintainers are not providing the relevant data, which should be easily obtained:

From what I've seen in various discussions, the assumption of syzkaller folks seems to be that most of the relevant information is in stack trace and that's sufficient for practical purposes - anything beyond that is seen as unwarranted special-casing. [...]
Face it, the underlying assumption is broken - for a large class of reports the stack trace does not contain the relevant information. It needs to be augmented by the data that should be very easy to get for the bot. Sure, your code, your priorities, but reports are only useful when they are not ignored and training people to ignore those is a bad idea...

Ted Ts'o agreed, noting that he has been asking for improvements of this sort for several years. Syzbot "is not doing things that really could be done automatically --- and cloud VM time is cheap, and upstream maintainer time is expensive". In effect, the syzbot developers are not being respectful of upstream maintainers' time, he said. Things have been improving, but not in this particular area:

Now, to be fair to the Syzbot team, the Syzbot console has gotten much better. You can now download the syzbot trace, and download the mounted file system, when before, you had to do a lot more work to extract the file system (which is stored in separate constant C array's as compressed data) from the C reproducer. So have things have gotten better.

Marco Elver reported that the problem is being worked on by the syzbot project. He pointed to a bug report comment from syzkaller (and syzbot) creator Dmitry Vyukov that was posted at the end of November. It linked to yet another message from Viro complaining about the problem. Looking further at the bug comment thread makes it clear that progress is being made on identifying what to search for and on adding tags to email subject lines to identify which filesystem is being fuzzed.

The thread eventually went completely off the rails, including a message that seems likely to draw a response from the kernel code of conduct committee. The overall tone of the thread was unfortunate, at least in spots, but both Ts'o and Viro (especially the latter) spent a fair amount of time patiently reiterating the problems that have been raised multiple times along the way, albeit at a lower volume. Those requests did not go far, so, as Ts'o put it, "maybe something a bit more.... assertive by Al [Viro] is something that will inspire them to prioritize this feature request".

Fuzz testing generates a huge number of reports; in order for the testing to be effective—useful—those reports have to be acted upon. Since that is the goal, it obviously makes sense to create reports that can be quickly routed to the right people. This not the first time we have seen complaints about fuzzing reports, and in a filesystem context, but hopefully we are on track to see improvements soon.

Index entries for this article
Kernel	Development model/Bug reporting
Kernel	Filesystems/Fuzzing

Troubles with triaging syzbot reports

Posted Dec 14, 2022 19:29 UTC (Wed) by warrax (subscriber, #103205) [Link] (18 responses)

Jeebus, some of the comments posted by the non-kernel-devs read as if they might have been written by a bot with no understanding at all of anything related to how humans and dev time works. (But, hey, maybe they're just "testing" an AI at the expense of kernel devs.)

Incredibly disrespectful.

Troubles with triaging syzbot reports

Posted Dec 14, 2022 21:38 UTC (Wed) by Kamiccolo (subscriber, #95159) [Link]

I believe a PR for syzbot just got drafted an hour ago or so... :}

Troubles with triaging syzbot reports

Posted Dec 14, 2022 23:48 UTC (Wed) by patrick_thomson (guest, #152863) [Link]

Though the level of invective in the thread isn’t appropriate, I was working with someone who felt it was helpful to respond to a justifiably-frustrated maintainer with “Calm downnnnnn Sir” and multiple messages full of grating pseudo-puns like “beatles” for “bugs,” I too would have trouble keeping a lid on my temper. Even accounting for possible cultural differences and ESL factors, it’s childish at best and outright disrespectful at worst.

Automated bug reports are only as good as the routing-to-humans procedure they undergo. Maintainers live and die by the signal/noise ratio in various project fora, and reducing that ratio irritates people, justifiably. While it’s not always great praxis to say “why don’t you just fix $BEHAVIOR with $STRATEGY,” it’s perfectly valid for maintainers to outline what kind of strategy would be useful for their purposes without being expected to fix the fuzzers themselves.

Troubles with triaging syzbot reports

Posted Dec 15, 2022 0:02 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link] (15 responses)

Perhaps Hillf Danton is not a native English speaker as his email address is with a Chinese hosting provider. The "impenetrable" reply from him is:

-------
Calm downnnnnn Sir even if this is not the east ender style.

Frankly no interest here at all wasting any network bandwidth just to get you
interrupted if it would take less than 72 hours to discover one of the beatles
you created. And actually more than double check is needed to ensure who
did that.
-------

Seems likely to mean something like:

-------
Please calm done: aren't British people supposed to be polite?

I'd have no interest in bothering you if it would take less than 3 days to trace one of your bugs to you, but that's not the case, and it would take even more effort to ensure we'd actually traced the bug to the right person if we tried to do that analysis ourselves.
-------

Even "translated" it doesn't show the best attitude, but it makes a little more sense and it seems clear that Hillf didn't really understand what was being asked of him.

Troubles with triaging syzbot reports

Posted Dec 15, 2022 0:35 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (14 responses)

As a Google engineer (who has nothing to do with syzbot or syzkaller), I tend to suspect this is exacerbated by a cultural difference. At Google, frequent context switching between the browser, the MUA (which is just Gmail, so a tab in the browser), the console, etc., is considered entirely normal and unremarkable. You have to switch between them frequently, or else you'll never get anything done. Meanwhile, on LKML, this is considered horrifying and grossly inefficient (at least by some people, anyway).

(I have no idea whether Hillf is actually a Google engineer. syzkaller appears to be a Google-owned GitHub repo, so I'm guessing that at least some of the people who work on it are probably Googlers?)

Troubles with triaging syzbot reports

Posted Dec 15, 2022 0:49 UTC (Thu) by space (subscriber, #157761) [Link] (1 responses)

Hillf Danton is neither at Google nor working on syzkaller.

Troubles with triaging syzbot reports

Posted Dec 16, 2022 0:49 UTC (Fri) by WolfWings (subscriber, #56790) [Link]

...I'm now in a mixture of "And nothing of value was lost..." and feelings similar to that kid that snuck onto the VGA awards stage just to drop a shitpost.

Absolute bafflement that they're not part of any of the components in question and still acting like that to kernel devs.

Troubles with triaging syzbot reports

Posted Dec 15, 2022 1:24 UTC (Thu) by viro (subscriber, #7872) [Link] (1 responses)

FWIW, he's not in the syzkaller git logs. And for the record, I don't think that anyone with that attitude ("the Most Holy Tool is an unparalleled blessing upon humanity; only a vile ingrate could possibly find any of its aspects deficient!!!") could be involved in the development of the tool in question. Cheerleading? Sure. Making the actual developers cringe? You bet. Contributing? Not likely.

Marco Elver is one of the syzkaller developers, so's Alexander Potapenko. AFAICS, nobody else in that thread is.

Hillf sounds like a Team OS/2 refugee or an Amiga fanboy on a bad flashback, TBH...

Troubles with triaging syzbot reports

Posted Dec 15, 2022 5:48 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> And for the record, I don't think that anyone with that attitude ("the Most Holy Tool is an unparalleled blessing upon humanity; only a vile ingrate could possibly find any of its aspects deficient!!!") could be involved in the development of the tool in question.

You'd be surprised. I've encountered quite a few (secondhand) horror stories of upstreams behaving in exactly that fashion.

(My usual attitude towards these kinds of disputes tends to look pro-upstream, but it's really more pro-the-people-who-do-the-work-call-the-shots; if upstream doesn't want to make a change, they don't have to, but neither does downstream have to use/package/triage/etc. upstream's work. It would obviously be preferable if everyone got along.)

Troubles with triaging syzbot reports

Posted Dec 15, 2022 13:39 UTC (Thu) by Wol (subscriber, #4433) [Link] (5 responses)

> As a Google engineer (who has nothing to do with syzbot or syzkaller), I tend to suspect this is exacerbated by a cultural difference. At Google, frequent context switching between the browser, the MUA (which is just Gmail, so a tab in the browser), the console, etc., is considered entirely normal and unremarkable. You have to switch between them frequently, or else you'll never get anything done. Meanwhile, on LKML, this is considered horrifying and grossly inefficient (at least by some people, anyway).

But you're imposing your workflow on other people ...

I make no judgement as to whether the tools are "good" or "bad" - different people have different definitions, and personally I'd dump both the browser and gmail in the "bad" category, but just because you are quite happy with your tools, doesn't mean that other people can function efficiently with them.

For grey-beards, it's likely that the console and MUA are the same tool (emacs).

I have to work with gmail, slack, and Excel/VBA. When I'm doing "real work" it's almost invariably in Excel's VBA window, and even the context switch to Excel proper is a productivity-damaging one. gmail and slack pretty much get ignored.

I don't think it really matters what your tools are, what matters is that (if possible) you are working with a favourite tool, and that you can minimise the number of times you switch between tools.

And another thing the young bucks don't realise, is that the reason greybeards stick to their favourite tools is that OLDER PEOPLE DON'T LEARN SO FAST. As a greybeard myself, I don't give a monkeys what tools other people prefer (so long as it doesn't impact on me), but I can do the job with MY tools of choice in a fraction of the time it would take if *I* used *SOMEONE*ELSE'S* tools of choice. I like to think I could do it faster than them with their choice, but that's arrogance :-)

(And all the evidence says that people "who can multi-task" are actually much less productive than people who can sit and concentrate without interruption. Case in point, we had something kick off at work yesterday morning, by yesterday evening I had digested the problem, worked out a fix, and rolled it out for testing. My bosses were shocked how fast it was done. But I was allowed to *concentrate*!)

Cheers,
Wol

Troubles with triaging syzbot reports

Posted Dec 15, 2022 17:58 UTC (Thu) by SLi (subscriber, #53131) [Link] (4 responses)

Maybe, but I don't think you can take this (the right to not demand everybody adapt to your way of working) arbitrarily far. For example, I remember well the golden times when all email was plaintext, but at the point where you are one in a million wanting that because html mail disrupts your workflow, then I'd say it's already your problem, and if you cannot perform properly because of your choices, then it's you who needs to adapt or at least find a workaround.

That's not to say that I, too, wouldn't find the problem complained of here very real and the response completely impossible to understand or accept

Troubles with triaging syzbot reports

Posted Dec 15, 2022 19:14 UTC (Thu) by viro (subscriber, #7872) [Link] (2 responses)

You do realize that anyone e.g. sending patches in HTML mail gets told to resend, don't you? With pointer to Documentation/process/submitting-patches.rst or Documentation/process/email-clients.rst if whoever responds feels like being helpful. Which, I might add, contains the following bit:
-----------------------------------
Gmail (Web GUI)
***************

Does not work for sending patches.

Gmail web client converts tabs to spaces automatically.

At the same time it wraps lines every 78 chars with CRLF style line breaks
although tab2space problem can be solved with external editor.

Another problem is that Gmail will base64-encode any message that has a
non-ASCII character. That includes things like European names.
-----------------------------------

Might or might not be accurate these days, but that pretty much means "non-starter for kernel development, use their IMAP interface with a real MUA if you are forced to use a gmail account". Has nothing to do with the age, etc. - see the mentioned files for the real reasons. If gmail web client has solved these problems nowadays, a patch to Documentation/process/email-clients.rst along the lines of "here's how to set it up so it would do the right thing" would be welcome...

Troubles with triaging syzbot reports

Posted Dec 15, 2022 21:33 UTC (Thu) by SLi (subscriber, #53131) [Link] (1 responses)

Yes, this happens in the kernel development context, and I'm not saying there's not a good reason for it (although I'm not saying the opposite either!).

Look, I'm no fan of all the new ways of doing things either. Rather, I just want to point that "it disrupts my/our way of working" only goes so far. It goes a lot more far, of course, in communities where lots of people want to do it like that (or there's people with sufficient power to just say this is how we do it), but even that has its limits...

Troubles with triaging syzbot reports

Posted Dec 16, 2022 4:27 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> Rather, I just want to point that "it disrupts my/our way of working" only goes so far.

True. But when "you" are asking "me" to respond to your bug report, "you" need to make that worth my while. So either a REALLY important bug, or a report that is REALLY easy to work with (or lots of dollars).

Troubles with triaging syzbot reports

Posted Dec 15, 2022 23:25 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

My email client does this to HTML email: https://github.com/linuxrocks123/MailTask/blob/bb6bfb8828...

It uses Aaron Swartz's html2text library to convert it to plaintext. It actually works pretty well.

Troubles with triaging syzbot reports

Posted Dec 16, 2022 2:03 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link] (3 responses)

My reading says the problem with the different tools is only the tip of the iceberg. Yes, the need to switch tools is annoying, but the underlying problem is that relevant information isn't being included up-front. The fuzz testers know which filesystem they were testing, and the information is present in the dashboard entry that has the more detailed information. But instead of including that very basic information in the title of the email, they make maintainers dig through the dashboard entry to figure it out. It's just massively inefficient and makes people much less inclined to pay attention. If they would just do the absolute most basic thing, like saying in the email title they generated the bug while testing fuzzed NTFS images, it would make life much easier.

Troubles with triaging syzbot reports

Posted Dec 16, 2022 2:34 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

As a Googler, I have to put up with this sort of thing (i.e. "here is an email/stderr message/log entry/whatnot, it contains a link, the link points to the actual information") so often that I'm basically immune to caring at this point. It's simply The Way Things Are Done over there.

(This is not meant to excuse it, merely to commiserate with other people being subjected to it.)

Troubles with triaging syzbot reports

Posted Dec 16, 2022 17:44 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link]

An email that points you to the information might be OK if you can at least have some confidence it's about your project. It's far worse when it's an email to a whole mailing list, most of whom aren't involved. Expecting people dig through a convoluted process just to figure out if the email is even relevant to them is just ridiculous.

Troubles with triaging syzbot reports

Posted Dec 16, 2022 19:19 UTC (Fri) by Wol (subscriber, #4433) [Link]

My reaction to that is to dump it in /dev/null. I have enough things to do anyway, it's easy to not find time to go searching down a rabbit warren.

And if it's work, I just dash off a reply saying "please provide the following extra info ..." - if I don't get a response it just disappears down the priorities :-) And if I do get a response well, once you've actually got buy-in from the other end, things usually end up well :-)

Cheers,
Wol

Troubles with triaging syzbot reports

Posted Dec 15, 2022 3:25 UTC (Thu) by flussence (guest, #85566) [Link] (1 responses)

I completely empathise with Viro here :-(

I've got my RSS reader pointed at my distro's bugzilla because that turns out to be a good way of staying ahead of upcoming problems, and on a good day it's very low volume. On a *bad* day... a flood of hundreds of automated, mostly identical, and dubious-usefulness QA bugs from two different users' buildbots (and an enormous amount of them end up as wontfix landfill after developers waste their time triaging them).

Troubles with triaging syzbot reports

Posted Dec 20, 2022 8:56 UTC (Tue) by thoeme (subscriber, #2871) [Link]

Same here: Being repeatedly on the receiving end of system problem reports consisting of nothing but a screenshot in a power point, I tend to get "invective" towards the sender as well. So yes, you may have found an issue, but please try to make it as easy as possible for the developer to follow up, otherwise you will end up at the back of the queue pronto.