No safeguards? [LWN.net]

No safeguards?

Posted Apr 25, 2025 5:37 UTC (Fri) by marcH (subscriber, #57642) [Link] (8 responses)

> I have fully automated test infrastructure, with a dashboard, that I could start running on other subsystem trees today, and wrapping other test suites is trivial.

If you have already paid for enough resources, just go and do it. At least for all the suites and coverage that don't meltdown your infrastructure.

> People just don't care, they're happy to do things the same old way they've always done as long as they still get to act like cowboys.

_Some_ people don't care. But, there are these wonderful things called "name and shame", "peer pressure", etc. It's not nice so people don't say it out loud but the massive success of CI is largely based on those. There are some memes, search "you broke the build" for instance. Don't get me wrong: these are neither an exact science nor a silver bullet. So it may make a large difference in some areas and very little in others. But for sure it _will_ make some difference and be worth it.

If you can for a reasonable effort, then stop thinking and discussing about it; just go and do it.

No safeguards?

Posted Apr 25, 2025 6:00 UTC (Fri) by koverstreet (✭ supporter ✭, #4296) [Link] (4 responses)

> If you have already paid for enough resources, just go and do it. At least for all the suites and coverage that don't meltdown your infrastructure.

I've got enough hardware for my own resources, but it's not cheap, and I run those machines hard, so I'm not terribly inclined to subsidize the big tech companies that don't want to pay for testing or support the community. Already been down that road.

And I don't want to get suckered into being the guy who watches everyone else's test dashboards, either :)

It really does need to be a community effort, or at least a few people helping out a bit with porting more tests, and maintainers have to want to make use of it. I've got zero time left over right now for that sort of thing, since I'm trying to lift the experimental label on bcachefs by the end of the year.

No safeguards?

Posted Apr 25, 2025 18:47 UTC (Fri) by marcH (subscriber, #57642) [Link] (3 responses)

> I have fully automated test infrastructure, with a dashboard, that I could start running on other subsystem trees today, and wrapping other test suites is trivial

2 comments later:

> ... but it's not cheap, and I run those machines hard, so I'm not terribly inclined to subsidize the big tech companies that don't want to pay for testing or support the community. Already been down that road.

Understood but please don't give everyone false hopes again :-)

A last, somewhat desperate attempt to change your mind: please don't underestimate the awesome power of "role-modeling"[*]. For instance, you could pick a test suite that regularly finds regressions in only the couple, worst subsystems and run a small test subset to keep usage to a minimum? Based on what you wrote, this could be enough to continuously highlight regressions in those subsystems and continuously crank up the pressure on them to take over what you continuously demonstrate. If you keep the test workload small, this should cost you nothing but the initial time to set it up, which you wrote would be small. Who knows, other subsystems might even fear you'll come after them next? :-) Sorry, I meant: be continuously impressed by that demo and desire copying it?

BTW wouldn't that also help your _own_ workload, at least a bit? I mean you have to keep debugging this incoming flow of regressions anyway, don't you?

[*] A personal, very recent example: I finally figured out the security model of https://github.com/actions/cache , combined it with ccache and cut down kernel compilation in paltry GitHub runners from 10 minutes down to 30 seconds. If I had a "role-model" like you said you could be, I would have done this months earlier!

No safeguards?

Posted Apr 25, 2025 20:46 UTC (Fri) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

> Understood but please don't give everyone false hopes again :-)

I'm trying to motivate other people to step up and help out, either by getting it funded or contributing tests, by talking about what's already there and what we do have.

I am _not_ going to get this done on my own. But I'll certainly help out and lead the effort if other people are interested.

No safeguards?

Posted Apr 26, 2025 0:41 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)

From a "motivational" perspective, seeing some random test suite in CI is "nice". But it's nothing like seeing (someone else) demo your _own_ test suite automated. The latter is mind blowing; it's really night and day. Bonus points when you see it catching some of your own regressions.

There are some obviously subjective elements (familiarity, ...) but there is also a more objective "gap" because the devil really is in the details: replicating with a different test suite is very rarely "trivial".

> > > and wrapping other test suites is trivial.

That "trivial" was likely optimistic :-)

No safeguards?

Posted Apr 26, 2025 0:59 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link]

> That "trivial" was likely optimistic :-)

Here's the generic wrapper for fstests:

https://evilpiepirate.org/git/ktest.git/tree/tests/fs/fst...

And the bcachefs wrapper on top of that:

https://evilpiepirate.org/git/ktest.git/tree/tests/fs/bca...

bcachefs has more wrappers for testing nocow, largebs, and then a bunch more wrappers for kasan/kmsan/lockdep/etc., but that's the basics.

There's a tiny patch to fstests to have it emit ktest style test start/end markers, but overall there's really not much, and other test suites (blktests, mmtests) are similar in style to fstests - so it's really not bad.

Again though, I'm not going to volunteer my time for work other subsystems should be doing themselves when I already have a filesystem to write.

People need to stop acting so helpless. I've provided the tools to make this easy, the rest is up to the community.

No safeguards?

Posted Apr 25, 2025 7:42 UTC (Fri) by josh (subscriber, #17465) [Link] (2 responses)

> But, there are these wonderful things called "name and shame", "peer pressure", etc. It's not nice so people don't say it out loud but the massive success of CI is largely based on those.

No, one of the many massive successes of CI is that it gets *rid* of those. The right answer to "you broke the build" is not "shame on you", it's "shame on our lack of tooling, that that wasn't caught before it was merged".

No safeguards?

Posted Apr 25, 2025 14:20 UTC (Fri) by marcH (subscriber, #57642) [Link]

There are developers who have the discipline and desire to try to break their own code before sharing it. There are others who do not and prefer to wing it in order to "save time" and effort. They don't push all the test buttons they have and wait for bug reports instead. The world is not that binary and the same people can be either at different times but you get the idea. Whether the latter people actually save time themselves is debatable (bug fixing can be extremely time-consuming) but for sure they massively disrupt and waste the time of the former people and of the project as a whole.

The "name and blame" comes from version control and "git blame"; CI does not change that. But automation acts as a dispassionate referee by removing most of the personal and subjective elements like:
- Removes debates about workspace-specific configurations: the automated configurations are "gold".
- The first messenger is a robot
- Choice of test coverage and priorities. You still need to discuss what gets tested, how often etc. but these discussions happen when configuring CI, _before_ regressions and tensions happen.

It's not a silver bullet and you still have projects that ignore their own CI results, don't have enough test coverage, have enraged CI debates,... but in every case it exposes core issues at the very least which is still major progress.

No safeguards?

Posted Apr 25, 2025 18:13 UTC (Fri) by marcH (subscriber, #57642) [Link]

> ... lack of tooling, that that wasn't caught before it was merged

_Pre-merge_ checks are critical and they make indeed a massive difference wrt. to avoiding disrupting others and reducing tensions, good point.

Automation is not just "pre-merge" though. Longer, post-merge daily/weekly test runs are still required in the many projects that limit pre-merge checks to less than ~1h for prompt feedback; those projects will still have some regressions merged. Much fewer and narrower regressions merged but still some.

There is also the funny issue of A and B passing separately but not together. Rare but happens. This is solved by Gitlab "merge trains", Github "merge queues" and similar but these require more infrastructure.

Last but not least: the issue of flaky tests, infra failures and other false positives that degrade SNR and confuse "aggressive" maintainers who merge regressions. And who'd want to fix the flaky tests or the infra? Gets little credit and promotions. As often, the main issue is not technical. It's cultural and/or a business decision.

No safeguards?

Posted Apr 25, 2025 6:53 UTC (Fri) by estansvik (guest, #127963) [Link] (1 responses)

Okay, I feel your pain Kent.

I'm not doing kernel stuff, but thought that patches were at least gated by some build testing. Pretty amazing for such a high profile project to not at least have that. So it's all done on scout's honor?

No safeguards?

Posted Apr 29, 2025 0:53 UTC (Tue) by Paf (subscriber, #91811) [Link]

The merge process is “Linus hits a button on a git command”, how could it be gated on anything without a more complex infra?

No safeguards?

Posted Apr 26, 2025 17:48 UTC (Sat) by marcH (subscriber, #57642) [Link] (30 responses)

I saw your email about Unicode testing at https://lore.kernel.org/lkml/l7pfaexlj6hs56znw754bwl2spco...

> It is _not_ enough to simply rely on the automated tests.
> You have to have eyes on what your code is doing.

I wanted to reply there but that thread turned into a shouting match that was again leveraged to make some clicks and I don't want my name in the middle of that. So I'm going to reply here instead. It's relevant here too.

The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

For a start, temporarily revert your changes and run the tests. If they still pass, then you know for sure there is no coverage. Another great technique I use all the time: deliberately insert one line bugs. This works really well. If there is coverage, it's an actually faster way to find where it is[1]. You will also be amazed to discover how many tests do provide coverage but don't notice failures and never failed in their entire life! Especially true with test code in C and bash that ignore errors by default (errexit goes a long way but not all the way)

But too many developers either never heard of these basic testing techniques (code must go "forward" always!) or don't want to discover that they have to fix or even - the horror - add new tests. That's double the work when you thought you were done _and_ customers and users don't run it so it's generally less valued in the industry[2]. On the other hand, letting others find your bugs does not always have consequences; I mean what's a couple angry emails and reverts? There's a good chance it won't affect your career at all. Just keep winging it, it works after a few tries.

There's been fantastic progress here and there, but when you look at the greater picture this is still a generally young and immature industry and a management generally clueless about quality. Every manager knows who writes the most code, few know who breaks the most stuff and costs the most time. To be a bit fairer with managers, when there is poor test coverage then _no one_ really knows who breaks the most stuff!

[1] deliberate breakage is also the only way to make sense of obscure systemd like build systems or the C preprocessor. I digress.
[2] one of the telltales is the "scripts/" directory found in most projects. "scripts/" does not mean anything besides "less important".

No safeguards?

Posted Apr 26, 2025 18:03 UTC (Sat) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

> The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique

That's a brilliant one. I use that regularly, but I never would've thought to document it - it's great for illustrating the right mindset, too. I might add that to bcachefs's Submitting patches.

No safeguards?

Posted Apr 29, 2025 14:30 UTC (Tue) by jezuch (subscriber, #52988) [Link] (1 responses)

I think it's called mutation testing?

No safeguards?

Posted Apr 29, 2025 14:32 UTC (Tue) by jezuch (subscriber, #52988) [Link]

Bah, should've read the entire thread before answering :)

No safeguards?

Posted Apr 27, 2025 0:03 UTC (Sun) by gmaxwell (guest, #30048) [Link] (2 responses)

I have a set of shell scripts that will go through source code and make varrious mutations one at a time such as change + to -, replace && with ||, replace 0 with ~0, change 1 to 2 or 2 to 1, swaps <, >, and =, inserts negations, blanks a line entirely, etc. Then it attempts to compile the code with optimizations. If it compiles, the script checks the sha256sum of the resulting binary against all the hashes it's seen before and if it's a new hash it runs the tests. If the tests pass the source is saved off for my manual inspection later.

The single point changes tend to not make errors which self cancel out, and usually if an error does cancel out or the change is in code that doesn't do anything the binary will not change. In code where tests have good condition/decision branch coverage most things this procedure catches are test omissions or bugs.

This approach is super slow and kludgy, I've been repeatedly surprised and frustrated that no one has made a C-syntax aware tool to do similar testing without wasting tons of time on stuff that won't compile or won't make a difference (e.g. mutating comments.. though sometimes I've addressed this by first running the code through something that removes all the comments).

But it's worked well enough for me and parsing C syntax is far enough away from the kind of programming I enjoy that I haven't bothered trying to close this gap myself.

No safeguards?

Posted Apr 27, 2025 1:51 UTC (Sun) by roc (subscriber, #30627) [Link] (1 responses)

This is called mutation testing. There are a lot of existing tools for it, some of which are C-syntax-aware. Also there are mutation-testing tools that work by patching binary code.

No safeguards?

Posted Apr 28, 2025 14:49 UTC (Mon) by daroc (editor, #160859) [Link]

LWN covered one such tool for Rust code in October. I've tried it in some of my personal projects since then and found it somewhat useful for expanding my test suites.

No safeguards?

Posted Apr 27, 2025 13:55 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (1 responses)

I like all of what's mentioned about testing, but I think it's worth mentioning one radical option that people need to have in the back of their heads when thinking about tests: Exhaustive testing.

256 seems like lots to us, and so we instinctively don't want want to try all 256 possible inputs to a function which takes a single byte. But 256 is nothing to a machine, so exhaustive tests are an effective choice here and might catch bugs in weird cases you hadn't considered.

Obviously you can't always do this, for a variety of reasons, and when you can it may be too slow and need to run overnight or something - but it's worth having the idea in your mind because when you can just try everything that's it, you're done, all inputs were tested.

I was testing my impl TryFrom<f32> for realistic::Real and impl From<Real> for f32 to check that they round trip when non-NaN and finite. I quickly discovered a one epsilon problem for some values, not a big deal given the relatively low precision of 32-bit floats but good to know and worth fixing - however a million into the exhaustive testing it found huge deviations because my previous "I know what to test" testing hadn't hit upon some important cases and the exhaustive testing had stumbled onto these - we're talking an order of magnitude size error like oops 0.07 isn't 0.0067

Orders of magnitude and exhaustive testing

Posted Apr 27, 2025 18:54 UTC (Sun) by farnz (subscriber, #17727) [Link]

The combination of property testing and order of magnitude thinking can help you find your way here, if you weren't already thinking about it.

Property testing randomly generates a test case, and confirms that a property holds (like your "must round trip from f32 to realistic::Real to f32 correctly"); there's libraries that can help with this, but it's also reasonable to write your own property-based tester if the problem space doesn't benefit from shrinking test inputs. Once you have a property tester, it's not hard to use it to find out how much of the input domain you can test in one second - you adjust the number of random inputs you generate until your test case takes about a second to run.

From there, you need to know that an hour is about three times 10³ seconds, and a day is about 80% of 10⁵ seconds, while your input space size is roughly 10^{number of bits * 3 / 10}; a 32 bit space is thus about 10^9.6 items (which you can round to 10¹⁰ - approximation is the name of the game here), and a 64 bit space is about 10^19.2 (which you can round to 10²⁰).

You then put the two together to work out how long your test would take if you stopped randomly generating test cases, and instead just went exhaustive; if your property test can test around 10⁶ items in a second, then a day's run will cover 10¹¹ possibilities. This is more than the number of possibilities in 32 bits, so you can exhaustively test the 32 bit space in a day. Similarly, if you can get your random tester to test around 10¹³ possibilities in a second on the available hardware, you know that a day will cover about 10¹⁸ tests, and so you need 10² (or 100) days to exhaustively test a 64 bit space (which, while slow, is fast enough that you might leave it running and pick up all the cases it finds for manual testing into the future, at least on significant releases).

And, of course, you can short-circuit this if, in your judgement, a single data item can be tested in few enough clock cycles; you know that 1 GHz is 10⁹ clock cycles per second, so if you judge that your test takes under 1,000 clock cycles per item, that's 10⁶ tests per second, which can exhaust a 32 bit space in a day.

No safeguards?

Posted Apr 28, 2025 16:34 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (21 responses)

> The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

As stated in another reply, this is mutation testing. One of my ideas is to use code and mutation testing to discover "this code affects this test" relations so that one could take a diff and run just the tests that "care" about it. This would help with test turnaround time during patch review (while still doing full runs prior to merging). One of our bottlenecks in our regular CI is that testing always runs everything even for "obviously cannot be affected by the diff" changes. It'd be much nicer to cycle just the relevant tests to green before moving onto the full test suite.

No safeguards?

Posted Apr 28, 2025 19:16 UTC (Mon) by marcH (subscriber, #57642) [Link] (19 responses)

> > The simplest and most underrated technique is: manually _test the tests_ by temporarily breaking the product code. I keep being amazed at how few developers do or even know that technique.

> As stated in another reply, this is mutation testing.

Yes and no.

If you ask a developer who is not interested in finding problems "Could you please perform some mutation testing?" then guess what will happen: nothing at all (Mutation what?)

On the other hand, if you tell them "Did you realize breaking this or that line passes the tests anyway?" after they proudly claimed to have tested their changes, then there is a small chance it will make some difference.

Who knows; the younger ones who don't think they know everything yet might even feel like they just tasted something useful and "upgrade" to more advanced and extensive mutation testing in the longer term.

Baby steps!

No safeguards?

Posted Apr 28, 2025 19:55 UTC (Mon) by pizza (subscriber, #46) [Link] (14 responses)

> On the other hand, if you tell them "Did you realize breaking this or that line passes the tests anyway?" after they proudly claimed to have tested their changes, then there is a small chance it will make some difference.

...or respond with "patches with improved tests welcome!"

No safeguards?

Posted Apr 29, 2025 0:02 UTC (Tue) by marcH (subscriber, #57642) [Link] (13 responses)

If someone submitting changes is seriously asking someone ELSE
to fix their _own_ lack of test coverage, right after being caught red-handed lying about said coverage, well now there is very simple, clear and compelling evidence that this person cannot be trusted with testing.

That's pretty far from not wanting to perform "mutation testing" or some other fancy word that most people won't even bother googling.

No safeguards?

Posted Apr 29, 2025 0:34 UTC (Tue) by pizza (subscriber, #46) [Link] (12 responses)

> If someone submitting changes is seriously asking someone ELSE to fix their _own_ lack of test coverage, right after being caught red-handed lying about said coverage, well now there is very simple, clear and compelling evidence that this person cannot be trusted with testing.

That's a nice sequence of what-ifs.

But even if it's true, so what? That person is not your supplier [1].

[1] https://www.softwaremaxims.com/blog/not-a-supplier

No safeguards?

Posted Apr 29, 2025 1:17 UTC (Tue) by marcH (subscriber, #57642) [Link] (5 responses)

> That's a nice sequence of what-ifs.

This is indeed a pretty specific and hypothetical path that we followed... _together_. Until now?

> But even if it's true, so what?

Then there are two possibilities:

1. The maintainer of that subsystem does not care and merges code anyway.
2. He cares and does not merge.

In _either_ case, everyone can draw very clear, evidence-based and useful conclusions about the quality of that subsystem.

The most important thing in quality is not the quality level itself. That level does matter of course, but what is even more important is not being ignorant and having some mere idea of where quality stand.

Take a look at this very short section
https://en.wikipedia.org/wiki/ISO_9000_family#ISO_9000_se...
It's all about processes, evidence and transparency. It's not concerned about defining what's "good" or "bad" quality, it's more about having some metrics in the first place - which unfortunately cannot be taken for granted.

When testing is "underrated" and mostly ignored in code reviews, that quality information is not even available, no one knows! Maybe the engineer who submits the code has been following some strict but private company QA process? Or maybe he just winged the whole thing due to unreasonable deadlines. Who knows - anyone with a bit of experience in this industry has already seen both. So, even a very basic "did you test this?" discussion already goes a long way.

Quality information is critical and actionable: it lets a company that sells some actual Linux-based product decide whether they should rewrite the Bluetooth subsystem or implement their own sound daemon versus stress-testing the existing one and participating upstream. Just some random examples; this sort of decisions happens all the time because open-source is "not a supplier".

Note an evidence-based, testing discussion is also (and in many cases: has been) very useful to reduce maintainer overload. FAIL | UNTESTED -> NACK. Done! Next (assuming that subsystem is interested in landing in some products)

No safeguards?

Posted Apr 29, 2025 3:02 UTC (Tue) by pizza (subscriber, #46) [Link] (4 responses)

> Take a look at this very short section of [ISO9000]

Seriously?

If you want ISO9000 compliance from me, you had damn well better be paying me.

If you're not, I repeat: I AM NOT YOUR SUPPLIER.

No safeguards?

Posted Apr 29, 2025 4:31 UTC (Tue) by marcH (subscriber, #57642) [Link] (3 responses)

> Seriously?

Clearly not:

> If you want ISO9000 compliance from me...

No safeguards?

Posted Apr 29, 2025 14:36 UTC (Tue) by pizza (subscriber, #46) [Link] (2 responses)

> Clearly not:

I've been part of the team dragging an organization through ISO9000 certification.
I've also worked for organizations in highly regulated spaces where ISO9000 was just the first of many, many steps.

But hey, if you're all about that process, guess what? You can report the test coverage bug through official channels, it will be triaged and prioritized based on the documented process, and it it meets the actionable threshold (ie it's the supplier responsibility to fix, as opposed to "new work" which may itself require further negotiations) it will be added to the development backlog. Eventually, it gets handed to a developer, and has to be QA'd and signed off by whatever else the process requires, and _eventually_ will land in some some future release.

So if you want that level of assurance and process from your suppliers? It's going to cost you... a _lot_. Not just in money, but time as well.

What's that, you have no formal contract that specifies the deliverables, compensation, and processes for reporting problems, plus the SLA for responses? Then they're not your supplier, and you have precisely *zero* legal (or moral) right to demand anything from them. Enjoy the "AS IS, NO WARRANTY WHATSOEVER" terms of the software you didn't pay for.

No safeguards?

Posted Apr 29, 2025 16:11 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

I'm really sorry that have you been hurt by ISO 9000 in the past and that this tangent of mine acted as a "trigger" for you. But this was just a tangent, I know nothing about ISO 9000 specifically and there is not much I can do to help with your healing process. On the contrary, my key message was: baby steps, don't even use fancy words like "mutation" to avoid scaring people.

I bet you're not the only one in such a difficult situation, maybe there are other people who can help?

No safeguards?

Posted Apr 29, 2025 20:06 UTC (Tue) by pizza (subscriber, #46) [Link]

> I bet you're not the only one in such a difficult situation, maybe there are other people who can help?

I'm afraid the difference between "can help" and "will help" is effectively insurmountable -- at least when there is no money involved.

No safeguards?

Posted Apr 29, 2025 8:29 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

> But even if it's true, so what? That person is not your supplier [1].

I think you're missing the point that that person cannot be trusted. If you've got any sense, you're going to blacklist him as a supplier ... especially if you're paying him in credibility!

"Honesty is the best policy" - I do my best to test my code, but I still regularly get bitten by quirks of the language, things I've forgotten, etc etc. And I try and make sure that my "customers" know what is tested and what isn't. The fact that half the time they don't listen, and the other half they don't understand, isn't my problem. Well it is, I have to fix the mess, but that's another incentive for me to try and get it right.

Cheers,
Wol

No safeguards?

Posted Apr 29, 2025 14:20 UTC (Tue) by pizza (subscriber, #46) [Link] (1 responses)

> I think you're missing the point that that person cannot be trusted. If you've got any sense, you're going to blacklist him as a supplier ... especially if you're paying him in credibility!

How many times do I have to point out that this person is *NOT* your supplier?

You're going to get better resuts when starting with "hey, your test coverage is missing something, here's a patch that fills the gap" versus "I created a situation that resulted in a failure but the existing tests didn't catch it, and if you don't fix this coverage gap immediately you're clearly lying about everything and can't be trusted blablabla"

No safeguards?

Posted Apr 29, 2025 16:01 UTC (Tue) by marcH (subscriber, #57642) [Link]

> versus "I created a situation that resulted in a failure but the existing tests didn't catch it, and if you don't fix this coverage gap immediately you're clearly lying about everything and can't be trusted blablabla"

Well beyond a strawman: it's elevated to an art form! :-)

Open source upstreams aren't suppliers

Posted Apr 29, 2025 14:50 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

The question then becomes where you're going to go for open source software if you blacklist all upstreams that refuse to act as suppliers. You can, of course, pay Red Hat, SUSE, Canonical, CIQ or others to act as your supplier based on top of open source upstreams, but then you're not going straight to the person who writes the code.

In the end, an upstream is not a supplier; there's even a different word to describe the relationship, since you have supplier/customer relationships in business, and upstream/downstream relationships in open source. Expecting an upstream to act as a supplier is opening yourself up to a world of pain every time the upstream's priorities and yours don't coincide.

Open source upstreams aren't suppliers

Posted Apr 29, 2025 15:12 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> In the end, an upstream is not a supplier; there's even a different word to describe the relationship, since you have supplier/customer relationships in business, and upstream/downstream relationships in open source. Expecting an upstream to act as a supplier is opening yourself up to a world of pain every time the upstream's priorities and yours don't coincide.

I thought we were talking about a DOWNSTREAM SUPPLYING faulty patches.

So we're blacklisting people who can't be arsed to supply properly working, tested code.

I have to be careful here, as I've done exactly that, but I've done my best to provide stuff that works (in a language, Scheme, that I find very difficult to work with), and I just have to accept that if nobody else sees value in what I've done they won't take it on. But if I behaved "entitled" and expected somebody to finish the work for me, then they have every right not to want to do business with me.

It all boils down, once again, to the "entitled" mentality a lot of people seem to have about other people doing work for "free" ...

Cheers,
Wol

Open source upstreams aren't suppliers

Posted Apr 29, 2025 15:31 UTC (Tue) by farnz (subscriber, #17727) [Link]

They're either an upstream offering you a project, or a downstream offering you a patch. They are not a supplier in either case, and expecting them to behave like a supplier is going to lead to problems and misunderstandings, precisely because that's the wrong sort of relationship to imagine.

Upstream and downstream also gets interesting because, unlike supplier and customer, it's one where the same entities dealing with the same project can change roles; Linus Torvalds is upstream of me in the Linux kernel fork he runs (and that we generally accept as the mainline), but if I chose to run my own Linux fork, Linus could choose to be downstream of me and submitting patches to me, or pulling patches from my upstream project into his downstream project. And, for added fun, Linus and I can swap roles - I can treat Linus as my upstream when I pull in 6.16, but then treat him as my downstream if he notices that I have a useful change that he'd like in his kernel.

No safeguards?

Posted May 2, 2025 7:53 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

It's not that easy. Saying "mutation testing" to a developer who has never heard of it is probably not the best way to evangelize mutation testing... but it *is* the best search term to find tooling that enables you to actually do it (without having to reinvent everything from scratch). So we're stuck using fancy ten dollar words for these things at least some of the time.

No safeguards?

Posted May 2, 2025 14:44 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

Like all people in a "privileged" work environment, I'm not sure you realize how ridiculously little testing some code gets before being submitted. Discussing "mutation testing" with the corresponding submitters is like trying to discuss literature with children learning to read. Please wait until they've reached middle school? In the mean time, "manually" point at a couple lines in their submission and ask them if any test fails when they break them. You might rarely ever interact with any such people so you can't notice but I can promise you there are some who never bother to try anything like that. Learning to do that does not require _any_ search which is already too much for people absolutely not interested in spending any time finding any bug with their code[*]. The very first ("baby") step is fixing that _mindset_ and that's already difficult enough. Don't scare these people with textbooks, at least not at first.

I've used an education analogy which makes me wonder: does any programming education teach you anything about test coverage and trying to break your own code? Or version control, or code reviews, or CI, or any quality topic,... I don't remember any at all but it was a while ago. I learned it all on the job. But these were full time software jobs. Now think about all the people who do not software full time and think: How hard could software be? If it were hard, it wouldn't be called "soft"ware :-)

[*] that's the job of the "validation team". Their precious time should be spent writing new bugs^H code.

No safeguards?

Posted May 2, 2025 14:56 UTC (Fri) by marcH (subscriber, #57642) [Link]

> does any programming education teach you anything about test coverage and trying to break your own code? Or version control, or code reviews, or CI, or any quality topic,...

How could I forget the "ugliest" child of them all: build systems :-D

No safeguards?

Posted May 2, 2025 15:34 UTC (Fri) by Wol (subscriber, #4433) [Link]

> Now think about all the people who do not software full time and think: How hard could software be? If it were hard, it wouldn't be called "soft"ware :-)

Job security? If software is hard, you have to leave it to the professionals?

I've only once worked in a pure software environment - it drove me almost suicidal. Pretty much every job I've had has been a small DP team supporting end users. There's no reason why software should be hard. If you have a mixed team of professional end users who can program, professional programmers who can end-user, AND EASY-TO-USE SOFTWARE, then doing things "right" isn't hard. That's why I'm a Pickie!!!

(And I don't call Excel, SQL, BQ/Oracle/etc easy to use.)

Cheers,
Wol

No safeguards?

Posted Apr 29, 2025 15:15 UTC (Tue) by kleptog (subscriber, #1183) [Link]

We solve this by having a "make quick-test" target which looks the the last commit and based on the directories of the files modified runs some specific subset of the tests. So if you're modifying the Makefile or some shared repo it's still going to take a while. But for the majority of patches you end up only running the tests for a single subsystem which reduces the turnaround time significantly.

Sure, it occasionally happens that a patch does actually break a test in another subsystem that you didn't expect, but that's pretty uncommon.

Like you, I wanted to automated this dependency detection, but a hand-maintained list gave all almost all the bang for very little buck.