|
|
Log in / Subscribe / Register

Arias: Human proof for FOSS contributions

Rodrigo Arias Mallo, maintainer of the Dillo web browser, has written a blog post with a proposal on one way to ensure that a contribution is written by a human and not AI; he suggests asking new contributors to record their programming session using asciinema.

In the same way that LLMs generate patches, they can also generate the asciinema recordings themselves. Then, the contributors can lie to the reviewers pretending to have made the edits. Perhaps surprisingly, this is not a easy task for LLMs, at least from my observations. The corpus of recordings of developers making mistakes and thinking the whole process of editing a file is not as large as the corpus of FOSS programs and patches in which to train an LLM. During my very simple tests I haven't been able to generate an asciinema session that remotely resembles what I would expect from a human, and even less so from a human with a nice editor theme and editing an existing Dillo source file.

The Dillo project is not yet requiring asciinema recordings, but he said that he would like to test the theory further. LWN covered asciinema in January 2026.



to post comments

This seems unwieldy

Posted May 26, 2026 18:03 UTC (Tue) by dskoll (subscriber, #1630) [Link] (1 responses)

Reviewing an editing session sounds like a very painful expenditure of time. And also, I suspect many would-be contributors would simply refuse to do it. Imagine doing this for every single patch.

I think continuing to have contributors attest that they have not used AI is the best we can do. After all, we already just trust that contributors haven't ripped off proprietary code they happened to have seen, without asking for a recording of their programming session.

This seems unwieldy

Posted May 26, 2026 18:20 UTC (Tue) by tlamp (subscriber, #108540) [Link]

Also, it'd bet it be trivial to fake such session too.

IOW, this just makes it again harder for humans but not really for bots, given that the latter has no problem with wasting time and nerves on such things.

This isn't just anti-ai, it's also anti-gui

Posted May 26, 2026 18:43 UTC (Tue) by jbills (subscriber, #161176) [Link] (5 responses)

I suppose this guy also only wants contributions written in emacs/vim/other terminal text editor. This idea is very silly. If you wanted to be realistic about it, require a video recording, not asciinema. Maybe one day all programmers are forced to become vtubers to contribute to open source. That seems like a good idea.

Speedruns do require this

Posted May 26, 2026 19:19 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Yeah, that's potentially interesting. Speed running is done this way in some categories. Nobody cares that you have some inputs which, if they were made in real time, would constitute the fastest SMB Any% because just that is called TAS (Tool-assisted speed run, typically assembled over an extended period perhaps by a large group co-operating) and already exists separately from "manual" speed running, if you want to claim an actual speed run they need video showing you and your hands making inputs, as well as the video game itself.

Since you're on video constantly while making attempts you might as well be on Twitch where you might even recoup a little bit of money from revenue share. Most of these communities are tiny, maybe you have fifty people peak watching your attempt at Blue Prince no-major-glitches Bequest speed running but you need to record video anyway or your attempts are just hearsay so eh, might as well. And hey, maybe you attract a following, the "hot girl bonus" factor applies on Twitch like most of life, but you might be a more entertaining watch than other players for all kinds of other reasons.

This isn't just anti-ai, it's also anti-gui

Posted May 26, 2026 22:34 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

Not just that -- it would have to be *in a terminal*. I have used X for my Emacs for as long as it's been possible to do so, because I like my fruit salad and my different fonts and my right-click context menus and my keybindings that terminals can't pass on and all that stuff Emacs users are supposed to revile and abhor. Am I supposed to switch to writing code in the straitjacket of a terminal emulator just in order to prove that I'm not slopping? You trust me that little, I don't contribute, sorry.

This isn't just anti-ai, it's also anti-gui

Posted May 27, 2026 12:21 UTC (Wed) by lbt (subscriber, #29672) [Link] (1 responses)

C-x (
?
:)

This isn't just anti-ai, it's also anti-gui

Posted May 27, 2026 16:05 UTC (Wed) by nix (subscriber, #2304) [Link]

A dump of an expanded view-lossage buffer might do, too -- while still being a horrifying intrusion.

This isn't just anti-ai, it's also anti-gui

Posted May 28, 2026 9:50 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Even for the dinosaurs living in a terminal emulator this makes no sense. I use Tilix and will fire a new emulator tab or pane every few minutes to grep, to look at past commits, to compile and so on.

Nonsense

Posted May 26, 2026 19:05 UTC (Tue) by gmprice (subscriber, #167884) [Link] (20 responses)

I understand the general anti-AI sentiment for things like art, but code is code.

There's little difference between well designed code generated by an LLM and well designed code written by hand - except that one took tokens and the other took coffee.

Nonsense

Posted May 26, 2026 19:11 UTC (Tue) by pizza (subscriber, #46) [Link]

> There's little difference between well designed code generated by an LLM and well designed code written by hand - except that one took tokens and the other took coffee.

....that should be "tokens in addition to coffee"

Nonsense

Posted May 26, 2026 19:56 UTC (Tue) by bertschingert (subscriber, #160729) [Link] (18 responses)

> There's little difference between well designed code generated by an LLM and well designed code written by hand

I'd argue that according to the "Programming as Theory Building" viewpoint, there is a potentially significant difference: the presence of a complete mental model of the code in the mind of the human(s) who wrote it.

Whether that actually matters now that LLMs can quickly ingest and "understand" a new codebase and answer questions about it... isn't yet clear to me. I also don't know to what extent prompting with no or minimal coding adequately builds the mental model of how the program works.

But I don't yet find it obvious that the end product - clean, hopefully correct code - is the only value produced and that the craft of coding by hand no longer matters.

Nonsense

Posted May 26, 2026 20:10 UTC (Tue) by gmprice (subscriber, #167884) [Link] (4 responses)

All of what you have described is captured by the words: "Well designed"

Nonsense

Posted May 27, 2026 10:26 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

Well, in that case well-designed code cannot be implemented by an LLM, by construction, because they cannot remember anything they learn outside fantastically expensive and rare training runs: in particular, they will not remember their mental model of the code they wrote between sessions (and their memory capacity / context window is extremely finite even if they did).

Nonsense

Posted May 27, 2026 14:24 UTC (Wed) by gmprice (subscriber, #167884) [Link]

Who said anything about the LLM designing the code?

There is a difference between design and implementation, and what you stated here is absolutely true of humans as well.

The exists no one that can explain the entire design of mm/ to you.

Nonsense

Posted May 27, 2026 19:10 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Modern models can have 1 million token context, which is quite a bit. You can "fake" permanent memory by using a part of this context to describe the "permanent" facts.

You can also fine-tune existing models it's not terribly expensive. But it doesn't work all that well right now. But it's definitely something that's being actively researched.

Nonsense

Posted May 28, 2026 7:10 UTC (Thu) by kleptog (subscriber, #1183) [Link]

They can remember state just fine within a single session. You just need to repeat the instructions every single time. You can even ask it to generate instructions for itself for the next run.

Additionally, if you want to do a lot with a single code base, you can make an optimised version for that code base for far cheaper than training again.

Nonsense

Posted May 26, 2026 22:55 UTC (Tue) by jmalcolm (subscriber, #8876) [Link] (12 responses)

I have noticed that, if a code base is new to me, I can come to a much better overall mental model of the code much faster using AI.

And once you do have a good mental model of the code, you are probably giving the AI such focused guidance that its output is adhering pretty well to that mental model as well. Or your own mental model is adapting.

I am not sure yet what I think about all this stuff. But I am beginning to see clearly that a good developer can use AI to create good code. We may not have figured out how all this is going to work yet but it is looking like a significant force-multiplier. It is hard to imagine a code base being developed long term without it.

An interesting thing to consider is how programmers thought about compilers when they were new. Compilers also "write code" in many ways and free the programmer from the burden of many low-level details with the overall impact being far greater productivity but less low-level control. How many of us write machine code these days? How many even think about it if the test are passing? If compilers were being introduced today, would we have GitHub policies forbidding the contribution of compiler generated code?

In my own mental model, I had been pushing back on this analogy with the thought that compilers are still deterministic while AI is not. More recently, I have shifted my thinking on this. Modern compilers are a lot less deterministic than I have been giving them credit for. With many optimizations, compiled code can look quite different from the original source (auto-vectorization, different branching behaviour, whole function calls and variables eliminated, and so on). And one compiler can produce quite different results from another (including between versions). We do not manage the resulting machine code (or assembly) in our version control systems. In the future, maybe we will not manage what we currently think of as source code either but rather something higher-level. Who knows?

Nonsense

Posted May 27, 2026 4:02 UTC (Wed) by ebiederm (subscriber, #35028) [Link] (5 responses)

We don't manage the resulting assembly in source control. We just have golden build machines, checked in compilers, buildroots and containers all to make the bunaries reproducible and to allow making small bug fixes.

When there have been so many solutions to the same problem it clearly has been worth solving.

Nonsense

Posted May 27, 2026 4:10 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

This actually is one of the approaches that people are trying now: extract instructions to LLMs into a document and use it for versioning.

The problem with it is purely practical, "compiling" a thorough spec for a large project can be expensive, and the result might not be great. If anything, this reminds me of the earlier days of computing. One similar example: the original Macintosh OS was initially written in a Pascal dialect, and then the resulting assembly was modified/optimized by hand because the compiler was not good enough.

Nonsense

Posted May 27, 2026 10:03 UTC (Wed) by callegar (guest, #16148) [Link] (2 responses)

Out of curiosity, how much does storing the "instructions" in a versioned fashion can actually help in practice? What level of reproducibility in the instructions -> result process can be expected when using the "affordable" tools?

Nonsense

Posted May 27, 2026 13:14 UTC (Wed) by NAR (subscriber, #1313) [Link]

People can learn by reading other people's code - I think people can also learn by reading other people's prompts as well. It did help me when I got tips from colleagues and read other prompts to get the hang of this prompting thing (or at least start to get the hang of it).

Nonsense

Posted May 27, 2026 19:07 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

The idea is that you store a detailed spec, then iterate using interactive prompts, and then extract the changes back into the spec. It's a bit too unreliable for me personally, but I saw other people making it work.

Nonsense

Posted May 29, 2026 12:34 UTC (Fri) by anton (subscriber, #25547) [Link]

We don't manage the resulting assembly in source control.
Exactly.
We just have golden build machines, checked in compilers, buildroots and containers all to make the bunaries reproducible
Only if reproducible builds are something you care about.

What most people care about, however, is reproducible behaviour. And by and large, we are getting that when we are compiling source code with a compiler. Is the same true when we store a prompt? If so, we could treat the prompt as source code, and the input to the compiler as some intermediate representation that we do not need to manage as source code.

But I don't think that LLMs are there yet, and I don't think they ever will. We have tried to specify what a program should do for many decades, and found that a full formal specification is a lot of work, and can contain bugs. A prompt tends to be much less formal, and leaves it to the LLM to fill in those things that you have not specified by guessing what you likely want (based on having digested lots of existing programs). But the next time you ask it, it might guess differently.

Nonsense

Posted May 27, 2026 15:48 UTC (Wed) by Wol (subscriber, #4433) [Link] (5 responses)

> In the future, maybe we will not manage what we currently think of as source code either but rather something higher-level. Who knows?

The problem is that the link between source-code, to executable, to program output, is (supposedly) deterministic.

The input to an AI is ambiguous - the job of the AI (or the programmer) is to enforce a deterministic rigidity onto the spec. It's a vague line whether the spec is deterministic or ambiguous, but you can't really put anything on the ambiguous side of the line into an SCCS.

That is the mental problem people have with writing code using AIs - different AIs will create turing-different code, and experience tells us that those differences are usually hard-to-spot bugs. Which is why I'm picking up from a variety of sources that the best way to write code with an AI, is to get it to do a first draft which you then rewrite from scratch. The one time I've done anything like that, that was both the obvious, and a very successful, approach.

Cheers,
Wol

Nonsense

Posted May 27, 2026 16:23 UTC (Wed) by mb (subscriber, #50428) [Link] (4 responses)

> That is the mental problem people have with writing code using AIs - different AIs will create turing-different code, and experience tells us that those differences are usually hard-to-spot bugs. Which is why I'm picking up from a variety of sources that the best way to write code with an AI, is to get it to do a first draft which you then rewrite from scratch. The one time I've done anything like that, that was both the obvious, and a very successful, approach.

True. Prototyping code, learning from it and then rewriting it produces a better overall result than going one-shot.

This is true, but it is independent of whether an AI was involved or not.
Having an AI just massively speeds up the process.

What matters is the development process. Not the tools used.

Nonsense

Posted May 27, 2026 16:28 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> What matters is the development process. Not the tools used.

Um, the tools are an intrinsic part of the process.

Nonsense

Posted May 27, 2026 17:21 UTC (Wed) by mb (subscriber, #50428) [Link]

No, not really. Unless the process is crap and demands the use of a specific tool.

The process of iterating prototypes to a solid design can be described without mentioning any specific tool or whether a part has to be done by AI or human.

Yes, it's easier to demand a specific tool and we all do it because we are lazy. That's why process descriptions often suck.

Nonsense

Posted May 27, 2026 17:08 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

True. Prototyping code, learning from it and then rewriting it produces a better overall result than going one-shot.

"Build one to throw away; you will anyhow." I think this really gets to the core of the problem with AI slop: as of today, a LLM is really only fit for producing prototypes. That's fine if the user understands the LLM's limitations and can either convert its prototype into a finished product or at least pass it to someone else who's capable of finishing it. AI slop happens when the LLM prototype gets treated like a finished product.

Nonsense

Posted May 28, 2026 7:46 UTC (Thu) by taladar (subscriber, #68407) [Link]

It depends a bit on what you actually need the code to do. Claude Code is e.g. pretty decent now writing the kind of code that has been highly repetitive with little innovation before. I used it recently to write a web version of a small CLI tool and it got most of the interface surprisingly correct even initially (though I did need to iterate over that version for a few days to get exactly what I wanted). But that is really the key, is your task essentially just wiring up some standard components that have been done in literally thousands of code bases before or is it something more creative? It also helped a lot that my project was in Rust and had strict clippy lints and other pre-commit hooks.

Loss of words

Posted May 26, 2026 19:05 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

This is ridiculous.

Privacy

Posted May 26, 2026 19:36 UTC (Tue) by npws (subscriber, #168248) [Link]

To me, coding is something private. I'll happily share my results, but that doesn't mean I will peemit someone watch over my shoulder the entire time.

Editing session recording as a throttling mechanism

Posted May 26, 2026 19:53 UTC (Tue) by hailfinger (subscriber, #76962) [Link] (10 responses)

The Dillo project seems to desire only human contributions for philosophical reasons. That is an entirely valid stance.

However, there are also other reasons to reject AI-assisted or AI-implemented contributions.

If the amount (number, size) of contributions to a project is increasing and the reviewer bandwidth is already the bottleneck, it does make sense to filter out the AI slop.

It is possible to increase the reviewer bandwidth by
- adding more reviewers
- asking people capable of review to spend more time on reviewing instead of writing code
- using tools (including forms of AI) to speed up reviews and/or to handle the easily automatable parts of review

However, for small FOSS projects, there often is exactly one person doing the merges and reviews, and that person at the same time tries to add features, fix bugs and maybe also have a job and a personal life. For such projects
- adding more reviewers needs people willing to do that as well as a significant time investment by the maintainer in training the new reviewers
- asking people capable of review (the maintainer) to spend more time reviewing would suck out the remaining fun parts of doing the project, possibly killing the project altogether
- using tools would need a significant time (and money) investment yielding possible results only much later, possibly after said new tool has already been deprecated by the entity offering it.

Requiring coding videos/asciinema sessions is currently a working method to enforce throttling. That may change in the following months.

That said, if there is even a slight chance of revealing secrets during a coding session (API keys shown while searching bash history, typing passwords into the wrong window), this sounds like a security nightmare.

Editing session recording as a throttling mechanism

Posted May 27, 2026 5:07 UTC (Wed) by eduperez (guest, #11232) [Link] (3 responses)

> The Dillo project seems to desire only human contributions for philosophical reasons.

Yes, I have the same feeling: "When receiving patches from first-time contributors it is sometimes hard to determine if the person has used an LLM to write the patch, looking at the code alone.". If the LLM-written code is so good you cannot distinguish it from human code, then where is the issue?

> However, there are also other reasons to reject AI-assisted or AI-implemented contributions.

Yes, but then the focus should by put on the "other reasons" (code quality, unneeded code, ...), not on who wrote the code.

Editing session recording as a throttling mechanism

Posted May 28, 2026 13:05 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

> If the LLM-written code is so good you cannot distinguish it from human code, then where is the issue?

Because appearances are deceptive? AI-produced code *looks* good. But if the AI doesn't *understand* what the code is supposed to do, how can you be sure the code is doing the right thing, rather than confidently doing the wrong thing? You all know people who confidently charge ahead, without any clue as to what they're actually supposed to be doing - there's your typical AI.

Cheers,
Wol

Editing session recording as a throttling mechanism

Posted May 28, 2026 16:14 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

> Because appearances are deceptive? AI-produced code *looks* good. But if the AI doesn't *understand* what the code is supposed to do, how can you be sure the code is doing the right thing, rather than confidently doing the wrong thing?

The proof is in the pudding, right? If it comes with test cases, then validating the test cases are reasonable is often much easier than validating the logic directly. Even the simplest test cases can flush out most subtle potential bugs.

ISTM that part of the problem is that in the Linux Kernel and C programs in general test cases are quite difficult to do. Mocking objects is tedious. The interfaces between the various components of the system are not even visible as source level. You can't easily just swap out a component with another for testing. And of course, testing hardware drivers is another kettle of fish entirely.

The result is that for these kinds of programs code review places a much higher burden on reviewers. As noted elsewhere, if you get a patch for a Rust program that still compiles afterwards, you have a lot more confidence it's not doing anything subtly wrong.

Things to watch for in LLM-authored changes

Posted May 28, 2026 18:54 UTC (Thu) by farnz (subscriber, #17727) [Link]

One nasty thing I've caught reviewing LLM-authored changes (in a Rust codebase at work) is the LLM adding #[ignore] to existing tests (and obfuscating it if you tell it not to do that, via things like #[cfg_attr], comment markers or if false), or even deleting them outright, while adding a decent number of new tests that make sense.

If you don't spot what it's done to your test suite, you get surprised because the patched code compiles and passes all tests, and yet some of the things you "know" you test don't work.

I'd expect this to be even worse for the kernel, where tests are often wrong if they exist at all, and so an LLM producing a plausible comment explaining why the test is bad will result in someone accepting the removal at face value.

Editing session recording as a throttling mechanism

Posted May 27, 2026 22:52 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (5 responses)

In the past few months I've noticed LLMs putting a different kind of pressure on small project maintainers.

These days, anyone can give an LLM a prompt, like "download the source for PackageXYZ, find the bug that causes a crash when -A is used with -B on utf-8 input, and build a binary package I can install on Ubuntu," and the LLM does exactly that. Maybe the solution isn't perfect, but it's good enough that the user never interacts with upstream: no search to find the upstream source, no exposure to the support forums, no bug report, no opinion from the maintainer about whether the fix is correct or not. The only product is a private fork, and the LLM can maintain it indefinitely across future upgrades. Upstream never knows that this occurs.

There are two losers in this antipattern. The first is the user, who doesn't get the benefit of upstream's experience and judgement on the change (maybe it was awful), or exposure to other users who have encountered similar problems or found diverse alternative solutions (maybe the user was holding it wrong).

The second loser is the upstream project, which loses vital information about which parts of the code are irritating users enough that they seek help with redesign or fixes. In the worst case, the LLM finds a security issue, quietly fixes it in a user's private fork, and nobody gets notified. The upstream project never gets small fixes or quality-of-life improvements from casual contributors, because they're all going directly into LLM-managed workspaces. The only users visible to the project are those who have issues too severe for LLMs to resolve, and the shrinking portion of users who don't use LLMs themselves.

This effect seems to be almost invisible to project maintainers. I know it's happening because people I know personally do it, but they tend not to tell anyone else about it, for various reasons. If a user forked the project on github, the upstream maintainer might get a stray notification when two users discuss an upstream bug on one of the downstream forks. In other cases, the upstream maintainer simply stops getting contributions from new users, with no way to tell whether their users have lost interest in the project, or silently replaced the project's maintainer with a robot.

Any barrier to contribution to a project, no matter how small, will accelerate this effect. The DCO, language barriers, legal risks, internet trolls, cultural norms, and social anxiety all filter out some subset of potential project contributors who could otherwise publish their work. Now we can add competitive maintainer service from LLMs to that list.

The solution for this antipattern (assuming that attempts to eliminate all LLMs from existence are not successful) is to find incentives for the LLM users to engage with the projects at some level. Nobody wants AI slop, but many users all asking LLMs to generate the same patch to the same function is a signal: that function, or something connected to it, needs maintainer attention, because users keep tripping over it. The trick is to find a way for a small project to collect that signal in the first place, then extract it from the firehose of noise as the project gains momentum.

When a project maintainer demands invasive and irrelevant performance art as a prerequisite to making a contribution, I think most prospective contributors are not going to open up a terminal session so they can record themselves retyping perfectly adequate code by hand--regardless of whether an LLM was used. The rational response for many people is to close the browser tab, ghost the project, and warn others not to approach.

The request is not just ingratitude to a stranger that might have done a lot of human volunteer work to produce and validate a submission (not to mention salvaging the socially useful products of its electricity and water costs if LLM was involved), and is willing to support the resulting code, only to have it prejudicially refused. It's a clear signal that the project maintainer values their ideology over technical quality, and that the maintainer (and even the project's code) should be avoided for that reason alone.

The best case would be that the contributor makes a public fork with a more inclusive contribution policy, but volunteer contributors are rare enough--volunteer fork maintainers are unicorns.

Editing session recording as a throttling mechanism

Posted May 28, 2026 7:49 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

On the other hand a lot of that work might not have ever happened without the LLM in the first place so I am not sure framing it as a last contributor makes sense, a lost bug report is more the scope I would expect.

Editing session recording as a throttling mechanism

Posted May 28, 2026 16:12 UTC (Thu) by zblaxell (subscriber, #26385) [Link]

First impressions matter. The project's response to a new user's first bug report will affect whether it gets a second bug report--or anything else--from that user.

For open source projects, the project's biggest competitor is often the project's own previous release (now co-maintained by the user's LLM). In small, new projects, there might be an alternative project out there that the user can comfortably switch to. The B project might not be as good, but if it's able to attract project A's contributors as well as its own, the B project will catch up to and overtake the A project. The user might be evaluating A and B at the same time, send a bug report or feature enhancement to both, and focus their future work on the project that is more cooperative.

A large project isn't going to starve for new contributors. The Linux kernel, QEMU, GCC, Rust, Python etc. won't be suffering from LLMs absorbing their new users any time soon. If anything, projects on this scale welcome diversion of new users to local support channels. The network effect is still a strong deterrent against maintaining a private fork. In those cases, yes, an indiscriminate ban on any subset of users will probably only delay bug reports. Eventually, a second user outside of the banned subset will report the same problem.

Editing session recording as a throttling mechanism

Posted May 28, 2026 8:05 UTC (Thu) by NAR (subscriber, #1313) [Link]

This seems to be a question of maintenance of a private fork vs working with upstream. Both require effort and resources, but different kind.

I wouldn't take this recording requirement too seriously - I think it's just "loud thinking" on part of a maintainer of a small open source project.

Editing session recording as a throttling mechanism

Posted May 28, 2026 8:11 UTC (Thu) by chris_se (subscriber, #99706) [Link] (1 responses)

> Nobody wants AI slop, but many users all asking LLMs to generate the same patch to the same function is a signal: that function, or something connected to it, needs maintainer attention, because users keep tripping over it.

Not necessarily though - sometimes you might _want_ users to trip over things if they're doing something wrong because of their own misconceptions.

Think for example the Rust borrow checker: a lot of newer users trip over it because they don't understand lifetimes properly, but it is there for a good reason, and you want it to find these kinds of issues. Granted, this is an extreme example, since most people won't modify rustc. But I hope this does get the point across I'm trying to make.

If the misconception is in the user's mind, and they prompt emphatically enough, any current LLM will eventually follow what the user wants, and then they'll generate a patch that might "work" for their use case, but run contrary to the design decisions the project made for good reason.

Editing session recording as a throttling mechanism

Posted May 28, 2026 18:53 UTC (Thu) by zblaxell (subscriber, #26385) [Link]

The signal indicates the existence of a problem, not necessarily its location.

Even if it's a feature, if users keep stumbling over it, there's some kind of issue. The code itself could be perfect, but documentation is failing to set appropriate user expectations, or the error message isn't informative enough. In cases like the borrow checker, it should be front and center in tutorials (and it is).

More likely, the LLM would work around the problem by making the affected variables explicitly shared, or refactor the code so that it does borrow correctly. If it's a deprecated Rust feature, the LLM might apply migration advice from the Rust project itself. Patching Rust would be rare unless the prompt explicitly framed the issue as a Rust compiler bug, and the user overrode the LLM's training, which probably includes some variation of "never assume a compiler bug when it could be user error."

A better example is Python code that blows up when it hits a malformed UTF-8 string. A user's LLM might fix the symptom by putting an exception trap around the string use, or change encoding to cp437. An experienced human might look at the data flow in the application to find how the string got malformed in the first place, and fix the problem there instead, or be clearer in the docs and runtime diagnostic error messages that malformed UTF-8 is explicitly forbidden.

None of that can happen in the project if its maintainer is not aware that the problem exists because the LLM intercepts the signal from the user to the project.

This is bananas.

Posted May 27, 2026 0:28 UTC (Wed) by masterleep (subscriber, #121749) [Link] (1 responses)

On the internet, nobody knows you're a clanker.

This is bananas.

Posted May 27, 2026 14:28 UTC (Wed) by karim (subscriber, #114) [Link]

This is priceless.

Categorically no.

Posted May 27, 2026 5:35 UTC (Wed) by rrolls (subscriber, #151126) [Link]

If I was required to record a video (for sake of argument I'm being generic and counting asciinema recordings as "videos") of how I came up with a code change from start to finish, I'd simply not do it.

1. I'll be all over the place while programming. It's very frequent to be working on one thing, and encounter another, and go take care of that while you're at it; or, find you're barking up the wrong tree and start from scratch; or, need to do something else in between; or, ... So, I am not going to be recording all that and then painstakingly editing it to show just the relevant bits, and I'm also not going to be recording all that and then submitting something of which the majority is not only irrelevant but quite possibly a privacy violation as well.

2. Being watched affects how you work. If I was recording while trying to do programming, I wouldn't be working as effectively. It's also one of many reasons why doing (individual) programming in an open plan office is a pain. Having someone or something watching over your shoulder hampers your ability to focus. Heck, this isn't limited to programming either. If you're a builder you'd much prefer your customer let you get on with your job rather than standing there watching you.

3. What if you lose the recording? Are you going to argue with a maintainer about why you have no recording? Are you going to put everything on a second screen and retype everything from it just to generate a recording (if so, you could be copying from AI...)? Are you going to attempt to reproduce everything you just did a second time? Most likely you're going to do the same as you would if you had any number of classic catastrophic technical issues that delete your work, perhaps swear for a while about what a silly policy this is, and then try to forget everything that happened and move on. Or maybe you could just fake the recording; see 4.

4. The claim is made that AI cannot fake an asciinema recording (or, generalising, a video) of such a programming session. That is almost certainly because neither of these have been the case yet: (a) people making a habit of recording such things, and (b) AI companies having a reason to train their models to make them. Suppose (God forbid) overnight, every project in the world adopted this policy of requiring these recordings. Suddenly, (a) and (b) are fulfilled; then it won't be long before AI gets good at faking this particular format like it's already good at faking just about everything else. People then start AI generating their "recordings" to tick the silly box, and you're worse off than when you started.

5. Why is this even necessary? If the project has a policy of not using AI (and states this front and center in their README or whatever, like many do), the vast majority of potential AI-using contributors will do the right thing and simply move on; few will dare to submit an AI contribution anyway. Some will be unlucky enough to miss the notice somehow, but will still state that they used AI, and their contribution can be quickly and politely declined. Very few will both miss the notice and fail to disclose AI use - and if they were doing it maliciously, well, people can _already_ submit malicious "contributions" for many other reasons, and maintainers have to be on the lookout for these anyway. Maintainers will get used to their long-term respectful contributors, and they will be able to trust that those people won't have used AI just as they trust that people aren't submitting malware - and for first-timers, maintainers will just have to apply the same vetting process that they hopefully already do.

Sure, of the first two (and perhaps three) points, you might turn round and say I'm just bad at this (though I do reckon I'm far from the only one on each count), but the last two certainly cannot be refuted in that way.

Power trip?

Posted May 27, 2026 7:25 UTC (Wed) by egb (subscriber, #163244) [Link] (8 responses)

Sounds like someone wants to exploit the power dynamic between maintainer and contributor to compel the latter to compromise their privacy more than one could reasonably demand if that dynamic didn't exist, like that between e.g. exam proctor and a student, or an employer and a candidate.

It might be quite funny to do a Paint Drying-style prank [1], where you upload 10 hours of creating and backspacing various files, compelling the reviewer to watch all of it...

[1] Paint Drying - Wikipedia

Power trip?

Posted May 28, 2026 21:25 UTC (Thu) by zblaxell (subscriber, #26385) [Link] (7 responses)

My longest known editing session spans three Debian stable releases. I opened some files shortly after the machine booted under bullseye, and closed the last one for a storage upgrade under trixie, 1108 days later. For most of that time, the editor would have been running from the orphan inode of its binary after the dist-upgrade from bullseye to bookworm.

I left the editor untouched with its files open for weeks or months at a time, so the asciinema recording would make Paint Drying look like a TV commercial break.

Power trip?

Posted May 28, 2026 21:54 UTC (Thu) by dskoll (subscriber, #1630) [Link] (5 responses)

You didn't reboot after the upgrades? Pretty sure bullseye to bookworm to trixie would have updated the kernel. Unless you had it pinned for some reason...

Power trip?

Posted May 28, 2026 21:55 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

You can just not reboot and keep running with the old kernel. It will likely mostly work. Even device hotplug will work, because the old modules will still be around.

Power trip?

Posted May 28, 2026 22:58 UTC (Thu) by dskoll (subscriber, #1630) [Link] (3 responses)

Yes, sure, but if (for example) a newer glibc takes advantage of new or improved system calls, the system could behave a bit weirdly when those calls are missing or lack the improvements.

Power trip?

Posted May 29, 2026 1:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

glibc is pretty good about avoiding the missing syscalls. I had an SBC with an essentially hard-coded kernel and I could keep it running on that old kernel for about 7 years, until the userspace drifted too far away. I think something related to the auxv vector was the last straw that caused it to fail to boot.

Power trip?

Posted May 29, 2026 2:26 UTC (Fri) by dskoll (subscriber, #1630) [Link] (1 responses)

Sure, I have no doubt it can be done, but don't really see the advantage in avoiding a reboot of a desktop system running a mainstream distro on a mainstream CPU architecture. Other than to have uptime bragging rights, I guess.

Power trip?

Posted May 29, 2026 18:57 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

I edit files on a lot of machines. Statistically, I can expect one of those editors will find itself running on a machine that had no requirement to reboot, and was lucky enough to avoid crashes and power failures for a while. I made a list of high uptime machines where I've edited files, and picked the first one in that list where I could confirm the editor session time from other data sources, and it turned out to be a large enough number for some lighthearted reducto ad absurdum.

I can't reliably state what my #2 longest editing session was. Some automation controller or legacy app VM which has quietly run up a 3- or 4-digit uptime day count has probably also hosted my second longest editing session--but I have no way to know which one that was. Sometimes editors crash mid-project, or there's an issue with tmux/mosh/screen/vncserv/rdesktop/etc that kills the editor, or I don't start editing a file until the machine has already been running for a few months, or I close an editor for reasons other than leaving a project. Too many variables to interpolate editor session time from machine uptime, and I don't track editor session time directly.

The second verifiable entry on my list is 61 days (started March 30, still running now) but in reality I'm pretty sure that number wouldn't be in my top 10.

Power trip?

Posted May 28, 2026 22:05 UTC (Thu) by bluca (subscriber, #118303) [Link]

I hope you switched to an editor that can save files to disk after that experience!

What if you just use AI to explore?

Posted May 27, 2026 11:00 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

I sometimes ask AI to make a change, but then discard its work and implement the same feature myself. The AI helped by finding the areas of files I needed to edit, and often by spotting some dependency I didn't think about, but its code is too messy. Also it doesn't follow my preferred style of making lots of tiny commits, with each one being either a pure refactoring (no change to behaviour) or a clearly defined "flip to the new behaviour". Would that be considered cheating here?

What if you just use AI to explore?

Posted May 28, 2026 20:22 UTC (Thu) by MaZe (subscriber, #53908) [Link]

Yeah, I do that all the time too.
Sometimes I'll get 3 or 4 different AI session/implementations, and then mix and match between them - either by hand or by providing much more focused prompts.

How about a `git reflog`?

Posted May 28, 2026 13:42 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Would a `git reflog` of the work, perhaps, be more suitable? It captures just the repository rather than everything being performed on the terminal/desktop session. History can also be forged, but without a proctor for all coding sessions…

Failed before it starts

Posted Jun 1, 2026 0:19 UTC (Mon) by azumanga (subscriber, #90158) [Link]

I just asked Claude to do this for a short PR I made, to go at human speed and make mistakes. I can’t tell it was made by Claude.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds