|
|
Log in / Subscribe / Register

Preferred form of modification

Preferred form of modification

Posted Mar 10, 2026 14:55 UTC (Tue) by kleptog (subscriber, #1183)
Parent article: Debian decides not to decide on AI-generated contributions

> ""what is the preferred form of modification for code written by issuing chat prompts?""
> Nussbaum answered that would be ""the input to the tool, not the generated source code"".

I thought: that can't be right. Then I looked at what he actually said:

> Assuming we are making the hypothesis of an LLM that is packaged in Debian and used as part of the build process of the package (so it's a build-dependency, and does not require internet access during build), how is that different from the 'bc' source package's use of flex/bison to generate C source files[0], or the 'swiglpk' source package's use of swig?

There is absolutely no way LLMs are useful for such a scenario. That's like for every build sending a task to Mechanical Turk to code and using the response blindly. LLMs are by their nature non-deterministic (though I guess you could turn down the temperature). That makes it not comparable to flex/bison.


to post comments

Preferred form of modification

Posted Mar 10, 2026 15:42 UTC (Tue) by geofft (subscriber, #59789) [Link] (8 responses)

In the sense of bit-for-bit reproducibility, yes. But in the sense of human understanding, I think it is actually like flex/bison: if you want to understand what a parser is doing, you're going to have a better time looking at the highest-level inputs instead of the C code. Or for autoconf, which I'm more familiar with: it's always a better time editing configure.ac and rerunning autoconf than editing ./configure directly, and also this remains true even if you're on a different version of autoconf and your regenerated ./configure has a whole bunch of uninteresting changes, because the intent of the two generated ./configure files is the same.

The term "preferred form of modification" is from the GPL, and is intended to protect the four software freedoms, specifically, the freedom to study and improve the software, and I think it should be interpreted in that context. By the word "modification" it implies not trying to regenerate anything exactly. I think it's a natural extension to reproducible builds to desire that a small change to the sources produces a correspondingly small change in the binary, but that is not a requirement for the sort of reproducibility you want for automated builds, and it's quite common (especially with compiler optimizations, etc.) for this not to be true already.

For the goal of bit-for-bit reproducibility, I wonder if you can do something like check in both the input and output of the LLM as well a proof that the output was generated from the given neural network and given inputs, which probably just takes the form of the RNG bitstream and the specific order of evaluation (even if you use a DRBG to deal with the randomness, my understanding is that operating on differently-shaped hardware with different parallelism is going to trigger some chaos theory in the outputs of a neural network). Apparently it is also more efficient to verify matrix multiplication than to actually perform it (Freivald's algorithm). This might be both too much data and too much computation to be practical at the moment, but maybe it's what we do many years in the future.

Preferred form of modification

Posted Mar 10, 2026 16:30 UTC (Tue) by neggles (subscriber, #153254) [Link] (3 responses)

Running the same LLM with the same prompts and the same RNG seed on the same device type will always produce the same output, so there's that.

Preferred form of modification

Posted Mar 10, 2026 16:35 UTC (Tue) by koverstreet (✭ supporter ✭, #4296) [Link] (2 responses)

No, it won't. You can run an LLM that way - temperature = 0 - but you generally don't want to. Like in many other algorithms, introducing some stochastic noise often produces better results.

Preferred form of modification

Posted Mar 10, 2026 16:54 UTC (Tue) by geofft (subscriber, #59789) [Link]

There's a difference between setting the temperature to zero, i.e. deterministically taking the most-likely token (basically changing softmax to regular-old deterministic max), and using a PRNG with a deterministic seed with a non-zero temperature, which will sometimes take less-likely tokens but will make that decision in the same way for every re-execution of the same network (with the same hardware, resources, etc.) with the same seed. I agree that setting the temperature to zero is probably not what you want.

Preferred form of modification

Posted Mar 10, 2026 17:34 UTC (Tue) by phm (subscriber, #168918) [Link]

No, it won't. You can run an LLM that way - temperature = 0 - but you generally don't want to. Like in many other algorithms, introducing some stochastic noise often produces better results.
Given the same settings (temperature, model, RNG seed) an LLM will produce the same results. Here are some example sessions run with llama.cpp (LLM is mdradermacher's quantization i1 of Apertus 8B, abliterated) on a Thinkpad.

t420:~/llama.cpp/build/bin$ ./llama-cli --temp 20 --seed 12345 $LLM

> Hello!

Hello! It's great to connect with you. If you have a question or a [^C]

# Running the same command again:

t420:~/llama.cpp/build/bin$ ./llama-cli --temp 20 --seed 12345 -m $LLM

> Hello!

Hello! It's great to connect with you [^C]

./llama-cli --temp 20000000 --seed 12345 -m $LLM

> Hello!

Hello! Welcome to the SwissAI assistant service. What can I help [^C]

Preferred form of modification

Posted Mar 10, 2026 19:50 UTC (Tue) by ptime (subscriber, #168171) [Link] (3 responses)

Flex/bison involve a deterministic mapping between the high level abstraction and generated code. LLMs do not.

Preferred form of modification

Posted Mar 10, 2026 20:17 UTC (Tue) by geofft (subscriber, #59789) [Link] (2 responses)

I don't think this comment responds to any of the things I said about determinism, nor does it take into account any of the existing comments about how deterministic execution of LLMs is entirely possible.

Preferred form of modification

Posted Mar 11, 2026 0:15 UTC (Wed) by ptime (subscriber, #168171) [Link]

It might be possible, just like it might be possible to take the tires off a bike and put furniture casters on instead, but the nondeterminism is why people want to use the LLMs in the first place.

Preferred form of modification

Posted Mar 12, 2026 3:46 UTC (Thu) by gf2p8affineqb (subscriber, #124723) [Link]

But determinism isn't the only point. The point is that other tools have formal semantics, and that changes have a predictable local effect. Compare that to LLM where the semantics are "whatever it outputs" and no one can predict how the output changes in response to a change in input.

Preferred form of modification

Posted Mar 10, 2026 16:56 UTC (Tue) by excors (subscriber, #95769) [Link] (6 responses)

> LLMs are by their nature non-deterministic (though I guess you could turn down the temperature).

I think that's not really true: the LLM basically does a deterministic computation of next-token probabilities, biases the probabilities based on the temperature parameter (low temperature exaggerates the high-probability tokens), then uses a PRNG to make a weighted selection of a single token to add to the prompt, and repeats. Some APIs (though not all) let you select the PRNG seed, in which case the output is usually reproducible with the same initial prompt and seed and other parameters. It seems they often have optimisations that can break the deterministic part, but that's just a quality-of-implementation issue, it's not part of the nature of LLMs.

But (as I understand it) coding assistants are not just an LLM with an input string and an output string. They mix multiple LLM sessions (including LLMs to generate prompts for other LLMs) with external tools (web search, filesystem access, etc) and with user input in a complex feedback loop. "The input to the tool" is not meaningful or useful for modification - that'd be like distributing an image as a list of Photoshop commands and brush strokes applied to a blank canvas.

Preferred form of modification

Posted Mar 10, 2026 19:31 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

Unfortunately, while this is *mostly* true, the probabilities may suffer from numerical stability issues. Each probability falls out of a lengthy sequence of matrix multiply etc. operations. Any of the following changes may invalidate existing seeds:

* Changing whether or not -ffast-math is enabled.
* Compiling with -ffast-math under a different compiler (or different version of the same compiler)
* Linking against a different BLAS library (or different version of the same BLAS).
* Compiling on any platform that fails to uphold IEEE 754 (other than as a result of -ffast-math). Cursory Googling suggests that "modern" GPUs "generally" uphold IEEE754.
* Changing whether or not the hardware can do fused multiply-add (FMA), and/or whether or not the BLAS is smart enough to take advantage of it.

Probably there are others as well, these are just the obvious ones.

TL;DR: Seeds are reproducible assuming we're talking about a specific binary running on specific hardware. In all other cases, you have to audit a lot of miscellaneous stuff to ensure reproducibility.

Preferred form of modification

Posted Mar 10, 2026 21:53 UTC (Tue) by excors (subscriber, #95769) [Link]

That seems no different to any other reproducible builds - you have to use largely the same compiler and compiler flags and hardware architecture etc. (And don't use -ffast-math in any case, because it's explicitly documented as producing incorrect output.)

I think the tricky part of non-determinism in LLMs is there's insufficient synchronisation in their parallel GPU code, so the order of some non-associative FP arithmetic depends on their dynamic load balancing and the GPU's non-deterministic thread scheduling. That's a deliberate tradeoff of performance against reproducibility, and they could choose to implement it the other way if they cared. (https://thinkingmachines.ai/blog/defeating-nondeterminism... has some plausible-sounding discussion of the changes needed to make it deterministic.)

Preferred form of modification

Posted Mar 10, 2026 21:21 UTC (Tue) by kleptog (subscriber, #1183) [Link]

> But (as I understand it) coding assistants are not just an LLM with an input string and an output string.

Right, because directly using the output of an LLM is like recording the output of someone's talking after asking them a question. You get the requests for clarification, side-tangents, ums and ahs, etc. You don't record someone talking, you ask them to write down their answer and let them refine it a few times. And you record that.

There are frameworks that do this, where you have the output stream of "talking" from the LLM and it also has an editor where it can write code, and rewrite it as required. But that's just making the LLM one cog in a larger machine, which makes the whole discussion about only LLMs kind of pointless.

For me LLMs feel like what happens in my mind between forming a thought and opening my mouth to produce grammatically correct sentences to convey a thought. They are a State -> Words transformer, that's all.

Preferred form of modification

Posted Mar 11, 2026 14:06 UTC (Wed) by udorsch (subscriber, #169676) [Link]

> "The input to the tool" is not meaningful or useful for modification - that'd be like distributing an image as a list of Photoshop commands and brush strokes applied to a blank canvas.

That's pretty much exactly how vector graphic formats work and vector graphics are highly efficient, widely used, and in many aspects superior to raster graphics (which would correspond to distributing the result not the input).

Preferred form of modification

Posted Mar 12, 2026 3:48 UTC (Thu) by gf2p8affineqb (subscriber, #124723) [Link]

> list of Photoshop commands and brush strokes applied to a blank canvas.

It is not! That would be a formally defined system like SVG. There is no formal definition for the semantics of an LLM, which make the input impossible to reason about or change in a predictable fashion.

Preferred form of modification

Posted Mar 13, 2026 5:34 UTC (Fri) by sramkrishna (subscriber, #72628) [Link]

It's non-deterministic because the training dictates the choices. If you the same LLM gets updated then the behavior of the LLM can possibly change. If you change the LLM then that will also change. Almost always, you will need extra prompting to add more guide rails. This is one of the weaknesses because you never know what the change will do without very extensive testing. You're going to need a lot of expertise here to do that. I reckon that is not going to be cheap.

Preferred form of modification

Posted Mar 12, 2026 6:51 UTC (Thu) by AdamW (subscriber, #48457) [Link] (1 responses)

I've seen various instances of people fiddling with stuff along these lines, and I think in every case, they fairly rapidly reached the conclusion "it's much better to use an LLM to generate deterministic code to generate the output than it is to call an LLM to do each instance of generation".

Preferred form of modification

Posted Mar 13, 2026 14:10 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Yes. I was using an LLM to help categorize a trove of email backlogs and having it write scripts to do the evaluation which it (and I!) can then use really saves on tokens.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds