Debian dismisses AI-contributions policy

Posted May 11, 2024 20:21 UTC (Sat) by flussence (guest, #85566)
In reply to: Debian dismisses AI-contributions policy by Paf
Parent article: Debian dismisses AI-contributions policy

> If we’re just saying “humans are different”, it would be nice to understand *why* in detail and if anything non human could ever clear those hurdles.

Are you saying there's a threshold of "AI-ness", whereby in crossing it, someone distributing a 1TB torrent of Disney DVD rips and RIAA MP3s, encrypted with a one time pad output from a key derivation function with a trivially guessable input, and being caught doing so, would result in the torrent file itself being arrested instead? Does a training set built by stealing the work of others have legal personhood now? Does the colour of the bits and the intent of the deed no longer matter to a court if the proponent of the technology is sufficiently high on their own farts?

Debian dismisses AI-contributions policy

Posted May 12, 2024 1:28 UTC (Sun) by Paf (subscriber, #91811) [Link] (4 responses)

I don't think I understand this comment - It seems to start from the premise that computerized processes are inherently different from biological ones and just proceed from there. I can't really engage on those terms - there's no argument to have.

Debian dismisses AI-contributions policy

Posted May 13, 2024 10:54 UTC (Mon) by LtWorf (subscriber, #124958) [Link] (3 responses)

A person can learn C from 1 book. An AI needs millions of books. Certainly you see a certain difference in orders of magnitude?

Debian dismisses AI-contributions policy

Posted May 13, 2024 15:40 UTC (Mon) by atnot (subscriber, #124910) [Link] (2 responses)

I think calling it "learning C" is being too generous. If you learn a language like C from nothing, you will have a relatively complete understanding of the language and be able to write semi-working, conceptually correct solutions to pretty arbitrary simple problems with relative ease.

LLMs don't have that, they just try to predict what the answer would be on stackoverflow. Including aparently, much to my delight, "closed as duplicate". If you try using them for actually writing code, it very quickly becomes clear they have no actual understanding of the language beyond stochastically regurgitating online tutorials[1]. They falter as soon as you ask for something that isn't a minor variation of a common question or something that has been uploaded on github thousands of times.

If we are to call both of these things "learning", we do have to acknowledge that they are drastically different meanings of the therm.

[1] And no, answers to naive queries about how X works do not prove it "understands" X, merely that the training data contains enough instances of this question being answered to be memorizeable. Which for a language like C is going to be a lot. Consider e.g. that an overwhelming majority of universities in the world have at least one C course.

Debian dismisses AI-contributions policy

Posted May 13, 2024 15:44 UTC (Mon) by bluca (subscriber, #118303) [Link]

> LLMs don't have that, they just try to predict what the answer would be on stackoverflow. Including aparently, much to my delight, "closed as duplicate". If you try using them for actually writing code, it very quickly becomes clear they have no actual understanding of the language beyond stochastically regurgitating online tutorials[1]. They falter as soon as you ask for something that isn't a minor variation of a common question or something that has been uploaded on github thousands of times.

That's really not true for the normal use case, which is fancy autocomplete. It doesn't just regurgitate online tutorials or stack overflow, it provides autocompletion based on the body of work you are currently working on, which is why it's so useful as a tool. The process is the same stochastic parroting mind you, of course language models don't really learn anything in the sense of gaining an "understanding" of something in the human sense.

Debian dismisses AI-contributions policy

Posted May 13, 2024 20:39 UTC (Mon) by rschroev (subscriber, #4164) [Link]

Have you tried something like CoPilot? I've been trying it out a bit over the last three weeks (somewhat grudgingly). One of the things that became clear quite soon is that it does not just gets it code from StackOverflow and GitHub and the like; it clearly tries to adapt to the body of code I'm working on (it certainly doesn't always gets it right, but that's a different story.)

An example, to make things more concrete. Let's say I have a struct with about a dozen members, and a list of key-value pairs, where those keys are the same as the names of the struct members, and I want to assign the values to the struct members. I'll start writing something like:

for (auto &kv: kv_pairs) {
	if (kv.first == "name")
		mystruct.name = kv.second;
	// ...
}

It then doesn't take long before CoPilot starts autocompleting with the remaining struct members, offering me the exact code I was trying to write, even when I'm pretty sure the names I'm using are unique and not present in publicly accessible sources.

I'm not commenting on the usefulness of all this; I'm just showing that what it does is not just applying StackOverflow and GitHub to my code.

We probably should remember that LLMs are not all alike. It's very well possible that e.g. ChatGPT would have a worse "understanding" (for lack of a better word) of my code, and would rely much more on what it learned before from public sources.