Gentoo bans AI-created contributions

Posted Apr 19, 2024 13:41 UTC (Fri) by LtWorf (subscriber, #124958)
In reply to: Gentoo bans AI-created contributions by kleptog
Parent article: Gentoo bans AI-created contributions

> Well, if you're processing 1TB of data into a 1GB model, it's very questionable whether you can really consider it a derived work any more.

So you're saying that if I rip a music CD that is ~700MiB of data, but then use lossy compression and make it into 50MiB of data, I'm actually allowed to do that?

Gentoo bans AI-created contributions

Posted Apr 19, 2024 14:36 UTC (Fri) by farnz (subscriber, #17727) [Link] (10 responses)

14:1 compression like that is well within the expected bounds of today's psychoacoustically lossless techniques. 1000:1 is not, so the argument is that if you rip a music CD and get ~700 MiB PCM data, and compress that down to 700 KiB, the result of decompressing it back to a human-listenable form is going to be so radically different to the original that this use is transformative, not derivative.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 16:17 UTC (Fri) by samlh (subscriber, #56788) [Link] (9 responses)

If you turn the music into MIDI, that could achieve such compression, and would still be a derivative work.

The same argument may reasonably apply for LLMs given how much verbatim input can be extracted in practice.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:36 UTC (Sat) by Paf (subscriber, #91811) [Link] (8 responses)

A lot of verbatim input can be extracted from *me* in practice, surely enough that I could violate copyright from memory.

So uh what about the other stuff I create? I know what good data visualization looks like because I have read many data viz based articles over the years. Etc

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:59 UTC (Sat) by LtWorf (subscriber, #124958) [Link] (7 responses)

Well if you learn the whole divina commedia and then write it down you won't become the author :)

Humans extrapolate in a way that machines cannot. So the comparison doesn't hold.

A human can write functioning code in whatever programming language after reading the manual from that language. A text generator needs terabytes worth of examples before it can start producing something that is approximatively correct.

I don't think comparing a brain with a server farm makes sense.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 9:25 UTC (Mon) by farnz (subscriber, #17727) [Link] (6 responses)

Humans extrapolate in a way that machines cannot.

This isn't, as far as I can tell, true. The problem with AI is not that it can't extrapolate, it's that the only thing it can do is extrapolate. A human can extrapolate, but we can also switch to inductive reasoning, deduction, cause-and-effect, and most importantly a human is able to combine multiple forms of reasoning to get results quickly and efficiently.

Note that a human writer has also had terabytes worth of language as examples before they start producing things that are correct - we spend years in "childhood" where we're learning from examples. Dismissing AI for needing a huge amount of examples, when humans need literal years between birth and writing something approximately correct is not advancing the conversation any.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:44 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (5 responses)

Note that a human writer has also had terabytes worth of language as examples before they start producing things that are correct

It's not terabytes, though. A really fast reader might be able to read a kilobyte per minute. If they read at that speed for 16 hours a day, they might be able to manage a megabyte per day. That would mean a gigabyte every 3 years of solid, fast reading doing nothing else every day. So a truly dedicated reader could manage at most a few tens of GB over a lifetime. Most people probably manage at most a few GB. Speaking isn't a whole lot faster. That means most humans are able to learn their native languages using orders of magnitude fewer examples than LLMs are.

To me, this is a sign the LLM stuff, at least the way we're doing it, is probably a side track. It's a neat way to get something that produces competent text, and because it has been trained on a huge range of texts it will be able to interact in just about any area. But it's a very inefficient way of learning language compared to the way humans do it. If we want something more like AGI, we need to think more about the way humans learn and try to teach our AI that way, rather than just throwing more texts at the problem.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:50 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing. It also assumes that we have no pre-prepared pathways to assist with language acquisition.

Once you add in full-fidelity video and audio at the quality that a child can see and hear, you get to terabytes of data input before a human can read. Now, there's a good chance that a lot of that is unnecessary, but you've not shown that - merely asserted that it's false.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 21:34 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (3 responses)

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing.

To the contrary, I think the different learning environment is part of what we need to reproduce if we want more human-like AI. A huge problem with LLM is that they are largely fed a single kind of input. It's no wonder chatbots have problems interacting with the world; they've read about it but never dealt with it firsthand. If we want an AI that can deal with the world as we do, it needs a full set of senses and probably a body so it can do something more than chat or paint.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 9:26 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Right, but you were claiming that because the input to a child can be summarised in a small amount of text, the child's neural network is clearly learning from that small amount of data, and not from the extra signals carried in spoken work and in body language as well.

This is what makes the "training data is so big" argument unreasonable; it involves a lot of assumptions about the training data needed to make a human capable of what we do, and then says "if my assumptions are correct, AI is data-inefficient", without justifying the assumptions.

Personally, I think the next big step we need to take is to get Machine Learning to a point where training and inference happen at the same time; right now, there's a separation between training (teaching the computer) and inference (using the trained model), such that no learning can take place during inference, and no useful output can be extracted during training. And that's not the way any natural intelligence (from something very stupid like a chicken, to something very clever like a Nobel Prize winner) works; we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 10:51 UTC (Tue) by Wol (subscriber, #4433) [Link]

Plus, it's well known that our neural networks have dedicated sub-systems for, eg recognising faces, that can be rapidly trained.

For example, baby learns what mum sounds like in the womb, and that is re-inforced by mum hugging new-born. My grand-daughter was prem, and while there don't appear to be any lasting effects, it's well known that separating mother and child at birth has very noticeable impacts in the short term. Not all of them repairable ...

We're spending far too much effort throwing brute force at these problems without trying to understand what's actually going on. I'm amazed at how much has been forgotten about how capable the systems of the 70's and 80's were - the prolog "AI Doctor" running on a Tandy or Pet that could out-perform a GP in diagnosis skills. The robot crab that could play in the surf-zone powered by a 6502. I'm sure there are plenty more examples, where our super-duper AI "more power than sent a man to the moon" would find it impossible to compete with that ancient tech ...

Modern man thinks he's so clever, because he's lost touch with the achievements of the past ...

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 23, 2024 15:20 UTC (Tue) by Wol (subscriber, #4433) [Link]

> we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

We also don't feed back to our AIs "this is wrong, this is right". So it's free to spout garbage (hallucinate) with no way of correcting it.

Cheers,
Wol