Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:44 UTC (Mon) by rgmoore (✭ supporter ✭, #75)
In reply to: Gentoo bans AI-created contributions by farnz
Parent article: Gentoo bans AI-created contributions

Note that a human writer has also had terabytes worth of language as examples before they start producing things that are correct

It's not terabytes, though. A really fast reader might be able to read a kilobyte per minute. If they read at that speed for 16 hours a day, they might be able to manage a megabyte per day. That would mean a gigabyte every 3 years of solid, fast reading doing nothing else every day. So a truly dedicated reader could manage at most a few tens of GB over a lifetime. Most people probably manage at most a few GB. Speaking isn't a whole lot faster. That means most humans are able to learn their native languages using orders of magnitude fewer examples than LLMs are.

To me, this is a sign the LLM stuff, at least the way we're doing it, is probably a side track. It's a neat way to get something that produces competent text, and because it has been trained on a huge range of texts it will be able to interact in just about any area. But it's a very inefficient way of learning language compared to the way humans do it. If we want something more like AGI, we need to think more about the way humans learn and try to teach our AI that way, rather than just throwing more texts at the problem.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:50 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing. It also assumes that we have no pre-prepared pathways to assist with language acquisition.

Once you add in full-fidelity video and audio at the quality that a child can see and hear, you get to terabytes of data input before a human can read. Now, there's a good chance that a lot of that is unnecessary, but you've not shown that - merely asserted that it's false.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 21:34 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (3 responses)

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing.

To the contrary, I think the different learning environment is part of what we need to reproduce if we want more human-like AI. A huge problem with LLM is that they are largely fed a single kind of input. It's no wonder chatbots have problems interacting with the world; they've read about it but never dealt with it firsthand. If we want an AI that can deal with the world as we do, it needs a full set of senses and probably a body so it can do something more than chat or paint.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 9:26 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Right, but you were claiming that because the input to a child can be summarised in a small amount of text, the child's neural network is clearly learning from that small amount of data, and not from the extra signals carried in spoken work and in body language as well.

This is what makes the "training data is so big" argument unreasonable; it involves a lot of assumptions about the training data needed to make a human capable of what we do, and then says "if my assumptions are correct, AI is data-inefficient", without justifying the assumptions.

Personally, I think the next big step we need to take is to get Machine Learning to a point where training and inference happen at the same time; right now, there's a separation between training (teaching the computer) and inference (using the trained model), such that no learning can take place during inference, and no useful output can be extracted during training. And that's not the way any natural intelligence (from something very stupid like a chicken, to something very clever like a Nobel Prize winner) works; we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 10:51 UTC (Tue) by Wol (subscriber, #4433) [Link]

Plus, it's well known that our neural networks have dedicated sub-systems for, eg recognising faces, that can be rapidly trained.

For example, baby learns what mum sounds like in the womb, and that is re-inforced by mum hugging new-born. My grand-daughter was prem, and while there don't appear to be any lasting effects, it's well known that separating mother and child at birth has very noticeable impacts in the short term. Not all of them repairable ...

We're spending far too much effort throwing brute force at these problems without trying to understand what's actually going on. I'm amazed at how much has been forgotten about how capable the systems of the 70's and 80's were - the prolog "AI Doctor" running on a Tandy or Pet that could out-perform a GP in diagnosis skills. The robot crab that could play in the surf-zone powered by a 6502. I'm sure there are plenty more examples, where our super-duper AI "more power than sent a man to the moon" would find it impossible to compete with that ancient tech ...

Modern man thinks he's so clever, because he's lost touch with the achievements of the past ...

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 23, 2024 15:20 UTC (Tue) by Wol (subscriber, #4433) [Link]

> we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

We also don't feed back to our AIs "this is wrong, this is right". So it's free to spout garbage (hallucinate) with no way of correcting it.

Cheers,
Wol