On (not) replacing humans

Posted Jun 27, 2025 15:30 UTC (Fri) by SLi (subscriber, #53131)
Parent article: Supporting kernel development with large language models

> Levin does not believe that LLMs will replace humans in tasks like kernel development.

I think this is an interesting claim; not because I think it's right or wrong, but because it's a nontrivial prediction, and a quite common one, with no explicitly expressed rationale.

Now there's enough leeway in the language to make it hard to agree on what it might mean. For one, what does "replacing humans" mean? Did compilers replace programmers? After all, people don't need to write programs (i.e. assembly) any more; instead they write these fancy higher level descriptions of the program, and the compiler intelligently does the programming. At least that's how I think it would have been viewed at a time. Or did the emergence of compilers lead to there being less programmers?

But also there are probably unstated assumptions and predictions of the capabilities of LLMs behind this kind of stances. Either extrapolation from the trajectory of LLMs so far, or even deeper stances like "LLMs are fundamentally unable of creating anything new", or beliefs—justified or not—that humans are necessarily better than any algorithm that can exist.

I don't mean to imply that these stances are ridiculous. They may well turn out to be true. And in the other direction, things that don't seem realistic sometimes work out.

I did some LLM stuff before GPT-3 was a thing. It seemed bewildering. I think it's safe to say that nobody in the field predicted the generalizing capabilities of language models.

For a long time, machine learning models meant training a model on data that was pretty similar to what you want to predict on (like creditworthiness). That was not too mind bending.

Then people started to talk about _few-shot learning_, meaning that you'd maybe only need to give a model five examples of what you want to do, and it would understand enough and be generalist enough to handle that. That sounded like scifi.

Then, next there was one-shot learning. Surely *that* could not work.

And the biggest surprise? Zero-shot learning. You just tell the model what you want it to do. I really don't think even people who researched LLMs predicted that before they started to see curious patterns in slightly less capable (by today's standards) language models.

Now, a few years later? GPT-3 feels like something which surely has been there since maybe 90s. It doesn't feel magical anymore, and the mistakes it made are ridiculous compared today's models.