Back to basics: Are weights software at all?

Posted May 20, 2025 22:24 UTC (Tue) by jzb (editor, #7867)
In reply to: Back to basics: Are weights software at all? by NYKevin
Parent article: Debian AI General Resolution withdrawn

"This may sound like a trivial "is a hotdog a sandwich?" type of question"

That's a solved question. Everyone knows that a hot dog is a taco.

"What makes AI model weights different from every other kind of non-software asset that a Linux distro might happen to distribute?"

In part I would say AI models are different from, say, audio files or PDFs because they are used to accomplish a task as opposed to being merely content. If I'm using Speech Note to transcribe audio to text, I get vastly different output depending on which model I use. It would be possible, I believe, for a model to be trained in some way to refuse to transcribe certain words/phrases or otherwise manipulate the output. Some evildoer could, for example, train the model not to use the Oxford comma!

Likewise, AI models that would be used for code generation might be created in a way to try to insert backdoors or just generate code with specific types of vulnerabilities. (Or a kind of reverse typosquatting where the model inserts calls to malicious Python modules that look like legitimate ones...)

Certain fonts might fall somewhere between content and AI models.

Anyway, the larger point that there's a fuzzy area for not-quite-software without accompanying "preferred form of modification" is taken—but I'd consider AI models different. What that difference means is open for some debate, but AI models are in a different category than PDFs, audio files, and fonts IMO.

Back to basics: Are weights software at all?

Posted May 20, 2025 23:19 UTC (Tue) by excors (subscriber, #95769) [Link]

> Likewise, AI models that would be used for code generation might be created in a way to try to insert backdoors or just generate code with specific types of vulnerabilities. (Or a kind of reverse typosquatting where the model inserts calls to malicious Python modules that look like legitimate ones...)

You don't even need a backdoored model for that: a recent study used various LLMs to generate code, and found around 20% of package references were hallucinated. The LLM just guessed a likely package name and API and hoped for the best. I expect the 'programmer' is typically going to run the code and paste the error messages back into the LLM, so they're not going to notice if an attacker has recently registered those package names. (https://arstechnica.com/security/2025/04/ai-generated-cod...)

Back to basics: Are weights software at all?

Posted May 21, 2025 0:56 UTC (Wed) by pabs (subscriber, #43278) [Link]

There is a font that is also an AI model:

https://fuglede.github.io/llama.ttf/