Zuckerberg: Open Source AI Is the Path Forward

[Posted July 23, 2024 by corbet]

Mark Zuckerberg has posted an article announcing some new releases of the Llama large language model and going on at length about why open-source models are important:

AI has more potential than any other modern technology to increase human productivity, creativity, and quality of life – and to accelerate economic growth while unlocking progress in medical and scientific research. Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.
There is an ongoing debate about the safety of open source AI models, and my view is that open source AI will be safer than the alternatives. I think governments will conclude it's in their interest to support open source because it will make the world more prosperous and safer.

Of course, whether Llama is truly open source is debatable at best, but it is more open than many of the alternatives.

Zuckerberg v. small number of companies

Posted Jul 23, 2024 17:39 UTC (Tue) by intelfx (subscriber, #130118) [Link] (2 responses)

> Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small number of companies

It's kinda funny of Zuckerberg to be talking about "power [that] isn't concentrated in the hands of a small number of companies"...

Zuckerberg v. small number of companies

Posted Jul 23, 2024 17:49 UTC (Tue) by DanilaBerezin (guest, #168271) [Link] (1 responses)

It's obvious he doesn't actually care about any of that. He just doesn't want him and his company to be tied down to Microsoft.

Zuckerberg v. small number of companies

Posted Jul 23, 2024 17:59 UTC (Tue) by intelfx (subscriber, #130118) [Link]

Very much so, yes. It's just funny how much this fake concern is out of place.

What even does "Open Source" mean at this point?

Posted Jul 23, 2024 18:24 UTC (Tue) by atnot (subscriber, #124910) [Link] (4 responses)

Articles like this make me wonder what FOSS even is. Not as an abbreviation, but as a movement.

The core of FOSS, to me, has always been a rebellion against people telling us our way of working shouldn't work: Humans are competitive, there would be no innovation without strict intellectual property, progress can only be driven by rare geniuses with vast power and wealth, the market is the only way to distribute resources and responsibilities, etc. And yet despite that, it doesn't just work, it works better. So much better that even corporations can't ignore it.

This has all of the language of FOSS, it uses the arguments we use. But instead we have a project that's run by one of the fiercly anticompetitive monopolists, under their top down control, which only they can train and only they can even really run at scale, released merely to use the community as a cudgel against their competitors. With the ultimate purpose but to launder copyright through a big machine that overwhelms the joy of human creation with an endless slop of predictive goop.

And sure, we can say to ourselves "well this isn't _true_ FOSS". But does that change anything? Perhaps, if this is what's being said with it, the rhetoric of vague appeals to "freedom" and "openness" has outlived it's usefulness.

What even does "Open Source" mean at this point?

Posted Jul 23, 2024 21:02 UTC (Tue) by dilinger (subscriber, #2867) [Link]

The oppressor will always attempt to co-opt the language of the oppressed. He's just following the standard playbook.

What even does "Open Source" mean at this point?

Posted Jul 23, 2024 21:11 UTC (Tue) by khim (subscriber, #9252) [Link]

> Articles like this make me wonder what FOSS even is. Not as an abbreviation, but as a movement.

There never was any such “FOSS movement”. Free software camp and open source camp were always two distinct factions, even before Open Source have decided to “invent a way to market free software to corporations”. But the two movements are closely related.

> The core of FOSS, to me, has always been a rebellion against people telling us our way of working shouldn't work:

Yes. That's the common part. Even when free software and open source movements believe in different things they both follow them faithfully.

> Humans are competitive, there would be no innovation without strict intellectual property, progress can only be driven by rare geniuses with vast power and wealth, the market is the only way to distribute resources and responsibilities, etc.

Note that this rant, while not uncommon and short is already self-contradictionary. Intellectual property is, by definition, a government-granted monopoly thus if market is the only way to distribute resources and responsibilities then “intellectual property” shouldn't exist for the only justification for it's existence is perceived failure of pure market to promote the Progress of Science and useful Arts.

It's internally inconsistent position to promote the market as infallible and simultaneously, claim that one of the primary tools that is supposed to fix the market failure is Ok, too.

> This has all of the language of FOSS, it uses the arguments we use. But instead we have a project that's run by one of the fiercly anticompetitive monopolists, under their top down control, which only they can train and only they can even really run at scale, released merely to use the community as a cudgel against their competitors.

Well, if people may subvert the regulatory tool that is supposed to promote the Progress and use it stop said progress, instead, then why wouldn't they apply the same trick to other things?

Practically any principle, pushed to it's logical conclusion, starts becoming self-defeating (thing about how FSF, in it's attempt to push for more freedom, started promoting locked-down devices, e.g.)

Corporations just push principles beyond the point where they become self-defeating consciously and knowingly.

What even does "Open Source" mean at this point?

Posted Jul 23, 2024 22:27 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

Corporations have learned to mimic human communication patterns to trick humans into propagating and sustaining them. Facebook here is particularly infamous for doing that at the cost of massive human suffering; Google, mutant anglerfish of a thousand "free" services that it is, does it at the cost of rendering large swathes of human knowledge inaccessible; Microsoft, as a predator of other corporations, gives us a mix of both outcomes. Apple is a bit of an outlier at this scale in that it seems to be motivated by pure greed for money, in the way one would expect of old-fashioned ideas of capitalism.

But make no mistake, "Open Source" was about exploitation from the beginning: it's about giving these ersatz emergent lifeforms inroads to colonise a space that was by and for real people, where interpersonal connections were once the main currency, to strip mine it and extract the surplus value. Without a real person on the other end of the line, when we are no longer writing and sharing software for the benefit of each other, and with the communication channel itself nowadays being interdicted and sterilised to keep people compartmentalised, that unwritten purpose for participating in FOSS has been completely undermined and perverted.

You can contribute to a billion-dollar Open Source project for years and the only thing that will so much as remember your email address is the recruiter spambot advertising fake job listings. It's a sucker's game. And becoming a militant FSF disciple won't provide relief either: it has never been as simple as libre vs proprietary. Software with an obvious master retains that power imbalance even in the presence of a "100% Vegan [A]GPLv3" sticker. Anyone who disagrees is welcome to try and get emacs and gcc to play nice together.

The core of "Free Software" can survive only if people start making an effort to reclaim it and perform some serious chemotherapy. But that is done via empathy, willpower, and clarity of purpose — with the sheer number of undesirable orbiters in FOSS who've been hypnotised to soil their pants violently upon encountering text documents asking them not to be raging bigots, others cheerleading over which alien mold spore they want to be the next master of their schemaless DB or containerization stack like a sportsball league, and a pervasive cult of "free speech" (and negative peace) that ensures the worst aspects will always grow until the rest of the room suffocates — realistically, this movement can do nothing from here on out but collapse under its own weight.

The future of free software will have to happen elsewhere, built by different people, who will be much more scrupulous who they let in.

What even does "Open Source" mean at this point?

Posted Jul 25, 2024 6:10 UTC (Thu) by oldtomas (guest, #72579) [Link]

Wow. Thanks for this furious, at the same time clear-eyed tirade.

May I quote you? As in "things I'd like to say if I had the necessary eloquence"?

"debatable"

Posted Jul 23, 2024 19:11 UTC (Tue) by josh (subscriber, #17465) [Link] (3 responses)

> whether Llama is truly open source is debatable at best

The question of "is a set of weights enough to make something open" is a source of open debate. But independent of that, the *license* of Llama is unquestionably proprietary.

"debatable"

Posted Jul 24, 2024 19:02 UTC (Wed) by Heretic_Blacksheep (guest, #169992) [Link] (2 responses)

Having skimmed the license, it reads like another "source available but don't compete with us" license, so no... neither open source as defined by the OSI nor free software as defined by the 4 freedoms. But it's also not a typical "you see nothing, you pay for the privilege of using" proprietary license. It's a BSL derivative.

So Zuck either doesn't know what open source really is, or he's being deliberately deceptive. Perhaps both. The former doesn't exclude the later. Given Zuck's & Meta's public history, it's easy to conclude this is a deliberate obfuscation and he personally doesn't care what "open source" is, only that it's another buzzword he can use. FOSS washing?

The source is available to view, and the models can be audited, but you can't exercise all the freedoms associated with FOSS. For some, that's enough. But any businesses really should beware of those wishy-washy non-compete clauses.

"debatable"

Posted Jul 24, 2024 19:27 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

The source code for LLMs is pretty much a sidenote. All the interesting stuff is in the weights and the training data.

You can run LLMs with about 1000 lines of pure C code without any external dependencies.

"debatable"

Posted Jul 31, 2024 7:12 UTC (Wed) by cpitrat (subscriber, #116459) [Link]

Compared to a regular program:
- the source code of the model is comparable to the source code of the assembly
- the weights are comparable to the executable/machine code
- the training data is comparable to the source code (and you may consider it generated source code, in which case the scripts scraping the web / downloading databases to generate it would be source code generators)
- the training procedure is comparable to the build system configuration (e.g makefiles)

An appropriate open source/free license applicable to machine learning models would have a vocabulary that encodes these equivalences.

Enlightenment bestowed upon us at last

Posted Jul 24, 2024 1:59 UTC (Wed) by zev (subscriber, #88455) [Link] (1 responses)

Considering how 100% of my social and economic activity these days is, as previously prophesied, done via the metaverse, I have no doubt whatsoever in Zuckerberg's wisdom and foresight.

Enlightenment bestowed upon us at last

Posted Jul 31, 2024 7:20 UTC (Wed) by cpitrat (subscriber, #116459) [Link]

That. I'm surprised how much people who became rich after some commercial success [1] (Zuckerberg, Musk, ...) are erected as prophets by some media [2] no matter the topic they're talking about, no matter how many times they are proven wrong or even obviously lying to defend their personal interests.

[1] in which an unhealthy disregard for ethics and humanity was probably at least as important as their personal talent

[2] I'm not blaming LWN here, this particular article is just a blog post highlight like there are many and is on-topic for LWN.

Beautiful

Posted Jul 24, 2024 5:45 UTC (Wed) by oldtomas (guest, #72579) [Link]

This must be one of the most beautiful examples of Conway's Law [1] I have yet to meet.

Given that LLMs chatbots are bullshit machines [2], seeing the CEO of a company spewing bullshit to promoting their own is... exhilarating.

Frankfurt's "On Bullshit" [3] is on my desk, in my reading queue.

[1] https://en.wikipedia.org/wiki/Conway%27s_Law
[2] https://link.springer.com/article/10.1007/s10676-024-09775-5
[3] https://en.wikipedia.org/wiki/On_Bullshit

Don't be so negative

Posted Jul 24, 2024 8:15 UTC (Wed) by dvandeun (guest, #24273) [Link]

I see a lot of negativity in the comments. If a well-known person speaks fine-sounding words, even if they might be insincere, would it not be more productive to disseminate them widely, and hold him to them?

Sooner or later we will need limits on open source AI

Posted Jul 24, 2024 9:02 UTC (Wed) by roc (subscriber, #30627) [Link] (2 responses)

Sufficiently powerful AI should not be open source, for the same reason we don't want open-source nuclear or biological toolkits that would allow anyone to easily build nuclear or biological weapons: attack is asymmetrically easier than defense, so empowering rogue individuals is too dangerous.

We probably haven't reached that "sufficiently powerful" AI yet, although when the full costs of deepfake scamming are eventually accounted for, we might conclude that threshold has already been crossed. I don't know when we'll reach that threshold, and perhaps it will take a very long time. But at some point we'll have to get off that open-source train, or we'll regret it.

Sooner or later we will need limits on open source AI

Posted Jul 24, 2024 11:57 UTC (Wed) by pizza (subscriber, #46) [Link]

> for the same reason we don't want open-source nuclear or biological toolkits that would allow anyone to easily build nuclear or biological weapons:

Um, the science of nukes and bioweapons is _very_ widely known. The difficulty is in the consistent production of various components at scale. Not unlike the LLMs of today. And not unlike semiconductor fabrication too, I might add.

Sooner or later we will need limits on open source AI

Posted Jul 28, 2024 8:25 UTC (Sun) by oldtomas (guest, #72579) [Link]

"... empowering rogue individuals is too dangerous."

How are Facebook, Google, Microsoft and the other outfits of surveillance capitalism behaving differently from "rogue individuals"?

If anything, they are more dangerous and ruthless.

There are enough reports "out there" showing how they miss their carbon targets, exploit people in Africa in "content moderation" shops to the point of PTSD and so on.

What's the open source in AI?

Posted Jul 24, 2024 14:02 UTC (Wed) by amit (subscriber, #1274) [Link]

The current AI wave is mostly just hype - all of computing has been "AI" anyway. Even my emacs eliza sessions could've ridden the wave back in the day. What we have today is a mix of three things: 1. computing power that crunches data quickly (at a significant cost to the environment); 2. lots of content generated by millions of people that companies use for free (as in beer); 3. writing programs to exploit 1 with data from 2.

So this current talk of "open source AI" only talks about the programs written in point 3 above, but completely ignore the actual content all of the AI is based on in point 2. No damn is given to copyrights. In fact, companies blatantly abuse their "big company" status to infringe on those rights and maybe then settle out of court if sued.

Comparing all this to the FOSS movement is disingenuous. The key aspect of generative AI is the source data that the AI-based data is generated from - without the content from 2, none of the generative AI stuff would work. So only calling attention to the "open source" programs is just smoke and mirrors.

Now - if we were to use ethically, copyright-respecting data at source - even just intra-company data for that company's own internal use - with these open source models - all this talk about open source models works fine. I suspect, though, that after a few iterations within a training model, it'll be easy to lose the source material in a "trusting trust" sort of a way that it'll be difficult to point to how ethically sourced or actually Free the model was to begin with.

Training data

Posted Jul 24, 2024 21:58 UTC (Wed) by kristian.paul0 (subscriber, #119897) [Link]

Would be nice to compare FB's Llama 3 reproducibility compare against with DCLM-7B from Apple since the later training data is coming from an open dataset