|
|
Subscribe / Log in / New account

OSI board AMA at All Things Open

By Joe Brockmeier
November 1, 2024

All Things Open

Members of the Open Source Initiative (OSI) board sat down for a 45-minute "Ask Me Anything" (AMA) session at All Things Open in Raleigh, NC on October 29. Though the floor was open to any topic the audience might want to ask of the OSI board, many of the questions were focused on the Open Source AI Definition (OSAID), which was announced the day before. The new definition has been somewhat controversial, and the board spent a lot of time addressing concerns about it during the session, as well as questions on open washing, and a need for more education about open source in general.

[OSI
board members left to right: Tracy Hinds, Sayeed Choudhury,
Anne-Marie Scott]

The session was held in one of the smaller rooms at the venue, with about 30 people in attendance (not counting OSI board members or staff). The session kicked off with some ground rules from the moderator, Mer Joyce, who had also worked as a facilitator for the OSAID drafting process. The first order of business was introducing the board members in attendance; OSI vice-secretary Anne-Marie Scott, vice chair Thierry Carrez, Gaël Blondelle, Pamela Chestek, Sayeed Choudhury, and Tracy Hinds.

Deborah Bryant, a former board member who currently works with the OSI as its US policy director, got the ball rolling with a question about the most interesting challenge the board expected to face in the next year.

Scott answered that the recent effort to create the OSAID had broadened the OSI's community, which was good but also "problematic". She said that the organization had encouraged a group of people to participate in the OSAID process who "may not have always affiliated themselves with open source" but were "incredibly valuable to the work we have done". Now, the OSI needed to work on keeping them engaged and to make connections to the existing community.

Choudhury agreed, adding that it had been an active year, and that the OSI now needed to get back to core things the organization had focused on in the past with the legal and developer communities. He also drew attention to the policy work the OSI had done, particularly with regard to the EU with the Cyber Resilience Act (CRA). "We've been doing that work, but I think we need to focus on it a lot more."

Bespoke licenses

The first audience question was about the trend of "bespoke" licenses, such as the CockroachDB License, that claimed to be open-source licenses but had restrictions such as "a certain number of users, or under a certain amount of revenue per year". He wanted to know how to navigate those "because right now we have to evaluate every single one of them".

Chestek was the first to respond to the question. Bespoke licenses, she said, defeat the purpose of the Open Source Definition (OSD). The idea behind the OSD was to make adopting open-source software frictionless because users know immediately that they have all the freedoms that they need:

These [bespoke] licenses are not designed to build community, they're designed to extract value out of software by free-riding on the concept of open source. But they're not open source.

What the OSI does, she said, is to occasionally talk about those licenses and point out that there's a reason that open source works, "and that is frictionless adoption". Without that, there is no chance of building community, so what is the point of the license "other than to maybe look like you're a good citizen", without being willing to make the real commitment to an open-source development model.

Carrez said that these new licenses focus on a single benefit of open source, which is the availability of code. As an organization, he said, OSI needed to do more to educate the public that there are additional benefits in terms of innovation and sustainability. He added that developers now take for granted a lot of the benefits of open source.

I can tell you that developing today is very different than developing then. I don't want us to go back to the dark ages of the '90s where you had to basically look at [the license of] every piece of code in order to use it.

How to move forward

The next question was from Nithya Ruff, head of the open-source program office (OSPO) at AWS. She said that "not everyone agrees with the new Open Source AI Definition". There were some points of disagreement, she said, but "a lot of places where we agree". She wanted to know how the community could work together to move forward.

[OSI
board members left to right: Thierry Carrez, Pamela Chestek, Gaël
Blondelle]

Scott said that the simple answer was to keep talking. Some people felt the OSAID was too open, others felt it was too closed, "and then we've got everything in the middle". That is driven, she said, by the kinds of organizations that people work for, as well as the values that they hold. She went on to say that there were more conversations to be had, some in the open and some under the Chatham House Rule, for the "next phase of engagement". Ruff was right, she said, "there's more agreement than there is difference, but on the points of difference, they are strongly held".

Carrez responded that the board didn't have a strong position one way or the other when it started the OSAID process, and that he was "very sympathetic" to the dissenting voices that were heard during the process. However, he said that he realized "the ultimate goal is really to replicate the success we've seen in open-source software to AI" and not to simply translate the OSD to AI.

Some of the tension we've seen is in people that haven't made that mind shift to the wider picture and are just trying to apply what they are very familiar with, that they're experts at, and that made it more difficult.

"Nobody disagrees about the principles [behind the OSAID], where we see disagreement is implementation", Chestek said. The OSI had gone further than others in trying to define a fully open implementation, and tried to put a stake in the ground that defined what it considered open right now. That implementation, she said, is "the piece of it that we know is going to change" but not the principles. Maybe when the industry was more stable, "then I do hope we'll really come to a unified place". She added that it was a "wild experience" to do the OSAID work at the peak of a hype cycle while an industry was being regulated.

Education

The next question came from Carson Shaar, who introduced himself as the co-founder of a company in the open-source space, Zero-True, and a recent college graduate. He said that he'd observed "quite a lack of education" around open source. Universities were doing a lot of teaching around entrepreneurship and building products, "but not a lot of work around contributing to open source and working in open source". He wanted to know what work the OSI was doing to educate and involve students.

Choudhury, who is director of the OSPO at Carnegie Mellon Libraries and director of an Alfred P. Sloan Foundation grant for coordination of University OSPOs, began by saying "I feel your pain". At least in the US, he said, education about open source is "deeply lacking" and fails to help students understand open source beyond the computer-science perspective. The Sloan Foundation is "providing support to the domestic movement" around open source to help university OSPOs give students, faculty, and technology staff a better understanding of the broader open-source ecosystem. "How to navigate in that from a technical perspective, legal perspective, community perspective" as part of the actual educational experience and not just something "on the side". It is, however, early days. "I'm not going to pretend we solve[d] this problem."

Left behind

I asked the next, somewhat long-winded, question. After introducing myself, I noted that I had observed many comments and responses that expressed a feeling that the OSI had chased a "shiny ball nobody asked it to chase", as well as disappointment with the OSAID process and final definition. There seemed to be a loss of trust in the OSI as a result, by the community that put the OSI where it is today. What was the plan to deal with that?

Hinds said that the board recognized that community members were upset, and felt they were not heard. However, "this [definition] was something that was being, I would say, asked and even demanded. We had people saying, 'we need this yesterday'." There was, she said, an underlying assumption that there could be a "translation from OSD to open-source AI" and that the OSI was being trusted to take the process seriously and try to facilitate it.

Choudhury said that the OSI was spurred on by pending regulation. He quoted Mike Milinkovich, executive director of the Eclipse Foundation, as saying that "we just have to get used to the fact that software is about to be regulated" in the context of the CRA. There were clear signals that regulation was about to start, including the use of the term "open-source AI" without defining it. What other group, he asked, is really better positioned to define it? But that meant reaching out to new sectors involved in regulating AI, "which was always going to be messy".

Carrez replied that the OSI "may not have done a great job promoting" the work that it has done to "have the back of developers" in the face of regulation such as the CRA. The regulatory landscape, he said, would be very different without the work done by the OSI and others, that would have put open source at risk.

There is also the fact that the OSI had to work with a lot of stakeholders, and that people from the OSI's traditional constituency were some of the voices heard from during the process, just not the only voices. That, he said, caused some frustration.

Blondelle argued that the OSI was not chasing a shiny ball, but trying to protect the original definition. The OSI had seen vendors using the term open source for things that were clearly not open source, "so I think we had to define open source AI because otherwise we would have lost some ground" on the OSD.

Hinds replied again that she wanted to make it clear that the OSI board had "felt that letdown". It would be spending energy on "reinforcing the value we provide to legal and developer communities, because we feel the pain of them feeling let down and need to do that repair" while making sure to include newer communities to figure out how they can all work together when needed.

"Stable, but not permanent"

The next question came from an attendee who said he was acting deputy chief AI officer for the Cybersecurity and Infrastructure Security Agency (CISA). He said that CISA leads the effort to manage risk in cyber and physical infrastructure, and said that the OSD serves as "a sort of risk tolerance and risk statement" about the acquisition and security risks of software. He said that helps him to advocate for open-source software in government as the best solution that will give the best outcomes to mitigate risk. However, risk was not one of the things that was mentioned about the OSAID, he said, and he wanted to know how CISA could help "sort of drive towards definitions that are equivalent risk-management postures" that would help him with security decisions and recommendations for AI systems.

Carrez said that was an interesting question, and that the OSI was staying ready to evolve the definition. One of the questions, he said, is what is the best way to patch and run AI systems? Comparing pure software to AI systems is difficult, for example the economic cost of generating the AI system's models. "The reality is that if only a handful of companies and a handful of governments have the resources" to rebuild models, it is not a practical goal for open-source AI. There is more and more evolution, he said, on the ways that models are fine-tuned or patched in a less costly way. It was important, though, to "put a stake in the ground" with the OSAID to have something to work from to have the discussions.

Choudhury also noted that the OSAID was the start of a journey and "a stable definition, but it's not permanent".

Open washing

The final question was about combating open washing, and how the OSI, government, and developer communities should be trying to prevent bad actors or others from misrepresenting software or AI systems as open if they are not.

Chestek said that this was not a new thing for open-source software and had probably been going on as long as it had existed. The OSI relies a lot on the community to do communal shaming, which is "probably the most powerful" way to combat open washing. When a company misrepresents its software, the OSI usually finds out about the situation from the community. Then the OSI would say something publicly, if it was appropriate to do so. That, she said, would probably carry over to the OSAID as well and she hoped "we can all converge at least on the principles of it". For example, if a system has a commercial limitation on it, "that's just not open source, and we don't even need to get into the weeds about whether or not you provided all the information about the data".

"Can I fork it?" asked Hinds. From a practical standpoint, she said, the right to fork translates to AI, and that is what the OSI is going for. "Can I look at this model? Can I work with this? Can I do something with it? I think that's really easy to resonate with our existing communities" as well as new ones coming into the open-source space.

The board also had an opportunity to give parting thoughts to the audience, which Carrez used to thank attendees and encouraged people to join the OSI and run for the board if they were interested in helping to make open source better.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event. ]


Index entries for this article
ConferenceAll Things Open/2024


to post comments

Missing the target

Posted Nov 1, 2024 22:15 UTC (Fri) by ballombe (subscriber, #9523) [Link] (1 responses)

> ""The reality is that if only a handful of companies and a handful of governments have the resources"" to rebuild models, it is not a practical goal for open-source AI.

This is what openAI et al. want you to believe, but this is not true. There exist small models that can be rebuild
on commodity hardware. There are the ones which are important for the true purpose of open source AI : to allow to
understand and demystify how models are built.

Obviously big players need the process to be kept mysterious so that they can make whatever claim about it to judges, politicians and the general public. But the OSI must not become complicit in this.

Missing the target

Posted Nov 2, 2024 0:57 UTC (Sat) by josh (subscriber, #17465) [Link]

Also, in the very early days of Open source, far fewer people had the tools to work on software as well. Computers weren't as widely available, compilers and toolchains weren't as widely available, documentation wasn't as widely available...

Things get better over time. But not if you give up and decide to weaken your values to match what you think today's limitations might be.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 2, 2024 5:17 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (20 responses)

yes, yes… the board was open… but those who actually drove the process weren’t and aren’t,
actively excluding voices all over, from FOSS experts to former board members, it didn’t matter,
dissenting voices are to not be taken for full by OSI.

> incredibly valuable to the work we have done

I bet the work was even more valuable to those…

> who ""may not have always affiliated themselves with open source""

And! Yes! Of course! Blame it on us poor people who don’t just…

> replicate the success we've seen in open-source software to AI"" and …

…insist on…

> simply translat[ing] the OSD to AI

… because we don’t have the “mind shift” necessary. Sure. That will be it.

I’ve seen enough. I barely skimmed the rest; OSI has successfully made itself obsolete.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 2, 2024 8:28 UTC (Sat) by josh (subscriber, #17465) [Link] (17 responses)

> OSI has successfully made itself obsolete.

They did that a while ago, when they approved the "CAL", a proprietary license with usage resrictions that nonetheless can now masquerade as Open Source because it has OSI approval.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 4:54 UTC (Mon) by Paf (subscriber, #91811) [Link]

Can you say more about the issue with the CAL?

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 6:09 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

Huh, I hadn’t see that. It’s well hidden, and the language is obscure enough to qualify as deliberately confusing to people who are not English-language-native lawyers, too.

When they approved the “Unlicense”[sic!], an extremely badly worded combination of a (possibly not legal) PD waiver and a (definitely botched and not even remotely near working) attempt at a fallback licence that fails to actually licence anything of relevance, I was fed up enough, tbh. But this shows they lost the whole reason they exist in the first place.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 10:27 UTC (Mon) by kleptog (subscriber, #1183) [Link] (14 responses)

So I thought I'd google to figure out what you mean, and end up with an article like this one which is totally useless. It does link to the license, but is basically an entire article full of quotes from people I don't know who disagree with each other, without even bothering to quote or reference the actual parts of the license that are a problem, so how can I make up my mind? On the other hand the OSI has an article which explains it well enough. It seems to be targeted at a very specific use-case.

People don't even bother trying to convince other people of their position anymore. Just a lot of "I think X" as if that is somehow enough to change my mind.

I asked ChatGPT for a summary why it's controversial and I can see it's unusual for a software license, I don't see anything that could label it "proprietary". Unless you take the position that any non open-source license is proprietary and there is no grey area.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 11:08 UTC (Mon) by intelfx (subscriber, #130118) [Link] (2 responses)

> Unless you take the position that any non open-source license is proprietary and there is no grey area.

Uhm, yes?

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 15:33 UTC (Mon) by kleptog (subscriber, #1183) [Link] (1 responses)

You think there is no value in distinguishing between the CAL, which

> gives You unlimited permission to use and modify the software to which it applies (the “Work”), either as-is or in modified form, for Your private purposes, while protecting the owners and contributors to the software from liability.

and say, a Microsoft Windows license which doesn't even give you the source?

Do they even see themselves how utterly ridiculous they are?

Posted Nov 13, 2024 19:27 UTC (Wed) by ssmith32 (subscriber, #72404) [Link]

A misleading and logically flawed argument.

Stating you should not call CAL open source, does not preclude differentiating between CAL, and other licenses. It just precludes calling it open source.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 12:48 UTC (Mon) by jkingweb (subscriber, #113039) [Link] (2 responses)

I skimmed the license text and couldn't even identify what makes it not-open source. I'm really not understanding why I should be worked up about it.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 5, 2024 0:12 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

After skimming the article and spending far more time than I should have clicking around this thread[1], the problem appears to be related to these two sections of the license:

> 3.1. Permissions Granted
>
> Conditioned on compliance with section 4, and subject to the limitations of section 3.2, Licensor grants You the world-wide, royalty-free, non-exclusive permission to:
>
> a) Take any action with the Work that would infringe the non-patent intellectual property laws of any jurisdiction to which You are subject; and
>
> b) Take any action with the Work that would infringe any patent claims that Licensor can license or becomes able to license, to the extent that those claims are embodied in the Work as distributed by Licensor.
>
> 3.2. Limitations on Permissions Granted
> The following limitations apply to the permissions granted in section 3.1:
>
> a) Licensor does not grant any patent license for claims that are only infringed due to modification of the Work as provided by Licensor, or the combination of the Work as provided by Licensor, directly or indirectly, with any other component, including other software or hardware.
>
> b) Licensor does not grant any license to the trademarks, service marks, or logos of Licensor, except to the extent necessary to comply with the attribution conditions in section 4.1 of this License.
>
> [...]
>
> 4.2. Maintain User Autonomy
> In addition to providing each Recipient the opportunity to have Access to the Source Code, You cannot use the permissions given under this License to interfere with a Recipient’s ability to fully use an independent copy of the Work generated from the Source Code You provide with the Recipient’s own User Data.
>
> “User Data” means any data that is an input to or an output from the Work, where the presence of the data is necessary for substantially identical use of the Work in an equivalent context chosen by the Recipient, and where the Recipient has an existing ownership interest, an existing right to possess, or where the data has been generated by, for, or has been assigned to the Recipient.
>
> 4.2.1. No Withholding User Data
> Throughout any period in which You exercise any of the permissions granted to You under this License, You must also provide to any Recipient to whom you provide services via the Work, a no-charge copy, provided in a commonly used electronic form, of the Recipient’s User Data in your possession, to the extent that such User Data is available to You for use in conjunction with the Work.

Note also that "Recipient" is defined in a way that is similar to the requirements of the AGPL (i.e. it includes people who interact with the software over a network).

In English:

* The CAL license is both a copyright license and a patent license. So we need to analyze it like a patent license, and not just like a copyright license.
* If you allow an end user to interact with CAL-licensed software in some way that generates data, you must allow the end user to obtain a copy of their data, in a format that can be directly used by the software, and you must not modify the software in such a way that the user can't actually run it on their own private copy of their data.

Bruce's argument, as far as I can follow it, appears to be that the people who made CAL intend to use software patents to enforce the data availability provision as applied to their particular use case. In other words, they are not merely requiring that software based on the CAL-licensed code comply with this data availability rule, but are instead attempting to impose this requirement on all software that interacts with their (decentralized?) system, regardless of where the code came from. Perens also argues that this could be much more straightforwardly accomplished by simply requiring participants in this system to sign a contract relating to user data.

I'm not thrilled with the use of software patents for this use case. But I'm also not entirely convinced that this is a problem specific to CAL. This looks a lot more like a "software patents are evil" problem than a "CAL is not OSD-compliant" problem, at least from where I sit. Other participants in the thread pointed out that most other FOSS licenses (which mention patents at all) have similar "we are only licensing the patents that would otherwise be infringed by verbatim distribution" clauses, so it is rather difficult to argue that CAL violates the OSD on that basis, without then concluding that many long-accepted licenses also violate the OSD.

[1]: https://lists.opensource.org/pipermail/license-review_lis...

Do they even see themselves how utterly ridiculous they are?

Posted Nov 5, 2024 14:04 UTC (Tue) by jkingweb (subscriber, #113039) [Link]

Thanks. I wonder how much of people's displeasure stems from OSI conferring "approval" rather than simply affirming consistency with the Definition.

Do I like the terms of the Cryptographic Autonomy License? No. I don't even like its name. Fortunately I don't have to use it, nor use any software which employs it. But it does seem to be consistent with the Open Source Definition at least as much as the AGPL, so I don't really see any inconsistency on the part of OSI.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 4, 2024 17:14 UTC (Mon) by excors (subscriber, #95769) [Link] (7 responses)

An old LWN article quotes and summarises the relevant bits of the licence: https://lwn.net/Articles/797065/

> The intent of all this language is relatively straightforward: if you are a user of an application (perhaps hosted on the net somewhere), you have the right to extract your data from that application to use with your own modified version of the code. Control of data is not to be used to lock users into a de-facto proprietary system.

From that article plus the Register one and some mailing list posts, it sounds like Bruce Perens didn't mind that specific licence's requirements, but he did mind the general idea of Open Source licences imposing requirements on data, largely because it makes the licences much trickier to comply with ("It's a good goal but it means you now need to have a lawyer to understand the license and to respond to your users"), and Open Source ought to be made simpler instead. He thought data restrictions should be completely out of scope for Open Source, and the OSI should reject CAL on that basis.

The OSI didn't have an existing policy on that, so it sounds like his arguments about field-of-use restrictions were stretching to find a technical justification within the OSD to reject it. And then he got increasingly frustrated when people didn't agree with him about the proposed policy or about that interpretation of the OSD, until he ended up quitting the OSI and accusing the board of conspiracy.

Incidentally, since at least 2020 he's been talking about what is now Post-Open (https://postopen.org/), apparently with the intent of supplanting Open Source by having a large number of software projects under a single new zero-cost licence for individuals and small businesses, and a second licence for larger businesses which requires royalties of 1% of the business's entire revenue paid to the Post-Open organisation. That organisation will subtract operating costs then divide the rest amongst individual developers (or their employers) in proportion to how widely each project is used and the number of lines of code each developer has written in the projects' Git repositories.

The goal is to make it simple for companies to comply - they don't have to spend any effort working out exactly which projects they use and how to follow all their different licences or pay for multiple support contracts etc, they just make a single payment to one organisation and that covers all their software - while also making them fund maintenance of the projects they rely on. Which doesn't sound like a bad goal in general, but his specific approach seems, uh, questionable. And I guess it's unsurprising he fell out with the OSI when his ideas are so radically different to Open Source.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 5, 2024 12:58 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

> and a second licence for larger businesses which requires royalties of 1% of the business's entire revenue paid to the Post-Open organisation.

So a large business that has nothing to do with the software industry will end up paying far more than a "small" software house, in return for much less benefit ... and depending on the definition of "revenue" a licence may well be out of reach for companies in low-margin businesses ...

That doesn't sound a sensible business model at all. "Idealism, meet reality! Score, reality 1 idealism nil".

Cheers,
Wol

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 14:40 UTC (Thu) by Karellen (subscriber, #67644) [Link] (2 responses)

I'm not sure I'd give Bruce Perens' idealism a score of 'nil'. Pretty sure he's put a few points on the board over the yearsdecades.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 14:59 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

I'm not saying I disagree with his idealism. I just think - in this particular instance - it won't work. Businesses are as assorted as people, and while it would work for some, I don't think it would work for most.

Cheers,
Wol

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 17:32 UTC (Thu) by excors (subscriber, #95769) [Link]

He accepts it won't work for some companies:

> We also have some non-goals:
> * Don’t worry about when or if today’s giant companies will join Post Open. The answer to "When will IBM come on board" may well be "never", because for such a large company, participation would mean a new USD$60 Million dollar per year fee (1% of USD$6 Billion). Instead, attract small and new companies with a free license, and grow them into paid license customers.
(https://postopen.org/how-post-open-works/)

...but I think those numbers are wrong - IBM's annual revenue is more like $60B, so the fee would be $600M per year. ($6B is IBM's recent annual net income, or their quarterly revenue from "software" alone (excluding "consulting", "infrastructure", etc))

And IBM isn't even a very big company by revenue (219th according to https://fortune.com/ranking/global500/ - the top companies are 10x higher), and it gets a higher proportion of its value from open source software than most companies, and it has reasonably decent net profit margins (~10%, compared to e.g Walmart's 2% which is typical for the grocery industry, meaning this fee would be half of Walmart's entire profits), so IBM is one of the better cases for this licence.

I think the non-goal is effectively excluding all companies in low-profit-margin industries, and most large-ish companies in high-profit-margin industries, and any small/medium company which hopes to either grow into a large-ish company or be acquired by one. So I find it hard to imagine _any_ company would ever agree to this. And without buy-in from companies totalling probably billions of dollars in revenue, there won't be enough money to fund the project.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 14:34 UTC (Thu) by Karellen (subscriber, #67644) [Link] (2 responses)

And I guess it's unsurprising [Bruce Perens] fell out with the OSI when his ideas are so radically different to Open Source.

This feels like an odd take. Are you familiar with the history of the Open Source Initiative and its Open Source Definition?

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 16:16 UTC (Thu) by excors (subscriber, #95769) [Link] (1 responses)

Yes, but evidently his views have changed after decades of experience, since he now says Open Source has failed. I think his main arguments are: Open Source's most significant achievement has been to create wealth for proprietary software companies like Google and IBM. There are very few successful Open Source projects for regular users. Critical infrastructure is severely underfunded. Licence violation is widespread and enforcement is almost nonexistent. And there is no workable solution to those issues within the Open Source paradigm. So he's working on a new paradigm, which he has explicitly called "a radical idea". (I'm basing this on https://postopen.org/ and some interviews and talks he's given.)

I have the impression that the wider Open Source community wouldn't fundamentally disagree with those problems he highlights, but the little discussion I've been able to find about his proposed solution has a lot of criticism and almost no support. Most people seem happy to bumble along with Open Source as it is now despite its problems, with perhaps a few incremental changes, while he's trying to shake things up, so it's unsurprising when that causes friction.

Do they even see themselves how utterly ridiculous they are?

Posted Nov 7, 2024 17:27 UTC (Thu) by pizza (subscriber, #46) [Link]

> Licence violation is widespread and enforcement is almost nonexistent.

That's only a problem for copyleft "Free Software". It's nearly impossible to violate (therefore there is little need to "enforce") the "permissive" licenses that the "Open Source" movement embraced.

(Congratulations; once again, Stallman's predictions have been shown to be accurate....)

Do they even see themselves how utterly ridiculous they are?

Posted Nov 2, 2024 21:44 UTC (Sat) by Shamar (guest, #122602) [Link] (1 responses)

> the board was open…

Not much. Several people were silenced, harassed or censored during the "co-design" process.

Julia Ferraioli paid a mental tool for trying to defend open source freedom from OSI's open washing goals:
https://archive.is/paD1W

But several other people felt the same and didn't dare to speak about it in public.

As for me, I was silenced for several weeks, and my posts were deleted because they debunked the OSI narrative:
see for example my reply to Carlo Piana (OSI board member) https://archive.is/JPSRX#post_3 that was removed from the forum: https://discuss.opensource.org/t/fsf-announced-basics-of-...

And something they don't even mention was the role Meta employees had to exclude training data from the requirements: here's where they admit the trick adopted (discovered by a user that was later silenced too) https://discuss.opensource.org/t/we-heard-you-lets-focus-...

For sure, the OSI has made itself obsolete, but too few people are aware of the alternatives such as https://opensourcedefinition.org/

Do they even see themselves how utterly ridiculous they are?

Posted Nov 3, 2024 14:36 UTC (Sun) by kleptog (subscriber, #1183) [Link]

While I can't really comment on any of the specifics you give, reading some of the posts I was amazed by the idea therein that whatever the OSI comes up with is somehow relevant for the implementation of the EU AI Act and exemptions within there. That's not how this works.

The relevant definition of open-source for AI for the purposes of the Act is described within the Act itself (recitals 102-104). No, they don't require the providing of the training data. But more importantly, an exceptions for open-source AI are not available for any product/service placed on the EU market (Article 2(12)). So essentially unavailable for any commercial party like OpenAI or Meta. The idea that commercial parties could hijack the OSI process to secure themselves exemptions to the EU AI Act is just so far off the mark it's silly.

The exemptions also don't cover providing a summary of the training data and showing you complied with copyright restrictions. Which are probably the ones commercial companies are most interested in.

Don't get all the hate

Posted Nov 3, 2024 5:07 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

I don't get all the hate.

The FSF is free to come up with its own definition and push it. Nobody stops them.

And the OSI made it clear that the current definition is not the only possibility. Just like we have various copyright licenses with various levels of restrictions (from the super-restrictive AGPLv3 to the WTFPL).

Don't get all the hate

Posted Nov 3, 2024 7:22 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

It's the modern state of politics. There is no gray. Either you're a good guy ("someone who agrees with my political goals") or a bad guy ("someone who does not").

Don't get all the hate

Posted Nov 3, 2024 13:49 UTC (Sun) by zack (subscriber, #7062) [Link] (1 responses)

Very much that. And this way of debating legitimate policy disagreements (in this case on what "open source AI" means) is making the free software movement weaker against our actual opponents. Meta & friends, who have been calling "open source" AI systems that will never pass the OSAID bar, are eating tons of popcorn while watching this infight, and while lobbying regulators that *their* notion of "open source AI" is the real one. Good luck to us all.

Don't get all the hate

Posted Nov 4, 2024 19:02 UTC (Mon) by lmb (subscriber, #39048) [Link]

I think the observation that there are systems that are proclaimed to be "open" that don't ever meet the OSAID bar is valid.

And yes, OSAID *is* better than nothing.

I disagree with the assessment that this means we should not voice criticism to the term, nor that doing so is harmful.

OSI chose to use a very comprehensive single term with zero differentiation and significantly lower standards. They *could* have done it differently. Same with publishing a definition as "1.0" that actively asks for industry endorsement. But tell me, how would you actually comply with it? Under what terms would you make all that extra info available?

They position themselves as an authority and *the* steward. Their results get evaluated according to those claims, and their actions questioned for potential ulterior motives.

Calling that "infighting" ain't great, when folks see serious possible consequences of what they're pushing out. (e.g., the impact on political regulations.)

Don't get all the hate

Posted Nov 3, 2024 13:01 UTC (Sun) by ballombe (subscriber, #9523) [Link] (3 responses)

> I don't get all the hate.
Disagreeing is not hate. Claiming otherwise is just a way to stifle debate.

Don't get all the hate

Posted Nov 3, 2024 16:44 UTC (Sun) by randomguy3 (subscriber, #71063) [Link] (2 responses)

sure, but there's a difference between saying "i don't think this was the right decision" and saying "only someone who was trying to destroy everything the open source movement stands for could have made this decision", or "there is no way anyone who made this decision could be taken seriously again". we've seen a lot of the latter ("they have lost all credibility" has popped up multiple times, even here on lwn where comments tend to be more moderate, with nothing to back up this assertion that the wider community's perception of OSI has actually changed significantly).

Don't get all the hate

Posted Nov 4, 2024 21:44 UTC (Mon) by ballombe (subscriber, #9523) [Link] (1 responses)

It is still not appropriate to qualify any of this as "hate". It is not.

Don't get all the hate

Posted Nov 5, 2024 3:27 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

I disagree. The term "hate," in this context, is shorthand for "hateful rhetoric." It is not a claim as to anyone's actual emotional state, it is merely a claim that the words used are excessively charged and negative.

Which they are. You do not need to impugn people's motivations to argue against their interpretation of "open source." That is a pure ad hominem attack which has no place in polite discourse. Dismissing it as "hate" is an entirely reasonable and proportionate response.

This is an unhelpful response from OSI

Posted Nov 3, 2024 16:47 UTC (Sun) by IanKelling (subscriber, #89418) [Link] (2 responses)

> "Nobody disagrees about the principles [behind the OSAID], where we see disagreement is implementation"

That is a friendly sounding statement which I find rather nasty: it groups all the critics together and suggests they have a fundamental deficiency that is enough to not take them very seriously. But on the other hand, any criticism without that deficiency, well that is a disagreement on principles and you should not expect those to be resolved. The disagreements are important and this not a good response.

> She wanted to know how the community could work together to move forward.

> Scott said that the simple answer was to keep talking

You had a q/a, and chose to impugn your critics in various ways and did zero addressing of the substance of their criticism. That kind of "moving forward" is the kind where you can call it moving forward without moving anywhere.

This is an unhelpful response from OSI

Posted Nov 3, 2024 20:01 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

>> > "Nobody disagrees about the principles [behind the OSAID], where we see disagreement is implementation"
>
> That is a friendly sounding statement which I find rather nasty: it groups all the critics together and suggests they have a fundamental deficiency that is enough to not take them very seriously.

Frankly, I am not able to read that statement in the way you seem to be reading it. I do not understand "disagreement [about] implementation" to imply "deficient disagreement," nor to imply that such disagreements should not be taken seriously.

What I do understand is the background context: A large group of commercial entities (commercial-ish, in the case of OpenAI) have gone around claiming that their AIs are "open" in one way or another, despite flagrant violations of OSD#6 and various other parts of the OSD (to the point that OpenAI's product is just straight proprietary software, with no attempt to justify the use of the word "open" in their name whatsoever). What the statement you quote appears to be saying is that everyone agrees that those products should not be described as "open-source AI" - and under OSAID 1.0, they are all excluded, which is an improvement over the situation where these companies were going unchallenged in their use of this term.

This is an unhelpful response from OSI

Posted Nov 3, 2024 20:20 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

The OSI definition of "Open" is better than the status quo ante. Now it's clarified that the open models should make the weights available with clearly defined restrictions and document the training data set.

This alone will make it easier to filter out models that are not at all open (e.g. "OpenAI" or Facebook's llama).

The next step is to have the "Open Training Set" definition.

Open Source AI is open-washing by any way of looking at it

Posted Nov 4, 2024 18:53 UTC (Mon) by lmb (subscriber, #39048) [Link] (4 responses)

> Choudhury said that the OSI was spurred on by pending regulation.

I think this is the real motivation behind the industry stakeholders pushing this, and I'm not the only one to notice.

If OSI, the "authority" and "steward" of the Open Source Definition declares something to be "Open Source AI", surely it is, dear regulator? We don't need to make our sources open, it says so. See? Those more lenient obligations apply to us!

It doesn't even actually specify those "OSI-approved terms" all of the assets/components that aren't actual source code should be made available under.

But it gives the industry another fancy and easy-to-comply-with marketable label to slap on their products.

Everyone wins, except the public.

We'd not call something Open Source if it came with a description of the sources and where to go and (maybe) buy access.

I don't hate it, I get why OSI does it (it serves their stakeholders), I'm just hugely disappointed.

The Software Freedom Conservancy however does seem to have their act and vision together.

Open Source AI is open-washing by any way of looking at it

Posted Nov 5, 2024 12:18 UTC (Tue) by zack (subscriber, #7062) [Link] (3 responses)

> The Software Freedom Conservancy (SFC) however does seem to have their act and vision together.

Disclosure: I co-authored SFC's aspirational statement on LLM-assisted programming. As such, I am very near to SFC's position in this general space.

But note that about data training in OSAID, what SFC actually says is "I [bkuhn] truly don't know for sure (yet) if the only way to respect user rights in an LLM-backed generative AI system is to only use training sets that are publicly available and licensed under Free Software licenses. [...] My instincts, after 25 years as a software rights philosopher, lead me to believe that it will take at least a decade for our best minds to find a reasonable answer on where the bright line is of acceptable behavior with regard to these AI systems." And he is spot on.

The point that I'd like to highlight here is that once you start looking at the details (legal, strategic, philosophical, etc.), the data issue in AI/ML is quite complicated. Trying to simplify it down to require-data=good, do-not-require-data=bad is not going to serve us well in the long term.

Open Source AI is open-washing by any way of looking at it

Posted Nov 5, 2024 12:34 UTC (Tue) by lmb (subscriber, #39048) [Link] (2 responses)

I don't want to equate whether opening the data up is good or bad directly either. I, too, have been involved in this topic for three decades, I realize the truth is rarely pure and never simple.

There can be good reasons for data not to be made public or released openly. Medical, safety, personal privacy - and yet the AI/ML systems trained on them serve vital functions for society. Those may only be visible to officially appointed & chartered inspectors, for example.

Even the FSF acknowledges this - nonfree systems can still be just.

No, open/free/public data sets are not the only way to respect user rights.

Where the OSAID falls short in my opinion is insisting that those systems are "Open Source AI" - not everything can be open, and that's fine. (Or they could, indeed, say they're Open(tm) _only_ if they fall under such exempt regulations and are indeed independently reviewable.)

The *default* for an "Open Source AI" system should, in my book, indeed be open data. (And it'd be helpful if OSAID specified terms one could comply with.)

Again, my gripe is OSI going with such an all-encompassing term and indeed claiming their definition covers all components comprehensively - from a very prominent position with a lot of power and influence.

I think their OSAID 1.0 should have been more nuanced & differentiated and mostly stick to the parts we do understand reasonably well. And deliver something we can actually implement in practice. This, to me, reeks of preempting regulatory decisions, and/or marketing reasons.

I don't want in-fighting while the absolute exploitationists rejoice, either. But the OSI started the overreach with a reductionist definition, and also claims the high ground of authority - they get judged accordingly. They don't get free cheerleading.

The fact that so many assumed-to-be-well-meaning people see this as a potential erosion and open-washing clearly shows they've not produced something that is clear enough, if that truly isn't their intent.

(I also know you can't ever produce anything that is 100% proof against misinterpretation by malicious actors, but the folks whom I've seen voice criticism don't tend to fall into that camp.)

Open Source AI is open-washing by any way of looking at it

Posted Nov 5, 2024 13:38 UTC (Tue) by paulj (subscriber, #341) [Link]

I think that for any generally queryable LLM made available to the public, you must be willing to accept the user is able to access (at least) snippets of the original training data. That is, the sensitivity of access to the prompt will be equal to the sensitivity of access to the original data. You need to treat the two - prompt and training data - as equivalent in terms of access.

In practical terms, for such LLMs, I think it will be required to anonymise any sensitive data.

It could be there are other kinds of AI models that can not, of themselves, directly leak the input data. E.g., a model arranged and trained to classify, say, diseases. You chat to it with your symptoms, perhaps, and it spits out a disease, and only a disease. If the output layers can only a set of output symbols that is a distinct set from the training data and much more limited than the training data, you could argue it is "safe" to distribute that model.

However, still, the training data is encoded into the parameters. The model state is a compressed form of the input. For sensitive input data, you would /not/ want to bet that it is safe to distribute the parameters, just cause they end up selecting from a predetermined and limit output layer. You would /not/ want to bet that some clever AI-hacker will eventually figure out how to backward-engineer some (or more) of the training data from the parameters.

Open Source AI is open-washing by any way of looking at it

Posted Nov 5, 2024 13:51 UTC (Tue) by zack (subscriber, #7062) [Link]

I completely agree with you on nuance.

I've also participated, as a volunteer, in the OSAID process. And once it became clear that OSI was opposed to mandating training data access, I've "battled" (for lack of a better term) for either a two-term definition (e.g., "open weight" vs "open source") or an additional qualifier (e.g., level 1/2/3/4), depending on whether training data were open data/public data/obtainable data/private data. I regret having lost that battle too. (But I still think that OSAID is better than nothing, given the current state of the AI industry, and that it will play a positive role in upcoming regulations.)


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds