OSI board AMA at All Things Open
Members of the Open Source Initiative (OSI) board sat down for a 45-minute "Ask Me Anything" (AMA) session at All Things Open in Raleigh, NC on October 29. Though the floor was open to any topic the audience might want to ask of the OSI board, many of the questions were focused on the Open Source AI Definition (OSAID), which was announced the day before. The new definition has been somewhat controversial, and the board spent a lot of time addressing concerns about it during the session, as well as questions on open washing, and a need for more education about open source in general.
The session was held in one of the smaller rooms at the venue, with about 30 people in attendance (not counting OSI board members or staff). The session kicked off with some ground rules from the moderator, Mer Joyce, who had also worked as a facilitator for the OSAID drafting process. The first order of business was introducing the board members in attendance; OSI vice-secretary Anne-Marie Scott, vice chair Thierry Carrez, Gaël Blondelle, Pamela Chestek, Sayeed Choudhury, and Tracy Hinds.
Deborah Bryant, a former board member who currently works with the OSI as its US policy director, got the ball rolling with a question about the most interesting challenge the board expected to face in the next year.
Scott answered that the recent effort to create the OSAID had
broadened the OSI's community, which was good but also
"problematic
". She said that the organization had encouraged a
group of people to participate in the OSAID process who "may not
have always affiliated themselves with open source
" but were
"incredibly valuable to the work we have done
". Now, the OSI
needed to work on keeping them engaged and to make connections to the
existing community.
Choudhury agreed, adding that it had been an active year,
and that the OSI now needed to get back to core things the
organization had focused on in the past with the legal and developer
communities. He also drew attention to the policy work the OSI had
done, particularly with regard to the EU with the Cyber
Resilience Act (CRA). "We've been doing that work, but I think
we need to focus on it a lot more.
"
Bespoke licenses
The first audience question was about the trend of "bespoke" licenses,
such as the CockroachDB
License, that claimed to be open-source licenses but had
restrictions such as "a certain number of users, or under a certain
amount of revenue per year
". He wanted to know how to navigate
those "because right now we have to evaluate every single one of
them
".
Chestek was the first to respond to the question. Bespoke licenses, she said, defeat the purpose of the Open Source Definition (OSD). The idea behind the OSD was to make adopting open-source software frictionless because users know immediately that they have all the freedoms that they need:
These [bespoke] licenses are not designed to build community, they're designed to extract value out of software by free-riding on the concept of open source. But they're not open source.
What the OSI does, she said, is to occasionally talk about those
licenses and point out that there's a reason that open source works,
"and that is frictionless adoption
". Without that, there is no
chance of building community, so what is the point of the license
"other than to maybe look like you're a good citizen
", without
being willing to make the real commitment to an open-source
development model.
Carrez said that these new licenses focus on a single benefit of open source, which is the availability of code. As an organization, he said, OSI needed to do more to educate the public that there are additional benefits in terms of innovation and sustainability. He added that developers now take for granted a lot of the benefits of open source.
I can tell you that developing today is very different than developing then. I don't want us to go back to the dark ages of the '90s where you had to basically look at [the license of] every piece of code in order to use it.
How to move forward
The next question was from Nithya Ruff, head of the open-source
program office (OSPO) at AWS. She said that "not everyone agrees with the
new Open Source AI Definition
". There were some points of
disagreement, she said, but "a lot of places where we
agree
". She wanted to know how the community could work
together to move forward.
Scott said that the simple answer was to keep
talking. Some people felt the OSAID was too open, others felt it
was too closed, "and then we've got everything in the
middle
". That is driven, she said, by the kinds of organizations
that people work for, as well as the values that they hold. She went on
to say that there were more conversations to be had, some in the open
and some under the Chatham
House Rule, for the "next phase of engagement
". Ruff was
right, she said, "there's more agreement than there is difference,
but on the points of difference, they are strongly held
".
Carrez responded that the board didn't have a strong position one
way or the other when it started the OSAID process, and that he was
"very sympathetic
" to the dissenting voices that were heard
during the process. However, he said that he realized "the ultimate
goal is really to replicate the success we've seen in open-source
software to AI
" and not to simply translate the OSD to AI.
Some of the tension we've seen is in people that haven't made that mind shift to the wider picture and are just trying to apply what they are very familiar with, that they're experts at, and that made it more difficult.
"Nobody disagrees about the principles [behind the OSAID], where we see
disagreement is implementation
", Chestek said. The OSI had gone
further than others in trying to define a fully open implementation,
and tried to put a stake in the ground that defined what it considered
open right now. That implementation, she said, is
"the piece of it that we know is going to change
" but not the
principles. Maybe when the industry was more stable, "then I do
hope we'll really come to a unified place
". She added that it was
a "wild experience
" to do the OSAID work at the peak of a hype
cycle while an industry was being regulated.
Education
The next question came from Carson Shaar, who introduced himself as
the co-founder of a company in the open-source space, Zero-True, and a recent
college graduate. He said that he'd observed "quite a
lack of education
" around open source. Universities were doing a
lot of teaching around entrepreneurship and building products,
"but not a lot of work around contributing to open source and
working in open source
". He wanted to know what work the OSI was
doing to educate and involve students.
Choudhury, who is director of the
OSPO at Carnegie Mellon Libraries and director of an Alfred P. Sloan
Foundation grant for coordination of University OSPOs, began by saying
"I feel your pain
". At least in the US, he said, education
about open source is "deeply lacking
" and fails to help
students understand open source beyond the computer-science
perspective. The Sloan Foundation is
"providing support to the domestic movement
" around open
source to help university OSPOs give students, faculty, and technology
staff a better understanding of the broader open-source
ecosystem. "How to navigate in that from a technical perspective,
legal perspective, community perspective
" as part of the actual
educational experience and not just something "on the side
". It
is, however, early days. "I'm not going to pretend we solve[d] this
problem.
"
Left behind
I asked the next, somewhat long-winded, question. After introducing
myself, I noted that I had observed many comments and responses that
expressed a feeling that the OSI had chased a "shiny ball nobody
asked it to chase
", as well as disappointment with the OSAID
process and final definition. There seemed to be a loss of trust in
the OSI as a result, by the community that put the OSI where it is
today. What was the plan to deal with that?
Hinds said that the board recognized that community members were
upset, and felt they were not heard. However, "this [definition]
was something that was being, I would say, asked and even demanded. We
had people saying, 'we need this yesterday'
." There was, she said,
an underlying assumption that there could be a "translation from
OSD to open-source AI
" and that the OSI was being trusted to take
the process seriously and try to facilitate it.
Choudhury said that the OSI was spurred on by pending
regulation. He quoted Mike Milinkovich, executive director of the
Eclipse Foundation, as saying that "we just have to get used to the
fact that software is about to be regulated
" in the context of the
CRA. There were clear signals that regulation was about to start,
including the use of the term "open-source AI" without defining
it. What other group, he asked, is really better positioned to define
it? But that meant reaching out to new sectors involved in regulating
AI, "which was always going to be messy
".
Carrez replied that the OSI "may not have done a great job
promoting
" the work that it has done to "have the back of
developers
" in the face of regulation such as the CRA. The
regulatory landscape, he said, would be very different without the
work done by the OSI and others, that would have put open source at
risk.
There is also the fact that the OSI had to work with a lot of stakeholders, and that people from the OSI's traditional constituency were some of the voices heard from during the process, just not the only voices. That, he said, caused some frustration.
Blondelle argued that the OSI was not chasing a shiny ball, but
trying to protect the original definition. The OSI had seen vendors
using the term open source for things that were clearly not open
source, "so I think we had to define open source AI because
otherwise we would have lost some ground
" on the OSD.
Hinds replied again that she wanted to make it clear that the OSI
board had "felt that letdown
". It would be spending energy on
"reinforcing the value we provide to legal and developer
communities, because we feel the pain of them feeling let down and
need to do that repair
" while making sure to include newer
communities to figure out how they can all work together when
needed.
"Stable, but not permanent"
The next question came from an attendee who said he was acting deputy
chief AI officer for the Cybersecurity and Infrastructure Security
Agency (CISA). He said that CISA leads the effort to manage risk in
cyber and physical infrastructure, and said that the OSD serves as
"a sort of risk tolerance and risk statement
" about the
acquisition and security risks of software. He said that helps him
to advocate for open-source software in government as the best
solution that will give the best outcomes to mitigate risk. However,
risk was not one of the things that was mentioned about the OSAID, he
said, and he wanted to know how CISA could help "sort of drive
towards definitions that are equivalent risk-management postures
"
that would help him with security decisions and recommendations for
AI systems.
Carrez said that was an interesting question, and that the OSI was
staying ready to evolve the definition. One of the questions, he said,
is what is the best way to patch and run AI systems? Comparing pure
software to AI systems is difficult, for example the economic cost
of generating the AI system's models. "The reality is that if only
a handful of companies and a handful of governments have the
resources
" to rebuild models, it is not a practical goal for
open-source AI. There is more and more evolution, he said, on the ways
that models are fine-tuned or patched in a less costly way. It was
important, though, to "put a stake in the ground
" with the
OSAID to have something to work from to have the discussions.
Choudhury also noted that the OSAID was the start of a journey and
"a stable definition, but it's not permanent
".
Open washing
The final question was about combating open washing, and how the OSI, government, and developer communities should be trying to prevent bad actors or others from misrepresenting software or AI systems as open if they are not.
Chestek said that this was not a new thing for open-source software
and had probably been going on as long as it had existed. The OSI
relies a lot on the community to do communal shaming, which is
"probably the most powerful
" way to combat open washing. When a company
misrepresents its software, the OSI usually finds out about the
situation from the community. Then the OSI would say something
publicly, if it was appropriate to do so. That, she said, would
probably carry over to the OSAID as well and she hoped "we can all
converge at least on the principles of it
". For example, if a
system has a commercial limitation on it, "that's just not open
source, and we don't even need to get into the weeds about whether or
not you provided all the information about the data
".
"Can I fork it?
" asked Hinds. From a practical standpoint,
she said, the right to fork translates to AI, and that is what the OSI
is going for. "Can I look at this model? Can I work with this? Can
I do something with it? I think that's really easy to resonate with
our existing communities
" as well as new ones coming into the
open-source space.
The board also had an opportunity to give parting thoughts to the audience, which Carrez used to thank attendees and encouraged people to join the OSI and run for the board if they were interested in helping to make open source better.
[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event. ]
| Index entries for this article | |
|---|---|
| Conference | All Things Open/2024 |
Posted Nov 1, 2024 22:15 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (1 responses)
This is what openAI et al. want you to believe, but this is not true. There exist small models that can be rebuild
Obviously big players need the process to be kept mysterious so that they can make whatever claim about it to judges, politicians and the general public. But the OSI must not become complicit in this.
Posted Nov 2, 2024 0:57 UTC (Sat)
by josh (subscriber, #17465)
[Link]
Things get better over time. But not if you give up and decide to weaken your values to match what you think today's limitations might be.
Posted Nov 2, 2024 5:17 UTC (Sat)
by mirabilos (subscriber, #84359)
[Link] (20 responses)
> incredibly valuable to the work we have done
I bet the work was even more valuable to those…
> who ""may not have always affiliated themselves with open source""
And! Yes! Of course! Blame it on us poor people who don’t just…
> replicate the success we've seen in open-source software to AI"" and …
…insist on…
> simply translat[ing] the OSD to AI
… because we don’t have the “mind shift” necessary. Sure. That will be it.
I’ve seen enough. I barely skimmed the rest; OSI has successfully made itself obsolete.
Posted Nov 2, 2024 8:28 UTC (Sat)
by josh (subscriber, #17465)
[Link] (17 responses)
They did that a while ago, when they approved the "CAL", a proprietary license with usage resrictions that nonetheless can now masquerade as Open Source because it has OSI approval.
Posted Nov 4, 2024 4:54 UTC (Mon)
by Paf (subscriber, #91811)
[Link]
Posted Nov 4, 2024 6:09 UTC (Mon)
by mirabilos (subscriber, #84359)
[Link]
When they approved the “Unlicense”[sic!], an extremely badly worded combination of a (possibly not legal) PD waiver and a (definitely botched and not even remotely near working) attempt at a fallback licence that fails to actually licence anything of relevance, I was fed up enough, tbh. But this shows they lost the whole reason they exist in the first place.
Posted Nov 4, 2024 10:27 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (14 responses)
People don't even bother trying to convince other people of their position anymore. Just a lot of "I think X" as if that is somehow enough to change my mind.
I asked ChatGPT for a summary why it's controversial and I can see it's unusual for a software license, I don't see anything that could label it "proprietary". Unless you take the position that any non open-source license is proprietary and there is no grey area.
Posted Nov 4, 2024 11:08 UTC (Mon)
by intelfx (subscriber, #130118)
[Link] (2 responses)
Uhm, yes?
Posted Nov 4, 2024 15:33 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (1 responses)
> gives You unlimited permission to use and modify the software to which it applies (the “Work”), either as-is or in modified form, for Your private purposes, while protecting the owners and contributors to the software from liability.
and say, a Microsoft Windows license which doesn't even give you the source?
Posted Nov 13, 2024 19:27 UTC (Wed)
by ssmith32 (subscriber, #72404)
[Link]
Stating you should not call CAL open source, does not preclude differentiating between CAL, and other licenses. It just precludes calling it open source.
Posted Nov 4, 2024 12:48 UTC (Mon)
by jkingweb (subscriber, #113039)
[Link] (2 responses)
Posted Nov 5, 2024 0:12 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
> 3.1. Permissions Granted
Note also that "Recipient" is defined in a way that is similar to the requirements of the AGPL (i.e. it includes people who interact with the software over a network).
In English:
* The CAL license is both a copyright license and a patent license. So we need to analyze it like a patent license, and not just like a copyright license.
Bruce's argument, as far as I can follow it, appears to be that the people who made CAL intend to use software patents to enforce the data availability provision as applied to their particular use case. In other words, they are not merely requiring that software based on the CAL-licensed code comply with this data availability rule, but are instead attempting to impose this requirement on all software that interacts with their (decentralized?) system, regardless of where the code came from. Perens also argues that this could be much more straightforwardly accomplished by simply requiring participants in this system to sign a contract relating to user data.
I'm not thrilled with the use of software patents for this use case. But I'm also not entirely convinced that this is a problem specific to CAL. This looks a lot more like a "software patents are evil" problem than a "CAL is not OSD-compliant" problem, at least from where I sit. Other participants in the thread pointed out that most other FOSS licenses (which mention patents at all) have similar "we are only licensing the patents that would otherwise be infringed by verbatim distribution" clauses, so it is rather difficult to argue that CAL violates the OSD on that basis, without then concluding that many long-accepted licenses also violate the OSD.
[1]: https://lists.opensource.org/pipermail/license-review_lis...
Posted Nov 5, 2024 14:04 UTC (Tue)
by jkingweb (subscriber, #113039)
[Link]
Do I like the terms of the Cryptographic Autonomy License? No. I don't even like its name. Fortunately I don't have to use it, nor use any software which employs it. But it does seem to be consistent with the Open Source Definition at least as much as the AGPL, so I don't really see any inconsistency on the part of OSI.
Posted Nov 4, 2024 17:14 UTC (Mon)
by excors (subscriber, #95769)
[Link] (7 responses)
> The intent of all this language is relatively straightforward: if you are a user of an application (perhaps hosted on the net somewhere), you have the right to extract your data from that application to use with your own modified version of the code. Control of data is not to be used to lock users into a de-facto proprietary system.
From that article plus the Register one and some mailing list posts, it sounds like Bruce Perens didn't mind that specific licence's requirements, but he did mind the general idea of Open Source licences imposing requirements on data, largely because it makes the licences much trickier to comply with ("It's a good goal but it means you now need to have a lawyer to understand the license and to respond to your users"), and Open Source ought to be made simpler instead. He thought data restrictions should be completely out of scope for Open Source, and the OSI should reject CAL on that basis.
The OSI didn't have an existing policy on that, so it sounds like his arguments about field-of-use restrictions were stretching to find a technical justification within the OSD to reject it. And then he got increasingly frustrated when people didn't agree with him about the proposed policy or about that interpretation of the OSD, until he ended up quitting the OSI and accusing the board of conspiracy.
Incidentally, since at least 2020 he's been talking about what is now Post-Open (https://postopen.org/), apparently with the intent of supplanting Open Source by having a large number of software projects under a single new zero-cost licence for individuals and small businesses, and a second licence for larger businesses which requires royalties of 1% of the business's entire revenue paid to the Post-Open organisation. That organisation will subtract operating costs then divide the rest amongst individual developers (or their employers) in proportion to how widely each project is used and the number of lines of code each developer has written in the projects' Git repositories.
The goal is to make it simple for companies to comply - they don't have to spend any effort working out exactly which projects they use and how to follow all their different licences or pay for multiple support contracts etc, they just make a single payment to one organisation and that covers all their software - while also making them fund maintenance of the projects they rely on. Which doesn't sound like a bad goal in general, but his specific approach seems, uh, questionable. And I guess it's unsurprising he fell out with the OSI when his ideas are so radically different to Open Source.
Posted Nov 5, 2024 12:58 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (3 responses)
So a large business that has nothing to do with the software industry will end up paying far more than a "small" software house, in return for much less benefit ... and depending on the definition of "revenue" a licence may well be out of reach for companies in low-margin businesses ...
That doesn't sound a sensible business model at all. "Idealism, meet reality! Score, reality 1 idealism nil".
Cheers,
Posted Nov 7, 2024 14:40 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (2 responses)
Posted Nov 7, 2024 14:59 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
Cheers,
Posted Nov 7, 2024 17:32 UTC (Thu)
by excors (subscriber, #95769)
[Link]
> We also have some non-goals:
...but I think those numbers are wrong - IBM's annual revenue is more like $60B, so the fee would be $600M per year. ($6B is IBM's recent annual net income, or their quarterly revenue from "software" alone (excluding "consulting", "infrastructure", etc))
And IBM isn't even a very big company by revenue (219th according to https://fortune.com/ranking/global500/ - the top companies are 10x higher), and it gets a higher proportion of its value from open source software than most companies, and it has reasonably decent net profit margins (~10%, compared to e.g Walmart's 2% which is typical for the grocery industry, meaning this fee would be half of Walmart's entire profits), so IBM is one of the better cases for this licence.
I think the non-goal is effectively excluding all companies in low-profit-margin industries, and most large-ish companies in high-profit-margin industries, and any small/medium company which hopes to either grow into a large-ish company or be acquired by one. So I find it hard to imagine _any_ company would ever agree to this. And without buy-in from companies totalling probably billions of dollars in revenue, there won't be enough money to fund the project.
Posted Nov 7, 2024 14:34 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (2 responses)
This feels like an odd take. Are you familiar with the history of the Open Source Initiative and its Open Source Definition?
Posted Nov 7, 2024 16:16 UTC (Thu)
by excors (subscriber, #95769)
[Link] (1 responses)
I have the impression that the wider Open Source community wouldn't fundamentally disagree with those problems he highlights, but the little discussion I've been able to find about his proposed solution has a lot of criticism and almost no support. Most people seem happy to bumble along with Open Source as it is now despite its problems, with perhaps a few incremental changes, while he's trying to shake things up, so it's unsurprising when that causes friction.
Posted Nov 7, 2024 17:27 UTC (Thu)
by pizza (subscriber, #46)
[Link]
That's only a problem for copyleft "Free Software". It's nearly impossible to violate (therefore there is little need to "enforce") the "permissive" licenses that the "Open Source" movement embraced.
(Congratulations; once again, Stallman's predictions have been shown to be accurate....)
Posted Nov 2, 2024 21:44 UTC (Sat)
by Shamar (guest, #122602)
[Link] (1 responses)
Not much. Several people were silenced, harassed or censored during the "co-design" process.
Julia Ferraioli paid a mental tool for trying to defend open source freedom from OSI's open washing goals:
But several other people felt the same and didn't dare to speak about it in public.
As for me, I was silenced for several weeks, and my posts were deleted because they debunked the OSI narrative:
And something they don't even mention was the role Meta employees had to exclude training data from the requirements: here's where they admit the trick adopted (discovered by a user that was later silenced too) https://discuss.opensource.org/t/we-heard-you-lets-focus-...
For sure, the OSI has made itself obsolete, but too few people are aware of the alternatives such as https://opensourcedefinition.org/
Posted Nov 3, 2024 14:36 UTC (Sun)
by kleptog (subscriber, #1183)
[Link]
The relevant definition of open-source for AI for the purposes of the Act is described within the Act itself (recitals 102-104). No, they don't require the providing of the training data. But more importantly, an exceptions for open-source AI are not available for any product/service placed on the EU market (Article 2(12)). So essentially unavailable for any commercial party like OpenAI or Meta. The idea that commercial parties could hijack the OSI process to secure themselves exemptions to the EU AI Act is just so far off the mark it's silly.
The exemptions also don't cover providing a summary of the training data and showing you complied with copyright restrictions. Which are probably the ones commercial companies are most interested in.
Posted Nov 3, 2024 5:07 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
The FSF is free to come up with its own definition and push it. Nobody stops them.
And the OSI made it clear that the current definition is not the only possibility. Just like we have various copyright licenses with various levels of restrictions (from the super-restrictive AGPLv3 to the WTFPL).
Posted Nov 3, 2024 7:22 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Posted Nov 3, 2024 13:49 UTC (Sun)
by zack (subscriber, #7062)
[Link] (1 responses)
Posted Nov 4, 2024 19:02 UTC (Mon)
by lmb (subscriber, #39048)
[Link]
And yes, OSAID *is* better than nothing.
I disagree with the assessment that this means we should not voice criticism to the term, nor that doing so is harmful.
OSI chose to use a very comprehensive single term with zero differentiation and significantly lower standards. They *could* have done it differently. Same with publishing a definition as "1.0" that actively asks for industry endorsement. But tell me, how would you actually comply with it? Under what terms would you make all that extra info available?
They position themselves as an authority and *the* steward. Their results get evaluated according to those claims, and their actions questioned for potential ulterior motives.
Calling that "infighting" ain't great, when folks see serious possible consequences of what they're pushing out. (e.g., the impact on political regulations.)
Posted Nov 3, 2024 13:01 UTC (Sun)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted Nov 3, 2024 16:44 UTC (Sun)
by randomguy3 (subscriber, #71063)
[Link] (2 responses)
Posted Nov 4, 2024 21:44 UTC (Mon)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Nov 5, 2024 3:27 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
Which they are. You do not need to impugn people's motivations to argue against their interpretation of "open source." That is a pure ad hominem attack which has no place in polite discourse. Dismissing it as "hate" is an entirely reasonable and proportionate response.
Posted Nov 3, 2024 16:47 UTC (Sun)
by IanKelling (subscriber, #89418)
[Link] (2 responses)
That is a friendly sounding statement which I find rather nasty: it groups all the critics together and suggests they have a fundamental deficiency that is enough to not take them very seriously. But on the other hand, any criticism without that deficiency, well that is a disagreement on principles and you should not expect those to be resolved. The disagreements are important and this not a good response.
> She wanted to know how the community could work together to move forward.
> Scott said that the simple answer was to keep talking
You had a q/a, and chose to impugn your critics in various ways and did zero addressing of the substance of their criticism. That kind of "moving forward" is the kind where you can call it moving forward without moving anywhere.
Posted Nov 3, 2024 20:01 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
Frankly, I am not able to read that statement in the way you seem to be reading it. I do not understand "disagreement [about] implementation" to imply "deficient disagreement," nor to imply that such disagreements should not be taken seriously.
What I do understand is the background context: A large group of commercial entities (commercial-ish, in the case of OpenAI) have gone around claiming that their AIs are "open" in one way or another, despite flagrant violations of OSD#6 and various other parts of the OSD (to the point that OpenAI's product is just straight proprietary software, with no attempt to justify the use of the word "open" in their name whatsoever). What the statement you quote appears to be saying is that everyone agrees that those products should not be described as "open-source AI" - and under OSAID 1.0, they are all excluded, which is an improvement over the situation where these companies were going unchallenged in their use of this term.
Posted Nov 3, 2024 20:20 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
This alone will make it easier to filter out models that are not at all open (e.g. "OpenAI" or Facebook's llama).
The next step is to have the "Open Training Set" definition.
Posted Nov 4, 2024 18:53 UTC (Mon)
by lmb (subscriber, #39048)
[Link] (4 responses)
I think this is the real motivation behind the industry stakeholders pushing this, and I'm not the only one to notice.
If OSI, the "authority" and "steward" of the Open Source Definition declares something to be "Open Source AI", surely it is, dear regulator? We don't need to make our sources open, it says so. See? Those more lenient obligations apply to us!
It doesn't even actually specify those "OSI-approved terms" all of the assets/components that aren't actual source code should be made available under.
But it gives the industry another fancy and easy-to-comply-with marketable label to slap on their products.
Everyone wins, except the public.
We'd not call something Open Source if it came with a description of the sources and where to go and (maybe) buy access.
I don't hate it, I get why OSI does it (it serves their stakeholders), I'm just hugely disappointed.
The Software Freedom Conservancy however does seem to have their act and vision together.
Posted Nov 5, 2024 12:18 UTC (Tue)
by zack (subscriber, #7062)
[Link] (3 responses)
Disclosure: I co-authored SFC's aspirational statement on LLM-assisted programming. As such, I am very near to SFC's position in this general space.
But note that about data training in OSAID, what SFC actually says is "I [bkuhn] truly don't know for sure (yet) if the only way to respect user rights in an LLM-backed generative AI system is to only use training sets that are publicly available and licensed under Free Software licenses. [...] My instincts, after 25 years as a software rights philosopher, lead me to believe that it will take at least a decade for our best minds to find a reasonable answer on where the bright line is of acceptable behavior with regard to these AI systems." And he is spot on.
The point that I'd like to highlight here is that once you start looking at the details (legal, strategic, philosophical, etc.), the data issue in AI/ML is quite complicated. Trying to simplify it down to require-data=good, do-not-require-data=bad is not going to serve us well in the long term.
Posted Nov 5, 2024 12:34 UTC (Tue)
by lmb (subscriber, #39048)
[Link] (2 responses)
There can be good reasons for data not to be made public or released openly. Medical, safety, personal privacy - and yet the AI/ML systems trained on them serve vital functions for society. Those may only be visible to officially appointed & chartered inspectors, for example.
Even the FSF acknowledges this - nonfree systems can still be just.
No, open/free/public data sets are not the only way to respect user rights.
Where the OSAID falls short in my opinion is insisting that those systems are "Open Source AI" - not everything can be open, and that's fine. (Or they could, indeed, say they're Open(tm) _only_ if they fall under such exempt regulations and are indeed independently reviewable.)
The *default* for an "Open Source AI" system should, in my book, indeed be open data. (And it'd be helpful if OSAID specified terms one could comply with.)
Again, my gripe is OSI going with such an all-encompassing term and indeed claiming their definition covers all components comprehensively - from a very prominent position with a lot of power and influence.
I think their OSAID 1.0 should have been more nuanced & differentiated and mostly stick to the parts we do understand reasonably well. And deliver something we can actually implement in practice. This, to me, reeks of preempting regulatory decisions, and/or marketing reasons.
I don't want in-fighting while the absolute exploitationists rejoice, either. But the OSI started the overreach with a reductionist definition, and also claims the high ground of authority - they get judged accordingly. They don't get free cheerleading.
The fact that so many assumed-to-be-well-meaning people see this as a potential erosion and open-washing clearly shows they've not produced something that is clear enough, if that truly isn't their intent.
(I also know you can't ever produce anything that is 100% proof against misinterpretation by malicious actors, but the folks whom I've seen voice criticism don't tend to fall into that camp.)
Posted Nov 5, 2024 13:38 UTC (Tue)
by paulj (subscriber, #341)
[Link]
In practical terms, for such LLMs, I think it will be required to anonymise any sensitive data.
It could be there are other kinds of AI models that can not, of themselves, directly leak the input data. E.g., a model arranged and trained to classify, say, diseases. You chat to it with your symptoms, perhaps, and it spits out a disease, and only a disease. If the output layers can only a set of output symbols that is a distinct set from the training data and much more limited than the training data, you could argue it is "safe" to distribute that model.
However, still, the training data is encoded into the parameters. The model state is a compressed form of the input. For sensitive input data, you would /not/ want to bet that it is safe to distribute the parameters, just cause they end up selecting from a predetermined and limit output layer. You would /not/ want to bet that some clever AI-hacker will eventually figure out how to backward-engineer some (or more) of the training data from the parameters.
Posted Nov 5, 2024 13:51 UTC (Tue)
by zack (subscriber, #7062)
[Link]
I've also participated, as a volunteer, in the OSAID process. And once it became clear that OSI was opposed to mandating training data access, I've "battled" (for lack of a better term) for either a two-term definition (e.g., "open weight" vs "open source") or an additional qualifier (e.g., level 1/2/3/4), depending on whether training data were open data/public data/obtainable data/private data. I regret having lost that battle too. (But I still think that OSAID is better than nothing, given the current state of the AI industry, and that it will play a positive role in upcoming regulations.)
Missing the target
on commodity hardware. There are the ones which are important for the true purpose of open source AI : to allow to
understand and demystify how models are built.
Missing the target
Do they even see themselves how utterly ridiculous they are?
actively excluding voices all over, from FOSS experts to former board members, it didn’t matter,
dissenting voices are to not be taken for full by OSI.
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
So I thought I'd google to figure out what you mean, and end up with an article like this one which is totally useless. It does link to the license, but is basically an entire article full of quotes from people I don't know who disagree with each other, without even bothering to quote or reference the actual parts of the license that are a problem, so how can I make up my mind? On the other hand the OSI has an article which explains it well enough. It seems to be targeted at a very specific use-case.
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
>
> Conditioned on compliance with section 4, and subject to the limitations of section 3.2, Licensor grants You the world-wide, royalty-free, non-exclusive permission to:
>
> a) Take any action with the Work that would infringe the non-patent intellectual property laws of any jurisdiction to which You are subject; and
>
> b) Take any action with the Work that would infringe any patent claims that Licensor can license or becomes able to license, to the extent that those claims are embodied in the Work as distributed by Licensor.
>
> 3.2. Limitations on Permissions Granted
> The following limitations apply to the permissions granted in section 3.1:
>
> a) Licensor does not grant any patent license for claims that are only infringed due to modification of the Work as provided by Licensor, or the combination of the Work as provided by Licensor, directly or indirectly, with any other component, including other software or hardware.
>
> b) Licensor does not grant any license to the trademarks, service marks, or logos of Licensor, except to the extent necessary to comply with the attribution conditions in section 4.1 of this License.
>
> [...]
>
> 4.2. Maintain User Autonomy
> In addition to providing each Recipient the opportunity to have Access to the Source Code, You cannot use the permissions given under this License to interfere with a Recipient’s ability to fully use an independent copy of the Work generated from the Source Code You provide with the Recipient’s own User Data.
>
> “User Data” means any data that is an input to or an output from the Work, where the presence of the data is necessary for substantially identical use of the Work in an equivalent context chosen by the Recipient, and where the Recipient has an existing ownership interest, an existing right to possess, or where the data has been generated by, for, or has been assigned to the Recipient.
>
> 4.2.1. No Withholding User Data
> Throughout any period in which You exercise any of the permissions granted to You under this License, You must also provide to any Recipient to whom you provide services via the Work, a no-charge copy, provided in a commonly used electronic form, of the Recipient’s User Data in your possession, to the extent that such User Data is available to You for use in conjunction with the Work.
* If you allow an end user to interact with CAL-licensed software in some way that generates data, you must allow the end user to obtain a copy of their data, in a format that can be directly used by the software, and you must not modify the software in such a way that the user can't actually run it on their own private copy of their data.
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Wol
I'm not sure I'd give Bruce Perens' idealism a score of 'nil'. Pretty sure he's put a few points on the board over the Do they even see themselves how utterly ridiculous they are?
yearsdecades.
Do they even see themselves how utterly ridiculous they are?
Wol
Do they even see themselves how utterly ridiculous they are?
> * Don’t worry about when or if today’s giant companies will join Post Open. The answer to "When will IBM come on board" may well be "never", because for such a large company, participation would mean a new USD$60 Million dollar per year fee (1% of USD$6 Billion). Instead, attract small and new companies with a free license, and grow them into paid license customers.
(https://postopen.org/how-post-open-works/)
Do they even see themselves how utterly ridiculous they are?
And I guess it's unsurprising [Bruce Perens] fell out with the OSI when his ideas are so radically different to Open Source.
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
Do they even see themselves how utterly ridiculous they are?
https://archive.is/paD1W
see for example my reply to Carlo Piana (OSI board member) https://archive.is/JPSRX#post_3 that was removed from the forum: https://discuss.opensource.org/t/fsf-announced-basics-of-...
Do they even see themselves how utterly ridiculous they are?
Don't get all the hate
Don't get all the hate
Don't get all the hate
Don't get all the hate
Don't get all the hate
Disagreeing is not hate. Claiming otherwise is just a way to stifle debate.
Don't get all the hate
Don't get all the hate
Don't get all the hate
This is an unhelpful response from OSI
This is an unhelpful response from OSI
>
> That is a friendly sounding statement which I find rather nasty: it groups all the critics together and suggests they have a fundamental deficiency that is enough to not take them very seriously.
This is an unhelpful response from OSI
Open Source AI is open-washing by any way of looking at it
Open Source AI is open-washing by any way of looking at it
Open Source AI is open-washing by any way of looking at it
Open Source AI is open-washing by any way of looking at it
Open Source AI is open-washing by any way of looking at it
