GitHub is my copilot
Copilot is a machine-learning system that generates code. Given the beginning of a function or data-structure definition, it attempts to fill in the rest; it can also work from a comment describing the desired functionality. If one believes the testimonials on the Copilot site, it can do a miraculous job of figuring out the developer's intent and providing the needed code. It promises to take some of the grunge work out of development and increase developer productivity. Of course, it can happily generate security vulnerabilities; it also uploads the code you're working on and remembers if you took its suggestions, but that's the world we've built for ourselves.
Machine-learning systems, of course, must be trained on large amounts of data. Happily for GitHub, it just happens to be sitting on a massive pile of code, most of which is under free-software licenses. So the company duly used the code in the publicly available repositories it hosts to train this model; evidently private repositories were not used for this purpose. For now, the result is available as a restricted beta offering; the company plans to turn it into a commercial product going forward.
Copy-and-paste
Looked at one way, GitHub Copilot is the embodiment of a number of aspects of software development that are, perhaps, not fully covered in school:
- Much of a software developer's time is spent cranking out boilerplate code that looks much like a lot of other boilerplate code in circulation. Unsurprisingly, developers do not find being freed of this work to be a distasteful prospect.
- An awful lot of software development is actually done by copying and pasting code. It's tempting to say that this is especially true of contemporary developers, but development worked this way even before the days of Stack Overflow.
- While we like to think that the code we write is original, we are all strongly influenced by code we have seen in the past. Developers who have read a lot of code tend to have many useful patterns at their fingertips.
If much of our work really comes down to copying and pasting at varying degrees of remove, perhaps it makes sense to get the computer to do that work for us when it can.
The use of free software to train Copilot has raised some interesting questions, though. If a machine-learning model has been trained on a particular body of code, is that model a derived work of that code? If so, since GPL-licensed code was used to train the model, the result would also come under the terms of the GPL. If that were true, it would not change much, since GitHub does not appear to have any interest in distributing its model.
But what about the code that Copilot spits out? Is that code, too, a derived work of the code used to train the model? The fact that Copilot occasionally regurgitates verbatim copies of the training code (0.1% of the time, according to the Copilot FAQ) tends to support those who believe that Copilot's output should be seen as a derived work. If this is true, then any code body using Copilot output is in the same situation, which would be a bit of a mess, since it will be derived from multiple bodies of code with conflicting licenses and an endless list of attribution requirements. The derived-work interpretation would make any code developed with Copilot's help entirely undistributable.
The best outcome is unclear
Your editor is not a lawyer and certainly does not wish to play one on the
net. That said, there are arguments to be made to the effect that Copilot's
output should not be seen as a derived work of the code used for
training. Certainly GitHub sees it that way; the Copilot FAQ states:
"Training machine learning models on publicly available data is
considered fair use across the machine learning community
". How
closely that consideration matches actual copyright law is not entirely
clear, but it is an ongoing precedent and practice.
More intuitively, one can easily compare Copilot with a seasoned software developer who has seen a lot of code over a long career. The code that developer writes today will surely be influenced by what they have seen in the past, but today's code is not generally seen as being a derived work of yesterday's reading. One could argue that Copilot is doing the same thing; the only difference is that, since it's a computer, it can read vast amounts of code — even PHP code — without going insane.
Former European parliamentarian Julia Reda makes
the argument that the code snippets produced by Copilot are not large
or complex enough to be considered original, copyrightable works. One
might well wonder how much better Copilot has to get before that line will
be crossed, but she also claims that "the output of a machine simply
does not qualify for copyright protection – it is in the public
domain
". This argument, if taken to its extreme, suggests that
copyrighted work could be put into the public domain by running it through
a photocopier. These arguments may hold for now, but it's not clear that
they are tenable in the long term.
More interestingly, Reda, along with Matthew Garrett, argues that a derived-work interpretation is not in the interests of the free-software community in any case. Copyleft, they say, is a response to overly strong copyright protection for code, not a reason to make it stronger. As Garrett put it:
The powers that the GPL uses to enforce sharing of code are used by the authors of proprietary software to reduce that sharing. They attempt to forbid us from examining their code to determine how it works - they argue that anyone who does so is tainted, unable to contribute similar code to free software projects in case they produce a derived work of the original. Broadly speaking, the further the definition of a derived work reaches, the greater the power of proprietary software authors.
On the other hand, he continues, systems like Copilot offer the prospect of training models with proprietary code and using the result without worries of being tainted. That, he says, is likely to be a positive outcome for the free-software community.
It seems reasonable to assume that Copilot is not the only
machine-learning-based code-synthesis system out there; it is also
plausible that these systems will become more capable over time. The
copyright issues raised by Copilot seem to be concentrated on free software
for now, but they may well expand beyond that realm in the future. What
happens now, though, will set precedents for the that future; if the
free-software community somehow shuts down Copilot over copyright issues, other
interests will have a stronger argument for strengthened copyright laws
applied to future systems. That power could be used to extend the reach of
proprietary software or shut down machine-learning systems that are
beneficial to the community.
We should, thus, be careful about what we wish
for, lest we actually get it.
Posted Jul 15, 2021 14:28 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (14 responses)
Given the legal uncertainty around Copilot, as well as my gut feeling that it's simply a really terrible idea, I've removed all my GitHub repos and moved everthing to a self-hosted Gitea instance. I've put stub repos up on GitHub that point to the real repos because (unfortunately) GitHub has become pretty important for discovering software projects. I'll miss certain GitHub features, especially the CI/CD framework, but I've cobbled together a workable replacement for that with self-hosted Buildbot.
Posted Jul 15, 2021 14:40 UTC (Thu)
by musicmatze (guest, #133336)
[Link] (13 responses)
Posted Jul 15, 2021 14:42 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link] (12 responses)
Posted Jul 15, 2021 15:14 UTC (Thu)
by bluca (subscriber, #118303)
[Link]
In the EU text and data mining is legal for public data, this is pretty much unambiguous (copyright status of the model/output of the model is not 100% clear on the other hand yet, but Julia Reda makes excellent arguments as always) - whether it is hosted on Github, Gitlab, your own server in your basement that definitely won't get hacked and used to mount supply-chain attacks on your users, a mailing list, or any other forge.
Posted Jul 15, 2021 16:38 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (10 responses)
I somehow doubt GitHub would scrape repos they don't host. And if they do, I can block them.
Posted Jul 15, 2021 18:04 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (9 responses)
Not if it's open source (or free software) you can't. Anyone can redistribute any FOSS code to anyone else, as long as they preserve license terms. If (as GitHub argues) feeding code into an ML model does not infringe copyright, then there is no legal mechanism to prevent GitHub from acquiring whatever FOSS code they want, by whatever means they want, and feeding it into the model. The only way around this is to move to proprietary licensing with a "thou shalt not redistribute to GitHub" term. But I doubt you actually want to do that.
Posted Jul 15, 2021 20:06 UTC (Thu)
by ejr (subscriber, #51652)
[Link] (8 responses)
Posted Jul 15, 2021 20:36 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link] (5 responses)
The point is that they don't have to source it from your server directly. It is open source, nothing is stopping them from using a proxy to clone it.
Posted Jul 15, 2021 20:39 UTC (Thu)
by ejr (subscriber, #51652)
[Link]
That "copilot" is bringing twenty-year-old security flaws back is another aspect. ML is only as good as its training data.
Posted Jul 15, 2021 20:55 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Is your code in Debian, or any other distro? The Internet Archive? What about the randoms on /r/datahoarders, do you think any of them made a copy?
There's no such thing as "It's public except for person X." If you put it on the internet, then anyone who wants to see it can probably get a copy by some means or another. Ordinarily, you would use copyright law to combat such unauthorized copying, but FOSS licenses are explicitly designed to allow and even encourage it.
Posted Jul 16, 2021 1:09 UTC (Fri)
by dskoll (subscriber, #1630)
[Link] (1 responses)
That's true; I can't prevent GitHub from slurping my code into its ML system. But I can express my disagreement with it, and I can stop using GitHub to ever so slightly reduce the network effect that makes it attractive in the first place.
Posted Jul 19, 2021 21:53 UTC (Mon)
by jbicha (subscriber, #75043)
[Link]
Posted Jul 16, 2021 15:19 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Yes there is: the version 3 of the GPL.
</troll>
Posted Jul 17, 2021 17:46 UTC (Sat)
by ju3Ceemi (subscriber, #102464)
[Link] (1 responses)
Depending on the licence etc, that may be illegal
Posted Jul 22, 2021 11:29 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link]
Posted Jul 15, 2021 15:45 UTC (Thu)
by MattBBaker (subscriber, #28651)
[Link] (2 responses)
Oh great, HAL9000 read reams of PHP code and that is what turned him insane.
Posted Jul 16, 2021 15:12 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
They had to filter C macros though. There are limits.
Posted Jul 17, 2021 20:12 UTC (Sat)
by klbrun (subscriber, #45083)
[Link]
Posted Jul 15, 2021 15:48 UTC (Thu)
by zblaxell (subscriber, #26385)
[Link]
Do we need ERB approval to experiment on robots? Asking for a friend. ;)
Posted Jul 15, 2021 15:51 UTC (Thu)
by IanKelling (subscriber, #89418)
[Link] (3 responses)
Posted Jul 16, 2021 5:27 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Jul 26, 2021 13:23 UTC (Mon)
by immibis (guest, #105511)
[Link] (1 responses)
Posted Jul 27, 2021 1:41 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
Posted Jul 15, 2021 16:25 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link] (16 responses)
Microsoft owns Github. Will they be willing to retrain Copilot, using large portions of the proprietary Microsoft code base as training data? That should help developers produce better Windows-compatible software, right, and if it's fair use to use any GPL chunks of software that might be emitted verbatim, shouldn't it be the same for their own code? And will they then specifically disclaim any proprietary interest in the output? They could lead by example if that's what they want to do.
Posted Jul 16, 2021 9:12 UTC (Fri)
by gdt (subscriber, #6284)
[Link] (12 responses)
Posted Jul 16, 2021 12:47 UTC (Fri)
by dskoll (subscriber, #1630)
[Link] (11 responses)
Copyright, maybe, but I don't see how they could indemnify against patent infringement. If you infringe on a patent, it doesn't matter where the infringing code came from... it only matters that you're using it.
Posted Jul 17, 2021 19:35 UTC (Sat)
by gfernandes (subscriber, #119910)
[Link] (10 responses)
Co-pilot isn't going to write for you the next SAP competitor. Or the next DNA sequence breakthrough. It's going to help you build up your next Big Thing using fundamental lego blocks.
Unlikely said lego blocks can be patented.
Posted Jul 18, 2021 13:40 UTC (Sun)
by dskoll (subscriber, #1630)
[Link] (9 responses)
You mean like this patent?
My point is there's no way GitHub would indemnify users of Copilot code against patent infringement, because that's an expensive minefield. It would also encourage patent trolls... would you prefer to claim patent infringement damages against TinyStartup, or MIcrosoft?
Posted Jul 19, 2021 6:09 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link] (1 responses)
Posted Jul 19, 2021 16:23 UTC (Mon)
by dskoll (subscriber, #1630)
[Link]
Again, you are missing the point. Whether or not a patent will stand, there is no way Microsoft would even entertain the potential liability of offering indemnification against patent infringement for Copilot output.
Whether or not a patent is garbage is not relevant in Microsoft's calculation. The only relevant number is the potential financial harm Microsoft could suffer by offering indemnity.
Posted Jul 19, 2021 6:11 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link] (4 responses)
And claiming patent damages against Microsoft is inviting the butcher to kill the goose that lays the golden eggs.
Take your pick.
Posted Jul 19, 2021 8:55 UTC (Mon)
by sandsmark (guest, #62172)
[Link] (1 responses)
Our tiny startup received a patent troll mail a couple of years ago, which felt kind of validating for me at least. :-)
Unfortunately for them they sent apparently letters to Amazon as well (some overlap with Amazon's hardware stuff and ours), who swiftly crushed them in court and got the relevant patents invalidated.
So one free advice for patent trolls: don't send your "kind" snail mail letters to huge American companies before tiny Norwegian startups. You might get crushed before the letter reaches Norway.
Posted Jul 19, 2021 9:22 UTC (Mon)
by anselm (subscriber, #2796)
[Link]
The usual strategy if you're a patent troll is to go after some small companies first, because they can't afford a protracted legal battle and are likely to cave or settle quickly. You won't get a lot of money out of them but these wins give you street cred to go after the bigger fish later.
Perhaps in your case the patent troll didn't realise that snail mail from the US (I presume) to Norway takes a while to arrive, and assumed the business with you would be done and dusted before they'd call out Amazon? Just a thought.
Posted Jul 21, 2021 14:15 UTC (Wed)
by dskoll (subscriber, #1630)
[Link] (1 responses)
Suing TinyStartup is hard work for pretty much no return. Suing Microsoft is like buying a lottery ticket. You're very, very likely to lose, but if you win, it'll be fantastic. Also, most patent infringement defendants end up settling, and MSFT's threshold for settling is higher than a smaller company's would be. Unless MSFT sees the patent or the troll as an existential threat, it'll probably make the correct business decision and just pay to make the problem go away.
Posted Jul 22, 2021 4:42 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link]
It's not obvious that paying off patent trolls is the right business move for a big company like Microsoft. It would probably be the right move if there were only one patent troll out there, but if there are a lot of them- and there are- they'll wind up paying again and again. It might be cheaper in the long run to spend the money on lawyers and make an example of the first few who try it. Sending the message that suing your company will just result in massive legal bills and an invalidated patent should discourage other trolls who are thinking about suing you, resulting in less spending overall.
Posted Jul 19, 2021 8:47 UTC (Mon)
by sandsmark (guest, #62172)
[Link] (1 responses)
I'm hungover and not in the mood to read patents, but something that most people seem to miss (and is fairly important with patents) is that the abstract is completely irrelevant.
What matters are the claims, and most importantly the independent claims (those that don't reference other claims). If you don't infringe on any part of an independent claim it does not apply, and it also invalidates all the dependent claims.
So while I haven't read that patent, usually when people point to what looks like absurd patents they tend to be useless (except for generating media hype and stock price for whomever got it).
Posted Jul 19, 2021 16:24 UTC (Mon)
by dskoll (subscriber, #1630)
[Link]
I did read the patent. But again, even garbage patents can be expensive to fight and invalidate. I doubt Microsoft has the appetite for taking on that potential liability by offering indemnification against patent infringement for Copilot output.
Posted Jul 18, 2021 23:38 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
It's funny how everyone immediately started talking about copyrights and patents while forgetting the elephant in the room: NDAs. One may argue if 3 lines of code copied verbatim constitute a copyright violation or not and whether 10 lines of code may be complicated enough to be a patent violation. But even a single line of code accidentally copied from input to output can reveal something which Microsoft (or any other company) considers a secret. Public repositories (on GitHub or elsewhere) don't have that problem because they are, you know, public. If something is already on GitHub (and stored in The Arctic Code Vault in the hope that it would be discoverable there 500 years from now) then it's, generally, assumed that cat is out of the bag and even if some NDA was actually violated it's too late to demand that secret should stay secret.
Posted Jul 29, 2021 19:23 UTC (Thu)
by mrugiero (guest, #153040)
[Link]
Posted Jul 29, 2021 19:18 UTC (Thu)
by mrugiero (guest, #153040)
[Link]
Posted Jul 15, 2021 17:16 UTC (Thu)
by bkw1a (subscriber, #4101)
[Link] (14 responses)
Posted Jul 15, 2021 18:25 UTC (Thu)
by rgmoore (✭ supporter ✭, #75)
[Link] (10 responses)
There are already automatic plagiarism detection programs. It would be easy enough to run one on the output of the assistant to catch any plagiarism. You could even build plagiarism detection into the assistant so it would never spit out something that was deemed to be plagiarism in the first place. The plagiarism detector would have access to the training library for the assistant, which is the logical library to use when looking for plagiarism in its output.
Posted Jul 16, 2021 2:25 UTC (Fri)
by developer122 (guest, #152928)
[Link] (9 responses)
Normally, end state after they've converged is one where the discriminator can no longer tell the difference because the generator creates code which is so realistic.
Here we would have to invert things, with the discriminator driving the generator away from copyrighted training code. In this case, it's easiest path forward is to generate varying degrees of random spew that could never be considered copyrighted code.
One thing that's missed in all of this: there's zero evidence that *any* neural network understands the high-level structure of the code (or story text) that it's reading. It's only *assumed.* Meaning, there's absolutely no guarantee that the code it generates is any good. It will only be trying to pass off some randomly mixed combination of the training material to try and extend a pattern.
Until we can mechanically produce an objective score for code quality (fuzzing?), it's impossible to guide a neural network towards it.
Posted Jul 18, 2021 0:38 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (8 responses)
There's zero evidence that a human understands the code they write, either. Yes, you can ask the human questions about it and judge their responses, but if we can't train a chatterbot to answer such questions now, it will likely be possible within the next few years (compare and contrast GPT-3), at which point everyone will decide that "answering simple questions about code" no longer qualifies as "evidence." Unlike, say, a news article (for which GPT-3 still struggles to distinguish between fantasy and reality), all of the relevant information is contained within the source code itself, so there's no external reality which it has to know about or understand (aside from straightforward vocabulary issues such as "what do humans call this design pattern?"). Therefore, it's much less difficult, and may already be possible with existing text generation systems.
This is the same process that chess went through, of course. You can't ask Stockfish or AlphaZero "Why is this a good/bad move?" except indirectly, by asking "What is White/Black's best response to this move, and Black/White's reply, and so on?" (at which point, it will happily show you how one side wins the other's queen in some convoluted 15+ move line that you could never have found on your own). But nobody would seriously argue that humans have a deeper understanding of chess than those engines, merely because the engines are unable to verbalize their reasoning in simple terms. On the other hand, when an engine had just barely defeated Kasparov in the 80's, everyone abruptly decided that computers excelling at chess was no longer a sign of intelligence.
TL;DR: "Real" AI just means "anything that AI can't do yet."
Posted Jul 18, 2021 5:46 UTC (Sun)
by gfernandes (subscriber, #119910)
[Link] (7 responses)
That, by the way, was Deep Blue.
I doubt anyone could describe Kasparov in quite the same way!
Posted Jul 18, 2021 6:16 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Deep Blue used a "classic" algorithm, with simple recursive search and a fine-tuned weight function. We can understand how it works.
Modern neural-net chess programs beat classic algorithms. And it's not even close. They work exactly like human brain, by recognizing patterns.
Posted Jul 18, 2021 23:50 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
Actually Stockfish only lost to Leela Chess Zero once. Otherwise it keeps rank #1 pretty robustly. Of course the fact that relatively-simple (and resource hungry since modern CPU is not well-designed to support neural networks) pattern-recognizing program beats literally everything else and only the top engine based literally on everything humanity discovered about chess in hundreds of years can hold it's own is still amazing.
Posted Jul 19, 2021 1:31 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Though to be fair it didn't participate in regular computer chess tournaments and Stockfish got better.
Posted Jul 19, 2021 9:00 UTC (Mon)
by sandsmark (guest, #62172)
[Link] (1 responses)
Hasn't Stockfish merged a neural net evaluator? Or was that after the tournament?
Posted Jul 19, 2021 9:37 UTC (Mon)
by khim (subscriber, #9252)
[Link]
It returned the crown in 18th season and got neural networks in 19th. I think “neural network revolution” is similar to “demise of assembler” in the end of last century. Hand-written assembler was still worse then what high-level languages produced, but development time was so drastically different that it was impossible for assembler developers to deliver anything fast enough for it to be competitive. Stockfish uses ƎUИИ to deal with some fringe cases where they just don't have time to fine-tune the algorithm. The fate of the computer chess (and the world, arguably) depends on whether chip developers would be able to develop massive 3D chips (with thousands and later, maybe even millions, of layers… Moore's law turned 90 degrees, in a sense). For now this is only used for flash (but memories always used new technologies first because they need a lot of transistors but have very simple structure), but if active components will follow then it would be demise of modern computing parading and rise of neural networks. This is because of power consumption: you couldn't put mullion cores into one chip while keeping them at gigahertz range, the whole thing would consume so much power it would be impossible to cool it (even if you find a way to supply all that power). Modern programming techniques couldn't work in trillions of 1MHz cores — but neural networks can. Whether this would be enough to create strong AI or not… nobody knows.
Posted Jul 19, 2021 0:18 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
I am not sure I agree with that.
A classical chess engine is essentially made up of three parts:
1. An opening book that describes standard lines in opening theory.
High-level chess players will absolutely memorize the same information as is present in an opening book, although perhaps not to the same depth as the engine does. Similarly, human players do imagine future lines and evaluate their endpoints based on heuristics, using a process that is conceptually similar to minimax with aggressive pruning. Finally, the best human players spend a lot of time learning their endgames. They don't memorize an entire tablebase, of course, but they learn the patterns, and so this can be characterized as a particularly smart compression algorithm (i.e. I don't need to memorize hundreds of minor translations or rotations of the same basic mating pattern).
Posted Jul 19, 2021 6:14 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link]
Posted Jul 15, 2021 18:46 UTC (Thu)
by Lennie (subscriber, #49641)
[Link] (1 responses)
https://towardsdatascience.com/the-most-important-supreme...
Posted Jul 15, 2021 18:56 UTC (Thu)
by floppus (guest, #137245)
[Link]
Posted Jul 16, 2021 5:29 UTC (Fri)
by pabs (subscriber, #43278)
[Link]
Posted Jul 15, 2021 18:56 UTC (Thu)
by Paf (subscriber, #91811)
[Link]
Bring on the machines, I suppose. Can’t stop them anyway.
Posted Jul 15, 2021 22:55 UTC (Thu)
by pctammela (guest, #126687)
[Link] (7 responses)
Until then, the model is as good as the code it consumes.
Posted Jul 16, 2021 0:09 UTC (Fri)
by excors (subscriber, #95769)
[Link] (5 responses)
That sounds like an important point to me. In particular, I don't see how it will get any better than the code it consumes.
The article identifies the valid problem that a lot of programming is "cranking out boilerplate code" and "copying and pasting code". The old-fashioned solution is to write libraries that can hide the boilerplate and frequently-copied-and-pasted code behind a well-designed easy-to-use API that can be reused by many projects. The library developer is likely to spend a relatively large amount of time thinking about their chosen area, and they provide a central point for collaborative testing and bug reporting and patching, which all helps to improve the quality of their code. The newer solution is to copy-and-paste from Stack Overflow, which has voting and comments and the ability to edit other people's posts, and it's far from ideal or consistent but at least those features are sometimes successful at letting the community improve the quality of answers to popular questions.
Copilot seems to lose all of those mechanisms that create a bias towards higher-quality code. Some of the code it consumes and produces will be good, some will be bad, but there's no way to tell the difference unless you're already an expert and/or spend a lot of time thinking about it (which is unlikely since you're using Copilot to save you from all that effort), and if you do spot any problems then there's no way to submit corrections to save other people from the same issues. Presumably most of the code generated by Copilot will be, on average, similar quality to the average input codebases; but then the natural state of any codebase is to fall in quality unless you actively fight against it, so the Copilot-generated projects will be worse than average, and then they'll be used as inputs for the next generation of Copilot, exacerbating the problem.
The old-fashioned solution of writing libraries is evidently not a good approach either, because it's often so painful to import libraries (especially if you're on Linux and the distros don't package it, or the API has changed and they packaged an old version, or if it's using an open source license that's incompatible with your project's, or if it's using a different build system, or if it has a bug you need to fix right now and can't wait for upstream, etc) and it's not worth bothering unless the library contains a very substantial amount of functionality that you'll use. There are good reasons why copy-and-paste is such a popular way to import small chunks of third-party code. But the solution should be to make it easier to put useful high-quality code into reusable libraries, and easier for people to find and use those libraries, not to encourage a form of copy-and-paste with even fewer quality controls.
Posted Jul 16, 2021 3:39 UTC (Fri)
by Paf (subscriber, #91811)
[Link] (1 responses)
And I would say that (and the expressivity of the model itself) will be a huge determinant of how useful it - ever - is.
Posted Jul 16, 2021 8:44 UTC (Fri)
by geert (subscriber, #98403)
[Link]
Posted Jul 16, 2021 4:42 UTC (Fri)
by interalia (subscriber, #26615)
[Link]
I've never gotten the problem with Stack Overflow copying/pasting to solve low-level questions of "how do I use this function" and "what does this error mean and how do I fix it?". Even changing the culture to have more bits of reusable code seems unlikely to help. If we create millions more Lego bricks, people will still ask how to use the bricks they chose to create their version of a Millennium Falcon.
Posted Jul 16, 2021 8:25 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Jul 17, 2021 3:16 UTC (Sat)
by Hattifnattar (subscriber, #93737)
[Link]
Posted Jul 30, 2021 1:04 UTC (Fri)
by mrugiero (guest, #153040)
[Link]
Posted Jul 15, 2021 23:56 UTC (Thu)
by proski (subscriber, #104)
[Link] (3 responses)
Posted Jul 16, 2021 1:16 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
If you're doing that, I'd dare to say you've got a driver no one is interested in using because you're going to have to look at some code or documentation (that probably contains some code) at some point.
Anyways, `EXPORT_SYMBOL_GPL` isn't about "you saw the code, now you're tainted". It's about "if you use this symbol, it is our opinion that you are relying on Linux so much that you must be derivative". There's no need for anyone to have seen the Linux code for that to take effect.
Posted Jul 30, 2021 1:09 UTC (Fri)
by mrugiero (guest, #153040)
[Link]
Posted Jul 16, 2021 13:41 UTC (Fri)
by HIGHGuY (subscriber, #62277)
[Link]
Posted Jul 16, 2021 13:16 UTC (Fri)
by ldearquer (guest, #137451)
[Link] (21 responses)
I think this whole argumentation of how [reducing the notion of derived work is actually good for free software] is totally nuts.
Or maybe I misunderstood something.
Copyright is not what enables the concept of proprietary software, or not alone. For end users there are arguably more annoying and immediate aspects of propietary software, which don't depend on copyright law: Binary-only distibution, closed protocols, device lock ("tivoization"), etc.
Last week I had to fix my son's bed frame, bought about 10 years ago. I ended up screwing and bolting in manners not foreseen by the original designer, but hey, they didn't seem to think about children that actually jump and play on the bed (despite of mum's disagreement).
Luckily the bed frame was not codified in binary opcodes.
Posted Jul 16, 2021 17:06 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (19 responses)
I would prefer to address this problem by passing comprehensive right-to-repair legislation, but I don't see that happening any time soon. The main advantage of legislation, of course, is that it is much harder to opt out of legislation than to avoid using GPLv3'd software.
Posted Jul 16, 2021 21:05 UTC (Fri)
by ldearquer (guest, #137451)
[Link] (3 responses)
However, so far copyleft has been of much more help than legislation. If legislation is to fix this, then bring the legislation first, and render the copyleft unnecessary, but don't do it the other way around.
Note that copyright (copyleft) can't really in any way avoid proprietary software per se. It just prevents that, whatever free software is made, it remains free for future recipients. In theory, it should avoid someone in a position of power to make use of it for locking users, but that requires the copyright + 'derived work' mechanism. Imagine MS releasing a version of Office made from Libreoffice + some closed source importer/exporter to a proprietary docxx format. We would all effectively move to the new format, and all the effort put on having a free office suite would be used against that goal.
Even if this could sound ridiculous today, there you have Edge...
But back to the main point, I still fail to see how minimizing the 'derived work' concept would help free software. Definitely not for users.
Posted Jul 16, 2021 21:27 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Regardless, this is all academic. The definition of "derivative work" is what it is. We can't change it now (except by legislation, but that's not happening).
* Side note: Microsoft can already do this anyway because Apache OpenOffice is under a permissive license that allows it. I'm not sure of the compatibility status of LibreOffice vs. OpenOffice, but it might not matter. You could first develop the plugin for OpenOffice and then, as a separate step, make minor compatibility adjustments as needed. This is probably fair use and might not even be subject to copyright protection in the first place because it would be purely functional (compare and contrast Oracle v. Google, Baker v. Selden, etc.).
Posted Jul 17, 2021 8:21 UTC (Sat)
by james (subscriber, #1325)
[Link] (1 responses)
If the software industry (which very much includes Linux, these days) had coalesced around an understanding of roughly where the boundaries of "derivative work" were, the courts would not be bound by that, but would be likely to respect it in their deliberations: partly because they don't like upsetting whole industries if they don't have to, partly because they do consider existing legal thought (certainly from other jurisdictions or scholarly works), and partly because one of the functions of a judgment is to let everyone know why they decided what they did, so if they departed from existing consensus they'd be expected to explain why.
And we've worried a lot more in public about "derivative work" than the proprietary software industry (and had lawyers informing debates). Our opinions are likely to inform what the rest of the industry thinks.
I'm not going to comment about civil law jurisdictions here.
Posted Jul 17, 2021 8:46 UTC (Sat)
by james (subscriber, #1325)
[Link]
Posted Jul 17, 2021 2:03 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (14 responses)
> Binary-only distibution, closed protocols, device lock ("tivoization"), etc.
The only way to eliminate those would be to essentially enshrine the four freedoms and some other things in legislation. I feel like the proprietary software vendors have the power to prevent that from happening.
Posted Jul 17, 2021 9:05 UTC (Sat)
by ldearquer (guest, #137451)
[Link] (13 responses)
But does this require the four freedoms?
They seem to be discussed as a pack, all or nothing, but it seems to me that, IIRC, the first two (having full control on what you get/pay for) are much more legit than the rights to redistribute.
Maybe I am just mixing in my persnal feelings, but if I buy a car, I can understand the guys who designed the car tell me I shall not copy their design to build and give away similar cars, at least for a reasonable span of time.
But if they tell me that, well, the car is mine, but not really mine, kind of renting, but you pay for maintenance, and sorry you can't open the hood, or replace a spark plug or service it without their approval...
Posted Jul 17, 2021 23:58 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (12 responses)
Imagine the entertainment system in a common type of car has a bug in it that manifests after the warranty period ends and the bug is really annoying, only manifests rarely, manifests only while driving but doesn't brick the system. To be able to efficiently debug the system you need to be able to run it elsewhere under simulation so you can pretend you are driving (freedom 0), since debugging from the back seat while driving around is likely quite hard. You need to be able to review the source code too (freedom 1) to figure out what is going wrong. Since the owner doesn't know programming, they need to enlist the local mechanic-programmer, distribute (freedom 2) the source code and a log of the data streams while the bug happens to them, have them modify the code (freedom 3) to add the fix and share (freedom 2 again) the fix back to the owner and to the fork maintained by the world-wide mechanic-programmer association.
Posted Jul 18, 2021 0:49 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
If, say, we required that software copyright expire automatically as soon as the warranty expired, then either we'd get much better warranty terms out of it, or else the manufacturers would shrug and let it expire. Of course, I'm not sure where that would leave all those FOSS licenses with NO WARRANTY exclusions... Maybe we just have a default minimum of 2 years or so? Or limit it to locked-down hardware or something. I dunno, just spitballing here.
(If your software is developing at such a glacial pace that a 2-years-outdated fork is going to seriously compete with it, then I tend to wonder how placing it under copyleft was really doing all that much good in the first place.)
Posted Jul 18, 2021 0:54 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
It's already 90% of the way to being useful. The other 10% has to do with DMCA provisions, EULAs, etc. If you abrogate those and require that 17 USC 117 is *always* in effect for *any* lawfully-obtained software under *any* circumstances, even if DRM-encumbered, then that provision would no longer be a dead letter.
Posted Jul 18, 2021 0:59 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
Posted Jul 18, 2021 8:21 UTC (Sun)
by ldearquer (guest, #137451)
[Link] (8 responses)
So I agree freedoms 2 and 3 are one possible option, but my line of thought is, are they the only option?
For example, the world-wide fork could be distributed in the form of a patchset to apply on specific versions of the original work. See e.g. how game modders work, where everyone has their own copy of the game. If the game was released with full source code, they would do even better. The freedom to distribute copies of the original work would be convenient, comfortable, but not strictly required.
Posted Jul 18, 2021 8:25 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
Posted Jul 18, 2021 9:50 UTC (Sun)
by smcv (subscriber, #53363)
[Link] (6 responses)
I suspect most game mods have to contain enough copied or modified from the original game that the copyright holder of the original game could shut them down as copyright-infringing, if they wanted to. In my experience, game modders usually rely on exemptions that don't actually exist either in law or in the game's EULA (for example "it's non-commercial, so it's fine"), and the game's copyright holder turns a blind eye to it because they recognise the value of mods in popularizing their games.
Perhaps copyright law should behave more like game modders think it does, and less like the overreaching reality, but that seems unlikely to happen while changes to it are primarily driven by the same few entertainment cartels.
Posted Jul 19, 2021 0:03 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
This greatly depends on the game. Skyrim, for example, is explicitly designed to encourage modding, with an EULA that specifically permits it. In recent years, Bethesda has even been selling mods for real money (with the modder taking a cut).
> Perhaps copyright law should behave more like game modders think it does, and less like the overreaching reality, but that seems unlikely to happen while changes to it are primarily driven by the same few entertainment cartels.
And... that's where the analogy falls apart. In my experience, Skyrim modders are some of the most unreasonable people on the internet. Behaviors I have seen:
* Uploading a mod, free for anyone to download, and then characterizing mirrors of that mod as "piracy."
Technically, most of these things are (to some extent) supported by existing copyright law. But IMHO that's because copyright law is ridiculous, not because the modders are right. If you suffer no economic harm from someone's "infringement," then you should not have a claim against them, plain and simple. Or at most, you should have a claim to nominal damages and equitable enforcement of the license (i.e. an injunction), not statutory damages of hundreds of thousands of dollars.
Posted Jul 19, 2021 2:08 UTC (Mon)
by pabs (subscriber, #43278)
[Link] (1 responses)
There is usually zero economic harm from GPL violations (at least easily demonstrable to the copyright holder), so I think I'm going to have to disagree with you here.
> Or at most, you should have a claim to nominal damages and equitable enforcement of the license (i.e. an injunction), not statutory damages of hundreds of thousands of dollars.
An injunction against GPL violations seems good but I don't think nominal damages are appropriate based on what I read on WikiPedia, instead it should be restitutionary/disgorgement damages (where they pay back their ill-gotten gains), or possibly punitive damages or both, or something like paying for consultancy to help them come back into GPL compliance, or paying legal costs for bringing a suit against whoever they received the violating software from.
Posted Jul 19, 2021 14:55 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
OTOH, money damages are perfect for the proprietary software crowd because they become the profit that the company was trying to make in the first place.
Side note:
> WikiPedia
They stopped using CamelCase links in 2002. Why do people still write their name like this?
Posted Jul 19, 2021 16:50 UTC (Mon)
by rgmoore (✭ supporter ✭, #75)
[Link] (1 responses)
I strongly disagree. There should be a moral presumption against people taking others work, even if they aren't trying to exploit it commercially. As an example, I am an amateur photographer. I like to share my photographs with friends and family and sometimes the whole world, but I have never tried to sell my photos. That shouldn't mean that anyone who sees them should be able to use them however they like without asking my permission. If copyright law is limited to economic losses, it means amateurs like me have no right to prevent others from using our work in ways we don't approve of.
Posted Jul 19, 2021 20:48 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
As it is described in the US Constitution, for example, copyright is a subsidy ("to promote the progress of science and the useful arts"). It has nothing to do with morality and is purely an economic scheme to incentivize people to create more stuff. This is why copyright originally had a relatively short term of 14 years with an optional 14 year extension. This is also why copyright explicitly does not protect ideas, concepts, etc. Countries other than the US have a separate scheme of "moral rights" which vest permanently in the author, cannot be sold, transferred, or renounced, and are more limited in scope than standard copyright (generally having to do with attribution, mutilation of the work, etc.). Perhaps the US should borrow this idea.
Corporate interests have, in recent years, found it more useful to characterize copyright as a form of property, giving us laws like the Copyright Term Extension Act of 1998. Perhaps you agree with that characterization, but it is instructive to look at the consequences which it has wrought before you assume that your characterization is the only correct one.
Posted Jul 30, 2021 1:19 UTC (Fri)
by mrugiero (guest, #153040)
[Link]
Posted Aug 3, 2021 9:34 UTC (Tue)
by nim-nim (subscriber, #34454)
[Link]
All those depend on copyright law one way or another, without copyright anyone could remove the anti-features and share the result.
However, Android and the rise of cloud services showed that openness is not sufficient in itself. Unless you make sharing of changes mandatory a sufficiently rich actor can take over any mature open source codebase, by adding just enough closed changes other variants become uncompelling to work on.
Mandatory opening and sharing of changes is the only thing that enables Joe Nobody to work on a shared codebase at the same level as big corporations.
Corporate devs are not smarter or more motivated, corporations can just afford to out-spend individuals long enough for them to give up (in other economic domains that is called dumping).
Posted Jul 17, 2021 19:54 UTC (Sat)
by scientes (guest, #83068)
[Link] (1 responses)
Posted Jul 30, 2021 1:24 UTC (Fri)
by mrugiero (guest, #153040)
[Link]
NOTE I do not agree with the paranoia in terms of being copyright infringements, but thinking about it does make sense because of that.
Posted Jul 18, 2021 22:19 UTC (Sun)
by gerdesj (subscriber, #5446)
[Link] (3 responses)
Copilot should not be a chargeable service as-is. It should simply be part of bog standard Github. We stash our code there and GH gets to monetise their usual enterprise services and launder data etc. We all get Copilot in return. If it is trained only on commons code and lacks the ability to derive an innovation itself then surely its output is commons too.
Otherwise, I suggest that GH only trains George on "enterprise" accounts and provides only those accounts the fruits of their labours.
Perhaps there could be two Copilots: one for commons code and one for enterprise code.
Yes, my tongue is nearly poking out of my cheek.
Posted Jul 23, 2021 15:27 UTC (Fri)
by pjones (subscriber, #31722)
[Link] (2 responses)
This will definitely go well.
Posted Jul 26, 2021 13:33 UTC (Mon)
by immibis (guest, #105511)
[Link]
Posted Jul 26, 2021 15:16 UTC (Mon)
by geert (subscriber, #98403)
[Link]
Posted Jul 22, 2021 20:04 UTC (Thu)
by david.a.wheeler (subscriber, #72896)
[Link]
I think that's misguided. Practically *ALL* open source software licenses have attribution requirements, and other requirements, that are *also* not being met by what's being done by this service.
Is it legal? I have no idea. I think this is really untested ground. There are several ways to look at this:
* Maybe this is "de minimus" use or otherwise fair use, in which case it's fine.
* Maybe this is more like a human learning from existing code; humans can write code after reading code (indeed, how else could humans learn?).
* Maybe this is more like creating a derivative work, in which case this is dubious from a copyright perspective.
I'm not a lawyer. I predict that if this starts becoming useful, a lot of lawyers *will* look at this :-).
Posted Jul 26, 2021 6:41 UTC (Mon)
by taterbase (guest, #153426)
[Link]
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
The W3C is working on a robots.txt spec to standardize opt-out, which non-charity-status orgs doing scraping are bound to observe: https://www.w3.org/community/tdmrep/
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
I have removed all my GitHub repos
So yes, you can, just like you can kill someone down the street
I have removed all my GitHub repos
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
On the other hand, he continues, systems like Copilot offer the prospect of training models with proprietary code and using the result without worries of being tainted.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
> And claiming patent damages against Microsoft is inviting the butcher to kill the goose that lays the golden eggs.
GitHub is my copilot
So one free advice for patent trolls: don't send your "kind" snail mail letters to huge American companies before tiny Norwegian startups. You might get crushed before the letter reaches Norway.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
I may not have the right, as an employee, to disclose a given fact about the company, but unless my employer made GH sign one I believe only copyright counts. I, the employer, either authorized that disclosure to GH or an employee is getting sued.
GitHub is my copilot
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
> Modern neural-net chess programs beat classic algorithms. And it's not even close.
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
2. A tree search (minimax) algorithm for the midgame. This also requires the use of a heuristic evaluation function to cut off searching before it gets too deep. In modern engines, this evaluation function is "smart" and considers the relative positions of the pieces and pawns as well as their material values.
3. An endgame tablebase that gives you the exact lines to play in any position where N or fewer pieces are on the board. For modern engines, N=7 is generally the limit (at least for publicly-available datasets, anyway), but in Kasparov's day, N would have been much smaller.
How would this work for books?
How would this work for books?
How would this work for books?
How would this work for books?
GitHub is my copilot
GitHub is my copilot
*That* would be game changer.
Is good code the norm today? I don't think so... perhaps in the future.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
They will be able to write more junk code faster.
GitHub is my copilot
Suppose that I'm converting my proprietary code to a Linux driver using Copilot or a similar system. Suppose that Copilot generates code that refers to symbols declared with EXPORT_SYMBOL_GPL. In that case the kernel would be preventing my code from loading based on a flawed premise that my code is a derived work of the kernel and should be licensed under GPL. How can my code be a derived work if I haven't even looked at the kernel sources? Could be an interesting discussion in LKML.
Derived works
Derived works
Derived works
Derived works
The legal issue would be yours, not the kernel’s.
GitHub is my copilot
>> The GPL doesn't exist because copyright is good, it exists because software being copyrightable is what enables the concept of proprietary software in the first place.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
The definition of "derivative work" is what it is. We can't change it now (except by legislation, but that's not happening).
I'm not entirely convinced about that: the precise definition of "derivative work" with respect to software in any given common law jurisdiction would need to be thrashed out in a number of court cases that (as far as I'm aware) haven't happened yet.
Not to say how much this will happen. Just there isn't a total vacuum.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
* Characterizing attempts to fix or improve another mod as "piracy," even when the original mod is obviously broken. In one case, I believe a user threatened to contact law enforcement over this sort of thing.
* Requiring logins to download an otherwise free mod. Using this to try and prevent specific individuals from downloading specific mods (which are otherwise free for everyone).
* A modding tool that checks to see if you have installed certain applications which the author disapproves of. This check is not disclosed anywhere in the documentation, and the software refuses to run with a mysterious error message if it finds those applications.
* Characterizing direct downloads (where you don't look at their fancy web page first) as "piracy" or otherwise problematic.
* Removing mods for petty or ridiculous reasons.
* Insisting that people who are good at modding are always right, and anyone who disagrees with them, about anything, must be wrong.
* Miscellaneous internet toxicity.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
If you suffer no economic harm from someone's "infringement," then you should not have a claim against them, plain and simple.
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
GitHub is my copilot
[A] Drop proposed code.
[B] Drop existing code."
It's not just the GPL or protective licenses
GitHub is my copilot