GitHub is my copilot

By Jonathan Corbet
July 15, 2021

Your editor has worked in the computing field for rather longer than he cares to admit; for all of that time it has been said that a day will come when all that tedious programming work will no longer be necessary. Instead, we'll just say what we want and the computer will figure it out. Arguably, the announcement of GitHub Copilot takes us another step in that direction. On the way, though, it raises some interesting questions about copyright and free-software licensing.

Copilot is a machine-learning system that generates code. Given the beginning of a function or data-structure definition, it attempts to fill in the rest; it can also work from a comment describing the desired functionality. If one believes the testimonials on the Copilot site, it can do a miraculous job of figuring out the developer's intent and providing the needed code. It promises to take some of the grunge work out of development and increase developer productivity. Of course, it can happily generate security vulnerabilities; it also uploads the code you're working on and remembers if you took its suggestions, but that's the world we've built for ourselves.

Machine-learning systems, of course, must be trained on large amounts of data. Happily for GitHub, it just happens to be sitting on a massive pile of code, most of which is under free-software licenses. So the company duly used the code in the publicly available repositories it hosts to train this model; evidently private repositories were not used for this purpose. For now, the result is available as a restricted beta offering; the company plans to turn it into a commercial product going forward.

Copy-and-paste

Looked at one way, GitHub Copilot is the embodiment of a number of aspects of software development that are, perhaps, not fully covered in school:

Much of a software developer's time is spent cranking out boilerplate code that looks much like a lot of other boilerplate code in circulation. Unsurprisingly, developers do not find being freed of this work to be a distasteful prospect.
An awful lot of software development is actually done by copying and pasting code. It's tempting to say that this is especially true of contemporary developers, but development worked this way even before the days of Stack Overflow.
While we like to think that the code we write is original, we are all strongly influenced by code we have seen in the past. Developers who have read a lot of code tend to have many useful patterns at their fingertips.

If much of our work really comes down to copying and pasting at varying degrees of remove, perhaps it makes sense to get the computer to do that work for us when it can.

The use of free software to train Copilot has raised some interesting questions, though. If a machine-learning model has been trained on a particular body of code, is that model a derived work of that code? If so, since GPL-licensed code was used to train the model, the result would also come under the terms of the GPL. If that were true, it would not change much, since GitHub does not appear to have any interest in distributing its model.

But what about the code that Copilot spits out? Is that code, too, a derived work of the code used to train the model? The fact that Copilot occasionally regurgitates verbatim copies of the training code (0.1% of the time, according to the Copilot FAQ) tends to support those who believe that Copilot's output should be seen as a derived work. If this is true, then any code body using Copilot output is in the same situation, which would be a bit of a mess, since it will be derived from multiple bodies of code with conflicting licenses and an endless list of attribution requirements. The derived-work interpretation would make any code developed with Copilot's help entirely undistributable.

The best outcome is unclear

Your editor is not a lawyer and certainly does not wish to play one on the net. That said, there are arguments to be made to the effect that Copilot's output should not be seen as a derived work of the code used for training. Certainly GitHub sees it that way; the Copilot FAQ states: "Training machine learning models on publicly available data is considered fair use across the machine learning community". How closely that consideration matches actual copyright law is not entirely clear, but it is an ongoing precedent and practice.

More intuitively, one can easily compare Copilot with a seasoned software developer who has seen a lot of code over a long career. The code that developer writes today will surely be influenced by what they have seen in the past, but today's code is not generally seen as being a derived work of yesterday's reading. One could argue that Copilot is doing the same thing; the only difference is that, since it's a computer, it can read vast amounts of code — even PHP code — without going insane.

Former European parliamentarian Julia Reda makes the argument that the code snippets produced by Copilot are not large or complex enough to be considered original, copyrightable works. One might well wonder how much better Copilot has to get before that line will be crossed, but she also claims that "the output of a machine simply does not qualify for copyright protection – it is in the public domain". This argument, if taken to its extreme, suggests that copyrighted work could be put into the public domain by running it through a photocopier. These arguments may hold for now, but it's not clear that they are tenable in the long term.

More interestingly, Reda, along with Matthew Garrett, argues that a derived-work interpretation is not in the interests of the free-software community in any case. Copyleft, they say, is a response to overly strong copyright protection for code, not a reason to make it stronger. As Garrett put it:

The powers that the GPL uses to enforce sharing of code are used by the authors of proprietary software to reduce that sharing. They attempt to forbid us from examining their code to determine how it works - they argue that anyone who does so is tainted, unable to contribute similar code to free software projects in case they produce a derived work of the original. Broadly speaking, the further the definition of a derived work reaches, the greater the power of proprietary software authors.

On the other hand, he continues, systems like Copilot offer the prospect of training models with proprietary code and using the result without worries of being tainted. That, he says, is likely to be a positive outcome for the free-software community.

It seems reasonable to assume that Copilot is not the only machine-learning-based code-synthesis system out there; it is also plausible that these systems will become more capable over time. The copyright issues raised by Copilot seem to be concentrated on free software for now, but they may well expand beyond that realm in the future. What happens now, though, will set precedents for the that future; if the free-software community somehow shuts down Copilot over copyright issues, other interests will have a stronger argument for strengthened copyright laws applied to future systems. That power could be used to extend the reach of proprietary software or shut down machine-learning systems that are beneficial to the community. We should, thus, be careful about what we wish for, lest we actually get it.

I have removed all my GitHub repos

Posted Jul 15, 2021 14:28 UTC (Thu) by dskoll (subscriber, #1630) [Link] (14 responses)

Given the legal uncertainty around Copilot, as well as my gut feeling that it's simply a really terrible idea, I've removed all my GitHub repos and moved everthing to a self-hosted Gitea instance. I've put stub repos up on GitHub that point to the real repos because (unfortunately) GitHub has become pretty important for discovering software projects.

I'll miss certain GitHub features, especially the CI/CD framework, but I've cobbled together a workable replacement for that with self-hosted Buildbot.

I have removed all my GitHub repos

Posted Jul 15, 2021 14:40 UTC (Thu) by musicmatze (guest, #133336) [Link] (13 responses)

Same here. I removed all my sources and moves them to sourcehut and my own hosting. I still have a github account because I am maintainer of sources that are not authored by me and thus need to stay on github unfortunately.

I have removed all my GitHub repos

Posted Jul 15, 2021 14:42 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (12 responses)

I don't see how moving code to a different website prevents GitHub from let ting their copilot run loose on your repos and gathering the data. If their argument holds true, there is nothing special about hosting in GitHub.

I have removed all my GitHub repos

Posted Jul 15, 2021 15:14 UTC (Thu) by bluca (subscriber, #118303) [Link]

Precisely, it seems the same kind of knee-jerk reaction from when Microsoft bought Github and the world was about to end.

In the EU text and data mining is legal for public data, this is pretty much unambiguous (copyright status of the model/output of the model is not 100% clear on the other hand yet, but Julia Reda makes excellent arguments as always) - whether it is hosted on Github, Gitlab, your own server in your basement that definitely won't get hacked and used to mount supply-chain attacks on your users, a mailing list, or any other forge.
The W3C is working on a robots.txt spec to standardize opt-out, which non-charity-status orgs doing scraping are bound to observe: https://www.w3.org/community/tdmrep/

I have removed all my GitHub repos

Posted Jul 15, 2021 16:38 UTC (Thu) by dskoll (subscriber, #1630) [Link] (10 responses)

I somehow doubt GitHub would scrape repos they don't host. And if they do, I can block them.

I have removed all my GitHub repos

Posted Jul 15, 2021 18:04 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (9 responses)

> And if they do, I can block them.

Not if it's open source (or free software) you can't. Anyone can redistribute any FOSS code to anyone else, as long as they preserve license terms. If (as GitHub argues) feeding code into an ML model does not infringe copyright, then there is no legal mechanism to prevent GitHub from acquiring whatever FOSS code they want, by whatever means they want, and feeding it into the model. The only way around this is to move to proprietary licensing with a "thou shalt not redistribute to GitHub" term. But I doubt you actually want to do that.

I have removed all my GitHub repos

Posted Jul 15, 2021 20:06 UTC (Thu) by ejr (subscriber, #51652) [Link] (8 responses)

If it's your own server, then yes you can.

I have removed all my GitHub repos

Posted Jul 15, 2021 20:36 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (5 responses)

> If it's your own server, then yes you can.

The point is that they don't have to source it from your server directly. It is open source, nothing is stopping them from using a proxy to clone it.

I have removed all my GitHub repos

Posted Jul 15, 2021 20:39 UTC (Thu) by ejr (subscriber, #51652) [Link]

That's the choice. Release something for others to use as they please, or do not.

That "copilot" is bringing twenty-year-old security flaws back is another aspect. ML is only as good as its training data.

I have removed all my GitHub repos

Posted Jul 15, 2021 20:55 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

Or downloading it from any number of mirrors that you might or might not know about.

Is your code in Debian, or any other distro? The Internet Archive? What about the randoms on /r/datahoarders, do you think any of them made a copy?

There's no such thing as "It's public except for person X." If you put it on the internet, then anyone who wants to see it can probably get a copy by some means or another. Ordinarily, you would use copyright law to combat such unauthorized copying, but FOSS licenses are explicitly designed to allow and even encourage it.

I have removed all my GitHub repos

Posted Jul 16, 2021 1:09 UTC (Fri) by dskoll (subscriber, #1630) [Link] (1 responses)

That's true; I can't prevent GitHub from slurping my code into its ML system. But I can express my disagreement with it, and I can stop using GitHub to ever so slightly reduce the network effect that makes it attractive in the first place.

I have removed all my GitHub repos

Posted Jul 19, 2021 21:53 UTC (Mon) by jbicha (subscriber, #75043) [Link]

They probably already fed all your repos into copilot before you removed them.

I have removed all my GitHub repos

Posted Jul 16, 2021 15:19 UTC (Fri) by marcH (subscriber, #57642) [Link]

>There's no such thing as "It's public except for person X.

Yes there is: the version 3 of the GPL.

</troll>

I have removed all my GitHub repos

Posted Jul 17, 2021 17:46 UTC (Sat) by ju3Ceemi (subscriber, #102464) [Link] (1 responses)

"If it's your own server, then yes you can."

Depending on the licence etc, that may be illegal
So yes, you can, just like you can kill someone down the street

I have removed all my GitHub repos

Posted Jul 22, 2021 11:29 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

How could controlling access to a server that you run be 'illegal'? There is no legal requirement to offer services to any specific person or entity from a server that you run.

GitHub is my copilot

Posted Jul 15, 2021 15:45 UTC (Thu) by MattBBaker (guest, #28651) [Link] (2 responses)

"the only difference is that, since it's a computer, it can read vast amounts of code — even PHP code — without going insane."

Oh great, HAL9000 read reams of PHP code and that is what turned him insane.

GitHub is my copilot

Posted Jul 16, 2021 15:12 UTC (Fri) by marcH (subscriber, #57642) [Link]

> "the only difference is that, since it's a computer, it can read vast amounts of code — even PHP code — without going insane."

They had to filter C macros though. There are limits.

GitHub is my copilot

Posted Jul 17, 2021 20:12 UTC (Sat) by klbrun (subscriber, #45083) [Link]

But is a program running on a computer sane (responsible) in the first place? Such a program is finite and usually unable to change its programming at run time, so it can't be held responsible in the way humans are. We have not developed Skynet yet, although the Chinese may be working on it.

GitHub is my copilot

Posted Jul 15, 2021 15:48 UTC (Thu) by zblaxell (subscriber, #26385) [Link]

I'm waiting for the inverse of this, where someone hacks up a SEO bot farm to manipulate Copilot's output. "Because we can" at first, but if it works, then maybe for other reasons later.

Do we need ERB approval to experiment on robots? Asking for a friend. ;)

GitHub is my copilot

Posted Jul 15, 2021 15:51 UTC (Thu) by IanKelling (subscriber, #89418) [Link] (3 responses)

GitHub copilot is SaaSS (Service as a Software Substitute) https://www.gnu.org/philosophy/who-does-that-server-reall...

GitHub is my copilot

Posted Jul 16, 2021 5:27 UTC (Fri) by pabs (subscriber, #43278) [Link] (2 responses)

Its more than just software though, it is also an actual service too, because even if GitHub and Copilot was fully Free Software, I still couldn't retrain the Copilot model from scratch and use it to write code; because I can't afford the disk space to store all of the input code, the servers and GPUs required to train the model and the probably large amount of resources it takes to use the model to write code. Other large well resourced tech companies could replicate that and provide a second service though.

GitHub is my copilot

Posted Jul 26, 2021 13:23 UTC (Mon) by immibis (subscriber, #105511) [Link] (1 responses)

What if you had the model they trained? I don't know about Copilot, but I believe the largest GPT-3 model is in the 250GB-1TB range (depending on what level of precision they used). Sure, you'd have a hard time fitting it in RAM, but it's not *impossible* to run yourself.

GitHub is my copilot

Posted Jul 27, 2021 1:41 UTC (Tue) by pabs (subscriber, #43278) [Link]

I definitely don't have computing resources required for that, nor the economic resources required to acquire them.

GitHub is my copilot

Posted Jul 15, 2021 16:25 UTC (Thu) by JoeBuck (subscriber, #2330) [Link] (16 responses)

On the other hand, he continues, systems like Copilot offer the prospect of training models with proprietary code and using the result without worries of being tainted.

Microsoft owns Github. Will they be willing to retrain Copilot, using large portions of the proprietary Microsoft code base as training data? That should help developers produce better Windows-compatible software, right, and if it's fair use to use any GPL chunks of software that might be emitted verbatim, shouldn't it be the same for their own code? And will they then specifically disclaim any proprietary interest in the output? They could lead by example if that's what they want to do.

GitHub is my copilot

Posted Jul 16, 2021 9:12 UTC (Fri) by gdt (subscriber, #6284) [Link] (12 responses)

Related: if GitHub believe there's not a infringement issue then they can simply indemnify Copilot's users against copyright and patent legal action.

GitHub is my copilot

Posted Jul 16, 2021 12:47 UTC (Fri) by dskoll (subscriber, #1630) [Link] (11 responses)

Copyright, maybe, but I don't see how they could indemnify against patent infringement. If you infringe on a patent, it doesn't matter where the infringing code came from... it only matters that you're using it.

GitHub is my copilot

Posted Jul 17, 2021 19:35 UTC (Sat) by gfernandes (subscriber, #119910) [Link] (10 responses)

I really doubt anyone would be able to claim a patent that would stand on a sort algorithm, or a circular buffer or something fundamentally utilitarian like that.

Co-pilot isn't going to write for you the next SAP competitor. Or the next DNA sequence breakthrough. It's going to help you build up your next Big Thing using fundamental lego blocks.

Unlikely said lego blocks can be patented.

GitHub is my copilot

Posted Jul 18, 2021 13:40 UTC (Sun) by dskoll (subscriber, #1630) [Link] (9 responses)

You mean like this patent?

My point is there's no way GitHub would indemnify users of Copilot code against patent infringement, because that's an expensive minefield. It would also encourage patent trolls... would you prefer to claim patent infringement damages against TinyStartup, or MIcrosoft?

GitHub is my copilot

Posted Jul 19, 2021 6:09 UTC (Mon) by gfernandes (subscriber, #119910) [Link] (1 responses)

That fact that a patent is awarded is in no way an indication that it will stand. This has been proven many times before.

GitHub is my copilot

Posted Jul 19, 2021 16:23 UTC (Mon) by dskoll (subscriber, #1630) [Link]

Again, you are missing the point. Whether or not a patent will stand, there is no way Microsoft would even entertain the potential liability of offering indemnification against patent infringement for Copilot output.

Whether or not a patent is garbage is not relevant in Microsoft's calculation. The only relevant number is the potential financial harm Microsoft could suffer by offering indemnity.

GitHub is my copilot

Posted Jul 19, 2021 6:11 UTC (Mon) by gfernandes (subscriber, #119910) [Link] (4 responses)

And actually, claiming patent damages against Tiny Startup is basically bad business. You'll get some loose change.

And claiming patent damages against Microsoft is inviting the butcher to kill the goose that lays the golden eggs.

Take your pick.

GitHub is my copilot

Posted Jul 19, 2021 8:55 UTC (Mon) by sandsmark (guest, #62172) [Link] (1 responses)

> And actually, claiming patent damages against Tiny Startup is basically bad business. You'll get some loose change.
> And claiming patent damages against Microsoft is inviting the butcher to kill the goose that lays the golden eggs.

Our tiny startup received a patent troll mail a couple of years ago, which felt kind of validating for me at least. :-)

Unfortunately for them they sent apparently letters to Amazon as well (some overlap with Amazon's hardware stuff and ours), who swiftly crushed them in court and got the relevant patents invalidated.

So one free advice for patent trolls: don't send your "kind" snail mail letters to huge American companies before tiny Norwegian startups. You might get crushed before the letter reaches Norway.

GitHub is my copilot

Posted Jul 19, 2021 9:22 UTC (Mon) by anselm (subscriber, #2796) [Link]

So one free advice for patent trolls: don't send your "kind" snail mail letters to huge American companies before tiny Norwegian startups. You might get crushed before the letter reaches Norway.

The usual strategy if you're a patent troll is to go after some small companies first, because they can't afford a protracted legal battle and are likely to cave or settle quickly. You won't get a lot of money out of them but these wins give you street cred to go after the bigger fish later.

Perhaps in your case the patent troll didn't realise that snail mail from the US (I presume) to Norway takes a while to arrive, and assumed the business with you would be done and dusted before they'd call out Amazon? Just a thought.

GitHub is my copilot

Posted Jul 21, 2021 14:15 UTC (Wed) by dskoll (subscriber, #1630) [Link] (1 responses)

Suing TinyStartup is hard work for pretty much no return. Suing Microsoft is like buying a lottery ticket. You're very, very likely to lose, but if you win, it'll be fantastic. Also, most patent infringement defendants end up settling, and MSFT's threshold for settling is higher than a smaller company's would be. Unless MSFT sees the patent or the troll as an existential threat, it'll probably make the correct business decision and just pay to make the problem go away.

GitHub is my copilot

Posted Jul 22, 2021 4:42 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link]

It's not obvious that paying off patent trolls is the right business move for a big company like Microsoft. It would probably be the right move if there were only one patent troll out there, but if there are a lot of them- and there are- they'll wind up paying again and again. It might be cheaper in the long run to spend the money on lawyers and make an example of the first few who try it. Sending the message that suing your company will just result in massive legal bills and an invalidated patent should discourage other trolls who are thinking about suing you, resulting in less spending overall.

GitHub is my copilot

Posted Jul 19, 2021 8:47 UTC (Mon) by sandsmark (guest, #62172) [Link] (1 responses)

> You mean like this patent?

I'm hungover and not in the mood to read patents, but something that most people seem to miss (and is fairly important with patents) is that the abstract is completely irrelevant.

What matters are the claims, and most importantly the independent claims (those that don't reference other claims). If you don't infringe on any part of an independent claim it does not apply, and it also invalidates all the dependent claims.

So while I haven't read that patent, usually when people point to what looks like absurd patents they tend to be useless (except for generating media hype and stock price for whomever got it).

GitHub is my copilot

Posted Jul 19, 2021 16:24 UTC (Mon) by dskoll (subscriber, #1630) [Link]

I did read the patent. But again, even garbage patents can be expensive to fight and invalidate. I doubt Microsoft has the appetite for taking on that potential liability by offering indemnification against patent infringement for Copilot output.

GitHub is my copilot

Posted Jul 18, 2021 23:38 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

It's funny how everyone immediately started talking about copyrights and patents while forgetting the elephant in the room: NDAs.

One may argue if 3 lines of code copied verbatim constitute a copyright violation or not and whether 10 lines of code may be complicated enough to be a patent violation.

But even a single line of code accidentally copied from input to output can reveal something which Microsoft (or any other company) considers a secret.

Public repositories (on GitHub or elsewhere) don't have that problem because they are, you know, public.

If something is already on GitHub (and stored in The Arctic Code Vault in the hope that it would be discoverable there 500 years from now) then it's, generally, assumed that cat is out of the bag and even if some NDA was actually violated it's too late to demand that secret should stay secret.

GitHub is my copilot

Posted Jul 29, 2021 19:23 UTC (Thu) by mrugiero (guest, #153040) [Link]

Wouldn't NDAs only enter the scene when, you know, the actors signed one? Does MSFT sign NDAs with people using their services?
I may not have the right, as an employee, to disclose a given fact about the company, but unless my employer made GH sign one I believe only copyright counts. I, the employer, either authorized that disclosure to GH or an employee is getting sued.

GitHub is my copilot

Posted Jul 29, 2021 19:18 UTC (Thu) by mrugiero (guest, #153040) [Link]

Nobody argues that's what they intend to do, but that it's a consequence of what they did. Two very different statements that happen to share the same output.

How would this work for books?

Posted Jul 15, 2021 17:16 UTC (Thu) by bkw1a (guest, #4101) [Link] (14 responses)

What if something like copilot were trained on the Google Books database? If authors used it, and it included (sometimes) paragraphs from copyrighted books, would the resulting work be free of copyright restrictions? Would it be plagiarism? And how would the copilot-assisted author know about it (before getting a letter from a lawyer)? Maybe a paragraph would be OK, but would there be hard guarantees that larger sections would never be copied verbatim? How would a machine learning system ensure that?

How would this work for books?

Posted Jul 15, 2021 18:25 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link] (10 responses)

There are already automatic plagiarism detection programs. It would be easy enough to run one on the output of the assistant to catch any plagiarism. You could even build plagiarism detection into the assistant so it would never spit out something that was deemed to be plagiarism in the first place. The plagiarism detector would have access to the training library for the assistant, which is the logical library to use when looking for plagiarism in its output.

How would this work for books?

Posted Jul 16, 2021 2:25 UTC (Fri) by developer122 (guest, #152928) [Link] (9 responses)

What you have essentially described is a generative adversarial network. It would be described of a generator, generating new code, and a discriminator trying to tell if it's the original copyrighted code.

Normally, end state after they've converged is one where the discriminator can no longer tell the difference because the generator creates code which is so realistic.

Here we would have to invert things, with the discriminator driving the generator away from copyrighted training code. In this case, it's easiest path forward is to generate varying degrees of random spew that could never be considered copyrighted code.

One thing that's missed in all of this: there's zero evidence that *any* neural network understands the high-level structure of the code (or story text) that it's reading. It's only *assumed.* Meaning, there's absolutely no guarantee that the code it generates is any good. It will only be trying to pass off some randomly mixed combination of the training material to try and extend a pattern.

Until we can mechanically produce an objective score for code quality (fuzzing?), it's impossible to guide a neural network towards it.

How would this work for books?

Posted Jul 18, 2021 0:38 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (8 responses)

> One thing that's missed in all of this: there's zero evidence that *any* neural network understands the high-level structure of the code (or story text) that it's reading. It's only *assumed.* Meaning, there's absolutely no guarantee that the code it generates is any good. It will only be trying to pass off some randomly mixed combination of the training material to try and extend a pattern.

There's zero evidence that a human understands the code they write, either. Yes, you can ask the human questions about it and judge their responses, but if we can't train a chatterbot to answer such questions now, it will likely be possible within the next few years (compare and contrast GPT-3), at which point everyone will decide that "answering simple questions about code" no longer qualifies as "evidence." Unlike, say, a news article (for which GPT-3 still struggles to distinguish between fantasy and reality), all of the relevant information is contained within the source code itself, so there's no external reality which it has to know about or understand (aside from straightforward vocabulary issues such as "what do humans call this design pattern?"). Therefore, it's much less difficult, and may already be possible with existing text generation systems.

This is the same process that chess went through, of course. You can't ask Stockfish or AlphaZero "Why is this a good/bad move?" except indirectly, by asking "What is White/Black's best response to this move, and Black/White's reply, and so on?" (at which point, it will happily show you how one side wins the other's queen in some convoluted 15+ move line that you could never have found on your own). But nobody would seriously argue that humans have a deeper understanding of chess than those engines, merely because the engines are unable to verbalize their reasoning in simple terms. On the other hand, when an engine had just barely defeated Kasparov in the 80's, everyone abruptly decided that computers excelling at chess was no longer a sign of intelligence.

TL;DR: "Real" AI just means "anything that AI can't do yet."

How would this work for books?

Posted Jul 18, 2021 5:46 UTC (Sun) by gfernandes (subscriber, #119910) [Link] (7 responses)

OK, a bit off topic now, but I'm sure you'd agree that there is a difference between human _intelligence_, and a glorified and somewhat biased search algorithm backed by a humongous database of alternatives to choose from, given a few characteristics to influence that choice.

That, by the way, was Deep Blue.

I doubt anyone could describe Kasparov in quite the same way!

How would this work for books?

Posted Jul 18, 2021 6:16 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> OK, a bit off topic now, but I'm sure you'd agree that there is a difference between human _intelligence_, and a glorified and somewhat biased search algorithm backed by a humongous database of alternatives to choose from

Deep Blue used a "classic" algorithm, with simple recursive search and a fine-tuned weight function. We can understand how it works.

Modern neural-net chess programs beat classic algorithms. And it's not even close. They work exactly like human brain, by recognizing patterns.

How would this work for books?

Posted Jul 18, 2021 23:50 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

> Modern neural-net chess programs beat classic algorithms. And it's not even close.

Actually Stockfish only lost to Leela Chess Zero once. Otherwise it keeps rank #1 pretty robustly.

Of course the fact that relatively-simple (and resource hungry since modern CPU is not well-designed to support neural networks) pattern-recognizing program beats literally everything else and only the top engine based literally on everything humanity discovered about chess in hundreds of years can hold it's own is still amazing.

How would this work for books?

Posted Jul 19, 2021 1:31 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

I was thinking about https://en.wikipedia.org/wiki/AlphaZero at least during its unveiling.

Though to be fair it didn't participate in regular computer chess tournaments and Stockfish got better.

How would this work for books?

Posted Jul 19, 2021 9:00 UTC (Mon) by sandsmark (guest, #62172) [Link] (1 responses)

> Actually Stockfish only lost to Leela Chess Zero once. Otherwise it keeps rank #1 pretty robustly.

Hasn't Stockfish merged a neural net evaluator? Or was that after the tournament?

How would this work for books?

Posted Jul 19, 2021 9:37 UTC (Mon) by khim (subscriber, #9252) [Link]

It returned the crown in 18th season and got neural networks in 19th.

I think “neural network revolution” is similar to “demise of assembler” in the end of last century.

Hand-written assembler was still worse then what high-level languages produced, but development time was so drastically different that it was impossible for assembler developers to deliver anything fast enough for it to be competitive.

Stockfish uses ƎUИИ to deal with some fringe cases where they just don't have time to fine-tune the algorithm.

The fate of the computer chess (and the world, arguably) depends on whether chip developers would be able to develop massive 3D chips (with thousands and later, maybe even millions, of layers… Moore's law turned 90 degrees, in a sense). For now this is only used for flash (but memories always used new technologies first because they need a lot of transistors but have very simple structure), but if active components will follow then it would be demise of modern computing parading and rise of neural networks.

This is because of power consumption: you couldn't put mullion cores into one chip while keeping them at gigahertz range, the whole thing would consume so much power it would be impossible to cool it (even if you find a way to supply all that power). Modern programming techniques couldn't work in trillions of 1MHz cores — but neural networks can.

Whether this would be enough to create strong AI or not… nobody knows.

How would this work for books?

Posted Jul 19, 2021 0:18 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

I am not sure I agree with that.

A classical chess engine is essentially made up of three parts:

1. An opening book that describes standard lines in opening theory.
2. A tree search (minimax) algorithm for the midgame. This also requires the use of a heuristic evaluation function to cut off searching before it gets too deep. In modern engines, this evaluation function is "smart" and considers the relative positions of the pieces and pawns as well as their material values.
3. An endgame tablebase that gives you the exact lines to play in any position where N or fewer pieces are on the board. For modern engines, N=7 is generally the limit (at least for publicly-available datasets, anyway), but in Kasparov's day, N would have been much smaller.

High-level chess players will absolutely memorize the same information as is present in an opening book, although perhaps not to the same depth as the engine does. Similarly, human players do imagine future lines and evaluate their endpoints based on heuristics, using a process that is conceptually similar to minimax with aggressive pruning. Finally, the best human players spend a lot of time learning their endgames. They don't memorize an entire tablebase, of course, but they learn the patterns, and so this can be characterized as a particularly smart compression algorithm (i.e. I don't need to memorize hundreds of minor translations or rotations of the same basic mating pattern).

How would this work for books?

Posted Jul 19, 2021 6:14 UTC (Mon) by gfernandes (subscriber, #119910) [Link]

You simply unpacked the nutshell.

How would this work for books?

Posted Jul 15, 2021 18:46 UTC (Thu) by Lennie (subscriber, #49641) [Link] (1 responses)

Well... seems to me the answer was already yes, you are allowed to do that in 2019:

https://towardsdatascience.com/the-most-important-supreme...

How would this work for books?

Posted Jul 15, 2021 18:56 UTC (Thu) by floppus (guest, #137245) [Link]

That article makes it clear that there's a world of difference between what Google is doing (using a "discriminative" algorithm) and what Github is doing (using a "generative" algorithm).

How would this work for books?

Posted Jul 16, 2021 5:29 UTC (Fri) by pabs (subscriber, #43278) [Link]

I think I read on one of the HN threads that someone was able to reproduce the entirety of a Harry Potter chapter verbatim using one of the ML services, I think it was GPT-3.

GitHub is my copilot

Posted Jul 15, 2021 18:56 UTC (Thu) by Paf (subscriber, #91811) [Link]

This is an awfully interesting tool. I’m curious to see how things like this develop going forward. I imagine like automation in other fields it will remove much drudgery and will also eliminate some lower end work. It is a very, very long way from being able to safely/successfully do the harder stuff, but there is a lot of easier stuff out there.

Bring on the machines, I suppose. Can’t stop them anyway.

GitHub is my copilot

Posted Jul 15, 2021 22:55 UTC (Thu) by pctammela (guest, #126687) [Link] (7 responses)

I wish GitHub repurposes this a tool to automatically write tests based on documentation.
*That* would be game changer.

Until then, the model is as good as the code it consumes.
Is good code the norm today? I don't think so... perhaps in the future.

GitHub is my copilot

Posted Jul 16, 2021 0:09 UTC (Fri) by excors (subscriber, #95769) [Link] (5 responses)

> Until then, the model is as good as the code it consumes.

That sounds like an important point to me. In particular, I don't see how it will get any better than the code it consumes.

The article identifies the valid problem that a lot of programming is "cranking out boilerplate code" and "copying and pasting code". The old-fashioned solution is to write libraries that can hide the boilerplate and frequently-copied-and-pasted code behind a well-designed easy-to-use API that can be reused by many projects. The library developer is likely to spend a relatively large amount of time thinking about their chosen area, and they provide a central point for collaborative testing and bug reporting and patching, which all helps to improve the quality of their code. The newer solution is to copy-and-paste from Stack Overflow, which has voting and comments and the ability to edit other people's posts, and it's far from ideal or consistent but at least those features are sometimes successful at letting the community improve the quality of answers to popular questions.

Copilot seems to lose all of those mechanisms that create a bias towards higher-quality code. Some of the code it consumes and produces will be good, some will be bad, but there's no way to tell the difference unless you're already an expert and/or spend a lot of time thinking about it (which is unlikely since you're using Copilot to save you from all that effort), and if you do spot any problems then there's no way to submit corrections to save other people from the same issues. Presumably most of the code generated by Copilot will be, on average, similar quality to the average input codebases; but then the natural state of any codebase is to fall in quality unless you actively fight against it, so the Copilot-generated projects will be worse than average, and then they'll be used as inputs for the next generation of Copilot, exacerbating the problem.

The old-fashioned solution of writing libraries is evidently not a good approach either, because it's often so painful to import libraries (especially if you're on Linux and the distros don't package it, or the API has changed and they packaged an old version, or if it's using an open source license that's incompatible with your project's, or if it's using a different build system, or if it has a bug you need to fix right now and can't wait for upstream, etc) and it's not worth bothering unless the library contains a very substantial amount of functionality that you'll use. There are good reasons why copy-and-paste is such a popular way to import small chunks of third-party code. But the solution should be to make it easier to put useful high-quality code into reusable libraries, and easier for people to find and use those libraries, not to encourage a form of copy-and-paste with even fewer quality controls.

GitHub is my copilot

Posted Jul 16, 2021 3:39 UTC (Fri) by Paf (subscriber, #91811) [Link] (1 responses)

I think they will absolutely have to figure out ways to incorporate “value” functions so it can try to learn what “good” code is. What will they be? How will it decide? Etc etc. They must already be working at this,

And I would say that (and the expressivity of the model itself) will be a huge determinant of how useful it - ever - is.

GitHub is my copilot

Posted Jul 16, 2021 8:44 UTC (Fri) by geert (subscriber, #98403) [Link]

Applying a few static checkers to the output, before presenting it to the user?

GitHub is my copilot

Posted Jul 16, 2021 4:42 UTC (Fri) by interalia (subscriber, #26615) [Link]

Regarding using libraries as a better way to import third party code, that's casually true. But in my experience searching Stack Overflow for solutions many of the posts on Stack Overflow are about "how do I use <module> to do X? I tried the following code: Y". In other words, they're questions about how to use a library anyway, so it's not like their problem is that they're not using a library. Now of course someone could write a wrapper library which makes using the first library even easier, but someone would still ask questions on how to use the new library.

I've never gotten the problem with Stack Overflow copying/pasting to solve low-level questions of "how do I use this function" and "what does this error mean and how do I fix it?". Even changing the culture to have more bits of reusable code seems unlikely to help. If we create millions more Lego bricks, people will still ask how to use the bricks they chose to create their version of a Millennium Falcon.

GitHub is my copilot

Posted Jul 16, 2021 8:25 UTC (Fri) by bluca (subscriber, #118303) [Link]

One person's library is another person's boilerplate

GitHub is my copilot

Posted Jul 17, 2021 3:16 UTC (Sat) by Hattifnattar (subscriber, #93737) [Link]

My cynical opinion is that Copilot will be most useful for people who write junk code.
They will be able to write more junk code faster.

GitHub is my copilot

Posted Jul 30, 2021 1:04 UTC (Fri) by mrugiero (guest, #153040) [Link]

Wait, you guys have documentation?

Derived works

Posted Jul 15, 2021 23:56 UTC (Thu) by proski (subscriber, #104) [Link] (3 responses)

Suppose that I'm converting my proprietary code to a Linux driver using Copilot or a similar system. Suppose that Copilot generates code that refers to symbols declared with EXPORT_SYMBOL_GPL. In that case the kernel would be preventing my code from loading based on a flawed premise that my code is a derived work of the kernel and should be licensed under GPL. How can my code be a derived work if I haven't even looked at the kernel sources? Could be an interesting discussion in LKML.

Derived works

Posted Jul 16, 2021 1:16 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

> if I haven't even looked at the kernel sources

If you're doing that, I'd dare to say you've got a driver no one is interested in using because you're going to have to look at some code or documentation (that probably contains some code) at some point.

Anyways, `EXPORT_SYMBOL_GPL` isn't about "you saw the code, now you're tainted". It's about "if you use this symbol, it is our opinion that you are relying on Linux so much that you must be derivative". There's no need for anyone to have seen the Linux code for that to take effect.

Derived works

Posted Jul 30, 2021 1:09 UTC (Fri) by mrugiero (guest, #153040) [Link]

And that's why libraries tend to use LGPL, because you don't need to read the code to be using it, and nobody will use it in a non-GPL project if they can't be sure it won't count as derivative.

Derived works

Posted Jul 16, 2021 13:41 UTC (Fri) by HIGHGuY (subscriber, #62277) [Link]

IANAL, but I would dare say that if you attempt to upstream such code you might be publishing code you do not have the right to publish for.
The legal issue would be yours, not the kernel’s.

GitHub is my copilot

Posted Jul 16, 2021 13:16 UTC (Fri) by ldearquer (guest, #137451) [Link] (21 responses)

From Matthew Garrett's article
>> The GPL doesn't exist because copyright is good, it exists because software being copyrightable is what enables the concept of proprietary software in the first place.

I think this whole argumentation of how ［reducing the notion of derived work is actually good for free software］ is totally nuts.

Or maybe I misunderstood something.

Copyright is not what enables the concept of proprietary software, or not alone. For end users there are arguably more annoying and immediate aspects of propietary software, which don't depend on copyright law: Binary-only distibution, closed protocols, device lock ("tivoization"), etc.

Last week I had to fix my son's bed frame, bought about 10 years ago. I ended up screwing and bolting in manners not foreseen by the original designer, but hey, they didn't seem to think about children that actually jump and play on the bed (despite of mum's disagreement).

Luckily the bed frame was not codified in binary opcodes.

GitHub is my copilot

Posted Jul 16, 2021 17:06 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (19 responses)

Unfortunately, addressing those problems through copyright has been hit-or-miss at best. While some people have embraced GPLv3, there are also plenty of developers, such as Linus, who don't actually care about locked hardware and "just want to get source code back." As a result, the primary effect of GPLv3 has been to fragment the copyleft ecosystem into two camps, which IMHO was a step backwards.

I would prefer to address this problem by passing comprehensive right-to-repair legislation, but I don't see that happening any time soon. The main advantage of legislation, of course, is that it is much harder to opt out of legislation than to avoid using GPLv3'd software.

GitHub is my copilot

Posted Jul 16, 2021 21:05 UTC (Fri) by ldearquer (guest, #137451) [Link] (3 responses)

I too agree right-to-repair legislation would be much better for everyone, probably also for the planet and the local employment.

However, so far copyleft has been of much more help than legislation. If legislation is to fix this, then bring the legislation first, and render the copyleft unnecessary, but don't do it the other way around.

Note that copyright (copyleft) can't really in any way avoid proprietary software per se. It just prevents that, whatever free software is made, it remains free for future recipients. In theory, it should avoid someone in a position of power to make use of it for locking users, but that requires the copyright + 'derived work' mechanism. Imagine MS releasing a version of Office made from Libreoffice + some closed source importer/exporter to a proprietary docxx format. We would all effectively move to the new format, and all the effort put on having a free office suite would be used against that goal.

Even if this could sound ridiculous today, there you have Edge...

But back to the main point, I still fail to see how minimizing the 'derived work' concept would help free software. Definitely not for users.

GitHub is my copilot

Posted Jul 16, 2021 21:27 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

The flip side of Microsoft writing a proprietary LibreOffice* plugin is a Linux developer writing a FOSS driver for closed hardware. You can't make the definition of "derivative work" behave differently depending on which end of the stick you're going to get. You might be able to argue about differences between hardware and software, but it's a very blurry line IMHO.

Regardless, this is all academic. The definition of "derivative work" is what it is. We can't change it now (except by legislation, but that's not happening).

* Side note: Microsoft can already do this anyway because Apache OpenOffice is under a permissive license that allows it. I'm not sure of the compatibility status of LibreOffice vs. OpenOffice, but it might not matter. You could first develop the plugin for OpenOffice and then, as a separate step, make minor compatibility adjustments as needed. This is probably fair use and might not even be subject to copyright protection in the first place because it would be purely functional (compare and contrast Oracle v. Google, Baker v. Selden, etc.).

GitHub is my copilot

Posted Jul 17, 2021 8:21 UTC (Sat) by james (subscriber, #1325) [Link] (1 responses)

The definition of "derivative work" is what it is. We can't change it now (except by legislation, but that's not happening).

I'm not entirely convinced about that: the precise definition of "derivative work" with respect to software in any given common law jurisdiction would need to be thrashed out in a number of court cases that (as far as I'm aware) haven't happened yet.

If the software industry (which very much includes Linux, these days) had coalesced around an understanding of roughly where the boundaries of "derivative work" were, the courts would not be bound by that, but would be likely to respect it in their deliberations: partly because they don't like upsetting whole industries if they don't have to, partly because they do consider existing legal thought (certainly from other jurisdictions or scholarly works), and partly because one of the functions of a judgment is to let everyone know why they decided what they did, so if they departed from existing consensus they'd be expected to explain why.

And we've worried a lot more in public about "derivative work" than the proprietary software industry (and had lawyers informing debates). Our opinions are likely to inform what the rest of the industry thinks.

I'm not going to comment about civil law jurisdictions here.

GitHub is my copilot

Posted Jul 17, 2021 8:46 UTC (Sat) by james (subscriber, #1325) [Link]

Not to say how much this will happen. Just there isn't a total vacuum.

GitHub is my copilot

Posted Jul 17, 2021 2:03 UTC (Sat) by pabs (subscriber, #43278) [Link] (14 responses)

Right-to-repair legislation usually concerns hardware not software so I'm not sure that it helps any of the things ldearquer mentioned:

> Binary-only distibution, closed protocols, device lock ("tivoization"), etc.

The only way to eliminate those would be to essentially enshrine the four freedoms and some other things in legislation. I feel like the proprietary software vendors have the power to prevent that from happening.

GitHub is my copilot

Posted Jul 17, 2021 9:05 UTC (Sat) by ldearquer (guest, #137451) [Link] (13 responses)

> The only way to eliminate those would be to essentially enshrine the four freedoms

But does this require the four freedoms?

They seem to be discussed as a pack, all or nothing, but it seems to me that, IIRC, the first two (having full control on what you get/pay for) are much more legit than the rights to redistribute.

Maybe I am just mixing in my persnal feelings, but if I buy a car, I can understand the guys who designed the car tell me I shall not copy their design to build and give away similar cars, at least for a reasonable span of time.

But if they tell me that, well, the car is mine, but not really mine, kind of renting, but you pay for maintenance, and sorry you can't open the hood, or replace a spark plug or service it without their approval...

GitHub is my copilot

Posted Jul 17, 2021 23:58 UTC (Sat) by pabs (subscriber, #43278) [Link] (12 responses)

Basically without freedom 2 and 3, everyone has to have the time, motivation and ability to become a programmer for the languages used in whatever software is involved. That isn't realistic even for all the programmers on Earth, let alone all the people who have full time jobs, kids, friends and hobbies. Most programmers aren't going to learn RISC-V assembly any time soon and most non-programmers aren't going to learn any programming in their lifetimes.

Imagine the entertainment system in a common type of car has a bug in it that manifests after the warranty period ends and the bug is really annoying, only manifests rarely, manifests only while driving but doesn't brick the system. To be able to efficiently debug the system you need to be able to run it elsewhere under simulation so you can pretend you are driving (freedom 0), since debugging from the back seat while driving around is likely quite hard. You need to be able to review the source code too (freedom 1) to figure out what is going wrong. Since the owner doesn't know programming, they need to enlist the local mechanic-programmer, distribute (freedom 2) the source code and a log of the data streams while the bug happens to them, have them modify the code (freedom 3) to add the fix and share (freedom 2 again) the fix back to the owner and to the fork maintained by the world-wide mechanic-programmer association.

GitHub is my copilot

Posted Jul 18, 2021 0:49 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

IMHO this has more to do with copyright terms being abusively long than with the four freedoms.

If, say, we required that software copyright expire automatically as soon as the warranty expired, then either we'd get much better warranty terms out of it, or else the manufacturers would shrug and let it expire. Of course, I'm not sure where that would leave all those FOSS licenses with NO WARRANTY exclusions... Maybe we just have a default minimum of 2 years or so? Or limit it to locked-down hardware or something. I dunno, just spitballing here.

(If your software is developing at such a glacial pace that a 2-years-outdated fork is going to seriously compete with it, then I tend to wonder how placing it under copyleft was really doing all that much good in the first place.)

GitHub is my copilot

Posted Jul 18, 2021 0:54 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Or, simpler option: Just make 17 USC 117(c) a lot stronger: https://www.law.cornell.edu/uscode/text/17/117

It's already 90% of the way to being useful. The other 10% has to do with DMCA provisions, EULAs, etc. If you abrogate those and require that 17 USC 117 is *always* in effect for *any* lawfully-obtained software under *any* circumstances, even if DRM-encumbered, then that provision would no longer be a dead letter.

GitHub is my copilot

Posted Jul 18, 2021 0:59 UTC (Sun) by pabs (subscriber, #43278) [Link]

Expired copyright doesn't exactly help either, since I cannot get the source code for proprietary trade secret binary blob works no matter what the copyright status of the software is. Source code escrow plus short copyright periods with immediate source code disclosure on copyright expiry would work though.

GitHub is my copilot

Posted Jul 18, 2021 8:21 UTC (Sun) by ldearquer (guest, #137451) [Link] (8 responses)

I agree with your explanation. It still looks, though, that freedoms 0 and 1 are the real goal, whereas freedoms 2 and 3 are a means to and end (to make freedoms 0 and 1 effective, if I understood you correctly).

So I agree freedoms 2 and 3 are one possible option, but my line of thought is, are they the only option?

For example, the world-wide fork could be distributed in the form of a patchset to apply on specific versions of the original work. See e.g. how game modders work, where everyone has their own copy of the game. If the game was released with full source code, they would do even better. The freedom to distribute copies of the original work would be convenient, comfortable, but not strictly required.

GitHub is my copilot

Posted Jul 18, 2021 8:25 UTC (Sun) by pabs (subscriber, #43278) [Link]

You need the ability to distribute the entire codebase including dependencies, otherwise you end up in the situation where the vendor stopped distributing the source code and so you cannot apply the patches and rebuild. Or the vendor stopped distributing the old version you were using and your patches don't apply to the new version they are distributing. This is one of the reasons distros like Debian distribute the full upstream source code, rather than just pointers to it plus packaging.

GitHub is my copilot

Posted Jul 18, 2021 9:50 UTC (Sun) by smcv (subscriber, #53363) [Link] (6 responses)

> See e.g. how game modders work, where everyone has their own copy of the game

I suspect most game mods have to contain enough copied or modified from the original game that the copyright holder of the original game could shut them down as copyright-infringing, if they wanted to. In my experience, game modders usually rely on exemptions that don't actually exist either in law or in the game's EULA (for example "it's non-commercial, so it's fine"), and the game's copyright holder turns a blind eye to it because they recognise the value of mods in popularizing their games.

Perhaps copyright law should behave more like game modders think it does, and less like the overreaching reality, but that seems unlikely to happen while changes to it are primarily driven by the same few entertainment cartels.

GitHub is my copilot

Posted Jul 19, 2021 0:03 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (5 responses)

> In my experience, game modders usually rely on exemptions that don't actually exist either in law or in the game's EULA (for example "it's non-commercial, so it's fine"), and the game's copyright holder turns a blind eye to it because they recognise the value of mods in popularizing their games.

This greatly depends on the game. Skyrim, for example, is explicitly designed to encourage modding, with an EULA that specifically permits it. In recent years, Bethesda has even been selling mods for real money (with the modder taking a cut).

> Perhaps copyright law should behave more like game modders think it does, and less like the overreaching reality, but that seems unlikely to happen while changes to it are primarily driven by the same few entertainment cartels.

And... that's where the analogy falls apart. In my experience, Skyrim modders are some of the most unreasonable people on the internet. Behaviors I have seen:

* Uploading a mod, free for anyone to download, and then characterizing mirrors of that mod as "piracy."
* Characterizing attempts to fix or improve another mod as "piracy," even when the original mod is obviously broken. In one case, I believe a user threatened to contact law enforcement over this sort of thing.
* Requiring logins to download an otherwise free mod. Using this to try and prevent specific individuals from downloading specific mods (which are otherwise free for everyone).
* A modding tool that checks to see if you have installed certain applications which the author disapproves of. This check is not disclosed anywhere in the documentation, and the software refuses to run with a mysterious error message if it finds those applications.
* Characterizing direct downloads (where you don't look at their fancy web page first) as "piracy" or otherwise problematic.
* Removing mods for petty or ridiculous reasons.
* Insisting that people who are good at modding are always right, and anyone who disagrees with them, about anything, must be wrong.
* Miscellaneous internet toxicity.

Technically, most of these things are (to some extent) supported by existing copyright law. But IMHO that's because copyright law is ridiculous, not because the modders are right. If you suffer no economic harm from someone's "infringement," then you should not have a claim against them, plain and simple. Or at most, you should have a claim to nominal damages and equitable enforcement of the license (i.e. an injunction), not statutory damages of hundreds of thousands of dollars.

GitHub is my copilot

Posted Jul 19, 2021 2:08 UTC (Mon) by pabs (subscriber, #43278) [Link] (1 responses)

> If you suffer no economic harm from someone's "infringement," then you should not have a claim against them, plain and simple.

There is usually zero economic harm from GPL violations (at least easily demonstrable to the copyright holder), so I think I'm going to have to disagree with you here.

> Or at most, you should have a claim to nominal damages and equitable enforcement of the license (i.e. an injunction), not statutory damages of hundreds of thousands of dollars.

An injunction against GPL violations seems good but I don't think nominal damages are appropriate based on what I read on WikiPedia, instead it should be restitutionary/disgorgement damages (where they pay back their ill-gotten gains), or possibly punitive damages or both, or something like paying for consultancy to help them come back into GPL compliance, or paying legal costs for bringing a suit against whoever they received the violating software from.

https://en.wikipedia.org/wiki/Damages#Nominal_damages

GitHub is my copilot

Posted Jul 19, 2021 14:55 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

The thing you have to bear in mind is that courts tend to focus on money damages as the default means of making someone whole. Injunctions are dispreferred unless they are the only way to protect the plaintiff from some "irreparable harm." So under the current system, it's easy to sue for statutory damages, but hard to sue for GPL compliance. The most straightforward way to force GPL compliance is to essentially threaten the defendant with money damages and demand that they comply in a settlement agreement. But plenty of defendants will just blow you off and assume that you won't actually litigate.

OTOH, money damages are perfect for the proprietary software crowd because they become the profit that the company was trying to make in the first place.

Side note:

> WikiPedia

They stopped using CamelCase links in 2002. Why do people still write their name like this?

GitHub is my copilot

Posted Jul 19, 2021 16:50 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (1 responses)

If you suffer no economic harm from someone's "infringement," then you should not have a claim against them, plain and simple.

I strongly disagree. There should be a moral presumption against people taking others work, even if they aren't trying to exploit it commercially. As an example, I am an amateur photographer. I like to share my photographs with friends and family and sometimes the whole world, but I have never tried to sell my photos. That shouldn't mean that anyone who sees them should be able to use them however they like without asking my permission. If copyright law is limited to economic losses, it means amateurs like me have no right to prevent others from using our work in ways we don't approve of.

GitHub is my copilot

Posted Jul 19, 2021 20:48 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

Well, it depends on how you look at copyright.

As it is described in the US Constitution, for example, copyright is a subsidy ("to promote the progress of science and the useful arts"). It has nothing to do with morality and is purely an economic scheme to incentivize people to create more stuff. This is why copyright originally had a relatively short term of 14 years with an optional 14 year extension. This is also why copyright explicitly does not protect ideas, concepts, etc. Countries other than the US have a separate scheme of "moral rights" which vest permanently in the author, cannot be sold, transferred, or renounced, and are more limited in scope than standard copyright (generally having to do with attribution, mutilation of the work, etc.). Perhaps the US should borrow this idea.

Corporate interests have, in recent years, found it more useful to characterize copyright as a form of property, giving us laws like the Copyright Term Extension Act of 1998. Perhaps you agree with that characterization, but it is instructive to look at the consequences which it has wrought before you assume that your characterization is the only correct one.

GitHub is my copilot

Posted Jul 30, 2021 1:19 UTC (Fri) by mrugiero (guest, #153040) [Link]

You can go a lot further back even, Doom had a similar story, besides the fact much later the engine became open source. Final Doom is made up of mods id commercialized, if memory serves.

GitHub is my copilot

Posted Aug 3, 2021 9:34 UTC (Tue) by nim-nim (subscriber, #34454) [Link]

> For end users there are arguably more annoying and immediate aspects of proprietary software, which don't depend on copyright law: Binary-only distribution, closed protocols, device lock ("tivoization"), etc.

All those depend on copyright law one way or another, without copyright anyone could remove the anti-features and share the result.

However, Android and the rise of cloud services showed that openness is not sufficient in itself. Unless you make sharing of changes mandatory a sufficiently rich actor can take over any mature open source codebase, by adding just enough closed changes other variants become uncompelling to work on.

Mandatory opening and sharing of changes is the only thing that enables Joe Nobody to work on a shared codebase at the same level as big corporations.

Corporate devs are not smarter or more motivated, corporations can just afford to out-spend individuals long enough for them to give up (in other economic domains that is called dumping).

GitHub is my copilot

Posted Jul 17, 2021 19:54 UTC (Sat) by scientes (guest, #83068) [Link] (1 responses)

I don't understand all the paranoia when you have already released libre licensed code, however this is an **incredibly** stupid idea.

GitHub is my copilot

Posted Jul 30, 2021 1:24 UTC (Fri) by mrugiero (guest, #153040) [Link]

The paranoia is about that code being used in unintended way, i.e. proprietary code when you wanted it to remain free, which is why you would use the GPL. Not all libre code is liberal code.

NOTE I do not agree with the paranoia in terms of being copyright infringements, but thinking about it does make sense because of that.

GitHub is my copilot

Posted Jul 18, 2021 22:19 UTC (Sun) by gerdesj (subscriber, #5446) [Link] (3 responses)

I've thought quite long and hard about this. To spare everyone my deliberations, I tend my conclusion:

Copilot should not be a chargeable service as-is. It should simply be part of bog standard Github. We stash our code there and GH gets to monetise their usual enterprise services and launder data etc. We all get Copilot in return. If it is trained only on commons code and lacks the ability to derive an innovation itself then surely its output is commons too.

Otherwise, I suggest that GH only trains George on "enterprise" accounts and provides only those accounts the fruits of their labours.

Perhaps there could be two Copilots: one for commons code and one for enterprise code.

Yes, my tongue is nearly poking out of my cheek.

GitHub is my copilot

Posted Jul 23, 2021 15:27 UTC (Fri) by pjones (subscriber, #31722) [Link] (2 responses)

What they need to do is keep the training data, and make it automatically trace the data path that code took through the model, so when it adds some code in the middle of your program, it silently also adds the copyright and license terms for the code it's regurgitating at the top of the file.

This will definitely go well.

GitHub is my copilot

Posted Jul 26, 2021 13:33 UTC (Mon) by immibis (subscriber, #105511) [Link]

That's not how ML works.

GitHub is my copilot

Posted Jul 26, 2021 15:16 UTC (Mon) by geert (subscriber, #98403) [Link]

"The license of this proposed code is incompatible with the license of the existing code on lines 25-37 and 1547-1803. Please make a choice:
[A] Drop proposed code.
[B] Drop existing code."

It's not just the GPL or protective licenses

Posted Jul 22, 2021 20:04 UTC (Thu) by david.a.wheeler (subscriber, #72896) [Link]

There seems to be a lot of discussion about this being applied to software released under the GPL, and sometimes the LGPL or protective licenses.

I think that's misguided. Practically *ALL* open source software licenses have attribution requirements, and other requirements, that are *also* not being met by what's being done by this service.

Is it legal? I have no idea. I think this is really untested ground. There are several ways to look at this:

* Maybe this is "de minimus" use or otherwise fair use, in which case it's fine.

* Maybe this is more like a human learning from existing code; humans can write code after reading code (indeed, how else could humans learn?).

* Maybe this is more like creating a derivative work, in which case this is dubious from a copyright perspective.

I'm not a lawyer. I predict that if this starts becoming useful, a lot of lawyers *will* look at this :-).

GitHub is my copilot

Posted Jul 26, 2021 6:41 UTC (Mon) by taterbase (guest, #153426) [Link]

I can't help but think of Kurt Vonnegut's luddite novel Player Piano. In the first chapter the main character reminisces about a machine operator named Rudy. He's the best machinist in the plant and the two managers hook him up to a machine that records his movements. Once recorded, Rudy and the rest of the machinists are replaced by machines that essentially repeat this action; over and over again. Optimization is made easy by adding more horsepower they're able to make production faster... but the machine is only able to replicate what it knows, never improvising. Rudy is proud of his contribution but ultimately relegated to an unfulfilling life where he's no longer needed. He along with the rest of the engineers in the town.