Amazon's CodeWhisperer
Amazon's CodeWhisperer
Posted Jul 5, 2022 14:45 UTC (Tue) by NightMonkey (guest, #23051)Parent article: Amazon's CodeWhisperer
So, let's say that a developer uses GItHub's CoPilot or Amazon's CodeWhisperer or other similar code Mad Lib tools. They love the MIT or Apache-licensed code (maybe even some GPL2?) that they see and use lots of it. 6 months later, a court finds the 'training data' code patented, and is, therefore, no longer Free. What then for the developer? How are they alerted to this problem? Or is it only a problem for the services, not the developer? Cheers.
Posted Jul 5, 2022 16:16 UTC (Tue)
by dskoll (subscriber, #1630)
[Link] (3 responses)
Also not a lawyer, but I know a little about patents from a previous job. A patent is different from copyright. To infringe copyright, you have to distribute a work contrary to the terms of its license, or derive a work from a copyrighted work and distribute it contrary to the original work's license.
For a patent, the only thing that matters is what you do, not how you got there. So for example, when the LZW compression algorithm was patented, it wouldn't matter if you copied a reference implementation, created a brand-new implementation on your own, or used a Copilot-derived implementation... you'd still be infringing the patent.
If you do infringe on a patent, it's sometimes better not to know, because willful infringement carries a lot higher penalty than inadvertent infringement.
I doubt Amazon or MSFT would be responsible for notifying users of their AI code-generating software about potential patent infringement... that risk lies entirely with the users.
Posted Jul 5, 2022 17:22 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
So when Code Whisperer makes a suggestion, it looks like it tells you where you got it from, and you have the information you need to do due diligence.
It seems Copilot doesn't bother ...
Cheers,
Posted Jul 6, 2022 14:59 UTC (Wed)
by khim (subscriber, #9252)
[Link] (1 responses)
If what you say is true then what it does is both more useful and safer. Sounds pretty nice in theory, at least. More like search engine than a tool to write garbage.
Posted Jul 6, 2022 21:02 UTC (Wed)
by KJ7RRV (subscriber, #153595)
[Link]
Posted Jul 5, 2022 16:21 UTC (Tue)
by nim-nim (subscriber, #34454)
[Link] (2 responses)
That service aspect apart, it changes nothing for you as a consumer or publisher of code. The service can be sued as accessory to copyright infringement, but the infringement is still yours (unless the service promises legal insurance as part of its terms of use).
As a consumer, you’re still supposed to perform legal due diligence on the third party code you integrate.
As a publisher, you’re supposed to make sure your legal terms are clearly written and clearly notified.
Copyright is still the same dangerous hairball than when AT&T published Unix (Lions book and all) and everyone involved ended up in court due to general carelessness.
Posted Jul 7, 2022 2:38 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (1 responses)
Posted Jul 7, 2022 7:13 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
They published Unix without any copyright notices, including removing any copyright notices from public code they incorporated. I think they officially relied on trade secret law.
Then, when they sued Berkeley they claimed copyright over the lot. That's why they were so desperate to keep the settlement quiet, because Berkeley's defence was "Hey, *you* removed our copyright notices, and now you're suing us for copying our own code!". At which point they were asked to *prove* which code was theirs and they were forced to respond "we don't actually have a clue".
Sadly, that's not the only case I know of where the copyright pirate has threatened action against the author.
Cheers,
Posted Jul 5, 2022 18:30 UTC (Tue)
by nickodell (subscriber, #125165)
[Link] (19 responses)
For a patent, if you invent the same thing as a previous patent, then you're infringing on that patent. It doesn't matter if you invented it independently. (However, the penalties for willful infringement are higher.)
For copyright, if you come up with the same idea, the way you came up with it matters. One interpretation is that language models are doing some form of reasoning, so a similar work appearing in the training data isn't necessarily proof that the language model is copying that previous work. Another interpretation is that a language model is just copying part of its input and changing a few things.
There are awkward effects for both possible interpretations. If you accept the first interpretation, then how do you measure whether a model is doing "enough" reasoning? If you accept the second interpretation, that implies that the output of e.g. GPT-3 is jointly owned by every person who's written anything on the internet. Practically speaking, it would become illegal to train an AI on common crawl data.
I don't think any court has ruled on it either way.
Posted Jul 5, 2022 18:46 UTC (Tue)
by nim-nim (subscriber, #34454)
[Link] (18 responses)
You can take all the words in a text, and arrange them in sentences meaning something else, and the result will be non infringing.
You can take the same text, and replace every single word with a synonym, and the result will be definitely infringing. None of the words survived but the structure is still the same.
That makes models, that analyze the structure of the code being written, and suggest bits to make it closer to someone else’s structure, especially problematic.
Posted Jul 5, 2022 21:11 UTC (Tue)
by rgmoore (✭ supporter ✭, #75)
[Link] (16 responses)
The classic example with writing is that you can change the medium or genre of a work and it can still be a derivative. All those comic book movies are still derivatives of the original comics, even if they don't directly swipe story lines. Similarly, The Magnificent Seven is still a derivative of Seven Samurai even though the setting, character names, and even the language all changed.
That said, the functional nature of code makes it a more difficult case than something purely expressive like fiction or poetry. If there are few enough ways of achieving the same purpose efficiently, it's possible to argue the code is determined purely by functional constraints and therefore isn't expressive. This is especially true if the code is implementing a published algorithm, like quicksort or the sieve of Eratosthenes.
Posted Jul 6, 2022 5:48 UTC (Wed)
by nim-nim (subscriber, #34454)
[Link] (3 responses)
Posted Jul 6, 2022 11:15 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (1 responses)
Copyright is intended to protect creative expression. If there are only a limited number of ways to express something, then the resulting expression may not qualify for copyright protection - e.g. in C++11 code, there's only a few plausible ways[1] to check that a std::string is empty, and using one of those is unlikely to be protected by copyright even if it's a direct copy and paste from another code base.
[1] Two possible ways to get size of string natively and compare to 0. The empty() method. Using c_str() or data() and then checking to see if the pointer points to NULL, or using strlen() to check the C string length. Comparing for equality to an empty string constant via either operator==() or compare().
And of course, there's the implausible ways that might conceivably be enough to get copyright protection depending on context - using find_first_not_of or find_last_not_of to find a character not in the empty string.
Posted Jul 6, 2022 15:09 UTC (Wed)
by khim (subscriber, #9252)
[Link]
Code tends to be at odds with copyrightability: if what you writing is convoluted enough is convoluted enough to warrant copyright protection then very often it's convoluted enough to perform badly. There are always some exceptions like 0x5f3759df: if you are using that then you maybe violating copyright, but if you use 0x5f375a86 (which is simply the best constant for that algorithm) then you are not violating. But what if you want to compatible with some existing code? Quake, e.g.? Then you need 0x5f3759df to stay compatible! Ultimately problems like that mean that for every line of code you need court decision… which is not practical, obviously.
Posted Jul 6, 2022 17:32 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link]
At least with CodeWhisperer it sounds as if you're shown the original code and the licensing terms, so you have a chance to comply with the terms if you want to. If you intend to violate the original licensing, that's on you, not on the software that shows you how others have done it.
Posted Jul 6, 2022 7:18 UTC (Wed)
by LtWorf (subscriber, #124958)
[Link] (10 responses)
Posted Jul 6, 2022 11:37 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (7 responses)
Depends when the original author died, and whether the new work is similar enough to the old work to qualify as a derived work.
For the death date side, if the original author died before 1950, then copyright term is over anyway, and no protection applies.
The "is it similar enough" side is more complex - the rules on whether a work is actually derived from another are complex and require a degree of human judgement - and this is the bit that Copilot and Hollywood both depend upon, in that something may be a copy, but not rise to the level of infringing copyright.
Posted Jul 6, 2022 17:08 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
This is true, but there are a rather surprisingly large number of additional "outs" in US copyright law (and generally *not* in the copyright law of any other country, because the US is weird):
* Published (only) outside the US: It's complicated, but probably not copyrighted if it was out of copyright in its home country on January 1, 1996.
Posted Jul 7, 2022 2:27 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (5 responses)
Life + 70 is only true in part of the world, mainly Europe. The US copyright laws are very hairy, but anything published more than 95 years ago is in the public domain, and anything published since then may not be, with author death dates only mattering right now for works first published after 2002. Lots of the rest of the world is life+50 (e.g. China) and only a couple of nations are life+60, but that includes India, which has more people than the EU.
Posted Jul 7, 2022 7:54 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (1 responses)
Note that you're talking about a whole load of ways in which, depending on where you are, something is out of copyright at life + 70 but also out of copyright before that. Hollywood tries to sell its movies globally, and thus wants to be on the upper limit of copyright - and even in the US, a story is out of copyright by that point.
Posted Jul 7, 2022 22:46 UTC (Thu)
by dvdeug (guest, #10998)
[Link]
Posted Jul 8, 2022 12:23 UTC (Fri)
by Ross (guest, #4065)
[Link] (2 responses)
This treatment in the US goes back to everything published in 1978 or later by individuals (not corporations), so it is definitely relevant and can very easily extend the term beyond 95 years. It will extend the duration of copyright for most works which would otherwise expire starting in 2048.
Posted Jul 8, 2022 13:38 UTC (Fri)
by dvdeug (guest, #10998)
[Link]
Posted Jul 8, 2022 13:47 UTC (Fri)
by dvdeug (guest, #10998)
[Link]
Posted Jul 6, 2022 11:52 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
However of course if the scriptwriter is dishonest they might not tell anybody where the original idea came from.
Realistically it's only going to be short works anyway. Condensing a novel into a movie loses almost everything except the outline plot - and even then it might take two or three movies to get it on film. There are a lot of old SF shorts with interesting ideas in them, but many would need a lot of work to sell a movie in the 21st century. It's notable how many of the original "Dangerous Visions" seem pretty tame now, or are outrageous for very different reasons. "If All Men Were Brothers, Would You Let One Marry Your Sister?" is the sort of thing you could probably do (but Hollywood wouldn't give you money for it) but it would go unnoticed, who cares? Likewise "Eutopia" in which the big deal is homosexuality. Or e.g. the mediocre Dick short "Faith of Our Fathers" which I'm sure modern people would guess is Phil Dick because of all the hallucinogenics, but is no "The Man in the High Castle".
I'd like it if the average "Sci Fi" movie I saw was as clever as "Raft of the Titanic" (what if the Titanic doesn't quite sink and many aboard survive...), never mind "Golem XIV" (instead of taking ages to discover that optimal play in Tic-tac-toe is a draw like in "Wargames", what if the machine the Americans built to plan World War III is categorically smarter than us) or "Orphanogenesis" (if we are just software, what happens if you just randomize the parameters and execute the resulting software in a virtual machine?).
Posted Jul 7, 2022 2:31 UTC (Thu)
by dvdeug (guest, #10998)
[Link]
Posted Jul 8, 2022 9:31 UTC (Fri)
by NAR (subscriber, #1313)
[Link]
Posted Jul 7, 2022 15:10 UTC (Thu)
by esemwy (guest, #83963)
[Link]
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Wol
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Wol
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
* Published (in the US) more than 95 years ago, and before 1978 (i.e. 1927 and earlier): Out of copyright per https://www.law.cornell.edu/uscode/text/17/304
* Published (in the US) before 1964, and copyright not manually renewed: Out of copyright (renewal is now automatic). Many older works fall under this exception, particularly anything that was seen as ephemeral or low-value. This includes quite a few pulp magazines, which you can now read on the Internet Archive for free.
* Published (in the US) before March 1, 1989, and no copyright notice or registration: Never copyrighted.
* Sound recording, published (in the US) before February 15, 1972: This used to be a complicated morass of state laws, but Congress fixed it in 2018, see https://www.law.cornell.edu/uscode/text/17/1401
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
Amazon's CodeWhisperer
So did that Mandalorian episode infringe on the Magnificient Seven/Seven Samurai copyrights?
Amazon's CodeWhisperer
Amazon's CodeWhisperer