LWN.net Logo

Maybe SCO had a point

Eric Raymond recently posted his own analysis of SCO's "stolen code" presentation. One unique thing that Eric did was to compare the code against the version found in a true SYSV Unix code base. Eric evidently has such a thing lying around; this is not a test that LWN was in a position to perform.

Based on his review of the various versions of the code in question and the comparison with SYSV, Eric came to the conclusion that the Linux version came from an ancient version of Unix. His reasoning goes:

Given this, there are two pieces of internal evidence that suggest 32V. One is that the function is split in two in SVr4 but single in 32V and Linux. A subtler indication that one change between SVr4 and Linux would remove a cast (in the second ASSERT call). It is quite unlikely that a programmer casually copying code would go to the effort to remove a cast, and a guilty copier wouldn't do it when there are ways to obscure similarities that are both easier and less likely to spawn subtle bugs.

However, to many readers, a close look at Eric's posted diff suggests a different conclusion. The SYSV version contains code in common with the Linux version (such as the ASSERT() statements) which do not appear in any of the ancient Unix versions. The reorganizations Eric mentions are trivial, the sort of thing a programmer might do while making a piece of code work.

It is not that hard to conclude that the SGI engineer who produced this code took it from the closest thing he had at hand: a proprietary, SYSV-based Unix implementation. It is, among other things, the simplest explanation. SCO, perhaps, had a point. Despite all of its precursors in ancient versions of Unix, this particular bit of code appears to have been stolen from a proprietary code base.

The existence of some suspect code is not particularly surprising; LWN raised that possibility back in May. When you are dealing with as much code as the Linux community now handles, and with such a large number of contributors, the chances of something bad slipping through are actually pretty good. Linux is not alone in this; any other project, including proprietary developments, can be contaminated with bad code. By some accounts, proprietary code has a much worse record in this regard.

It looks like time for the community to face up to this fact: this is one of those times when something slipped through. Somebody (presumably) at SGI took a short cut, and we got burned. In the 2.6 kernel, the right thing has already been done: that code is gone. It needs to come out of 2.4 as well.

This development does not really help SCO's case in any way. SCO cannot use it to go after users - they are not running the code in question, and they do not have it on their drives. And it is absolutely irrelevant to SCO's claims against IBM, which have nothing to do with copyright infringement. This code did not help Linux achieve its current capabilities in any way. Its removal does not hurt Linux. It is - if proved to be an infringement - a tiny one which resulted from carelessness and laziness, not malice. It is not responsible for SCO's problems.

But we should not gloss over the fact that the contribution of this bit of code to Linux does appear to have been a copyright infringement. The right way to deal with such problems is to acknowledge them, and to remove the bad code immediately. This will not be the last case of plagiarism that the community has to address; let's try to set a precedent for doing it right.


(Log in to post comments)

Maybe SCO had a point

Posted Aug 21, 2003 17:56 UTC (Thu) by rst (subscriber, #5098) [Link]

Ummm... points of law should perhaps be left to lawyers, and I am not one, but for this case to work in SCO's favor, they would have to have a copyrightable interest in the differences between the BSD-licensed 32V code and the SYS5 version. And in many contexts, a few lines of code (which is all we're talking about) are not enough to create a copyrightable interest -- de minimis non curat lex.

Maybe SCO had a point

Posted Aug 21, 2003 18:57 UTC (Thu) by Arker (guest, #14205) [Link]

I think you both have good points.

Certainly we shouldn't pre-judge whether or not it was copyright infringement. I really doubt it is, the way I understand copyright law, and textual analysis, it's hard to believe that there is any actual infringement here at all. But unless a judge rules on it we can't be sure. Such code just shouldn't have been there.

I'm also disappointed with Raymonds glossing over the issue of the BSD-like license Caldera released this stuff under. It does have the advertising clause, I understand, and if so then it's not GPL compatible. One would not know that from reading Raymonds article. It's really not a big thing - clearly this is nothing anywhere near the level of infringement SCOs big talk has lead anyone foolish enough to believe them to expect - but if we expect other people to follow our licenses, we shouldn't take cavalier attitudes in regards to theirs.

SCO is going to try to seize any statements made, particularly by generally acknowledge community leaders like Raymond, and spin them to their benefit. Glossing over the difference between their BSDish license and the current GPL compatible BSD license could give them just such an opportunity. And even if that wasn't a factor, we still should avoid such statements on general principles.

Maybe SCO had a point

Posted Aug 21, 2003 20:01 UTC (Thu) by adamruth (guest, #12380) [Link]

One thing that I haven't heard much about (or skimmed over too quickly) is whether or not
and what versions of BSD this code appeared in. Would this code have been part of the
AT&T settlement with BSDI and thus BSDI could re-release it under any other license?

If the code appeared, properly, under a GPL compatible BSD license, then it wouldn't really
be a problem.

Maybe SCO had a point

Posted Aug 22, 2003 6:32 UTC (Fri) by Arker (guest, #14205) [Link]

It's my understanding that that's a no. The code in question is descended from v32 or an earlier version of AT&T Unix. I did find, and the article mentions, some very similar code in 2.11BSD, but on close examination it looks like that's probably a separate modification of v32. 4.4BSDLite (the first certain unencumbered version) doesn't seem to have it. You're welcome to take a look yourself, I didn't look too closely, just ran a couple pattern matches.

Maybe SCO had a point

Posted Aug 21, 2003 21:33 UTC (Thu) by taniwha (guest, #49) [Link]

And even if that wasn't a factor, we still should avoid such statements on general principles.
I believe the appropriate quote is, "Anything you say can and will be used against you in a court of law."

GPL compatibility

Posted Aug 21, 2003 21:36 UTC (Thu) by BrucePerens (subscriber, #2510) [Link]

This is moot, because a version of the code was in the public domain, differing from the version used by two uncopyrightable lines. Even if that were not the case, the advertising clause would only be a problem if a GPL copyright holder sued about it. And even then it would be hard to show that anyone could ever advertise a feature of the malloc in this driver, so the advertising clause would not activate.

The compatibility issue doesn't work the other way - the BSD license they used does not ban licenses that are incompatible with it.

The lack of attribution is a problem, but one that would create financial damage. If we were to keep the function, we'd have to remediate that.

Bruce

GPL compatibility

Posted Aug 21, 2003 23:40 UTC (Thu) by BrucePerens (subscriber, #2510) [Link]

Oops. That should read but not one that would create financial damage.

GPL compatibility

Posted Aug 22, 2003 6:35 UTC (Fri) by Arker (guest, #14205) [Link]

Don't forget that Caldera has copyright on some significant sections of GPL code in the kernel.

GPL compatibility

Posted Aug 23, 2003 7:21 UTC (Sat) by mammothrept (guest, #14201) [Link]

Bruce,

I am not a programmer but IAAL and if it is correct that there is only a two line difference in functional code between the public domain version and the SCO proprietary version, I think you are right that it is not copyrightable. The abstraction-filtration-comparison test used in most circuits to analyze software copying will 'filter out' even identical copying of code if it is of an entirely functional nature. Even if it passed an AFC screening and is found to be copyrightable, the creative element in two lines of code is going to be so scant as to be trivial and therefore not infringing. Unless SCO can show identical copying of much larger sections of code in other files, I don't see them having much of a case.

Doug Steele

Maybe SCO had a point

Posted Aug 21, 2003 19:08 UTC (Thu) by dwalters (subscriber, #4207) [Link]

Excellent comments.

The right way to deal with such problems is to acknowledge them, and to remove the bad code immediately. This will not be the last case of plagiarism that the community has to address; let's try to set a precedent for doing it right.

I think it was McBride who said that the world is divided into those who respect intellectual property, and those who don't. He is right. And as a community, we should show him and the world that we are among those who DO care about intellectual property, and do the right thing.

Maybe SCO had a point

Posted Aug 21, 2003 19:55 UTC (Thu) by coriordan (guest, #7544) [Link]

> The right way to deal with such problems is to acknowledge them
> and to remove the bad code immediately

For this to happen, the Linux hackers and SCO have to agree on what code must be removed. Discussion can't even begin until SCO say what code they have issues with.

> This will not be the last case of plagiarism that the community has to
> address; let's try to set a precedent for doing it right

This is an obvious truth followed by nonsensical drivel due to my first point.

If SCO allowed us to remove the offending code, they wouldn't have a strong case in court. Their probable motive is to discredit GNU/Linux and the GPL, they don't have any intentions of playing fair. SCO hold all the cards, as will any future trouble makers.

We should "do the right thing"? What exactly is that?

Ciaran O'Riordan

Maybe SCO had a point

Posted Aug 21, 2003 21:22 UTC (Thu) by ccchips (guest, #3222) [Link]

Well....I wish people would start looking at history, especially the history of physical industry, such as food (recipes) architecture (blueprints,) manufacturing (cement products, hardware, etc.,) and show people who are willing to listen, how openness has always won out in the long run.

And--as far as copyright infringement is concerned, how the Hell can anyone cshow copyright infringement if they won't show the infringer their copy?

As for SCO's spin on the deal, it's probably time to start writing and calling our supporters in industry. Rmember, it is a benefit to hardware manufacturers if they can avoid unnecessary software costs. It should therefore benefit those people to help us correct any errors that may have been made.

It would also benefit them to help us to show that SCO is blowing trivial matters way out of proportion, instead of concentrating on their big issue (assuming they have one in the first place.)

It might also be good to ask SCO why they insist on distributing products like Samba on their proprietary platforms, while at the same time spouting their rhetoric.

Maybe SCO had a point

Posted Aug 21, 2003 21:39 UTC (Thu) by coriordan (guest, #7544) [Link]

> As for SCO's spin on the deal, it's probably time to start
> writing and calling our supporters in industry

Somebody has to do this, you are a candidate (this applys to all yous).

We can't rely on "our supporters in industry", they generally don't come through for us, we have to help ourselves.

and this issue is not the most needy project that needs work. In the EU, the patent inflation directive and the upcoming EU-DMCA are far more important. I'm sure other continents have plenty of their own problems. Of course ESR will jump on this issue, he'll jump on anything in the spotlight, others are busy doing truly useful work for less recognition. We need more of the latter. (and it would be nice of news sites gave them some recognition instead of printing yet another SCO (or ESR) article.)

Ciaran O'Riordan

Maybe SCO had a point

Posted Aug 22, 2003 5:16 UTC (Fri) by djabsolut (guest, #12799) [Link]

Of course ESR will jump on this issue, he'll jump on anything in the spotlight, others are busy doing truly useful work for less recognition.
 
It is better to have a guy like ESR around than not - historical knowledge of UNIX and open source doesn't come from nowhere. He may be a bit biased, but aren't we all to some extent?
 

Maybe SCO had a point

Posted Aug 23, 2003 16:01 UTC (Sat) by coriordan (guest, #7544) [Link]

ESR does have good UNIX knowledge but we have Bruce Perens, Linus Torvalds, and Ken Thompson commenting on this issue. I think they are completely competent.

This essay is possibly ESRs best work.

I don't think ESR is worthy of attention. I'd rather not have him around. I won't go into it at length now :)

Ciaran O'Riordan

Maybe SCO had a point

Posted Aug 22, 2003 3:45 UTC (Fri) by brouhaha (subscriber, #1698) [Link]

If SCO allowed us to remove the offending code, they wouldn't have a strong case in court.
IANAL, but I believe that to be false. Suppose your neighbor doesn't keep his dog properly secured, and the dog routinely destroys your garden. There are two scenarios:

  1. You tell the neighbor he is causing you damage, but don't give details. The neighbor asks how he can fix the problem, but you refuse to tell him. The damage continues to occur, and you file suit against the neighbor. When you wind up in court, the judge asks you why you wouldn't specify the specific damages so that they could be corrected. The judge awards you damages for the problems that occurred before you notified your neighbor, but not for those that occurred later, because you didn't make a good faith effort to provide enough information for your neighbor to solve the problem.
  2. You tell your neighbor that the dog is getting loose. The neighbor fixes the problem. However, you still want the neighbor to pay for your flowers, and he refuses. You sue, and even though the problem has since been remedied, you present evidence that damage occurred. .The judge awards you the damages.

Note that keeping the nature of the infringement secret, as SCO has been doing, does not in fact help their case at all. The judge will want to know why SCO chose not to reveal the specific Linux code in question (beyond the one trivial example they've shown) so that the Linux copyright holders could take positive action to stop infringing. SCO's claim that revealing the Linux code in question would damage their trade secrets is completely ludicrous; the Linux code is already public, so for SCO to state that lines 1287-1315 of foo.c infringe their copyrights would not damage any trade secrets. They do not have to show the System V code to do this, although it would eventually be necessary to prove infringement if they choose to litigate over the copyright.

Maybe SCO had a point

Posted Aug 22, 2003 6:06 UTC (Fri) by pto (guest, #5753) [Link]

SCO has already listed (in the recent PowerPoint presentation) that hundreds of files and hundreds of thousands of lines of code are infringing. Knowing what files these are won't help, I'm afraid.

It's like saying your neighbor's dog destroyed your Rolls-Royce, without any proof that any car ever existed. Your neighbor would be right not to pay up. And no judge will award you damages for anything, even justifiable damages, if you've lied about your case.

Maybe SCO had a point

Posted Aug 22, 2003 6:27 UTC (Fri) by sommere (guest, #14168) [Link]

But the hundreds of thousands of lines includes mainly code which IBM, SGI, etc own but MAYBE weren't allowed to contribute under contracts they signed.

If they were allowed to contribute them, then there is no problem. If they weren't then those lines still aren't infringing on copyright claims, only contract claims with those specific companies.

The number of lines lifted directly from SCO's code is probably very small from everything I've read.

Maybe SCO had a point

Posted Aug 22, 2003 12:11 UTC (Fri) by lsweeks (guest, #14198) [Link]

>If SCO allowed us to remove the offending code, they wouldn't have a strong case in court.

Just the opposite, actually. If SCO showed the code they would be identifiying the IP they claim ownership of in order to mitigate their damages as required under law. By taking actions that tend to continue or enhance the claimed damages of infringement they are in essence destroying their claim to be acting in good faith on this issue. This is specifically barred by copyright law.

Maybe SCO had a point

Posted Aug 23, 2003 0:48 UTC (Sat) by bojan (subscriber, #14302) [Link]

> Just the opposite, actually. If SCO showed the code they would be identifiying the IP they claim ownership of in order to mitigate their damages as required under law.

And that's in fact one of the key points that Prof. Moglen is making - they should absolutely go after the infringers - Linux distributors. Just imagine the impression of seriousness that the judge will get when it is found that no such thing was ever done. They are threatening users with infringement lawsuits (where majority of them cannot be infringers anyway, given they only used the software, not copied, distributed etc.) and the real infringers are continuing to press thousands of CDs daily, allowing downloads from Internet sites and so on. And they infringed millions of times in the past too.

That's why their case looks flimsy. They are just not doing what anyone that has their copyrighted material infringed upon would do. I have no idea what kind of legal strategy that is, but is sounds outright stupid.

And the nonsense about the code being secret is simply idiotic. How can something be secret when everyone has a copy of it?

Secrets

Posted Aug 24, 2003 19:32 UTC (Sun) by 263135 (guest, #14083) [Link]

> And the nonsense about the code being secret
> is simply idiotic. How can something be secret
> when everyone has a copy of it?

Shh! I have a secret message for you. Obtain a copy of 1984 by George Orwell and see part two, chapter six, first sentence.

Secrets

Posted Aug 28, 2003 4:55 UTC (Thu) by ekj (subscriber, #1524) [Link]

I don't have my copy in Germany. What does that sentence say ?

Secrets

Posted Sep 16, 2003 18:06 UTC (Tue) by Zakaelri (guest, #15087) [Link]

Part 2, Chapter 6, Sentance 1: "It had happened at last." (Orwell 130)
Cited: Orwell, George. 1984. 1950. New York: Signet.

Hope I did MLA correctly ;)

Maybe SCO had a point

Posted Aug 22, 2003 3:52 UTC (Fri) by daniel (subscriber, #3181) [Link]

I think it was McBride who said that the world is divided into those who respect intellectual property, and those who don't. He is right. And as a community, we should show him and the world that we are among those who DO care about intellectual property, and do the right thing.

Of course we should. But at the same time, we should not let anyone forget that those who covet the intellectual property of others as SCO does, certainly belong to the second group.

Maybe SCO had a point

Posted Aug 21, 2003 19:11 UTC (Thu) by pbk (guest, #14280) [Link]

Well, I'm not really sure what it does for Linux's pedegree, but it sure does put the lie to that bullshit about luxury cars and bicycles!!

I know, I know, the already removed that from the amended complaint.

Maybe SCO had a point

Posted Aug 21, 2003 19:19 UTC (Thu) by ken (subscriber, #625) [Link]

It's a bit strange to se Eric come to the conclusion he dose.

I would say that the two uses of ASSERT makes it quite clear that this is indeed SVr4 code.

But how much is this code worth ? 3 bilion Top sectret as it is and so much research that
must gone into the creation of these lines esp if you consider that the acual function of the
code is the same as the version from 73 that is free to use.


Maybe SCO had a point

Posted Aug 21, 2003 19:40 UTC (Thu) by dwalters (subscriber, #4207) [Link]

ken, I think you're missing the point of corbet's article, here. The malloc case may well be fairly trivial, but moving forward, as a community we must not simply dismiss or gloss over these issues.

We're not just talking about SCO vs IBM here. Linux will survive whatever happens. But do we want it to survive as a dirty, underground operating system, with the reputation of being used only by those lawless "Napster types", or a beacon to the world of what can be achieved by community effort while respecting laws and people's intellectual property?

If indeed you're right ken, and this is indeed Sys V v4 code (contrary to Eric's analysis), the right thing to do is to throw our hands up and admit it, apologise, remove it from 2.4, and take measures to try and prevent this kind of thing from happening again.

Perhaps this calls for our own "pattern analysis team". Eric has already said that he and others HAVE the Sys V code (legally, although, they cannot legally publish it). Couldn't someone in possession of it try to find what code might be common to both the Sys V and Linux code, and publish diffs, like Eric did, for us to peer review?

Maybe SCO had a point

Posted Aug 21, 2003 20:03 UTC (Thu) by ken (subscriber, #625) [Link]

I did not dismiss it but SCO reaction over this is in no way proportional to the "crime" in fact it is completely insane. It would have been no problem at all for them to show what code they was concerned about from the beginning as there is absolutely no trade secret or anything like it in this code. In fact the function of the code is common knowledge.

But it's also impossible to protect us from this. how could you possible test something against something that you do not have access to. It's always going to be the copyright holders job of finding the problem after all linux is open they are not. Had they simply post a message saying where the problem was this had been resolved a long time ago.

Apologize?

Posted Aug 22, 2003 3:31 UTC (Fri) by kasperd (guest, #11842) [Link]

I don't see why I should apologize for something I didn't do. The person who actually "stole" the code (if that is even what happened) should apologize. Those of us who cannot check the code against closed sources, because we don't have access to the closed sources, are in no way responsible for what has happened. SCO is more responsible than the majority of Linux developers and users.

Apologize?

Posted Aug 31, 2003 6:54 UTC (Sun) by roncando (guest, #14643) [Link]

Well, why not?
But more important, a structural approach should be created to (at least try to) guarantee that no copyrichted code can (anymore) show up in GPL-ed software.

Apologize?

Posted Sep 7, 2003 19:53 UTC (Sun) by turpie (guest, #5219) [Link]

That sounds a nice idea, I am looking forward to hearing how you propose this will should be done. :)

How would you know whether the code someone gives you is copyrighted by someone else or not?

How do you know if I copied the above sentences from someone else?
You could paste it into google to find out if they'd been copied from a public website, but if wasn't posted on the Internet then how would you know?

Without hiring Harry Potter to cast some magic spell there would be no way to tell for sure.

---
Paul Turpie

Maybe SCO had a point

Posted Aug 22, 2003 7:33 UTC (Fri) by alonzo (subscriber, #2770) [Link]

But, you've still got to wonder a bit, about who actually put the code into the linux kernel. Look at who's logo is at the top of the list on the Linux ia64 page:

http://www.linuxia64.org/

Maybe SCO had a point

Posted Aug 22, 2003 12:24 UTC (Fri) by lsweeks (guest, #14198) [Link]

>If indeed you're right ken, and this is indeed Sys V v4 code (contrary to Eric's analysis), the right thing to do is to throw our hands up and admit it, apologise, remove it from 2.4, and take measures to try and prevent this kind of thing from happening again.

This sounds correct as it stands. However if this alleged SysV r4 code snippet dates from the period between 1993 and 1998 (which I must admit I doubt, but am throwing up here just as a what if) it may very well be in the public domain as a result of Novell/USL's actions during that period of time when they were suing BSDi and the UC Regents. On the other hand I would be happy and proud to find out that this snippet had been replaced by something better that was innovated by OSS programmers. That may not be possible either since there are very few ways to implement this function according to other analysis I have read.

Easy to replace atealloc

Posted Aug 22, 2003 16:00 UTC (Fri) by emk (subscriber, #1128) [Link]

Just call the vastly superior and far more modern memory allocators in the Linux kernel. They're about 30 years more modern!

Maybe SCO had a point

Posted Aug 22, 2003 3:58 UTC (Fri) by daniel (subscriber, #3181) [Link]

I would say that the two uses of ASSERT makes it quite clear that this is indeed SVr4 code.

Even if it were, the SVr4 code is clearly derived, not an original work.

Maybe SCO had a point

Posted Aug 24, 2003 20:58 UTC (Sun) by glennthigpen (guest, #14417) [Link]

It is too early to go saying that the code was stolen. It is also too early to say that the code came from Sys V because of the asserts. The code was written by SGI and has their copyright notice. They evidently think they have the right to release it under the GPL.
The code itself is written along the lines of the 32V release, which has no copyright because AT&T failed to include the copyright notices as noted in the Judge's opinion in the USL vs. BSDI case.
The asserts themselves are there for debugging according to ESR. It would be just as good an assumption that the coder took the 32V code and modernized it for Linux SMP keeping the same structure as the 32V code.
The fact is, we do not know just how SGI derivd the code. And that is a problem between SGI and SCO. This one is not completely clear cut as I would like it to be and we cannot rule out any possible infringement at all. But it is something that would have been remedied very easily if SCO had shown it from the get-go and Linus felt that there was a problem there.
That was something that was said from the very beginning. "Show us the offending code, and if it indeed is infringing, it will be removed."
The Berkeley Packet Filter code is a completely different horse show, and there is little doubt in anyone's mind (except SCO) that Jay Schulist wrote that himself from published materials. However it seems that SCO claims that the BPF code in Sys V is theirs and it is in their codebase without the BSD copyright notice. Isn't there something about the "unclean hands" doctrine in IBM's response to SCO's lawsuit?

Glenn

Maybe SCO had a point

Posted Aug 21, 2003 19:26 UTC (Thu) by danw6144 (guest, #14336) [Link]

There exists a much simpler and innocent explanation for the code similarities
in SysVr4 and linux. Simple necessity -- the mother of all invention.

The ASSERTs and other code similarities didn't exist in older unixes because
they weren't SMP. When implementing new SMP code with locking functions it is
completely natural to have the debugging ASSERTs at points where the locking
is being implimented. It simply wouldn't make sense to sprinkle ASSERTs among
older fragments of the 32v code. Raymond plainly states there were obvious
differences in the two implimentations.

ESR states "The first ASSERT actually differs in a way that isn't trivial (the
Linux version excludes a size argument of zero). And there is a simple
functional reason for the locking calls; 32V didn't do SMP (Symmetric
Multi-Processing), but both SysVr4 and Linux do."

If you were implementing new SMP code where would your debug points be placed ?
Another hint is Raymond's steak dinner remark. It implies SysVr4 suffered from
lack of maintenance -- old debugging junk was left in the code.

Maybe SCO had a point

Posted Aug 21, 2003 20:49 UTC (Thu) by ken (subscriber, #625) [Link]

It's not obvious at all to insert the ASSERT macro at least I would not have done it.

Maybe SCO had a point

Posted Aug 22, 2003 1:46 UTC (Fri) by Wol (guest, #4433) [Link]

You forget. Didn't Alan Cox write the linux SMP. With assistance from Caldera! And when did SMP go into Unix?

I don't know my timelines, but it's quite possible - and reasonable - that Caldera asked Alan's permission to put his SMP code into Unix. Seeing as it was their money that helped pay for the code to be written (and helped pay for Alan's reputation) it would have been extremely churlish for him to have refused!

OK, I'm likely wrong. But these things are happening so fast, people take time to respond, and it wouldn't surprise me if Alan is only now waking up to this story (it seems to have happened during our night ...)

Cheers,
Wol

Maybe SCO had a point ?

Posted Aug 21, 2003 19:29 UTC (Thu) by mmarq (guest, #2332) [Link]

"This development does not really help SCO's case in any way"

But it can help them in every way, because their business is FUD, FUD and not the persue of the thrue. IMHO the algorithm was BSD lincenced "first" even by Caldera, conclusions are obvious !... and forensic work for SCO is not really necessary, isnt it,... speacially when they are settled to "pattern" Linux/OSS as thieves, because they could always argue that they dont know very well what was stollen, if it was really, 80 lines, thousands , millions...

Yes, there is a problem

Posted Aug 21, 2003 19:36 UTC (Thu) by cwitty (subscriber, #4600) [Link]

Bravo! This is the first pro-Linux journalism I've seen in the past couple of days that admits the likelihood that there really is a problem. Regardless of the age of the code, and the silliness of any claims based on such code that Linux could not be enterprise-ready without UNIX, it seems almost certain that someone either:

1) copied code from some file containing a copyright notice, without following the terms of the notice (even if the code was from some BSD-licensed source, the attribution requirements were not followed), or

2) copied code from 32V, or some other probably-public-domain source, without attribution.

If somebody deliberately decided that the 32V code is public domain, and copied the code based on that belief, I'd actually consider that worse than some "accidental" copying by somebody who wasn't thinking about the issues. (I don't know if the judge's comment:
> Consequently, I find that Plaintiff has failed to
> demonstrate a likelihood that it can successfully defend its
> copyright in 32V.
is binding in subsequent court cases, and I don't like the idea of somebody submitting code to Linux with even a faint chance of it being tainted; also, even if you're copying from a public domain source, it's only polite to acknowledge the source.)

I think it's bad that people are going to such efforts to explain why this is just a minor violation, that doesn't really matter (or maybe it's not a violation at all). We want to have the high moral ground. In my opinion, this means we should not use code that hasn't been explicitly released for our use. We should not rely on defenses like "Even if it was copied from SysV, that's almost like 32V, and that's probably public domain", or even "it was copied from 32V, which is public domain" (since the public-domain status of 32V is accidental, rather than deliberate). We should acknowledge a problem, and fix it.

(Of course, the previous paragraph would not apply if this actually went to court; somebody in a court case should use every defense they can, "high moral ground" or not. But this code is, as far as I know, not yet involved in a court case.)

Yes, there is a problem

Posted Aug 21, 2003 20:35 UTC (Thu) by mmarq (guest, #2332) [Link]

OK, who was the guy ?

"...or maybe it's not a violation at all..."

"We" better make ours minds kickly, because a torrent of FUD is coming this way!

I could have been (i wasent really) the one copying it from 32V, i evolved it for SMP and it happens to be identical to sys V in 2 or 3 lines that weren't in the original algorithm, and in places where IT WAS HELPLESS not to happen...

COULDNT USL OR NOVELL HAVE COPYED IT WHEN IT WAS DEVELOPED FOR SGI SYSTEMS ?

IMHO, "WE" WILL NEVER KNOW IT REALLY!

In Linux it sliped the copyright notice, so what, it dosent make it illegal copying to the point of erasing that code. Why not just incert the proper copyright notice...

Is SCO task to prove precedence, wich they cant, because in such a trivial piece of code they would already have showned black on white evidence...

IN THE SAKE OF THE THRUE, is Linux/OSS community task not to get paranoid, in matters that in my belive they can not prove with "TOTAL EXACTNESS" !

The Linux/OSS is making SCO & Ma$ter win it's FUD, by excess of solicitude

HOW GOOD THOSE SCO & Ma$ter FUD BASTARDS ARE, HEIM ?


Yes, there is a problem

Posted Aug 22, 2003 13:28 UTC (Fri) by ccchips (guest, #3222) [Link]

I say we simply take the high moral ground:

1. We don't steal code.
2. Code thieves will not be welcome.
3. IF SCO has been wronged, we'll fix it. *IF*.
4. This does not give SCO any excuse to: (1) cry "conspiracy, (2) disparge the Linux community, (3) disparage the GPL.
5. (and this is most important,) Make sure everyone is aware that there is a certain amount of obscurity here. Caldera was involved with Linux, so was SGI, IBM, Hewlett-Packard, and others.
6. Caldera's behavior WRT their copyrights and how they were being enforced or not *counts.* If anyone at Caldera knew about any of this and looked the other way, it *counts.* And if Caldera staff were shuffling code back and forth between Linux and proprietary UNIX, with no regard for the GPL, thinking neither of us would care as long as they were helping LInux, that counts, also.

Yes, there is a problem

Posted Aug 22, 2003 13:05 UTC (Fri) by urulokion (guest, #14350) [Link]

"(I don't know if the judge's comment:
> Consequently, I find that Plaintiff has failed to
> demonstrate a likelihood that it can successfully defend its
> copyright in 32V.
is binding in subsequent court cases,..."

After reading a bit more about the judges comment, it's all but binding.

The V32 source code was widely published without a copyright statement. This had the potential for losing copyright protection. But copyright law has a number of tests and legal remedies that could be applied to prevent the lose of copyright. Among then are a test of only being published only in small amounts and/or a closed group, or by registering a copyright within 5 years of the date of first publication.

None of the legal remedies where followed, and all of the legal tests the that applied failed. So if the Juedge were to rule on it he most likely would have rules against the Plantiff.

The judge didn't have to rule on on it because USL and BSD, Inc., et al. reached an out of court settlement.

Maybe SCO had a point

Posted Aug 21, 2003 20:33 UTC (Thu) by rfunk (subscriber, #4054) [Link]

I think Eric's diff makes things quite unclear.

You can use "patch -lR" to reconstruct the SysV code from his diff and the Linux code. Then do a "diff -wU 4" between the two, and you'll get a full and clear listing of the similarities and differences. You might also want to do the "diff -wU 4" between the 32V and Linux code.

In order for the Linux code to be derived from the SysV code, the Linux person at SGI would have moved an ASSERT(size > 0) to be before an "if (size == 0) return NULL", AND change it to ASSERT(size >= 0), so that the "if clause" is useless. This is the best case for the Linux code not to have come from SysV, and it can easily be chalked up to incompetence on the part of the SGI programmer.

After doing both unified diffs (32v-Linux and SysV-Linux) I'm afraid I'm leaning toward Jon's conclusion rather than Eric's.

Maybe SCO had a point

Posted Aug 21, 2003 23:10 UTC (Thu) by ekonijn (subscriber, #6395) [Link]

In both cases, the assert protects against negative values
of size; nothing wrong there. The if statement is not quite useless,
it's the difference between an oops and a running system.

Maybe SCO had a point

Posted Aug 21, 2003 23:23 UTC (Thu) by rfunk (subscriber, #4054) [Link]

Sorry, I was looking at the ASSERT backwards.

I still agree with Jon Corbet's though; the Linux code is closer to the SysV code than to the 32V code.

Still not a copyright infringement

Posted Aug 21, 2003 21:28 UTC (Thu) by BrucePerens (subscriber, #2510) [Link]

If this code appears in System V with the two ASSERT statements, something that I have not verified because I don't have access to System V, then it is two lines different from a version which was not copyrighted. The difference is not sufficient to be copyrightable.

But there is no denying that someone, probably at SGI, was sloppy, and that we lucked out that this wasn't much worse. Thus, it behooves all of you who have access to licensed Unix code to take a copy of the latest development kernel and cross-check the code base.

Bruce

Still not a copyright infringement

Posted Aug 21, 2003 23:28 UTC (Thu) by rjamestaylor (guest, #339) [Link]

Question: if I had access to Licensed UNIX codebase (I don't) and I "diffed" it against Linux, how would I alert the Linux development community about my findings without:
  • Breaking non-disclosure agreements by clearly demonstrating the copying from the original
  • Opening the door for unscrupulous people to "claim" such-and-such a module infringes when it may actually not
Is the best way to make such a disclosure the way ESR has done, i.e., to present a minimal diff, or perhaps a Linux-side diff only?

I know I'd want more than someone's word that code in Linux was copied from licensed UNIX before the community got in gear to remove that code. What's the standard of proof?

Still not a copyright infringement

Posted Aug 21, 2003 23:38 UTC (Thu) by BrucePerens (subscriber, #2510) [Link]

More than one person we know has access to the System V code base. Some of them are known to us and trusted enough to give us no more than a list of file names and line numbers in the Linux kernel that might be questionable. That discloses none of the SCO art. And of course that is not a signal to just remove the code. We would first check the provenance of the code, and would often find that it was something used in System V but not owned by SCO, like the BPF code I reported on.

Of course, you can even cross-check one friend against another, or ask one to verify what another has reported.

Don't you think this is already going on, quietly? I would have expected that folks a number of companies would have been on this months ago.

And by the way, if there was a significant infringement in a piece of code that people cared about - not in a prototype SGI driver for models that were never sold - someone would already have noticed.

Thanks

Bruce

Still not a copyright infringement

Posted Aug 22, 2003 7:33 UTC (Fri) by dwalters (subscriber, #4207) [Link]

if there was a significant infringement in a piece of code that people cared about - not in a prototype SGI driver for models that were never sold - someone would already have noticed.

This is a good point, and Linus has effectively said that the open source development model effectively allows this to happen.

Don't you think this is already going on, quietly?

That depends on just how many people actually do

  • legally have access to the Sys V source code
  • have the time and motivation to perform detailed "pattern analys" between the two code bases

I don't know what tools already exist to do this, but it strikes me that a useful tool for the community to develop would be a program to statistically analyse the two code bases and come up with suspicious matches. Of course it would be better for SCO to just make public what files, version and line numbers in Linux they think are infringing, but apart from a couple of examples, they don't appear to be doing that.

Still not a copyright infringement

Posted Aug 22, 2003 8:54 UTC (Fri) by jmitchel (guest, #11611) [Link]

Don't you think this is already going on, quietly?

I believe this is more in a class of plausible deniability than it is a wishfull statement. Bruce's post sounds far more like a description of what is happening framed in a deniable way then it does a plan for action.

For that matter, if I cared enough I could probably get access to some generation of SYSV source, having worked with some decently connected people at Bell Labs. Now I think of it, it would be tempting to try to find it. In the long run it would be far more edifying for me and dog+world to try matching Version 7 vintage code, to develop tools that others can use to do fast, efficient searches.

Still not a copyright infringement

Posted Aug 22, 2003 6:44 UTC (Fri) by sommere (guest, #14168) [Link]

There is a largly quiet field of CS research on how to find CS students who plagarize. The researchers, for the most part, don't publicize their findings so that students can't check to see if they have obfuscated their code enough.

I did some poking around to see if I could figgure out how they work and here is what I found:

1) They typically remove all tokens (words) except the keywords. (so variable names don't matter.)
2) They often equate equivolents keywords (for and while can be used in equivolent ways)
3) They usually use an algorithm called "Running Karp Rabin" to find strings of matching tokens in two files. This algorithm is resistant to just reorderign the functions. (so it finds strings of tokens length 6 or longer which match anywhere in the file, for example)


This is likely what SCO's pattern matching team is doing, and someone on our side with access to System V should be doing it too.

I wrote a java program to test out whether this algorithm actaully finds cheaters (it did.) Feel free to e-mail me (lwn at ethanet.com) for more info/source.

Still not a copyright infringement

Posted Aug 22, 2003 13:03 UTC (Fri) by dark (subscriber, #8483) [Link]

Umm, that algorithm sounds like it will find loads of false positives. Most student assignments are pretty simple, with only one or two obvious ways of deriving a correct solution.

Still not a copyright infringement

Posted Aug 22, 2003 16:17 UTC (Fri) by sommere (guest, #14168) [Link]

the programs don't expell students automatically :) Yes, it takes a human to use common sense and figgure out if one was actually copied from the other. But it gives you somewhere to start.

Still not a copyright infringement

Posted Sep 16, 2003 19:10 UTC (Tue) by Zakaelri (guest, #15087) [Link]

I would think this is (approximately) how the whole process wouold be
done:

1) Do a unified diff on Linux vs System V
2) Do a unified diff on Linux vs System 32
3) Compare the Diffs, remove any Duplicate lines (this would extract the
things that are identical between System V, System 32, and Linux...
leaving only the chnages.)
4) Inspect the remaining lines in the System V diff, then figure out where
they came from in the linux codebase (line numbers for each file).
5) Send the resultant line numbers/files/etc to the kernel list, so they
can investigate where they all came from, and change anything that should
be changed.

The only problem with this technique is that the files would have to have
the same names/etc for it to work... and in some of the cases (such as the
SGI code case, already dissected by Bruce Perens and (by this point)
probably numerous others, if I recall correctly) this is not the case.

If there are files that are renamed, or code that is copyrighted but used
elsewhere in the same code, then the algorithim will probably escalate
pretty quickly to be NP-complete.

Still not a copyright infringement

Posted Sep 16, 2003 19:20 UTC (Tue) by Zakaelri (guest, #15087) [Link]

I am very sorry for the unreadability of the third and last paragraph in
the above post... Let's see if I can make sense of them:

The only problem with this technique is that the files would have to have
the same names/etc for it to work... If memory serves, there were some
cases (such as the SGI driver code, previously dissected and discussed by
Bruce Perens) where the copied code was actually in a different file than
it's System V equivalent. This makes the problem of finding copyright
infringement in the current codebase become unmanagable... the complexity
of the algorithim necessary to complete such a task might as well be
NP-complete.

See srcdupchk to compare source trees

Posted Aug 22, 2003 9:34 UTC (Fri) by emk (subscriber, #1128) [Link]

I wrote a utility called srcdupchk to encourage SCO to do the right thing and report suspicious code--so far, they haven't. It's available on freshmeat. If there's somebody who has the legal rights to run srcdupchk, and has a legal use for its output, please let them know about it. It uses the rolling hash technique I saw proposed in an article a few months ago.

Where did the patch come from?

Posted Aug 21, 2003 23:07 UTC (Thu) by wyszynski (subscriber, #4988) [Link]

The source of the original patch to Linux goes back even further, can be found in the IA-64 patch for 2.4.16: linux-2.4.16-ia64-011126.diff.gz. This appears to be about 10 monthes (26 November 2001) before in got into 2.4.19.

Didn't this patch come from the Trillian/IA-64 Project? Caldera was a active member of that project. Weren't they checking the stuff which was release by the project.

Big patch - how does the rest compare?

Posted Aug 21, 2003 23:40 UTC (Thu) by rjamestaylor (guest, #339) [Link]

    diff -urN linux-2.4.16/arch/ia64/sn/io/ate_utils.c lia64-2.4/arch/ia64/sn/io/ate_utils.c
    --- linux-2.4.16/arch/ia64/sn/io/ate_utils.c Wed Dec 31 16:00:00 1969
    +++ lia64-2.4/arch/ia64/sn/io/ate_utils.c Mon Nov 19 23:22:50 2001
    @@ -0,0 +1,206 @@
That's a rather large, complete patch -- against nothing previous (if I read the diff correctly) -- how similar is the REST to ancient UNIXes and SRVx?

Maybe/Maybe Not.

Posted Aug 21, 2003 23:20 UTC (Thu) by namaseit (guest, #13940) [Link]

I will agree. The process of accepting code into the linux kernel should maybe be more
refined. Not that it isnt a careful process already, but as to save ourselves again. I have no
clue how to accomplish this. Maybe the people that code is accepted from should be
checked on more to make sure some employee of XYZ Inc. isnt putting in his companys
code. But this almost indefinately ruins the kernel process. That anyone can send in a
patch and the best code wins. Now we have to check on the people sending in patches
and fixes. To make sure they arent affiliatted with any company. And then to ask their
employer if any of the code they are submitting belongs to them. Thats all i can think of.
How do you check on code you cant see, and how do you refine a process that is
built on people sending in code. I mean the kernel developers write at least 50k lines of
code *each* month. That is insane. How do you check all of that.

This is where i see OSS and proprietary meeting and clashing. Its life. But we have
to get beyond this and figure out as a community how we avoid these kinds of situations.
This will, i fear, hinder the OSS process we all use. Besides SCO, i honestly doubt there is
any other companies who would say we have infringed on their code. How many others
would be nuts enough to risk their entire business on the maybe/maybe not of their code
being in linux. SCO is doing this as a last chance tactic. And as OSS grows in the market
and the world, companies like Microsoft will try much harder to destroy us and all we have
accomplished and stand for. Because we threaten everything *they* stand for. But what
they dont understand is that they cant kill linux, cause its an idea. They can slow its
adoption, but it will just come back. They can slam it with FUD, but the community doesnt
care. Linux is here to stay.

Maybe SCO had a point

Posted Aug 22, 2003 1:03 UTC (Fri) by dmantione (guest, #4640) [Link]

Whatever.

There can be only one outcome of this into court. Assuming there was a copyright
violation of both the BSD license under which the code was licensed as the GPL code it
was combined with, SCO nor any other parrty has suffered damages from it. It's claim
will be rejected.

Copyright violation is normally a criminal act. In adition, the party who has suffered
damaged can claim compensation. There is no criminal prosecution going on here and
there won't be. Combining incompatible free licenses does in general not cause damages
to anyone. There should be nothing to worry about.

Maybe SCO had a point

Posted Aug 22, 2003 13:22 UTC (Fri) by urulokion (guest, #14350) [Link]

"Copyright violation is normally a criminal act. In adition, the party who has suffered damaged can claim compensation. ..."

No entirely correct. Up until very recently, copyright infringement was civil tort. It is handled totally in civil courts.

Copyright infringement became a federal criminal offense in the late 90's with the NET act (No Electronic Theft). It doesn't auomatically kick in, there is a certain threshold which has to be exceeded. The act is geared torward nailing warez site operator and the like.

Maybe SCO had a point

Posted Aug 22, 2003 1:15 UTC (Fri) by josh_stern (guest, #4868) [Link]

Too many people here sound like they have half bought into
SCO's crazy legal theories. We've all heard endless blah,
blah, blah from SCO about their strict confidentiality
contracts governing their secret Unix codebase. So if
it turns out that they had such a contract with SGI, and
an SGI employee, acting as a representative of SGI,
violated that contract and submitted SYSV source to the
Linux kernel tree, by what stretch of the imagination
are the Linux developers, distributors, and users liable for
that misappropriation? It's not as if they have access to
the proprietary secret code to check it. It's not as if
SCO is cooperating in pointing out any problem area.
What is the plausible standard for due diligence on
copyright that is out of compliance here? One can't
possibly check copyright against someone else's confidential
secret.


Maybe SCO had a point

Posted Aug 22, 2003 1:57 UTC (Fri) by bojan (subscriber, #14302) [Link]

Excellent point. Now take a look at this as well (U.S. Copyright Office on copyright notices):

http://www.copyright.gov/circs/circ03.html

Search for "error in name" on that page (instance #2) which reads:

-----------------------------------------
Error in Name

When the person named in the notice is not the owner of copyright, the error may be corrected by:

1. Registering the work in the name of the true owner;
or

2. Recording a document in the Copyright Office executed by the person named in the notice that shows the correct ownership. Otherwise, anyone who innocently infringes the copyright and can prove that he or she was misled by the notice and obtained a transfer or license from the person named in the notice may have a complete defense against the infringement.
-----------------------------------------

SCO should take it up with IBM, SGI and whoever else. Otherwise, your point is an excellent example of common sense and the above confirms that.

Eric's making sense, but it's moot atealloc() considered harmful!

Posted Aug 22, 2003 5:26 UTC (Fri) by RobDavies (guest, #9930) [Link]

Ask yourself, if you copy & pasted the ancient code, and were debugging a
driver. What ASSERT's would be useful, and where would you put them?

It's natural to add an assertion right at beginning of function to check
arguments. Yes, it's sub-optimal in this case, but very often it is not
worth the time costs, of using an unusual layout. If the code was copied,
it seems odd decision to move it in that way.

The addition of locking is dictated by functional requirements, that
similarity is a red-herring.

Now, suppose Sys V code was copied, the malloc wrapper goes, rmalloc
changed name to atealloc. But then why would the register storage class
specifier be removed on the parameters?

I think the white space changes would be interesting, and ought not be
excluded. Precisely because of it's non-significance, should the Linux
version share idiosyncratic white space with the Sys V version, I would
change my view, but from what's shown, the Linux version does look like
it's come from 32V.

Now there's no questioon this code will be removed from 2.4, it is not
necessary to write your own memory allocater! Really all Marcello needs
to decide now, is whether to use the 2.6 fix, or play safe and pull the
whole driver as unused and potentially tainted, until SGI care to comment
on it's originality and explain, from where and why borrowed code was
added to the kernel.

Infoworld talks to the author of the code!

Posted Aug 22, 2003 6:49 UTC (Fri) by markhb (guest, #1003) [Link]

Infoworld has an article where they identify the author of the code as one Jay Schulist, and he maintains that he has no access to Unix source code, and wrote it in 1997 as part of a school networking project.

Infoworld talks to the author of the code!

Posted Aug 22, 2003 7:37 UTC (Fri) by corbet (editor, #1) [Link]

Jay is the author of the other bit of example code - the packet filter stuff. That's a separate issue - one where, at first blush, it's even clearer that SCO is out to lunch. But I want to look a bit closer when I get a chance.

Infoworld talks to the author of the code!

Posted Aug 22, 2003 7:45 UTC (Fri) by dwalters (subscriber, #4207) [Link]

It looks like the author of that article has misinterpreted Bruce's press release slightly. The way I read it, it makes it sound that Bruce is claimiing that HE owns the System V code (while he's actually quoting SCO's reaction to his analysis).

You're referring to the BPF code, by the way, which is the second example that SCO presented in their slide show, not the one under discussion in this thread.

Infoworld talks to the author of the code!

Posted Aug 22, 2003 7:58 UTC (Fri) by donwaugaman (subscriber, #4214) [Link]

That article is talking about the Berkeley Packet Filter code, not the atealloc() implementation that Eric Raymond's article references.

I think that Bruce Perens' first (and incontroverted) conclusion was that SCO was completely off their rocker in going after the BPF code, as it had been a reimplementation from the ground up - even though direct use of the code under the BSD license would have been legal (I believe they had dropped the advertising clause by then).

Similarities between the SCO and Linux code in the BPF can be explained by the use of the same spec. I wonder how many other similarities between the two kernels can be explained in the same way? If SCO hasn't been diligent in figuring out which similarities are due to actual copying, and which aren't, they're going to look pretty bad in discovery, not to mention when and if an actual trial happens.

What is the point?

Posted Aug 23, 2003 4:18 UTC (Sat) by chel (guest, #11544) [Link]

Regarding the malloc function, the algorithm is old. A formal description of the algorith is in Knuths "The art of computer programming" published in 1968. I think ALGOL68 was (one of) the first language with a formal description of this kind of dynamic memory usage.
A published and free version of this implementation exists.
An identical implementation of this algorithm exists in de Unix SYSVR4 source with a copyright notice. Analysis of whitespace and comment formatting shows resemblence between this source and a piece of sourcecode that was/is in the Linux source tree.
This source was contributed by SGI with both a SGI copyright and a GNU copyright notice.

Putting a copyright notice above freely available content does not make it copyrigthed, if so there are many violations, just compare http://docsrv.caldera.com/cgi-bin/man/man?clear+1 with current Linux manpages. The SCO manpage is almost identical, but caries a copyright notice. Claiming IP on whitespace distribution in sourcecode is as stupid as creating a new original work by modifyig the whitespace distribution in existing sourcecode.

If I consider the current cases: SCO vs IBM, RedHat vs SCO, IBM vs SCO, I don't see anything where this whitespace distrubution could be of any importance.

SCO vs IBM: IBM was not involved.
IBM vs SCO: is about patents
RedHat vs SCO: RedHat distributed code with a perfect GNU copyright notice from copyrightholder SGI, a well known manufacturer with good reputation.

Some good programming practices are:
- normal nummerical calculus does not apply on pointer types, so pointers and integers should not be mixed.
- the size of basic C types are implementation dependent.
Already the first line: "unsigned long malloc(mp, size)" is in violation with these basic rules. If this piece of code would appear in court, the only reason could be to demonstrate how, though the lack of peer reviews, wrong code can survive for 30 years in some well known operating systems. Claiming IP rights on this code would be a clear demonstration of lack of competence.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds