|
Linus Torvalds on SCO's latest claims
On Mon, 22 Dec 2003, John Dee wrote: > > I know you guys have already probably seen this.. figured I'd share with > the class, so the big kids can tear it apart. > http://lwn.net/Articles/64052/ I spent half an hour tearing part of it apart for some journalists. No guarantees for the full accuracy of this write-up, and in particular I don't actually have "original UNIX" code to compare against, but the files I checked (ctype.[ch]) definitely do not have any UNIX history to them. The rest of the files are mostly errno.h/signal.h/ioctl.h (and they are apparently the 2.4.x versions, before we moved some common constants into "asm-generic/errno.h"), and while I haven't analyzed them, I know for a fact that - the original errno.h used different error numbers than "original UNIX" I know this because I cursed it later when it meant that doing things like binary emulation wasn't as trivial - you had to translate the error numbers. - same goes for "signal.h": while a lot of the standard signals are well documented (ie "SIGKILL is 9"), historically we had lots of confusion (ie I think "real UNIX" has SIGBUS at 10, while Linux didn't originally have any SIGBUS at all, and later put it at 7 which was originally SIGUNUSED. So to me it looks like - yes, Linux obviously has the same signal names and error number names that UNIX has (so the files certainly have a lot of the same identifiers) - but equally clearly they weren't copied from any "real UNIX". (Later, non-x86 architectures have tried harder to be binary-compatible with their "real UNIX" counter-parts, and as a result we have different errno header files for different architectures - and on non-x86 architectures the numbers will usually match traditional UNIX). For example, doing a "grep" for SIGBUS on the kernel shows that most architectures still have SIGBUS at 7 (original Linux value), while alpha, sparc, parisc and mips have it at 10 (to match "real UNIX"). What this tells me is that the original code never came from UNIX, but some architectures later were made to use the same values as UNIX for binary compatibility (I know this is true for alpha, for example: being compatible with OSF/1 was one of my very early goals in that port). In other words, I think we can totally _demolish_ the SCO claim that these 65 files were somehow "copied". They clearly are not. Which should come as no surprise to people. But I think it's nice to see just _how_ clearly we can show that SCO is - yet again - totally incorrect. Linus ---- For example, SCO lists the files "include/linux/ctype.h" and "lib/ctype.h", and some trivial digging shows that those files are actually there in the original 0.01 distribution of Linux (ie September of 1991). And I can state - I wrote them (and looking at the original ones, I'm a bit ashamed: the "toupper()" and "tolower()" macros are so horribly ugly that I wouldn't admit to writing them if it wasn't because somebody else claimed to have done so ;) - writing them is no more than five minutes of work (you can verify that with any C programmer, so you don't have to take my word for it) - the details in them aren't even the same as in the BSD/UNIX files (the approach is the same, but if you look at actual implementation details you will notice that it's not just that my original "tolower/toupper" were embarrassingly ugly, a number of other details differ too). In short: for the files where I personally checked the history, I can definitely say that those files are trivially written by me personally, with no copying from any UNIX code _ever_. So it's definitely not a question of "all derivative branches". It's a question of the fact that I can show (and SCO should have been able to see) that the list they show clearly shows original work, not "copied". Analysis of "lib/ctype.c" and "include/linux/ctype.h". First, some background: the "ctype" name comes "character type", and the whole point of "ctype.h" and "ctype.c" is to test what kind of character we're dealing with. In other words, those files implement tests for doing things like asking "is this character a digit" or "is this character an uppercase letter" etc. So you can write thing like if (isdigit(c)) { .. we do something with the digit .. and the ctype files implement that logic. Those files exist (in very similar form) in the original Linux-0.01 release under the names "lib/ctype.c" and "include/ctype.h". That kernel was released in September of 1991, and contains no code except for mine (and Lars Wirzenius, who co-wrote "kernel/vsprintf.c"). In fact, you can look at the files today and 12 years ago, and you can see clearly that they are largely the same: the modern files have been cleaned up and fix a number of really ugly things (tolower/toupper works properly), but they are clearly incremental improvement on the original one. And the original one does NOT look like the unix source one. It has several similarities, but they are clearly due to: - the "ctype" interfaces are defined by the C standard library. - the C standard also specifies what kinds of names a system library interface can use internally. In particular, the C standard specifies that names that start with an underscore and a capital letter are "internal" to the library. This is important, because it explains why both the Linux implementation _and_ the UNIX implementation used a particular naming scheme for the flags. - algorithmically, there aren't that many ways to test whether a character is a number or not. That's _especially_ true in C, where a macro must not use it's argument more than once. So for example, the "obvious" implementation of "isdigit()" (which tests for whether a character is a digit or not) would be #define isdigit(x) ((x) >= '0' && (x) <= '9') but this is not actually allowed by the C standard (because 'x' is used twice). This explains why both Linux and traditional UNIX use the "other" obvious implementation: having an array that describes what each of the possible 256 characters are, and testing the contents of that array (indexed by the character) instead. That way the macro argument is only used once. The above things basically explain the similarities. There simply aren't that many ways to do a standard C "ctype" implementation, in other words. Now, let's look at the _differences_ in Linux and traditional UNIX: - both Linux and traditional unix use a naming scheme of "underscore and a capital letter" for the flag names. There are flags for "is upper case" (_U) and "is lower case" (_L), and surprise surprise, both UNIX and Linux use the same name. But think about it - if you wanted to use a short flag name, and you were limited by the C standard naming, what names _would_ you use? Maybe you'd select "U" for "Upper case" and "L" for "Lower case"? Looking at the other flags, Linux uses "_D" for "Digit", while traditional UNIX instead uses "_N" for "Number". Both make sense, but they are different. I personally think that the Linux naming makes more sense (the function that tests for a digit is called "isdigit()", not "isnumber()"), but on the other hand I can certainly understand why UNIX uses "_N" - the function that checs for whether a character is "alphanumeric" is called "isalnum()", and that checks whether the character is a upper case letter, a lower-case letter _or_ a digit (aka "number"). In short: there aren't that many ways you can choose the names, and there is lots of overlap, but it's clearly not 100%. - The original Linux ctype.h/ctype.c file has obvious deficiencies, which pretty much point to somebody new to C making mistakes (me) rather than any old and respected source. For example, the "toupper()/tolower()" macros are just totally broken, and nobody would write the "isascii()" and "toascii()" the way they were written in that original Linux. And you can see that they got fixed later on in Linux development, even though you can also see that the files otherwise didn't change. For example: remember how C macros must only use their argument once (never mind why - you really don't care, so just take it on faith, for now). So let's say that you wanted to change an upper case character into a lower case one, which is what "tolower()" does. Normal use is just a fairly obvious newchar = tolower(oldchar); and the original Linux code does extern char _ctmp; #define tolower(c) (_ctmp=c,isupper(_ctmp)?_ctmp+('a'+'A'):_ctmp) which is not very pretty, but notice how we have a "temporary character" _ctmp (remember that internal header names should start with an underscore and an upper case character - this is already slightly broken in itself). That's there so that we can use the argument "c" only once - to assign it to the new temporary - and then later on we use that temporary several times. Now, the reason this is broken is - it's not thread-safe (if two different threads try to do this at once, they will stomp on each others temporary variable) - the argument (c) might be a complex expression, and as such it should really be parenthesized. The above gets several valid (but unusual) expressions wrong. Basically, the above is _exactly_ the kinds of mistakes a young programmer would make. It's classic. And I bet it's _not_ what the UNIX code looked like, even in 1991. UNIX by then was 20 years old, and I _think_ that it uses a simple table lookup (which makes a lot more sense anyway and solves all problems). I'd be very susprised if it had those kinds of "beginner mistakes" in it, but I don't actually have access to the code, so what do I know? (I can look up some BSD code on the web, it definitely does _not_ do anythign like the above). The lack of proper parenthesis exists in other places of the original Linux ctype.h file too: isascii() and toascii() are similarly broken. In other words: there are _lots_ of indications that the code was not copied, but was written from scratch. Bugs and all. Oh, another detail: try searching the web (google is your friend) for "_ctmp". It's unique enough that you'll notice that all the returned hits are all Linux-related. No UNIX hits anywhere. Doing a google for _ctmp -linux shows more Linux pages (that just don't happen to have "linux" in them), except for one which is the L4 microkernel, and that one shows that they used the Linux header file (it still says "_LINUX_CTYPE_H" in it). So there is definitely a lot of proof that my ctype.h is original work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ (Log in to post comments)
Sorry to sound so sycophantic, but... Posted Dec 22, 2003 22:34 UTC (Mon) by fLameDogg (guest, #11305) [Link] I worship at Linus' feet. Bugs and all.
Re: Sorry to sound so sycophantic, but... Posted Dec 23, 2003 8:52 UTC (Tue) by janpla (guest, #11093) [Link] So, you're implying that he has 'bugs'? On his feet?
Re: Sorry to sound so sycophantic, but... Posted Dec 23, 2003 14:04 UTC (Tue) by fLameDogg (guest, #11305) [Link] Well, microbes surely. But I meant his metaphorical programmer's feet :O)
Linus Torvalds on SCO's latest claims Posted Dec 22, 2003 22:59 UTC (Mon) by ccchips (guest, #3222) [Link] SCO better not get Linus in a courtroom, or he'll "shoot [them] with little yellow bolts of light." -Farscape, episode 1
Linus Torvalds on SCO's latest claims Posted Dec 22, 2003 23:29 UTC (Mon) by smoogen (subscriber, #97) [Link] Hmmm it has become obvious.. SCO owns the rights to C itself. Since C was really written so that Unix could occur the two can NOT be decoupled by any international body or standards...- the C standard also specifies what kinds of names a system library Thus SCO really should also say it owns all implementations of C out there.. as it is the real root of their copyrighted code. [ :) for the humour impaired.. but on the other hand if SCO starts suing everyone who uses C and its derivatives of Java and C++ and C#.. I want a piece of the action.]
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 0:04 UTC (Tue) by jbh (subscriber, #494) [Link] Well if not C then at least C++:"And C++ programming languages, we own those, have licensed them out multiple times, obviously." McBride, August 2002 I'm not joking.
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 1:38 UTC (Tue) by mmarq (guest, #2332) [Link] Not even Mcbride... he was not joking, he was on crack (he must had been on something)!What a "piece of work" this McBride.
McBride, August 2002 Posted Dec 23, 2003 5:11 UTC (Tue) by frazier (subscriber, #3060) [Link] Sure enough, you aren't joking.There's some entertaining stuff in there:
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 0:00 UTC (Tue) by Halmonster (subscriber, #4537) [Link] As much as it pains me to contradict our Fearless Leader, I must point out the following. Just because the history of a C source file can be traced to code that SCO clearly has no claim over, that does not mean that the patches and bugfixes applied to that C source file were not tainted. It is possible that the improvements to (say) tolower() could have been borrowed from a protected source. We need to go through all the patches to these files and make sure they are free from copyright burdens as well.Hal Eisen
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 0:45 UTC (Tue) by bojan (subscriber, #14302) [Link] You are right, but you're also giving SCO too much credit, IMHO. They have released some of their own versions of Unix under a BSD-style licence. Also, the settlement between UCB and USL clearly allowed for BSD to be distributed under BSD licence.Remember when SCO folks claimed that all the "infringements" had nothing to do with BSD code? Now it turns out (as before) that they do. One would have to ask why exactly is that? My bet is this - there is not real evidence, so they have to come up with this. And that doesn't even begin to describe the very desperate position SCO is in: 1. Novell renewed their Unix copyright claims. In any event, it is fun to watch :-)
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 1:45 UTC (Tue) by mmarq (guest, #2332) [Link] "It is possible that the improvements to (say) tolower() could have been borrowed from a protected source." "IF SO", isnt that the job SCO to prove ??
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 0:16 UTC (Tue) by karekaa (guest, #9000) [Link] Isn't it time to turn this case upside down?This is yet another example where SCO is claiming ownership to code which is evidently NOT owned by them! They are threatening individuals and companies for using code which they own just as much as they own the texts of William Shakespeare -- (or Hamlet knows, if SCO has bought those rights too... sigh). It's time for some qualified jurist to look into the possibility if all target parties could actually claim replacement for consumed time on this illusion of a court case from SCO?! Kjell Arne Rekaa
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 3:42 UTC (Tue) by link (guest, #916) [Link] This is yet another example where SCO is claiming ownership to code which is evidently NOT owned by them! They are threatening individuals and companies for using code which they own just as much as they own the texts of William Shakespeare [&hellip]
Funny you should use The Bard as an example, given there is some controversy over
whether he actually wrote the plays he's credited with; the honor usually attributed to
one of Christopher Marlowe, Francis Bacon, William Stanley, or Edward de Vere.
Of course, it may be partially appropriate given nobody seriously suggests William Shaespeare of Stratford plagiarized any of the above; rather one of the candidates used the name as a pseudonym (a nom de guerre, possibly, in this case) for publishing their own works. The Oxfordians in particular like to suggest Lord Oxford (Edward de Vere) published under the pseudonym because he as a courtier was forbidden to publish poetry. [Source: Google & The Shakespeare Oxford Society's “A Beginner's Guide to the Shakespeare Authorship Problem”]
Linus Torvalds on SCO's latest claims Posted Jan 8, 2004 19:17 UTC (Thu) by stuart (subscriber, #623) [Link] Actually it was all written by Marlowe. My mum said.Hmmm i'm starting to sound like McBride. Stu.
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 8:18 UTC (Tue) by steven97 (guest, #2702) [Link] Isn't it time to turn this case upside down? These are the United States you are talking about. You are free to sue them, or anyone else, with ridiculous claims -- but can you afford that?
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 20:34 UTC (Tue) by iabervon (subscriber, #722) [Link] An actual lawsuit is probably not worth the time and expense, but itmight be interesting for someone to set up a site which lists all of the copyrighted works and their owners which SCO has claimed to own, with the intent of having a class action lawsuit at some point if it becomes economically reasonable. It would might make the press take notice if there were periodic press releases accusing SCO of more and more copyright infringement as they claim more things. This week, they seem to have violated copyrights by J H Lu (the libc that Linus took his strings and numbers from), Linus (who actually machine generated the file), and various standards groups (who actually named these things). (Note that it's perfectly legal to use, modify, and distribute these things, but you can't claim to own the copyrights on them if you don't.) I dream of SCO showing up in court without any evidence, and when asked why they don't have any, being forced to admit that all their computers and hard drives have been seized by federal marshalls as evidence against them for software piracy. Having a centralized location which lists their crimes seems like a good start.
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 2:26 UTC (Tue) by jimm007 (guest, #18055) [Link] #define toupper(c) (char)(c & ~32)#define tolower(c) (char)(c | 32)
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 3:08 UTC (Tue) by tres (guest, #352) [Link] This fails to take into account the possibility of some SCO programmer doing:char new_char = toupper( '5' ); It also fails to work on non ASCII codes which Linus' might or might not depending on whether or not lower case letters have a larger numerical representation.
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 6:38 UTC (Tue) by proski (subscriber, #104) [Link] Try this with your definition:
Linus Torvalds on SCO's latest claims Posted Dec 23, 2003 9:42 UTC (Tue) by dvrabel (subscriber, #9500) [Link] You're missing brackets around the argument in the body of the macro. This is why I always use inline functions in preference to macros.
SCO's latest claims Posted Dec 23, 2003 3:10 UTC (Tue) by dusty (guest, #14668) [Link] SCO is a modern day patent medicine outfit that deserve a slow agonizing death. Methinks they are about to get it.
Linus is "owned by SCO" Posted Dec 23, 2003 6:22 UTC (Tue) by s52d (guest, #2199) [Link] Baby making is owned by SCO. Linus's mother never payed royalities.Also, having a name is SCO trade secet. By giving Linus a name, Best regards, Iztok (p.s.: Iztok is owned by SCO, and phrase "Best Regards" as well.
Linus is "owned by SCO" Posted Dec 23, 2003 14:07 UTC (Tue) by fLameDogg (guest, #11305) [Link] Unfortunately, the phrase "is owned by SCO" is owned by... well, you know. Now you must pay royalties.
Linus Torvalds on SCO's latest claims Posted Dec 28, 2003 14:55 UTC (Sun) by argolnx (guest, #18251) [Link] Tomorrow i will register for copyright the "for" routine.... it could be nice ?!?!
Linus Torvalds on SCO's latest claims Posted Dec 29, 2003 10:57 UTC (Mon) by bradh (subscriber, #2274) [Link] The toupper() and tolower() routines are basically the same as providedby P.J. Plauger in his book "The Standard C Library". Which is (C) 1992.
Linus Torvalds on SCO's latest claims Posted Jan 22, 2004 0:05 UTC (Thu) by KillerBEEEE (guest, #18875) [Link] This is to inform you that the name "SCO" is copywrited material. This thread contains 28 uses of the above mentioned name. Please send $100 per useage of the name to po. box 420 danklane stonersparadise ca. 42042;)
|
Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.