LWN.net Logo

Still not a copyright infringement

Still not a copyright infringement

Posted Aug 22, 2003 12:44 UTC (Fri) by sommere (guest, #14168)
In reply to: Still not a copyright infringement by rjamestaylor
Parent article: Maybe SCO had a point

There is a largly quiet field of CS research on how to find CS students who plagarize. The researchers, for the most part, don't publicize their findings so that students can't check to see if they have obfuscated their code enough.

I did some poking around to see if I could figgure out how they work and here is what I found:

1) They typically remove all tokens (words) except the keywords. (so variable names don't matter.)
2) They often equate equivolents keywords (for and while can be used in equivolent ways)
3) They usually use an algorithm called "Running Karp Rabin" to find strings of matching tokens in two files. This algorithm is resistant to just reorderign the functions. (so it finds strings of tokens length 6 or longer which match anywhere in the file, for example)


This is likely what SCO's pattern matching team is doing, and someone on our side with access to System V should be doing it too.

I wrote a java program to test out whether this algorithm actaully finds cheaters (it did.) Feel free to e-mail me (lwn at ethanet.com) for more info/source.


(Log in to post comments)

Still not a copyright infringement

Posted Aug 22, 2003 19:03 UTC (Fri) by dark (✭ supporter ✭, #8483) [Link]

Umm, that algorithm sounds like it will find loads of false positives. Most student assignments are pretty simple, with only one or two obvious ways of deriving a correct solution.

Still not a copyright infringement

Posted Aug 22, 2003 22:17 UTC (Fri) by sommere (guest, #14168) [Link]

the programs don't expell students automatically :) Yes, it takes a human to use common sense and figgure out if one was actually copied from the other. But it gives you somewhere to start.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds