15 Minutes and 150MB of RAM to Compare Unix and Linux
Posted by Eric Fri, 20 Jun 2003 00:00:00 GMT
SCO has recently made two accusations: (1) IBM has contributed IBM employees' code to Linux in violation of certain SCO/IBM contracts, and (2) some proprietary Unix code has somehow been illegally contributed to Linux. I'm not qualified to comment on whether or not IBM owns the code IBM wrote--though on behalf of software authors everywhere, I hope IBM does. However, I've written a tool which will allow SCO to find any code shared between Linux and Unix in about 15 minutes. What SCO does with this tool is up to them.
When I was young, my father once told me, "The truth will out." (I think he was quoting someone.) He felt that it was better to face the truth, get all the facts before the public, and do the right thing. You might pay a price for your honesty, but you'd pay a bigger price if you lied, because sooner or later, "The truth will out."
I don't fear the truth. And as a creator, I deeply respect other people's copyrights. I don't want other people misuing my work. Linus Torvalds is a creator, too, and he says he respects the work of others. If there's Unix code in the Linux kernel, I want it removed swiftly, and I want those responsible to be barred from future contribution. Such a illegal copying would be stain upon the honor of many good, creative people.
When I read Egan Orion's excellent article, I decided to implement the idea he described, and to make the tool publically available to help copyright owners figure out whether their code has been copied.
How to Use It
You'll need a Linux or Unix system with a decent C++ compiler. Download srcdupchk-0.2.tar.gz (that's "source duplication checker" to people who don't speak Unix), decompress it, and type:
$ cd srcdupchk-0.2 $ ./configure $ make $ make install
You may need be root to run the last command. Before continuing,
please read the README and COPYING files carefully.
with no warranties and is provided "AS IS".
Now place the programs to be be compared in two different directories, and type:
$ srcdupchk my-program-src linux-2.4.20
After 15 minutes or so, you'll get output which looks like this:
linux-2.4.20/foo/bar.c:20:107 linux-2.4.20/foo/baz.c:52:57 ...
This means that lines 20 to 107 of
bar.c are similar to code
my-program-src, as are lines 52 to 57 of
srcdupchk won't print out the corresponding lines in
my-program-src, so you don't have to reveal more than necessary
about your own program. (If you want to see the other half of the
matches, use the
For details on how
srcdupchk actually works, see the README.
srcdupchk uses some neat tricks to ignore whitespace, commenting
style, brace placement, and other irrelevant details.
srcdupchk will find lots of perfectly legal code sharing. For
example, both Linux and Unix are allowed to contain BSD code, or public
domain code from various textbooks or the web. Other common sources of
duplication are license notices, and the boilerplate code generated by
popular tools. Once you've found the duplication, you need to
investigate it carefully before you know what it means.
A Personal Request
Please don't publish the results on running
other people's code. It isn't polite, it almost certainly violates any
non-disclosure agreements you've sig