Checking Code Ownership

Posted by Eric Fri, 30 May 2003 00:00:00 GMT

SCO claims that hundreds of lines of their Unix SVR4 code have somehow been mixed with the millions of lines of the Linux kernel, and that SCO should therefore receive a billion dollars from IBM. (Or something like that. Novell says that SCO doesn't own SVR4, SCO says they do own SVRx, Eric Raymond says that SVR4 contains misattributed BSD Unix code, and Darl McBride of SCO says something different every week.) I'll let you in on a dirty little secret: Many large software systems contain stolen or misattributed code. This is true of both in-house software and shrinkwrap software.

Example License Violations

I once audited the licenses in a 40,000-line proprietary software package. Here's what I found:

  • 1035 lines of GPL'd code. We groveled before the author's lawyer, received forgiveness, and replaced the offending code with something under the BSD license, all within 48 hours.
  • 231 lines of unattributed wrapper code. I have no idea who wrote this, but it looked nothing like the rest of the code. I yanked it.
  • 503 lines of utility classes. These were all written by the same guy, and contained the note: "Before using this code you should read the 'License Agreement' document and agree with it." The license agreement was missing (and the author had dropped completely off the web in the 6 years since the code was written), so I yanked these files, too.
  • 294 lines of mystery code, buried deep inside another file. While digging through some disk I/O code, I discovered 5 functions with an unusual coding style and lots of tricky low-level code. After a hour of web browsing, I hit pay dirt: These routines were part of a popular open source I/O library, and they had been used without proper attribution. I moved these functions into a separate file, and replaced the missing attribution.

(Just for comparison: A skilled C hacker can write a few hundred lines of good C code in a day, assuming a slightly lower-than-usual number of meetings and phone calls.)

All in all, about 5% of the code in this program had some kind of licensing problem. I don't think that this is a rare occurance, either.

Courts have already ruled that the Unix SVR4 code is heavily contaminated with misattributed, third-party code, in just this fashion.

My Guesses About the SCO Situation

To date, SCO has presented zero evidence for their accusations. However, I wouldn't be the least bit surprised to find a few thousand lines of SVR4 code in Linux (less than 0.1%), or--for that matter--a few thousand lines of Linux code in SCO Unix. When Novell sued UCB Berkeley over SVR4 code in BSD Unix, the courts ruled that 3 files out of 18,000 violated Novell's copyrights, and that BSD Unix code had been improperly used in SVR4.

An Open Request to SCO

If any your code has actually appeared in the Linux kernel, please tell us where it is, so we can remove it. Really. We don't want to violate your copyrights, and the Linux community could replace the offending code in a week or two (if you're only talking about "hundreds of lines").

Comments are disabled