June 1, 2005
By Pamela Jones, Editor of Groklaw
When
Black Duck Software
first made available its software compliance tool, ProtextIP, about a
year ago, the typical first reaction was to view it as a response to
SCO's lawsuit.
Now there is a second such product, Palamida's IP Amplifier, and it's
clear there is a market for such products. Cisco, for one, has just signed
on with Palamida. Who really needs products like this, and why? And is
there a difference between them?
Who Needs Software Compliance Tools?
Now that Free and Open Source software has hit the mainstream of the
enterprise, businesses need to be certain that they are not taking on
legal liabilities with the code. There are many licenses, and making
sure a company is abiding by them all is complex. That's one reason you
are hearing so many voices calling for simplifying and settling on fewer
licenses. But it goes deeper than that.
"Everyone who distributes software should know what goes into it," attorney
Lawrence Rosen explains. "And almost everyone who distributes software
wants to comply with the relevant licenses. Most reputable software-based
businesses recognize that playing fast-and-loose with copyright claims
isn't worthwhile."
While most businesses today are pleased to adopt and incorporate open
source products into their products and services, they want to know what
licenses apply so that they can comply with the terms.
"That's what Black Duck and Palamida make possible," Rosen adds. "A
distributor or user can know what open source software is in its own
software and act accordingly, early in the cycle. It's now possible to
evaluate license compatibility for specific component sets and plan
appropriate combinations for use in products to be developed."
Unfortunately, developers sometimes use GPL code (or other licensed FOSS
code) without telling management, thinking it's public domain. It
isn't. And with outsourcing, sometimes developers are in other countries
that may have more relaxed views on copyright and this can cause problems.
So when developers let things happen they shouldn't (such as making
unauthorized
copies or derivative works), companies have an automated way to catch
some of that and react appropriately before much bigger problems can
develop.
Software practices are also changing. Application development today is
becoming more like an assembly line, more a matter of assembling bits
of code from open source projects and from outsourced firms and
incorporating them into proprietary products than handcrafting 100%
custom software. This isn't a bad thing, because it makes it possible to
avoid having to reinvent the wheel -- one of the advantages of Open Source
-- but it also means that checking on license terms and making sure you
are complying with them all is vital to the process.
And there is no doubt that enforcement of GPL violations is increasing,
as Fortinet learned recently when a German court banned their
U.K. subsidiary from further distribution of their firewall and antivirus
products until they complied with the GPL, which they promptly did.
Then there is the Sarbanes-Oxley Act [PDF], and its
requirements for IT audits.
"The SECs new rules on heightened corporate responsibility for public
company reporting known as Sarbanes-Oxley require public companies to
abide by internal procedures that are sufficient to provide reasonable
assurance that the financial and non-financial information required to be
disclosed in its periodic and current reports is accurate," says Karen
Copenhaver, executive vice president and general counsel for Black Duck
Software.
"Specifically, Sarbanes creates two new corporate governance requirements:
assessment of internal controls over financial reporting (required by
section 404 of the Act), and heightened corporate responsibility for
financial reports (required by section 302 of the Act). It would be hard
to overestimate the burden that compliance with these new rules has placed
on public companies in the first few years since their enactment.
"Even before Sarbanes, public companies were required to address
intellectual property matters in their current and periodic reports. A
reporting company traditionally discloses the importance of its
intellectual property assets to the companys business and any third-party
intellectual property encumbrances on the companys ability to conduct its
business. To the extent that a failure to identify or comply with third
party license obligations has an effect on the accuracy of any of this
information, public companies will be concerned about compliance with
their obligations under Sarbanes."
Obviously, Sarbanes-Oxley has upped the ante considerably. But most
businesses and developers want to do the right thing anyway, apart from
outside pressures. The tools don't set policy for a company, but they
surely make it easier to make sure policies are observed.
What Do the Tools Offer?
Before automated software compliance tools were available, due diligence
in checking software for infringing code was done by assigning the tedious
task to senior software programmers in the company, who, together with
lawyers laboriously looked through the code. The problem with such a
system, aside from the time it required and the drudgery, is that no one
person knows all the Free and Open Source projects available by sight, let
alone all the proprietary products you are not allowed to see without
complex legal arrangements.
Automated systems are an obvious answer. What they provide is a
Google-like collection of code. They've collected it all for you. Both
tools scan for copyright infringement and can spot more than verbatim
matches. But they do more than scan.
Palamida says its IP Amplifier product automatically detects, manages and
reports on the third party, commercial and open source components that may
exist in their software code base. It consists of two key modules -- the
Compliance Library and the Detector. Using an automated collection system,
the Compliance Library contains billions of source code snippets and
millions of files of the most commonly used open source projects found in
the market.
Palamida: "The Palamida IP Amplifier uses three different types of
technologies to automate detection, source code fingerprinting, file
digest matching, and for Java files, namespace matching. This means the
software is able to conduct both source code and binary code analysis. So
for companies whose developers download whole libraries, compiled code,
XML files, icons, text files, and include those resources into their code
base, the software will still detect their usage even though their source
code is not available and even if we do not have the components listed in
our database."
Next, there is a "layer of analysis that is beyond just code matching for
reduction of false positives. We call this technology CodeRank.
CodeRank looks at the code matches and evaluates the results on
multiple levels, including uniqueness, coverage and clustering. How unique
is that match to what is in the Palamida database? How much of a customer
file matches a file in Palamidas database? How dense are the matches
do they look like a continuous cut and paste or does it look like two
engineers coded against the same API?"
After their software evaluates the code matches, Palamida assigns a
CodeRank number to the matches; the higher the CodeRank number the higher
the chances of copying. In the scan results, users will see a list of all
code that has matches and a list of all the third party products that they
most likely came from, with the most likely on top.
Reports identify all components that include open source and list their
licenses, text and license information, in addition to the CodeRank. All
the information and data is exportable in XML data format, allowing users
to create custom reports, as well as via HTML reports.
Black Duck too offers a great deal more than just code scanning. Black
Duck's Copenhaver: "We do more than just scan code. Our product provides
a full suite of services covering project planning, code analysis and
detection, license analysis and management, auditing and archival
capabilities for the complete life cycle of software projects.
"From an open source perspective," Coperhaver adds, "we help developers
manage the origins and obligations of code that they use so they can meet
the expectations of the industry and community. But everything we do works
for both open source and proprietary or commercial code. Users can add
code prints and licenses into the system to manage their internal
proprietary code along with open source.
"Our product helps people manage the introduction of licensed materials
into their code bases, understand the obligations associated with that
code (and combinations of components from different sources), provide an
environment for controlled remediation of issues that arise and create an
archivable record of the actions that were taken by the team along the
way. Our products are designed to bring together developers, lawyers and
business decision makers into a collaborative environment."
Black Duck offers an analysis 'engine' that processes licenses at a
detailed level and alerts users to license conflicts and obligations of
both software source and binary components and their combinations. The
ProtexIP Knowledgebase contains detailed breakdowns of 500+ software
licenses for automated comparison of license terms and notification of
collective obligations, and the data is remotely updated frequently with
new licenses as they come to market. It recently added what they call
Custom Code Prints, which gives ProtextIP support for proprietary source
code.
Palmida claims a database of 40,000 of the most commonly used OSS projects
and their associated licenses, monitoring more than 38 million open
source files and billions of source code snippets. The Knowledge Base also
contains all pertinent information regarding the open source projects:
name, version number, project name, licensor, licensor information (when
available), license, license text, and project URL, all using an
automated collection toolset that incorporates information on all the new
projects released on the major OSS repositories for real time updates.
The Palamida database takes up less than 10 Gb disk space, thanks to a
compression algorithm, and it's all kept on a customer's own servers,
behind their firewall. Its code is written in Java. IP Amplifier
can be configured to search daily or weekly and has a set of configuration
tools to integrate it into build systems.
Are There Any Differences?
The biggest differentiator is cost. IP Amplifier 3.0 is licensed on an
annual subscription basis, for unlimited number of users, at prices that
begin at $50,000 and go up to $250,000 per year, depending on the
customer's development environment. There is a 30-day Free Trial offer.
Black Duck now offers two options. You can pay an annual licensing fee for
its multiuser ProtextIP product, at $25,000 per year, and then add
additional charges based on the amount of code you have. Or, you can use
their new hosted ProtextIP/OnDemand product, an online system for a
single user, single project, 90-day sessions, for which you pay based on
the amount of code you wish to scan. It costs $3,000 for 10 MB of code and
costs scale up to $25,000 for 100 MBs. A company thinking of acquiring
another might wish to use the online tool, rather than purchase more
costly version.
Both products still require human analysis, naturally. There can be false
matches, if two independent developers happen to write software that is
very much the same, even if there has been no copying, just because there
are only so many ways of writing the same instruction. Both tools
provide not only identical matches but also flag similarities in your
source code to others' programs that are worth your further investigation
and list issues for review. It's important to realize, however, that
the tools scan and analyze copyright issues and licensing issues, not
patent infringement. That is an entirely separate ballgame.
But for what they are designed to do, unquestionably they have
simplified, organized, and improved the due diligence process.
(
Log in to post comments)