By Jake Edge
February 3, 2010
Adding a new Certificate Authority (CA) to a browser's list of accepted CAs
is typically a quiet affair; the browser team vets the CA based on their
criteria and adds those who pass the test. For Mozilla, the criteria and
vetting process are not private, but the process generally happens behind
the scenes. Users find out that new CAs have been added by looking at the
CA store after a browser upgrade, though it is likely a very rare user that
actually looks. When Mozilla followed its policies and added the China
Internet Network
Information Center (CNNIC) CA, things took a very different path—a
firestorm of protest resulted.
CAs are the issuing authority for Secure Sockets Layer (SSL) certificates
that are used to authenticate encrypted HTTP (i.e. HTTPS) sessions. A CA
that has been accepted into a browser's "root store" can then sign SSL
certificates for domains and those certificates will be accepted as valid
by the browser. Much like self-signed certificates, SSL certificates that
are signed by a CA that is not in the root store will cause the browser to
emit scary security warnings.
As seen in the Mozilla bugzilla
entry, Liu Yan of CNNIC requested addition to the root store in
February 2009. Public discussion was opened
on October 13. There were some technical concerns discussed, which CNNIC
fixed, and the discussion closed on October 22. A bug was filed to
actually get CNNIC's root certificate added to the root store (which is in
the separate Network Security Services component). That bug was closed
in mid-December once CNNIC verified that the proper certificate was added.
That is presumably how most new CAs get added, a somewhat bureaucratic
process
is followed, the certificate gets added, and everyone goes on
their merry way. For CNNIC, though, things went a little differently.
With at least some folks in the Chinese IT world, CNNIC has a terrible
reputation. Starting on January 27, they were not shy about giving their
opinion of CNNIC—and Mozilla's decision to include it—on the
original bug report and a thread
in the mozilla.dev.security.policy group.
The main complaints seem to stem from the accusation that CNNIC has been
involved in distributing malware/spyware that is used by the Chinese
government to monitor its citizens. It is also alleged to be involved with
China's "Great Firewall" that censors specific web sites when accessed from
China. In addition, Liu asserted that CNNIC is "not a Chinese
Government organization" as part of the application process, but
various commenters dispute that.
There are some 60 comments on the bug, along with more than 100 messages in
the thread, many of them very passionate and/or heated requests to remove
CNNIC. It is perfectly understandable that Chinese people are concerned
about the possibility of government action against them because of what
they might say on the internet. But, it is not clear that adding CNNIC as
a CA has any bearing on that. Certainly CNNIC (or any CA) could
abuse their position and issue SSL certificates for domains that it
shouldn't, but, if they do, that act will provide clear evidence of
wrongdoing.
In order for an SSL certificate to be accepted, it must be
sent to the browser. Anyone visiting gmail.com, for example, and
getting a certificate signed by anyone other than Thawte (the CA that
signed Gmail's certificate), has proof of malfeasance. If CNNIC is abusing
its position, it should be relatively easy to prove. As Mozilla's
Johnathan Nightingale puts it:
What I have asked for
here, and am asking for again, is specific, concrete evidence that this CA has
acted in a way that contravenes our root policy. An illegitimate certificate
would be the single, best example of such evidence.
To many of the commenters, though, there is abundant proof of CNNIC's
involvement with malware and its
"lies" about its governmental status should be enough, in their eyes, to
remove CNNIC as a CA in Mozilla browsers. But, being affiliated with a
government is not a reason that Mozilla would reject a CA (there are
several others already in the root store for Japan, Taiwan, and others).
It also isn't clear that distributing malware, separate from its CA
activities, would be enough to remove a CA from the root store.
Other CAs have misbehaved along the way. Verisign's poorly-named Site Finder scheme redirected DNS
queries in violation of the RFC, and in ways that were roundly criticized.
But that action was separate from its CA business and there were no calls
to remove it from any browser's root store. While Site Finder is a
relatively minor transgression compared to the accusations leveled against
CNNIC,
it is difficult to punish organizations in a particular realm except based
on its behavior within that realm. Thus the calls for evidence of CA abuse.
It is quite possible that an outcry back in October, as part of the public
comment period, might have slowed or stopped the inclusion of CNNIC. But,
that didn't happen, CNNIC complied with the policy, and was added. So, the
question now is "whether
we should review" that decision, Nightingale said.
In order to do that, some evidence needs to be presented, he suggested:
It feels to me like that makes our next step clear, here. It won't help to
tally up the complainants (there will be many), and it won't help to demand
assurances from CNNIC (since the alleged governmental pressure would trump
those anyhow). It certainly won't help to cite wikipedia.
If there's truth to the allegation, here, then it should be possible to produce
a cert. It should be possible to produce a certificate, signed by CNNIC, which
impersonates a site known to have some other issuer. A live MitM attack, a
paypal cert issued by CNNIC for example.
Mozilla's Kathleen Wilson announced
the creation of a draft policy for
changing a root certificate that has been added to the root store. This
would provide a means for handling just this kind of dispute. Eddy Nigg of
Startcom, who is part of the team that reviews root inclusion requests, has
specifically asked
Wilson to start a review of CNNIC.
In the meantime, though, there are several technical measures that users
can take to protect themselves. To start with, in "Edit -> Preferences ->
Advanced -> Encryption" in Firefox, one can remove particular CAs from
the root store. There are also two different Firefox addons that could
help. Certificate
Patrol permanently stores each SSL certificate that the browser
encounters, and alerts the user when one changes. Perspectives
instead uses "network notaries" that store certificates for particular
hosts and can help users decide whether a self-signed or other certificate
is valid.
It is instructive to take a look at the long list of CAs that are installed
with Firefox. Many are for high-profile companies, but there are quite a
few for seemingly obscure organizations. There are certainly enough
different CAs that a government—or criminal organization—that wished to apply some
pressure could get its hands on a forged SSL certificate. In truth, the
pressure only
need be applied to an employee who has access to the signing key. That risk
exists whether or not CNNIC, or any other particular CA, is on the list.
It is certainly unfortunate that the accusations against CNNIC only
surfaced after the inclusion process had already been completed. Depending
on what evidence is compiled, Mozilla is likely to have a difficult
decision to make. But the controversy, along with other recent security concerns that may
involve the Chinese government, is likely to further raise the profile of
internet censorship. It is something that many governments like to condemn
on one hand and implement with the other—the only defense against it
is keeping it in the public eye.
Comments (18 posted)
February 3, 2010
This article was contributed by Nathan Willis
On January 20, YouTube publicly unveiled a video player that allows site visitors to watch videos embedded directly into each page as HTML 5 video elements, replacing the plugin-based Flash player — and second-tier video sharing site Vimeo quickly followed suit. But both sites serve up HTML 5 video files only in the patented and royalty-collecting H.264 format. By sheer coincidence, the announcement neatly overlapped with the release of Firefox 3.6, and was followed days later with Apple's press event showcasing its iPad gadget, which lack H.264 and Flash support, respectively. What followed was a furious multi-way debate all about Flash, licensing, web video, and H.264 versus Ogg Theora. For the open source community, there is nothing to celebrate yet, but the high profile of the argument has opened the door for discussion of the real underlying issue: patented web standards.
Rewind
The root of the entire controversy is HTML 5's video element, which allows a web developer to include video content in a web page in any file format, obviating the need to wrap such content in a Flash player useful only because of the Flash plugin's ubiquity. But it is up to the browser to include support for the formats it chooses in its built-in video player. The HTML 5 standard does not mandate that support be included for any particular format in order to qualify as compliant, however, so a public war is underway between format proponents for de-facto dominance.
On one side is the ISO Moving Picture Experts Group (MPEG), pushing for adoption of its H.264 format. The H.264 codec is part of the broader MPEG-4 family, is patented, and all parties wishing to include support for it are required to pay licensing fees to the patent holders through a consortium called the MPEG-LA — the licensing requirement applies to encoders and decoders, hardware and software, and includes both original manufacturers and downstream redistributors.
Many on the other side are supporters of the free Theora format, which requires no royalties to implement in hardware or in software, thanks to irrevocable free licenses on the original patents granted by its original creator. The reference encoder and decoder are developed by Xiph.org and are available under a BSD-style license.
Theora proponents emphasize the need for HTML 5 to include a free-to-implement format, insulating the next decade of web development from the nightmare caused by the GIF patent enforcement debacle. H.264 supporters claim that Theora's quality-per-bitrate performance is behind H.264's, and that some unknown third-party might hold secret patents on one or more techniques used in Theora, and subsequently sue implementers for patent infringement if the format is made part of the standard (the so-called "submarine" patent threat).
The major web browsers are divided on format support. Apple's Safari ships with H.264 support only, Google's Chrome supports both H.264 and Theora, Firefox and Opera support only Theora. Microsoft's Internet Explorer does not support HTML 5 video at all. Confusing the mix slightly is the fact that both Safari and Chrome implement H.264 playback because their parent companies pay licensing fees to MPEG-LA; consequently the open source browser projects WebKit and Chromium do not support H.264, because the license fees paid do not cover these downstream derivatives.
Players
That, then, was the situation when YouTube and Vimeo announced their H.264 HTML 5 video player support. What should have been a red-letter day for open web standards instead resulted in complaints to Mozilla from users (and pundits) that Firefox 3.6 "did not support HTML 5." In fact, Firefox has supported HTML 5 video since version 3.5, but it does not include an H.264 decoder.
Video expert Silvia Pfeiffer traced the problem back to numbers. According to Statcounter's market share statistics, Firefox accounts for 22.57% of the browsers in the world, with Chrome and Safari totaling 8.53%. Thus, of all the HTML 5-capable browsers in the field, Firefox makes up nearly 73% — and that 73% could not watch any of the YouTube or Vimeo video. It should be no surprise, then, that some of those users complained.
Mozilla's Christopher Blizzard responded to the news with a detailed analysis of the H.264 ownership and patent problem. The situation is precisely the same as the GIF disaster of a decade earlier, and as the MP3 situation from the early 2000's — but with considerably higher stakes. H.264 is patented, pure and simple, and the patent owners charge royalties today and will continue to do so until their patents expire. If H.264 becomes a de facto standard, the patent owners will have the freedom to hike the price of licenses, and they will no doubt do so.
Blizzard goes on to examine the terms of H.264 licensing and its effects on corporate and independent producers of web content. To include an H.264 decoder in Firefox, Mozilla would have to pay a license fee (perhaps $5 million per year), but such a move would also undermine Mozilla's founding principles of supporting and promoting free formats and standards.
Flash, we hardly knew ye
The other big news from the last week of January was Apple's iPad launch party. The iPad, like its diminutive siblings the iPhone and iPod Touch, uses a Safari-based web browser, and includes Apple's licensed H.264 decoder for HTML 5 video. But also like the smaller devices, the iPad does not include Flash support.
Coming from Apple, that decision was hailed by some in the media as a death knell for Flash. Once the preferred format for incorporating animation and interactive page elements into web content, in recent years its usage has shrunk to the point where it is used almost exclusively as a platform to deliver online video (and for irritating advertising, of course, although strictly speaking that would not be considered "content" by most).
No one seems to lament the possibility of Flash's demise. Apple has suggested that Flash is the cause of most of the Safari crashes reported through its OS X Crash Reporter utility. Mozilla said in October of 2009 that third-party plugins cause at least 30% of all Firefox crashes, a statistic supported by the popularity of Flash-blocking add-ons.
Apple's Steve Jobs even went so far as to publicly call Flash too buggy for use in a town hall meeting last week, declaring HTML 5 the way of the future.
What's a site owner to do?
Flash may indeed have no fans remaining outside of Adobe, a fact that
magnifies the importance of HTML 5 video codec battle. The plugin has
survived as long as it has for one reason alone: its availability on almost
every browser on almost every operating system. Long after AJAX became popular for interactive content functionality, a web developer could implement video playback in a Flash element and feel secure that it would work on virtually every browser that would encounter it.
The same cannot be said of HTML 5 video, and certainly not of HTML 5 video with H.264 content. If Theora becomes the dominant format (or officially sanctioned in the HTML 5 specification), it will be possible again, but that is simply not true of H.264. Both encoders and decoders require licensing; a fact often overlooked in the debate about browser support, but one which Blizzard addresses in his blog entry. Anyone can set up a site delivering CSS, HTML, and even Theora using free, legal tools, and without asking or signing for permission; H.264 would change that.
The only question is whether or not the web development community will
recognize that and rally behind Theora or another free alternative. The
H.264 patent owners' attacks on Theora are not substantive; the quality
comparison is highly subjective (and, in fact, comparing video encoding
quality is inherently
subjective), and as Xiph.org points out, submarine
patents are an equal threat to free and non-free codecs alike. The
original patents on Theora technology are known and licensed freely; if a
patent owner possessed sufficient evidence to kneecap Theora with an
infringement lawsuit regarding other
patents, it surely would have happened
already.
Moreover, the HTML 5 video element includes support for multiple source files, so content providers can offer each video in multiple formats; the fight is only the H.264 patent holders trying to prevent a rival format from being blessed as part of the standard. Those patent holders would take the same tactics with any other video format.
Some critics have suggested that another free video codec is needed, and Theora is certainly not the only option. Sun has been developing its own patent-avoiding video codec through the Open Media Commons project for several years, although the project is rather quiet. Blizzard suggests that Google may have a video patent play of its own in mind with its recent attempts to acquire On2, the company that developed the VP3 codec from which Theora descended. Dan Glidden, formerly of the Open Media Commons project, is a proponent of the MPEG-RF movement to change MPEG policy to establish a royalty-free option as a "baseline" codec for MPEG-4.
The debate is far from over. YouTube and Vimeo may have changed one
aspect of it, however — unlike in years past when the fight took place
almost entirely within World Wide Web Consortium working groups, this time
it is being fought in public. Consequently, more people are getting a look
at what HTML 5 video is in practice, and can better understand the difference between the HTML element and video format delivered, which can only be a good thing.
In the meantime, small web developers who want to serve up HTML 5 video content still have choices. The simplest option is to include multiple video source files, but a better alternative is to use the Cortado applet from Xiph.org; a streaming media Java applet that decodes Theora. It is open source, works transparently on any platform that includes Java support, and does not require encoding multiple source files — so there is no inadvertent spreading of unnecessary H.264 content required. But no one should hold their breath waiting for YouTube to implement it, of course.
Comments (56 posted)
February 3, 2010
This article was contributed by Don Marti
From one point of view, Samba is open source high
drama at its finest: an early adopter of version 3
of the GNU General Public License, and the recipient
of an unprecedented release of formerly proprietary
Microsoft documentation, thanks to a high-profile
anti-trust case. Meanwhile, though, it's the
low-profile software that implements the Server Message Block (SMB)
file-sharing protocol, sometimes known as CIFS. Samba powers every inexpensive
NAS device in the computer store—without even a
mention on the box—and comes with all the common
Linux distributions and with Apple's Mac OS X Server.
Today, as Samba comes closer to implementing a
key Microsoft directory protocol, the two aspects are
being forced together.
Samba creator Andrew Tridgell,
better known as Tridge, posted
to his blog, "There has been a lot of progress
recently in the development of the directory server
capabilities of Samba4." In a half-hour screencast
video, he demonstrated a development version of Samba
acting as a Microsoft Active Directory domain controller in a mixed environment.
"We are making very rapid progress now," he added.
Active Directory (AD) is a central repository for
all the administrative information that a modern
Microsoft Windows site needs. Besides user
names and passwords, AD functions as a DNS
server, stores network configuration policy
such as firewall rules, and acts as a back-end
for applications' configuration. Microsoft
Exchange, for example, is completely dependent
on it.
AD is made up of "domains" which are data structures
that contain groups of objects, which might represent
everything from an individual printer to the entire
company sales force. Domains can then be collected up
into "forests". A company might have many AD domains
within its forest, and everything in the forest can
be managed by the same administrators. Because AD
is such a critical service, Windows sites typically
install multiple AD servers, which replicate their
data using a formerly secret protocol.
The Samba team received
Active Directory documentation, including
the server-to-server protocol, as part of an agreement
made in response to a European Commission antitrust
case in 2007. The documents have helped the project,
Tridge said:
Stefan Metzmacher had managed to
decode some very important parts of the protocol as
part of his thesis work, but we were still missing
some key parts of the puzzle. The documentation from Microsoft filled
in many of these key elements, and perhaps more
importantly, Microsoft has been very willing to
engage with us to fill in any gaps that we find,
including working directly with traces of Samba
talking to Windows domain controllers to enable us
to debug our implementation.
The documentation project was a huge project from the
Microsoft side. Tridge described it this way:
I think it is fair to say that the
WSPP/MCPP documentation effort is one of the largest
efforts in IT history to document a set of network
protocols. The sheer scale of the
effort means that there are inevitably errors and
omissions. We have been pleased at how Microsoft has
responded to our reports of these errors by providing
us with additional documentation where needed.
In the video, Tridge demonstrates provisioning an
Active Directory domain on a Samba server, running
a development version of Samba from shortly before
Samba 4 alpha 11. Once the Samba server is running,
he then starts a copy of Microsoft Windows Server
2008R2 Standard as a guest under VirtualBox, and
runs the Windows "dcpromo" command to have it join
the domain as a domain controller.
A few clicks and entries in the "Active Directory
Domain Services Installation Wizard" later, the
Windows box is ready to reboot and come up as part
of the domain originally created on Samba. It takes
about 30 seconds to synchronize key information for
the newly-created domain. This step might take hours
on a larger, longer-running domain.
Samba 4 has
a few limitations, compared to a Windows AD server.
There is only one domain per forest, and only one site
per domain, but Tridge says that removing those limitations are
near-future priority tasks. Windows administrators,
like sysadmins everywhere, fall all over the
"lumpers" vs. "splitters" spectrum, and anyone
but extreme lumpers with simple configurations
will need the ability to define separate domains,
for departments and roles, and separate sites, for
physical locations.
The remaining manual step is to add the
Windows domain controller to the DNS zonefile
on the DNS server. Microsoft's Active Directory handles
DNS duties itself, while Samba relies on the
system nameserver. A change to a Samba AD domain
requires a corresponding change to a zonefile on the
nameserver. "What we don't yet support in Samba 4
is the ability to create arbitrary DNS names within
a Bind9 server using Kerberos authenticated DNS
requests," he said. "Microsoft stores DNS within
Active Directory. We can't join a Windows domain
controller as a new DNS server, so have to rely
on the Unix machines to provide DNS," he added.
After recording the screencast, Tridge did write
a script to automate the needed zonefile changes,
he said.
Tridge's screencast shows the Windows box
successfully syncing with the Samba server, and a
user added on the Windows side shows up quickly in a
search of the Samba server. Samba 4 is also able to
join an existing AD domain. A tool called "vampire"
is the Samba-side equivalent of the "dcpromo" command
on Windows. Tridge demonstrated using it to add a
second Samba server to the domain, ending up with a
domain with two Samba servers and one Windows server.
This ability means that an administrator could soon
add a Samba appliance to an existing AD network,
reducing the number of actual Windows servers
needed.
Integration and the "Franky" concept
Samba 4 is an ambitious rewrite, which has been in progress
since 2003. Meanwhile, Samba 3 has been through many
releases with incremental improvements, and currently
works well as a member, but not a domain controller,
of an Active Directory domain. Samba 3 is "closer
and closer to Windows compatibility in timestamps and
Windows ACLs. It's harder and harder to tell us from
a Windows box," Samba team member Jeremy Allison said.
Thanks to extensive usage and bug reports, Samba 3
has gained the ability to handle real-world client
quirks, while Samba 4 has focused on the big AD
problem but not faced the day-to-day beatings of
production use.
Tridge said that in addition to remaining AD work,
"we also need to find out exactly how we will achieve
our stated goal of re-integrating the great file
sharing and printing work that has been done in the
Samba3 branch with all of the work on Active Directory
server support in Samba4."
Samba developers have been discussing
ideas for combining the new functionality
in Samba 4 with the existing Samba 3 code.
One design for a combined project, called "Franky,"
short for "Frankenstein," would run Samba 3, listening on the SMB ports
(139 and 445), along with Samba 4 listening on the ports required for AD
support. Another alternative would be
to run Samba3, but pass through AD-related requests
to Samba4. "Obviously this will
require quite a lot of merge work, but we believe
this may be possible to achieve in 2010," Jeremy said
on the Samba team blog.
Tridge said:
We need to have a single common file
server component and printing component again. The
strain on the team of having two implementations of
the file serving component is too great. One way of
achieving that is via something like the 'Franky'
approach, but that has a significant downside of
making deployment and administration of Samba more
difficult. We need to put more thought into how we can
make it easy for administrators, while also offering
the best set of features from both branches.
"I'm expecting a fairly heated discussion at
SambaXP
this year," said John Terpstra, Samba team
member and chief software architect of ClearCenter,
which produces a web-administered distribution for
small and medium businesses. The SambaXP conference
is scheduled for May 3rd - 7th, 2010 in Göttingen,
Germany.
Licensing and downstream
Samba with Active Directory is still not on downstream
roadmaps. Simo Sorce, Principal Software Engineer
at Red Hat, who maintains Samba packages for Fedora,
said that project is looking at including Samba
3.5.0 in Fedora 13, if it's ready in time. But AD
is still in the future. For future releases, "We
will wait until the solution is stable enough that
upgrades won't mean your server has a good chance of
breaking," he said.
ClearCenter's ClearOS combines network gateway
with VPN, web and mail filtering, Samba file server,
Kolab groupware, and web-based administration tools
into a package designed for resellers to deploy at small
businesses and branch offices. Samba is a key part of
the company's product, which competes with Microsoft
Small Business Server but with a monthly subscription
bill instead of an up-front license price. ClearOS is
based on CentOS, a rebuild of Red Hat Enterprise
Linux, but includes Samba 3.4 in place of CentOS's
3.0 package. "ClearOS 6 is going to ship pretty
quickly after Samba 4 ships," John said.
Samba adopted
version 3 of the GPL in 2007. One effect of
the new license was to prohibit downstream Samba
resellers from entering into new patent license
agreements covering Samba, like the controversial Novell-Microsoft
patent deal of 2006. Samba's license change
doesn't affect Novell, whose contract predates the
GPLv3 cutoff date, but according to the Samba web
site, "Patent covenant deals done after 28 March 2007
are explicitly incompatible with the license if they
are 'discriminatory' under section 11 of the GPLv3."
No GPLv2 fork has emerged, and, Jeremy
says, the license change "has essentially
been a complete non-issue." Downstream
vendors ship Samba on everything from tiny NAS
devices that connect to a USB drive, up to IBM's Scale
Out File Services, which runs clustered Samba
on top of IBM's proprietary General Parallel File
System (GPFS). "What Samba does is it turns the
CIFS server into a commodity, allowing people to
compete on back-end scaled clustered filesystems,"
Jeremy said.
All of the Samba code is under individual copyrights,
without assignment. "It's completely impossible to
be bought out," Jeremy said. "No one can get any
advantage over anyone else in the Samba code."
As part of the agreement with Microsoft, the
company must disclose any of its patents that it
believes are necessary to implement its protocols,
and it has not added any to its list since reaching
the agreement. Microsoft has been "very cautions
about breaking compatibility," Jeremy said.
"With Windows 7, Microsoft made sure that it
would work with a Samba 3 domain controller."
Microsoft ended support for Windows NT 4, the
last of its OS products to implement the old NT
Directory Services system, at the end of 2004, and
Windows 7 does not work with an NT4 domain controller, he added.
Help wanted
As you might expect, the Samba team is looking for
help. Tridge invites new contributors: "Join the #samba-technical
IRC channel (on the FreeNode
network, irc.freenode.net), join the samba-technical
mailing list, and get involved with the development
process. Point out what the priorities are for Samba4
before you would consider deploying it, and help us
to prioritize our development to meet your needs."
Jeremy asks would-be redistributors and SMB
appliance vendors to work on functionality they
anticipate needing. "If you're planning on a
product within the next 18 months, the earlier you get
involved the more chance you get to steer it to do the
things you need to do," he said. "If you
need Samba to interface with a particular filesystem,
give us a VFS module that will let us do that,"
Jeremy said. Contributions to Samba itself have
to be licensed under the GPLv3, but the team does
want to be able to run Samba on the user's choice of
clustered filesystem.
Then, as Jeremy posted, "Once we have a
merged code-base, we'll declare victory, ship Samba4
and have the biggest darn release party since Duke
Nukem Forever shipped and revolutionized computer
gaming ! :-)." Samba 3 has served well as an
essential file server, and Samba 4 has broken new
ground in Microsoft protocol discovery, but eventually,
one way or another, there will be one Samba again.
Comments (30 posted)
Many sites these days depend on Google Analytics to measure traffic, but there's something to be said for keeping control of one's data. Piwik bills itself as an open source alternative to Google Analytics, but does it actually measure up? Piwik isn't quite a full-on replacement for Google Analytics, but it's mature and complete enough for many users.
Piwik is the successor to phpMyVisites. It lacks a few features that were in phpMyVisites, such as PDF export and mail reports, but also adds a plugin architecture, better API, cleaner user interface, and better performance/scalability.
We looked at the current stable release, Piwik 0.5.4. Piwik is very simple to set up for anyone used to installing Web applications. It requires MySQL 4.1 or later, PHP 5.1.3 or later, the pdo and pdo_mysql PHP extensions, and the PHP GD extension to get the "sparkline" graphs in Piwik. Part of the install process is a system check that shows the system requirements and what, if anything, is missing. On the test server running WordPress, the GD extension was the only bit that wasn't already present. Assuming the requirements are met, it's a simple process of navigating to the URL where Piwik is installed, filling in a few bits of info, and clicking "next" a few times. In all, it shouldn't take more than five to 10 minutes to install.
The slightly harder piece is integrating Piwik to the site. It depends on a piece of JavaScript code to run on each page that will be counted. Some popular blogging software and content management systems have plugins to work with Piwik, so it's not necessary to insert the code into site templates manually. We used the Piwik Analytics plugin to integrate it with WordPress. Once Piwik is installed and configured, results are visible almost immediately.
Because Piwik depends on JavaScript to track visitors, it will miss at least some percentage of traffic, depending on how many users hit the site with JavaScript turned off. It won't track visitors who get site information via RSS/Atom feeds, and will also miss some file downloads as well. Piwik tracks clicks on certain URLs that end with recognized filetypes but if someone clicks a link to, say, a PDF hosted on the site without visiting a page with the Piwik tracking script, that will be missed.
The Piwik interface is easy to use and provides quite a bit of
flexibility. Users can customize the main dashboard by adding an assortment
of widgets that track visitor actions (like what links are clicked),
referrers, or visitor settings (resolution, browser, etc.). The widgets themselves can display data as bar charts, sparklines, pie charts, or just raw numbers. Data can also be exported from each widget as an image of the graph, CSV, JSON, and PHP.
Some users don't like Google Analytics because of the site's dependence
on Flash. The good news is that Piwik requires far less use of Flash than
Google Analytics, and many of the widgets have table displays that don't
require it at all. But if you want the pretty graphs, Flash is required.
While Piwik has the advantage of putting web site owners in control of their own data, it has the disadvantage of putting additional load on the server. For low traffic sites, this probably won't be an issue. The test system we tried Piwik on had no problems with the additional load from Piwik, but the site typically had less than 1,000 page views per day (at least according to Piwik). Note that it's not necessary to run Piwik on the same server as the tracked sites.
Comparing Piwik directly to Google Analytics is sort of Apples to Oranges. Both tools give a good sense of traffic on a web site, and tend to mostly agree on traffic numbers though as a rule Google seems to track fewer visits than Piwik by about six or seven percent. By default, Piwik doesn't (yet) have an option to discard visits from the admin users, but the WordPress plugin does provide this option — so it's not clear what traffic Google is missing or discounting that Piwik does count. Both trackers show visitor breakdowns by browser, region, operating system, resolution, and more.
Though Piwik provides webmasters with control of their own data,
visitors might be uneasy if they were aware how much data Piwik harvests
about them. The visitor log report displays the visitor's IP address,
keyword used to find the site, date and time visiting, the URL referring
them to the site, duration of the visit, operating system, browser, screen
resolution, and browser plugins detected.
Piwik does a respectable job identifying keywords that lead visitors to a site, the pages that are most popular, returning visitors, time spent on site, and so forth. For amateur Webmasters who just want to see how their site performs, Piwik gives all the tools that one might want. Depending on how demanding the business needs are, Piwik should be suitable for Webmasters who need a general sense of site traffic and performance. For users who specifically need to focus on site performance as a major business goal, Piwik might not be enough.
Hands down, Google does a much better job of showing geographic data
than Piwik. Users who are curious as to the exact location of their traffic
will want to use Google Analytics. It's possible to drill down all the way
to the city level in some cases. Piwik, by contrast, shows visitors by
country and provider, and that's about it. Users who want to know whether
traffic is coming from Nuremberg or Frankfurt, or Los Angeles or New York,
need to use Google Analytics, try out one of the third party plugins that requires a fair amount of configuration, or write their own.
A full list of plugins is available on the Piwik Developer Zone page, though the list is simply a Trac search. One might find some interesting plugins, but it will take some digging.
Google Analytics also has more features for Webmasters trying to improve site traffic and compete with other sites. For instance, if one chooses to opt in to data sharing, Google will compare a site's traffic with aggregate data from other sites that share their data. Of course, Google already has the data, but this feature requires an extra step to allow it to be aggregated. This allows a Webmaster to track site performance against all aggregate traffic, or specific industry verticals. For example, it was possible to compare the test site traffic against other open source sites that are tracked by Google Analytics.
While Google may have features that Piwik doesn't (and vice-versa), Google Analytics is less friendly to the do-it-yourself approach. Piwik features a plugin architecture that allows developers to create their own features. Most of Piwik's features are enabled via plugins. The Plugin interface could do a better job of allowing users to get more information. Each plugin is listed with a short description, version number and links to activate or deactivate the plugin but no link to further information about the plugin in most cases. The "Live Visitors!" plugin, for example, is particularly unhelpful with only "Live Visitors!" as a description.
The Piwik roadmap indicates that 1.0 should be released sometime in 2010. Features planned for 1.0 include the ability to anonymize IPs stored in the Piwik database, export widgets to display limited data rather than all Website data, improve performance and scaling for Piwik, and better documentation.
But what won't be in Piwik is just as telling. The roadmap warns that the Piwik team doesn't plan to provide "advanced web analytics features found in other commercial products: custom report generator, custom segments and real time segmentation, funnel analysis, advanced ecommerce reporting, etc." Instead, the team suggests that these could be added as plugins, and that the goal of Piwik is to create an "open web analytics framework" that could be used to implement these features if the community desires.
To get the most complete picture possible, it's probably a good idea to combine Piwik with a package like AWstats that will analyze Apache logs. If data privacy and using an open tool isn't a concern, Google Analytics might be a better choice for now, because it does offer a wider selection of features. But users seeking an open source solution, and those who don't want to turn data over to Google or another third party, should look seriously at Piwik. There's no conflict in setting up each of the tools to run concurrently on a site, and having all of the packages at one's fingertips provides all the information any Webmaster could want.
Comments (9 posted)
Page editor: Jonathan Corbet
Next page: Security>>