Gathering web site statistics with Piwik
Many sites these days depend on Google Analytics to measure traffic, but there's something to be said for keeping control of one's data. Piwik bills itself as an open source alternative to Google Analytics, but does it actually measure up? Piwik isn't quite a full-on replacement for Google Analytics, but it's mature and complete enough for many users.
Piwik is the successor to phpMyVisites. It lacks a few features that were in phpMyVisites, such as PDF export and mail reports, but also adds a plugin architecture, better API, cleaner user interface, and better performance/scalability.
We looked at the current stable release, Piwik 0.5.4. Piwik is very simple to set up for anyone used to installing Web applications. It requires MySQL 4.1 or later, PHP 5.1.3 or later, the pdo and pdo_mysql PHP extensions, and the PHP GD extension to get the "sparkline" graphs in Piwik. Part of the install process is a system check that shows the system requirements and what, if anything, is missing. On the test server running WordPress, the GD extension was the only bit that wasn't already present. Assuming the requirements are met, it's a simple process of navigating to the URL where Piwik is installed, filling in a few bits of info, and clicking "next" a few times. In all, it shouldn't take more than five to 10 minutes to install.
The slightly harder piece is integrating Piwik to the site. It depends on a piece of JavaScript code to run on each page that will be counted. Some popular blogging software and content management systems have plugins to work with Piwik, so it's not necessary to insert the code into site templates manually. We used the Piwik Analytics plugin to integrate it with WordPress. Once Piwik is installed and configured, results are visible almost immediately.
Because Piwik depends on JavaScript to track visitors, it will miss at least some percentage of traffic, depending on how many users hit the site with JavaScript turned off. It won't track visitors who get site information via RSS/Atom feeds, and will also miss some file downloads as well. Piwik tracks clicks on certain URLs that end with recognized filetypes but if someone clicks a link to, say, a PDF hosted on the site without visiting a page with the Piwik tracking script, that will be missed.
![[Piwik
dashboard]](https://static.lwn.net/images/2010/PiwikDashboard-sm.png)
The Piwik interface is easy to use and provides quite a bit of flexibility. Users can customize the main dashboard by adding an assortment of widgets that track visitor actions (like what links are clicked), referrers, or visitor settings (resolution, browser, etc.). The widgets themselves can display data as bar charts, sparklines, pie charts, or just raw numbers. Data can also be exported from each widget as an image of the graph, CSV, JSON, and PHP.
Some users don't like Google Analytics because of the site's dependence on Flash. The good news is that Piwik requires far less use of Flash than Google Analytics, and many of the widgets have table displays that don't require it at all. But if you want the pretty graphs, Flash is required.
![[Piwik
countries]](https://static.lwn.net/images/2010/PiwikCountry-sm.png)
While Piwik has the advantage of putting web site owners in control of their own data, it has the disadvantage of putting additional load on the server. For low traffic sites, this probably won't be an issue. The test system we tried Piwik on had no problems with the additional load from Piwik, but the site typically had less than 1,000 page views per day (at least according to Piwik). Note that it's not necessary to run Piwik on the same server as the tracked sites.
Comparing Piwik directly to Google Analytics is sort of Apples to Oranges. Both tools give a good sense of traffic on a web site, and tend to mostly agree on traffic numbers though as a rule Google seems to track fewer visits than Piwik by about six or seven percent. By default, Piwik doesn't (yet) have an option to discard visits from the admin users, but the WordPress plugin does provide this option — so it's not clear what traffic Google is missing or discounting that Piwik does count. Both trackers show visitor breakdowns by browser, region, operating system, resolution, and more.
![[Piwik
visitors]](https://static.lwn.net/images/2010/PiwikVisitors-sm.png)
Though Piwik provides webmasters with control of their own data, visitors might be uneasy if they were aware how much data Piwik harvests about them. The visitor log report displays the visitor's IP address, keyword used to find the site, date and time visiting, the URL referring them to the site, duration of the visit, operating system, browser, screen resolution, and browser plugins detected.
Piwik does a respectable job identifying keywords that lead visitors to a site, the pages that are most popular, returning visitors, time spent on site, and so forth. For amateur Webmasters who just want to see how their site performs, Piwik gives all the tools that one might want. Depending on how demanding the business needs are, Piwik should be suitable for Webmasters who need a general sense of site traffic and performance. For users who specifically need to focus on site performance as a major business goal, Piwik might not be enough.
![[Analytics state detail]](https://static.lwn.net/images/2010/AnalyticsStateDetail-sm.png)
Hands down, Google does a much better job of showing geographic data than Piwik. Users who are curious as to the exact location of their traffic will want to use Google Analytics. It's possible to drill down all the way to the city level in some cases. Piwik, by contrast, shows visitors by country and provider, and that's about it. Users who want to know whether traffic is coming from Nuremberg or Frankfurt, or Los Angeles or New York, need to use Google Analytics, try out one of the third party plugins that requires a fair amount of configuration, or write their own.
A full list of plugins is available on the Piwik Developer Zone page, though the list is simply a Trac search. One might find some interesting plugins, but it will take some digging.
Google Analytics also has more features for Webmasters trying to improve site traffic and compete with other sites. For instance, if one chooses to opt in to data sharing, Google will compare a site's traffic with aggregate data from other sites that share their data. Of course, Google already has the data, but this feature requires an extra step to allow it to be aggregated. This allows a Webmaster to track site performance against all aggregate traffic, or specific industry verticals. For example, it was possible to compare the test site traffic against other open source sites that are tracked by Google Analytics.
While Google may have features that Piwik doesn't (and vice-versa), Google Analytics is less friendly to the do-it-yourself approach. Piwik features a plugin architecture that allows developers to create their own features. Most of Piwik's features are enabled via plugins. The Plugin interface could do a better job of allowing users to get more information. Each plugin is listed with a short description, version number and links to activate or deactivate the plugin but no link to further information about the plugin in most cases. The "Live Visitors!" plugin, for example, is particularly unhelpful with only "Live Visitors!" as a description.
The Piwik roadmap indicates that 1.0 should be released sometime in 2010. Features planned for 1.0 include the ability to anonymize IPs stored in the Piwik database, export widgets to display limited data rather than all Website data, improve performance and scaling for Piwik, and better documentation.
But what won't be in Piwik is just as telling. The roadmap warns that the Piwik team doesn't plan to provide "advanced web analytics features found in other commercial products: custom report generator, custom segments and real time segmentation, funnel analysis, advanced ecommerce reporting, etc.
" Instead, the team suggests that these could be added as plugins, and that the goal of Piwik is to create an "open web analytics framework" that could be used to implement these features if the community desires.
To get the most complete picture possible, it's probably a good idea to combine Piwik with a package like AWstats that will analyze Apache logs. If data privacy and using an open tool isn't a concern, Google Analytics might be a better choice for now, because it does offer a wider selection of features. But users seeking an open source solution, and those who don't want to turn data over to Google or another third party, should look seriously at Piwik. There's no conflict in setting up each of the tools to run concurrently on a site, and having all of the packages at one's fingertips provides all the information any Webmaster could want.
Index entries for this article | |
---|---|
GuestArticles | Brockmeier, Joe |
Posted Feb 4, 2010 15:56 UTC (Thu)
by MattPerry (guest, #46341)
[Link] (2 responses)
Posted Feb 4, 2010 16:12 UTC (Thu)
by jake (editor, #205)
[Link] (1 responses)
Actually, you don't ... the Javascript blob can extract things like resolution, plugins installed, and some other stuff. The logs give you most of what you need, though, and count things that don't run Javascript (bots, NoScript users, etc.).
jake
Posted Feb 20, 2010 23:24 UTC (Sat)
by Velmont (guest, #46433)
[Link]
Posted Feb 6, 2010 17:24 UTC (Sat)
by bgilbert (subscriber, #4738)
[Link] (4 responses)
Since when does LWN compare open tools to proprietary ones?
Posted Feb 6, 2010 18:20 UTC (Sat)
by corbet (editor, #1)
[Link] (3 responses)
Proprietary tools - especially very widely used proprietary tools - can show what is possible and what people want. Piwik is likely to be of interest to sites wanting to move away from this particular proprietary tool - or which chose not to use it in the first place. People running such sites are likely to be interested in what they might gain or lose by making this change. Sorry, but I think the comparisons are appropriate, and I hope they were useful for some.
Posted Feb 6, 2010 18:58 UTC (Sat)
by rahulsundaram (subscriber, #21946)
[Link]
Posted Feb 7, 2010 1:57 UTC (Sun)
by bgilbert (subscriber, #4738)
[Link] (1 responses)
Posted Feb 12, 2010 9:18 UTC (Fri)
by emeraldine (guest, #63520)
[Link]
Posted Feb 15, 2010 20:04 UTC (Mon)
by frabcus (guest, #25169)
[Link]
We picked it because we can guarantee better privacy for our users than
Using Javascript can reduce false hits - it has been the easiest, most
Finally, I am really happy to see this article. I am worried that more and
Another good example is GMail. I know many hackers who use GMail, who would
Ideally I'd like an integrated open source package, maybe in a virtual
If we don't compete with proprietary on the web, then all the gains of the
Gathering web site statistics with Piwik
Gathering web site statistics with Piwik
Gathering web site statistics with Piwik
Is this an ad for Google Analytics?
As the article was being prepared, I did try to direct its focus away from Analytics and toward Piwik itself. That said:
Is this an ad for Google Analytics?
Is this an ad for Google Analytics?
Is this an ad for Google Analytics?
Is this an ad for Google Analytics?
Gathering web site statistics with Piwik
TheyWorkForYou).
with Google Analytics. We handle sensitive letters on the WriteToThem
website (also open source). Also, in general we prefer to use open source
software, rather than proprietary software.
reliable way to exclude all bots from results for a while now. Also, the
setup is more in the hands of the person managing the content - it is hard
to give access to the log files themselves for the tools to run on without
a Unix systems administrator, which although I have, not everyone does.
more "open source" developers are using closed tools made by Google, such
as Google Analytics.
never have dreamed of using Outlook in the 1990s. Unfortunately, there is
no open source web-based email client that is comparable to GMail yet.
Maybe RoundCube will be one day, but it needs developers.
machine you install, that gives you (on your own domain) web based email
(with GMail-like threading and keyboard shortcuts), SMTP and IMAP. And
integrates it all with full text search, spam whitelisting and other
features that require communication between various parts of the email
system. All ready set up.
90s competing with Microsoft will be wiped out.