LWN.net Weekly Edition for October 31, 2013

Lightbeam for Firefox

By Nathan Willis
October 30, 2013

In early 2012, Mozilla released Collusion, an experimental extension for Firefox that recorded third-party web tracking and rendered its information in a graph-like visualization. Since then, the add-on has undergone further development, and Mozilla announced its first major update on October 25. In a move that will come as no shock to Mozilla watchers, the project has been renamed, but it also now offers additional ways to dig into the tracking data that the extension collects, and users can now choose to contribute their tracking statistics to a data survey Mozilla is conducting on web privacy.

The rebranded extension is now called Lightbeam, and is compatible with Firefox 18 and newer. Lightbeam works by recording third-party HTTP requests in the pages visited with the browser, noting requests that match a list of web tracking services and advertisers. The list itself originally came from privacychoice.org. Each tracker encountered is saved in a local record that notes which domain linked to the tracker, whether cookies were set by the request, whether the request was made via HTTP or HTTPS, whether the user has actually visited the tracker domain in question, and a few other details about the connection.

You're on candid camera

The first visualization of this data (and the only option in the Collusion-branded release) was a connected graph, with edges connecting the domains that a user has intentionally visited to all of the tracker domains that come along for the ride. Not only does that visualization immediately show which sites are the worst third-party tracking offenders, but it also reveals how many independent sites are sending data back to the same tracking service. This is an educational experience; most users are aware that large web service companies like Google and Facebook do web tracking, but seeing the full list of third-party, business-to-business web-tracking sites without household names is another thing altogether.

The graph is updated in real time, which makes for interesting (and arguably creepy) viewing in a separate window while one goes about the day's surfing. For example, over the past 48 hours, Lightbeam indicates that I have visited 94 sites, which have in turn collectively reached out to 177 third-party sites. Not all of those third-party sites are trackers, of course: those that are known to be tracking services are rendered as blue-outlined nodes, while others are rendered with white outlines. The graph shows visited sites as circles and third-party sites as triangles.

Interested users should note, though, that Lightbeam is still on the buggy side; the project's GitHub issue tracker indicates that quite a few users are encountering compatibility problems with other extensions—particularly those that also deal with third-party web tracking or privacy protection. There also appear to be several Firefox privacy preference settings that break Lightbeam. Ironically, of course, the users most interested in Lightbeam are likely to also be the most interested in tweaking privacy settings and in installing other web-tracking countermeasures, so these unresolved issues greatly impact Lightbeam's usefulness.

When it does work, the graph visualization allows one to focus in on each individual site, which shuffles around the graph connections so that the site of interest is in the center. Clicking on a node also opens a side pane showing the server location and the list of third-party sites that have been contacted. While the graph is revealing from a big-picture perspective, it is also not necessarily the easiest way to study in depth. Luckily, Lightbeam offers two other views on the same data. The "clock" provides a time-based look at the visits and third-party connections made, and there is also a straightforward list.

Users can also save the extension's current data locally, or reset it to begin a new capture session. That capability would be most useful to do something like compare the effect that changing the "Do Not Track" preference has on third-party trackers, but in my tests enabling the "Do Not Track" setting was one of the preferences that stopped Lightbeam from working altogether.

Spies like us

Better visualization features are nice, but arguably the biggest change in the revamped extension is the fact that Lightbeam users can opt-in to send their data to Mozilla. That might sound a tad oxymoronic, but Mozilla insists that the data collected from users will be anonymized and will only be published in aggregate. For instance, only the domain and subdomain of visited sites are recorded, not the path component of URLs, so in most cases usernames and other personally-indentifiable data will not be included.

In September 2012, Mozilla's David Ascher wrote about the data-collection effort, saying that it would look at the "different kinds of uses of shared third-party HTTP requests", so that users can better understand what types of request are in their interest and which are not. He also said that Mozilla intended to use the data to work with site publishers. The blog post announcing the October Lightbeam release followed up on this idea, albeit with few specifics. The post says only that "Once the open data set has time to mature, we’ll continue to explore how publishers can benefit from additional insights into the interaction of third parties on their sites."

There indeed may be a lot about third-party tracking (both commercial tracking services and tracking that is performed surreptitiously) that site owners in general are generally in the dark about. Still, it would be nice to have more detail available about exactly what the data collection effort at Mozilla will look like before opting in to it. In the meantime, though, Lightbeam definitely does "pull back the curtain" (as the blog post puts it) on web tracking for individual users. Hopefully as Mozilla pursues a broader effort, the results will be enlightening for users of the web in general—most of whom know that "web tracking" exists in some form, but for whom its full extents and methods remain a mystery.

Comments (5 posted)

The kernel maintainer gap

By Jake Edge
October 30, 2013

ELC Europe

Wolfram Sang is worried that the number of kernel maintainers is not scaling with the number of patches flowing into the mainline. He has collected some statistics to quantify the problem and he reported those findings at the 2013 Embedded Linux Conference Europe. There is not an imminent collapse in the cards, according to his data, but he does show that the problem is already present and he forecasts that it will only get worse.

Sang's slides [PDF] had numerous graphs (some of which are reproduced here). The first simply showed the number of patches that went into each kernel from 3.0 to 3.10. As one would guess, the trend is an increase in the number of patches. Companies are working more with the upstream kernel than they have in the past, he said, which is great, but leads to more patches.

There are also, unsurprisingly, more contributors. Using the tags in Git commits, Sang counted the number of authors and contrasted that with the number of "reviewers" (using the "Committer" tag) in his second graph. Over the 3.0–3.10 period, the number of authors rose by around 200, which is, coincidentally, roughly the (largely static) number of reviewers. That means that the gap between authors and reviewers is getting larger over time, which is a basic outline of the scaling problem, he said. If we want to maintain Linux with the quality we have come to expect, it is a problem we need to pay attention to.

His statistics are based on accepted patches and don't include superseded patches or bogus patches that might require a lengthy explanation to the author. There is also a fair amount of education that maintainers often do for new developers. All of that takes additional time beyond what a raw number patches will show, he said. While his graphs start at 3.0, he does not want to give the impression that the maintainer workload was in a good state at that time, "it was challenging already" and it is getting worse.

Trond Myklebust gave the best definition of a maintainer that Sang knows of. According to that definition, the job is made of five separate roles: software architect, software developer, patch reviewer, patch committer, and software maintainer. It is not easy to rip out any of those tasks to distribute them to other people. The "number one rule is to get more maintainers who like to do all of these jobs at once".

The right maintainer will enjoy all of those jobs, which makes good candidates fairly rare. Sang suggested that developers remember that maintainers have all those roles when interacting with them. He doesn't mean that developers should obey maintainers all the time, he said, but they should keep in mind that the maintainer may be wearing their architect hat so they may be looking beyond the direct problem the developer is trying to solve.

The "Reviewed-By" and "Tested-By" tags are quite helpful to him as a maintainer because they indicate that the patch is useful to others beyond just its author. That led him to look at the stats for those tags. He plotted the number of reviewers and testers who were not also committers to try to gauge that pool. That graph appears above at right, and shows that there are around 200 reviewers and 200 testers for each kernel cycle as well. The trend is much like that of maintainers, so there is still an increasing gap. The reviewers and testers "are doing great work", but more of them are needed as well.

Using a diagram of the different pieces in a typical ARM system on chip (SoC), Sang showed that there are many different subsystems that go into a kernel for a particular SoC. He wanted to look at those subsystems to see how well they are functioning in terms of how quickly patches are being merged. He also wanted to compare the i2c subsystem that he maintains to see how it measured up.

Using the "AuthorDate" and "CommitDate" tags on patches from 3.0 to 3.10, he measured the latency of patches for several different subsystems (the combined graph is shown at right). That metric can be inaccurate as a true measure of latency if there are a lot of merge commits (as the CommitDate may not reflect when the maintainer added the patch), but that was not really a problem for the subsystems he looked at, he said.

He started with the drivers/net/ethernet subsystem, which had some 5000 patches over the kernel releases measured. It has a fairly low latency, with 85% of the patches being merged within 28 days. This is what developers want, he said, a prompt response to their patches. In fact, 70% of patches were merged within one week for that subsystem.

Looking at the mtd subsystem shows a different story. Sang was careful to point out that he was not "bashing" any subsystem or maintainer as they all do "great work" and do what they can to maintain their trees. After 28 days, mtd had merged just over 50% of its patches. Those might be more complicated patches so they take more review time, he said. That is difficult to measure. After one three-month kernel cycle, about 80% of the patches were merged.

Someone from the audience spoke up to say that many of the network driver patches get merged without much review because there is no dedicated maintainer. That makes the latency lower, but things get merged with lots of bugs. Sang said that is one way to deal with a lack of reviewers: if the network driver patch looks "halfway reasonable" and no one complains, it will often just be merged. If anyone is unhappy with that state of affairs, they should volunteer to help, another audience member suggested. Sang agreed, and said that taking on the maintenance of a single driver is a good way to learn.

His subsystem is somewhere in between net/ethernet and mtd. That is "not bad", he said, for someone doing the maintenance in their spare time. But for a vendor trying to get SoC support upstream, it may not be quick enough.

The dream, he said, would be for all subsystems to be more like the Ethernet drivers without accepting junk patches. His belief is that over time the latency in most subsystems is getting worse. In fact, his "weather forecast" is that we will see more and more problems over time, either with increased latency or questionable patches going into the trees.

So, what can be done to help out maintainers? For users, by which he means people who are building kernels for customers, not developers, necessarily, or regular users, he recommends giving more feedback to subsystem maintainers. Commenting on patches, testing them, and describing any problems found will help. Add a Tested-By tag if you have done that, as well. If there is no reaction to a patch on the mailing list and seemingly no interest, it makes his job of deciding whether the patch is worthwhile difficult. If you are using a patch that hasn't been merged, consider resending it, but check to see if there are open issues from when the patch was posted. Sometimes there are simple style changes needed that can be easily fixed.

For developers, he recommends trying to get the patch right the first time and thus reducing the number of superseded patches. Not knowing the subsystem and making mistakes that way is reasonable, but sloppy patches are not. In addition, if you know a patch is a suboptimal solution to the problem, be honest about it. Don't try to sell him something that you know is bad. Sometimes a lesser solution is good enough, but a straight explanation should accompany it.

He also recommends that developers take part in the patch QA process by reviewing other patches. In fact, he said, you should also review your own patches as if they came from someone else—it is surprising what can be found that way. Taking part in the mailing list discussions, especially those that are about the architecture of the subsystem, is important as well. It is difficult to determine which way to go, at times, without people stating their opinions.

Maintainers should not necessarily work harder, Sang said, as most are working hard already. It is important to watch out for burnout as no one wins if that happens. SoC vendors are "constantly pressing the fast-forward button" by releasing hardware faster and faster, so you may reach a point where you simply can't keep up. That may be time to look for a co-maintainer.

Having the right tools is an important part of being a maintainer. There is no "ready-made toolbox" that is handed out to new maintainers, but if you talk to other maintainers, they may have useful tools. Keyboard shortcuts, Git hooks for doing auto-testing, tools to handle and send out email, and so on are all time savers. "Pay attention to the boring and repetitive tasks" and try to automate them.

Organizations like the Linux Foundation (LF), Linaro, SoC makers, and others have a role to play as well. If they already have developers, it is important to allow those developers to review patches and otherwise participate in kernel QA. That will improve the developers' skills, which will help the organization, and it will improve the kernel too.

It is important to educate new kernel developers internally about the basics of kernel submissions. He is much more lenient with someone who he knows is working on his own than he is with those working at companies where there are multiple folks who already know about submitting patches and could have passed the knowledge on.

Increasing the number of maintainers would help as well. It might be easier for people to take on maintainer responsibilities if it were part or all of their job to do so. Sang believes that being a maintainer should ideally be a full-time paid position, but that is often not the case. He does it on his own time, as do others, and some do it as part, but usually not all, of their job. A neutral party like the LF might be desirable as the employer of (more) maintainers, but other organizations or companies could also help out. In his mind, it is the single most important step that could be taken to improve the kernel maintainer situation.

He went back to the SoC diagram he showed early on, but this time colored the different subsystems based on whether the maintainer was being paid to do that work. Red meant that the maintainer was doing it in their spare time, and there were quite a few subsystems in that state. That is somewhat risky for SoC vendors trying to get their code upstream. Ideally, most or all of the diagram would be green (maintainer paid to do it) or yellow (part of the maintenance time is paid for). Sang ended by saying that having full-time maintainers was really something whose time had come and he is optimistic that more of that will be happening soon.

[I would like to thank the Linux Foundation for travel assistance to Edinburgh for the Embedded Linux Conference Europe.]

Comments (2 posted)

Automotive Linux projects getting in gear

By Nathan Willis
October 30, 2013

Automotive Linux Summit Fall

The third Automotive Linux Summit (ALS) was held in Edinburgh, Scotland, concurrently with the Embedded Linux Conference and Kernel Summit. As has been the case with the previous ALS events, there was a lot of talk from automakers about developing Linux-based in-vehicle infotainment (IVI) and embedded control systems—talk that, for the most part, dealt with efforts that are still several years away from reaching the showroom floor. Such is simply the nature of the car business; new car models require a multi-year development cycle for numerous reasons. But what was more interesting in the schedule of the Edinburgh event was tracking the progress of the many sub-projects on which a Linux-based IVI system depends. Over the past two years, the community has seen the topics move steadily from what an IVI system should include to active projects with working code.

GENIVI opening up

The leading example of this progress comes from GENIVI, the car-industry alliance working on a Linux-based IVI middleware layer. In 2012, GENIVI launched its first open source projects: an audio routing manager, a graphics layer manager, and a diagnostic logging tool. Since that time, the stable of GENIVI projects has grown to fifteen.

GENIVI's Philippe Gicquel delivered one of the keynote talks on the first day of ALS, starting off with a description of how GENIVI sees itself fitting into the expanding community of Linux-driven automotive software projects. The alliance's target, he said, is the non-differentiating components of IVI stack: those bits on which car makers and tier-one suppliers would rather not compete head-to-head since they are largely invisible to consumers. In particular, though, GENIVI is pursuing a specific set of domains: multimedia and graphics, connectivity with consumer electronics devices (e.g., phones), location-based services, and integrating Linux with existing automotive standards (such as diagnostics). Each domain has its own working group at GENIVI, which are collectively coordinated by GENIVI's System Architecture and Baseline Integration teams.

Gicquel then briefly discussed the active software projects. GENIVI's standard practice is to work upstream, he said, and it contributes to Wayland and other projects, but it does maintain its own IVI-specific branches of projects when a full merge upstream is not possible. For example, he said, the IVI Layer Manager (which was one of the first three projects) project maintains a patch set that makes Wayland GENIVI compliant, since its use case is not important to 95% of Wayland systems.

The newer GENIVI projects include several that build on upstream work, such as AF_BUS, which is a latency-reducing optimization of D-Bus. Node Startup Controller (NSC) and Node State Manager (NSM) are components used to manage application lifecycles in the vehicle. NSC extends systemd to handle rapid startup and shutdown of applications and system services (where often the maximum allowable time is a legal requirement). NSM provides a state machine to track applications' lifecycles.

Other projects are original. Persistence Management is a library to handle storage of data that needs to persist across reboots, and, like NSC, is designed to cope with the rapid system shutdowns expected in automotive environments. IPC CommonAPI C++, is a set of C++ language bindings that abstracts several inter-process communication APIs into one (the "common API"). Currently D-Bus, SOME/IP, and Controller Area Network (CAN) bus are the IPC mechanisms targeted, but support for others may be added later. Smart Device Link (SDL) is a framework for smartphone applications to run remote user interfaces on a car's dash head unit.

There are also several developer tools in GENIVI's projects. YAMACIA is a plugin for the Eclipse IDE that supports the IPC CommonAPI and the Franca interface definition language used by the CommonAPI, while LXCBench is an analysis tool that benchmarks the performance of applications run in Linux containers.

Finally, there are several "proof of concept" (POC) projects that implement basic functionality to demonstrate APIs or features that GENIVI expects vehicle OEMs to replace. The browser POC is a Qt-and-WebKit based browser component. Similarly, the Tuner Station Manager demonstrates the use of the radio tuner API, and the Point-Of-Interest Service is a demonstration of the location-based service's group point-of-interest (POI) API. Web API Vehicle is an HTML5 application interface toolkit designed as a proof-of-concept to show application developers how to access various W3C web APIs in a vehicle.

Gicquel estimated that there were about 75 active contributors to the various GENIVI projects at present, more than 20 code repositories, and that they had written 500,000 lines of code so far. The alliance recently appointed longtime GENIVI developer Jeremiah Foster as the developer community manager, and is looking to open up even more of its development processes in the coming months. There are mailing lists for every project, he said, but the alliance is still in the process of "moving from silos to a community."

Automotive Grade Linux

The other major development community showing off new work at ALS was the Linux Foundation's Automotive Grade Linux (AGL) workgroup, which builds its software stack on top of the IVI flavor of Tizen.

How AGL, GENIVI, and Tizen IVI fit together (or don't) was certainly a source of confusion in years past, but as each of the projects has released more code, the picture has become clearer. Intel's Brett Branch presented a session on the first day of ALS that dealt with that very question (among others); as he explained, GENIVI provides a middleware layer, but vehicle OEMs will use GENIVI code in conjunction with a Linux base system to build a final product—and will likely include other components as well, such as AUTOSAR-compliant components for safety-critical systems.

Indeed, while GENIVI now offers two distinct baseline Linux distributions built on different platforms (an x86-based system built on Baserock and an ARM system built with Yocto), Tizen IVI rolls a single release, which includes components from GENIVI projects, code from elsewhere, and code written within the Tizen project itself.

At ALS 2013, there were a variety of talks about Tizen IVI components. Intel's Ossama Othman discussed the project's work implementing several new standards for mobile device integration. The driving force, he said, is consumers' desire to seamlessly move between their smartphone screen and their IVI unit for common applications like music playback or mapping. Othman's primary focus is implementing the display integration, where, he said, there are three main standards: SDL, MirrorLink, and Miracast.

MirrorLink and Miracast both work by cloning the smartphone's display to the IVI head unit, but they differ considerably otherwise. First, MirrorLink is a closed specification created by the Car Connectivity Consortium (CCC), while Miracast is an open standard created by the WiFi Alliance. Second, MirrorLink explicitly handles sending input events from the IVI head unit (e.g., gestures and touch input) back to the smartphone, but Miracast is a one-way, display-only standard. Miracast applications are thus left on their own to send data from the head unit back to the mobile device through some other means. Third, MirrorLink requires a USB connection between the device and head unit, whereas Miracast (as one might expect) is wireless—although it consumes a lot of bandwidth. Miracast also mandates use of the non-free H.264 video codec, which makes a "clean" free software implementation impossible. But MirrorLink requires every MirrorLink-compatible application to go through a certification process.

But car makers might be willing to foot the bill for H.264 licenses or MirrorLink certification, so Tizen IVI explores both. At the moment, Othman has implemented as much of Miracast as he can; many of the necessary pieces are out there already (such as support for WiFi Direct in the kernel and in wpa_supplicant), but for others he is still in the process of persuading people to release the necessary code under an open license. The Tizen IVI releases do ship with hardware support for H.264 playback, he said, and they have support for some WiFi Direct hardware.

The MirrorLink situation is far more bleak; there appear to be no open source implementations at all, but Othman said he is still looking. He also noted that it was unclear to him whether there were ways to get access to MirrorLink documentation without paying for membership in the CCC.

The third standard, SDL, is the most feasible to implement. The open source code (which is hosted at GENIVI) was a donation from Ford, but Othman said it was difficult to integrate. So far, Ford has not participated in the project since making the initial code contribution, and the existing code is "packaged weirdly" making it a pain to integrate. For example, it includes hardcoded links to libraries not included in Tizen and to specific executables (like Google Chrome), it statically links all of the libraries it uses, and it generates a number of symbol conflicts and other errors when compiled. Othman said he has not been able to figure out which version of g++ Ford used to compile it internally, but that it is clearly an out-of-date one. Nevertheless, he was able to patch it and it integrated in the latest Tizen releases.

The good news is that SDL is the most functional of the three; it implements a full remote display (not simply cloning the device screen), handles input events, and allows application developers to implement separate interfaces for the device and IVI screens.

Patrick Ohly gave a related presentation (which Othman encouraged his audience to attend) about the work he has done in Tizen IVI to integrate the synchronization of personal information (PIM) data like calendars and contacts databases between smartphones and IVI head units.

The Tizen IVI PIM stack is based on GNOME tools: Evolution Data Server, Folks, and SyncEvolution. This is a different stack than the one used in the Tizen smartphone releases, which does not address synchronization. Adapting the GNOME stack to Tizen IVI did require several changes, he said, such as storing all data in an SQLite database, and he has added several other useful tools, such as libphonenumber, an Android library for normalizing phone numbers. SyncEvolution is configured to support a configurable set of address books on different devices; it can thus present a unified address book for all phones synced to the IVI unit, but can keep them synchronized separately. He is currently working on CardDAV and CalDAV support (including supporting Google's PIM services), as well as support for the Bluetooth Phone Book Access Profile.

Down the road

To the smartphone development community, address book synchronization may sound like old news. But it and the other projects on display at ALS are significant because they reveal how much progress has been made in recent months. At the 2012 ALS, the biggest news was the availability of AGL Demonstrator, a demo IVI system that sported many UI mock-ups that would be filled by real applications today. Indeed, between the Tizen IVI milestone releases and GENIVI's base systems, there were three Linux-based IVI systems available this year.

There were still carmakers on stage discussing IVI systems that they had built but that were not yet available, but the tenor of the discussion has changed. In his keynote address, Jaguar Land Rover's Matt Jones commented that being in the IVI business meant that the company was expected to pony up membership fees for a wide assortment of industry alliances: AUTOSAR, CEA, DLNA, Bluetooth SIG, ERTICO, and many more. Together the membership fees add up to $800,000 or more, he said, but through its involvement with GENIVI and Tizen he has seen far return for the money that the company has put into open source projects. He ended that discussion by inviting anyone with an automotive open source project to come talk to him about it, since that is what he would prefer to invest his budget in. It will be interesting to see just how much bigger the playing field is at the next ALS, whether from Jones's investment or elsewhere.

[The author would like to thank the Linux Foundation for travel assistance to Edinburgh for ALS.]

Comments (4 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Living with the surveillance state; New vulnerabilities in chromium, dropbear, firefox, gnutls, roundcube, ...
Kernel: Vast amounts of Kernel Summit coverage
Distributions: Ubuntu Touch 1.0; Fedora, Musix, ...
Development: Developing the Opus and Daala codecs; Firefox 25; Cisco's open source H.264 codec; The first Tizen tablet; ...
Announcements: Linaro Connect videos, new books, events.

Next page: Security>>