LWN.net Weekly Edition for December 1, 2011
Releasing Samba 4
Samba 4 has been a long time in coming—and it seems to still be a ways off. Samba is a free implementation of Microsoft's SMB/CIFS protocol that is used in the vast majority of consumer-targeted network storage devices, but the current 3.x versions lack many of the features that enterprise users require (Active Directory support in particular) and Samba 4 is meant to address that shortcoming while also reworking and rewriting much of the code base. The biggest hurdle to a Samba 4 release seems to be integrating the new code with the existing pieces that currently reside in Samba 3. A recent proposal to release the current code "as is"—rather than complete the integration work before a 4.0 release—is under heavy discussion on the samba-technical mailing list.
There is a fair amount of impatience, it seems, for a Samba 4.0 release.
In the proposal, Andrew Bartlett notes that
vendors would like to build their products atop a stable release, rather
than alphas, and that finishing the integration work in 4.1 might allow a
final 4.0 release "in about three
months time
". Samba 4 has been in the works since 2003, with
several "technology preview" releases starting in 2006 and the first alpha
release in 2007, but a 4.0 final release has so far proved elusive.
In his proposal, Bartlett is seeking to route around the hurdles and get
a release out there.
Part of the problem with integrating the Active Directory (AD) Domain Controller (DC) work with the existing production Samba 3 code is that there needs to be a clear migration path for users who upgrade. If the existing Samba 3 file server code (often referred to as "smbd") were still shipped as an option, existing users would not need to change anything. Only users that were interested in moving to Samba-based DCs would need to start using bin/samba (which is the name of the Samba 4 server that includes AD and DC functionality).
But, some have always envisioned Samba 4 as a single server process that can handle all of the different roles (file server and AD DC), which necessitates the integration. Others are not so sure that it doesn't make sense to release a Samba 4 that incorporates the new functionality for those who need AD DC support, while leaving other users to continue using the older, working and well-tested code. Part of the problem seems to be that various different sub-groups within the project are taking their own approach, and no one has done a lot of development—or testing—of an integrated server solution.
Those who are currently testing DC setups with the Samba 4 code are using the simpler, single-process "ntvfs" file server code, rather than smbd. For that reason and others, Andrew Tridgell would like to see ntvfs still be available as an option:
But that doesn't sit well with Jeremy Allison, partly because of the maintenance burden:
That's a big sticking point for me. I though we'd decided s4 fileserver was smbd code - end of story. The details were how to do the integration.
Beyond just the embedded case, though, Tridgell sees value in keeping ntvfs alive, rather than forcing those users to switch to the smbd-based file server code. It's a matter of what testing has been done, as well as causing fewer disruptions to existing setups:
Tridgell also thinks that ntvfs has a design and structure that should eventually migrate into smbd. But, as he notes, his arguments haven't convinced Allison, so he's started working on a branch that uses the smbd server. That's only one of the problem areas, though. AD DCs handle far more than just file serving, they also perform authentication, DNS lookups, act as printer servers, and more.
But, once again, there are two versions of some of the services floating around. For example, winbind, which handles some authentication and UID/GID lookups has two separate flavors, neither of which currently handles everything needed for an AD DC. Tridgell and Bartlett have been looking into whether it makes sense to have a single winbind server for both worlds, seemingly coming to the conclusion that it doesn't. But Allison, Simo Sorce, and Matthieu Patou see that as another unnecessary split between existing and future functionality. Sorce is particularly unhappy with that direction, saying:
Part of the complaints about Bartlett's proposal is about positioning.
Samba 4 has been envisioned as a complete drop-in replacement for Samba 3.
To some, that means integrating the Samba 3 functionality into Samba 4, but for others, it could mean making the Samba 3 pieces available as
part of Samba 4. Tridgell and others are in the latter camp, but Allison looks at it this
way: "It isn't an integrated product yet,
it's just a grab bag of non-integrated features.
" He goes on to say
that it will put OEMs and Linux distributions "in an incredibly
difficult position w.r.t. marketing and communications
".
Everyone seems to agree that the ultimate goal is to integrate the two code bases, but there is enough clamor for a real release of the AD DC feature that some would like to see an interim release. One idea that seems to be gaining some traction is to do a "Samba AD" release using the Samba 4 code (and possibly including some of Samba 3 server programs). That release would be targeted at folks that want to run a Samba AD DC—as many are already doing using the alpha code—encouraging those who don't need that functionality to stay with Samba 3. As Géza Gémes put it:
Doing something like that would remove much of the pressure that the Samba team is feeling regarding a Samba 4 release. That would allow time to work out the various technical issues with integrating the two Sambas for an eventual Samba 4 release that fulfills the goal of having a single server for handling both the AD DC world and the simpler file-and-print-server world. As Gémes and others said, it's not a perfect solution, but it may well be one that solves most of the current problems.
The underlying issue seems to be that Samba 3 and Samba 4 have been on divergent development paths for some time now. While it was widely recognized that those paths would need to converge at some point, no real plan to do so has come about until now. Meanwhile, users and OEMs have been patiently—or not so patiently—waiting for the new features. It is probably still a ways off before the "real" Samba 4 makes an appearance, but plans are coming together, which is certainly a step in the right direction. Given that some have been using the AD DC feature for some time now, it probably makes sense to find a way to get it into people's hands. Not everyone is convinced of that last part, however, so it remains to be seen which way the project will go.
YaCy: A peer-to-peer search engine
Developers in the "free network services" movement often highlight Google services like GMail, Google Maps, and Google Docs when they speak about the shortcomings of centralized software-as-a-service products, but they rarely address the software behemoth's core product: web search. That may be changing now that the decentralized search engine YaCy has made its 1.0 release. But while the code may be 1.0, the search results may not scream "release quality."
The rationale given for YaCy is that a decentralized, peer-to-peer search service prohibits a central point-of-control and the problems that come with it: censorship, user tracking, commercial players skewing search results, and so forth. Elsewhere on the project site, the fact that a decentralized service also eliminates a central point of failure comes up, as does the potential for faster responses through load-balancing. But user control is the core issue: YaCy users determine which sites and pages get indexed, and it is possible to thoroughly explore the search index from a YaCy client.
In addition to "basic" search functionality indexing the entire web, individual administrators can point YaCy's crawler towards specific content, using it to create a search portal limited in scope to a particular topic, a specific domain, or a company intranet. On one hand, this design decision allows both custom "search appliances" using free software, and, with the same code base, puts the indexing choices directly into the hands of the search engine's users. On the other hand, the portion of the web indexed by the federated network of YaCy installations is much smaller than the existing indexes of Google and other commercial services — and perhaps more importantly, it grows more slowly as well.
The 1.0 release of YaCy is available for download in pre-built packages for Windows, Mac OS X, and Linux, as well as an Apt repository for Debian and Ubuntu. The package requires OpenJDK version 6 or later; once installed it provides a web interface at http://localhost:8090/ that includes searching, administration and configuration, and control over your local index creation. You can also register the local YaCy instance as a toolbar search engine in Firefox, although it must be running in order to function.
In a relatively new move, the project is also running a public web search portal at search.yacy.net. This portal accesses the primary "freeworld" network of public YaCy peers. There are other, independent YaCy networks that do not attempt to index the entire web, such as the scientific research index sciencenet.kit.edu. These portals make it possible to test out YaCy's coverage and results without installing the client software locally.
The architecture among peers
Broadly speaking, a network of YaCy peers (such as freeworld) maintains a single, shared reverse-word-index for all of the crawled pages (in other words, a database of matching URLs, ordered on the words that would make up likely search terms). The difference is that the index is sharded among the peers in a distributed hash table (DHT). Whenever a peer indexes a new set of pages, it writes updates to the DHT. Shards are replicated between multiple peers to bolster lookup speed and availability.
In practice, YaCy's DHT is a bit more complicated. The DHT does not store the full URL reference for every matching page — the complete entry includes not just the source URL, but metadata about it such as the last crawl time, the language, and the filetype (all of which might be important to the user performing the search). That information is stored in a local database on the peer that crawled the page, and replicated to several other peers. The peers' data is kept in a custom-written NoSQL database using AVL trees, a self-balancing search tree that gives logarithmic lookup, insert, and delete operations.
Each record in the DHT's index for a particular word contains a hash of the URL and a hash of the peer where the full reference is stored. Those two hashes are computed and stored separately, so that it is simple to determine that two matching hashes are entries for the same URL. That saves time because a URL that matches multiple search terms only has to be fetched from a peer once for a particular search.
Finally, for common terms, the number of URL references for a given word can become unwieldy, so YaCy partitions the DHT not just on full word-entry boundaries, but splits each word entry across multiple peers. That complicates the search process because matches for each word need to be retrieved from multiple peers, but it balances the load.
![[Network visualization]](https://static.lwn.net/images/2011/yacy-visualization-sm.png)
YaCy lead developer Michael Christen said that the freeworld network currently partitions word entries into 16 parts (although this is configurable, and he said the freeworld network may soon scale-up). The peer network is visualized as a circle, and the 16 shards evenly "spaced" around the circle. Two mirrors are created for each shard, and are placed in adjacent positions to the "left" and "right" of the primary location. The freeworld network claims around 1000 active peers with about 1500 passive peers (i.e., those not contributing to the public index); together they are currently indexing just under 888 million pages. You can see a live visualization of the network on the YaCy home page, and the YaCy client allows you to explore it more thoroughly (due to the DHT's circular design, the live visualization resembles a borderline-creepy pulsating human eye; it is still not clear to me whether or not this is intentional...).
Search me...
Because a search involves sending several stages of request to multiple peers (i.e., getting the matching URL/peer hashes from the DHT, then querying peers for the URL records), the lag time for a YaCy search is potentially much higher than it is for sending a query to a single search engine datacenter. However, because each YaCy peer is also storing its own portion of the full index in a local database, the system makes use of that existing data structure to speed things up.
First, for every search, a query to the local database is started concurrently with the remote search query. Secondly, each search's results are cached locally, so that on subsequent searches the local query will return more hits and not tax the network. This works best for the scenario when two searches performed in a row are part of a single search session — such as a search for "Linux" followed quickly by a refinement for "Linux" and "kernel." It is also presumed that a YaCy user is likely to have crawled pages that are of particular interest to them, so there is a better-than-average chance that relevant results will already be stored in the local database.
The search process in YaCy is complex not only because of the distributed storage system, but because a central server does not assemble and sort the results. Instead, the local YaCy web application must do so. It collects results from the local database (including cached results from previous queries) and from remote peers and sorts them together in a "Reverse Word Index Queue." The ranking algorithm used is similar to the PageRank used by Google, though Christen describes it as simpler. As you continue to use YaCy, however, it observes your actions and uses those to refine future search results.
YaCy next fetches the contents of each matching page. Christen admits that this is time-consuming, but it is done to prevent spamming and gaming the system — by fetching the page, the local web application can verify that the search terms actually appear in the page. The results are loaded as HTTP chunks so that they appear on screen as fast as possible.
![[Search results]](https://static.lwn.net/images/2011/yacy-searchresults-sm.png)
From the user's standpoint, the YaCy search interface is a good deal more complicated than the minimalist design sported by Google and its big-name competitors. The search box is clean, but the results page displays more detail for every entry than a production search engine might: there is a link to the metadata database entry for each hit, another link to details on the indexing and parsing history of the entry, and floating buttons to promote, demote, and bookmark each link. There is also a "filetype navigator," "domain navigator," and "author navigator" for the results page as a whole, along with a tag cloud.
Harvest season
As interesting as the query and replication design may be, the entire system would be useless without a sufficiently large set of indexed pages. As mentioned earlier, index creation is a task left up to the peers as well. The YaCy client software includes several tools for producing and managing the local peer's portion of the global index.
The current release includes seven methods for adding to the index: a site crawler that limits itself to a single domain, an "expert" crawler that will follow links to any depth requested by the user, a network scanner that looks for locally-accessible servers, an RSS importer that indexes individual entries in a feed, an OpenArchives Protocol for Metadata Harvesting (PMH) importer, and two specialized crawlers: one for MediaWiki sites, and one for phpBB3 forums.
YaCy can also be configured to index all URLs visited by the local browser. In this mode, it places several safeguards to protect against indexing personal and protected pages. It skips all pages fetched using POST or GET parameters, those requiring HTTP password authorization, and so on. You can also investigate the local database, pulling up records by word entry or hash (if you happen to know the hash of the word), and edit or delete the metadata stored. You can also blacklist problematic sites with regular expression matching; there is a shared YaCy Blacklist Engine available to all clients, although there is no documentation on its contents.
How ever you add to the global index, the actions you take on the search results also contribute to the search experience: you can promote or demote any entry using the +/- buttons on the results page. There is also an advanced-configuration tool with which you can tweak the weight of about two dozen separate factors in how your search results are sorted locally, including word frequency, placement on the page, appearance in named anchors, and so on. These customizations are local, though; they do not affect which pages the peers send in response to your queries.
Options and the future
The description above outlines the default, "basic" mode of YaCy usage. The administrative interface allows you to configure the YaCy client in several other modes, including intranet indexing and serving as a single-site search portal. You can also set up user accounts for use with a password-protected search engine (as might be the case for a company intranet), load "suggestion dictionaries," tweak the filetypes that YaCy will index, and adjust the memory usage and process priority.
Another wrinkle in YaCy administration is that the peer can be configured to also load search results culled from the Scroogle and Blekko search engines. Scroogle is an engine that scrapes Google, but removes user-tracking data from the results, while Blekko is a search engine dedicated to publishing its search algorithms and optimizations for public consumption.
Both speak to the need to "bootstrap" YaCy's global index. A glance at reader comments to other YaCy stories (such as LWN's announcement and Slashdot's coverage) indicates that many people have tried YaCy and found the ordering of the results to be lacking. The topic comes up repeatedly on the YaCy discussion boards, although Christen noted in an FSCONS 2010 talk that YaCy already has more pages in its index than Google did at the time that it launched.
Nevertheless, the YaCy team has recently been promoting a new idea to boost the size of the index: interoperability with the Apache Solr search platform.
From a practical standpoint, this is probably a good move. YaCy alone is not yet indexing enough of the web to be competitive with commercial search engines. Some modest tests of my own roughly match the experiences of the LWN and Slashdot commenters: YaCy can find big and obvious pages for popular topics, but the real meat of web search is the ability to find the difficult-to-discover content. From one point of view, YaCy is like any other crowd-sourced data initiative: the more people who participate, the better it gets. However, it is drastically different from Wikipedia or OpenStreetMap in one key regard: the partial coverage available during the ramp-up phase of the project makes the system unusable for real work. You can map your own home town and the map will be useful to you on a daily basis — but if you index your own web site, that does not help you find most of your search targets. Better interoperability with Solr and other open source search engines could help, as would a concerted effort to index important un-covered areas of the web (a replacement for Google Code Search comes to mind).
Still, the developers are quick to admit that YaCy is not a production service. At this point, the team is concerned with tackling the tricky problem of distributing indexing, searching, and page ranking over a peer-to-peer network. Which is an original problem, even if the current state of the index is not a major challenge to Google.
2011 Linux and free software timeline - Q1
Here is LWN's fourteenth annual timeline of significant events in the Linux and free software world for the year.
In many ways, 2011 is just like all the previous years we have covered—only the details have changed. Releases of new software and distributions continues at its normal ferocious rate, and Linux adoption (though perhaps not on the desktop) continues unabated. That said, the usual threats to our communities keep rearing their heads; in particular, the patent attacks against free software continue to increase. But, overall, it was a great year for Linux and free software, just as we expect 2012 (and beyond) to be.
We will be breaking the timeline up into quarters, and this is our report on January-March 2011. Over the next month, we will be putting out timelines of the other three quarters of the year.
This is version 0.8 of the 2011 timeline. There are almost certainly some errors or omissions; if you find any, please send them to timeline@lwn.net.
LWN subscribers have paid for the development of this timeline, along with previous timelines and the weekly editions. If you like what you see here, or elsewhere on the site, please consider subscribing to LWN.
For those with a nostalgic bent, our timeline index page has links to the previous 13 timelines and some other retrospective articles going all the way back to 1998.
January |
Linux 2.6.37 is released (announcement, KernelNewbies summary, Who wrote 2.6.37).
No more H.264 video codec support for the Chrome/Chromium browser as
Google focuses on WebM support (announcement,
update).
The Hudson continuous integration server project forks due to fallout from Oracle's acquisition of Sun. The new project is called Jenkins (announcement).
![[LibreOffice logo]](https://static.lwn.net/images/tl2011/libreoffice-logo.png)
LibreOffice makes its first stable release, 3.3 (announcement, LWN coverage).
OpenOffice.org also makes a 3.3 release (new features, release notes).
The FFmpeg project has a leadership coup, though it eventually resolves into a fork in March, which results in the Libav project (LWN blurb).
Amarok 2.4 is released (announcement).
-- Alan Cox won't miss the BKL
Mark Shuttleworth announces plans to include Qt and Qt-based applications on the default Ubuntu install (blog post).
Xfce 4.8 is released (announcement, LWN preview).
linux.conf.au is held in Brisbane, Australia despite the efforts of
Mother Nature to inundate it. Organizers were quick to move to a new venue after catastrophic
flooding, and
the conference came off without a hitch. (LWN coverage: Re-engineering the internet, IP
address exhaustion, Server power management,
The controversial Mark Pesce keynote, 30 years of sendmail, Rationalizing the wacom driver,
and a Wrap-up).
KDE Software Compilation 4.6 is released (announcement).
Bufferbloat.net launches as a site to work on solving networking performance problems caused by bufferbloat. (LWN blurb, web site).
February |
The last IPv4 address blocks are allocated by the Internet Assigned Numbers Authority (IANA) to the Asia-Pacific Network Information Center (APNIC), which would (seemingly) make the IPv6 transition even more urgent (announcement).
-- Mel Chua
FOSDEM is held February 5-6 in Brussels, Belgium (LWN coverage: Freedom Box, Distribution collaboration, and Configuration management).
Eben Moglen announces the FreedomBox Foundation as part of his
FOSDEM talk. A fundraising campaign on Kickstarter garners well over the
$60,000 goal. (LWN article).
Debian 6.0 ("Squeeze") is released (announcement, LWN pre-review).
The Ada Initiative launches to promote women in open technology and culture (announcement, LWN coverage).
-- Nokia CEO Stephen Elop foreshadows the switch to Windows
Nokia drops MeeGo in favor of Windows Phone 7 (LWN blurb, Reuters
report).
GNU Guile 2.0.0 released. Guile is an implementation of the Lisp-like Scheme language (announcement).
The MPEG Licensing Authority (MPEG-LA) calls for patents essential to VP8, as it is looking to form a patent pool to potentially shake down implementers of the video codec used by WebM (announcement).
A Linux-based supercomputer is a contestant on Jeopardy. IBM's "Watson" trounces two former champions (New York Times article).
![[Python logo]](https://static.lwn.net/images/tl2011/python-logo.png)
Python 3.2 released (announcement).
FreeBSD 8.2 released (announcement, release notes).
Southern California Linux Expo (SCALE) 9x is held in Los Angeles, February 25-27 (LWN coverage: Unity, Hackerspaces, Distribution unfriendly projects, and Phoronix launches OpenBenchmarking).
Canonical unilaterally switches the Banshee default music store to Ubuntu One (original blog post, update, and Mark Shuttleworth's view)
Red Hat stops shipping broken-out kernel patches for RHEL 6 which causes an uproar in the community and charges of GPL violations. It actually happened earlier, but came to light in February. (LWN coverage: Enterprise distributions and free software and Red Hat and the GPL; Red Hat statement).
March |
The vendor-sec mailing list and its host are compromised (announcement, LWN coverage).
![[Scientific Linux logo]](https://static.lwn.net/images/tl2011/sl-logo-64.png)
Scientific Linux 6.0 is released. (announcement).
The Yocto project and OpenEmbedded "align" both in terms of governance and technology, which should result in less fragmentation in the building of embedded Linux systems (announcement).
Linux 2.6.38 is released (announcement, KernelNewbies summary, and
Who wrote 2.6.38).
openSUSE 11.4 is released (announcement, LWN review).
Linus Torvalds starts loudly complaining about the ARM kernel tree, which leads to a large effort to clean it all up (linux-kernel post, LWN article).
-- Linus Torvalds is unimpressed by the Bionic GPL violation claims
Fraudulent SSL certificates issued by UserTrust (part of Comodo) are found in the wild (LWN blurb, article and follow-up).
Android's Bionic C library comes under fire for alleged GPL violations, though it appears to be a concerted fear, uncertainty, and doubt (FUD) campaign (LWN article).
Microsoft sues Barnes & Noble over alleged patent infringement in the Android-based Nook ebook reader (LWN blurb and article).
-- Dave Aitel
Firefox 4 is released, marking the beginning of Mozilla's new quarterly release schedule (announcement).
Google chooses not to release its tablet-oriented Android 3.0
("Honeycomb") source
code, because it isn't ready for both tablets and handsets (LWN article).
The Monotone distributed version control system releases its 1.0 version (announcement).
GCC 4.6.0 is released (LWN blurb, release notes).
Security
Printer vulnerabilities via firmware update
Regular readers of this page will not find it surprising to hear about attacks against hardware, typically through the firmware installed on them. The recent report about a vulnerability in HP laser printers falls into that category, but there are some twists. The researchers at Columbia University certainly picked an attention-getting example when they were able to alter the printer firmware and nearly set the paper being printed on fire, but HP's reaction to the flaw, at least so far, is eye-opening as well.
The flaw is a simple one, evidently. Print jobs sent to the printers are
scanned to see if they contain a firmware update, if so, the update is
installed. Crucially, the update is not checked for any kind of digital
signature, nor is user input requested before performing the update. In
the msnbc report, HP's
Keith Moore, chief technologist for the printer division, said that
printers since 2009 have required signed updates, but the Columbia
researchers "say they purchased one of the printers they hacked in
September at a major New York City office supply
store
". Regardless, there are certainly millions of pre-2009 HP
laser printers in service that are presumably vulnerable.
The researchers were able to rewrite the firmware so that it "would
continuously heat up the printer's fuser — which is designed to dry
the ink
once it's applied to paper — eventually causing the paper to turn brown
and smoke
". Before the paper could catch fire, though, a "thermal
breaker" shut down the printer—seemingly permanently. In a press
release, HP said that the breaker is designed to thwart just that kind
of problem. The company also said that the breaker "cannot be
overcome by a firmware change or this proposed
vulnerability
". That's certainly a nice safety feature, but disabled
printers definitely make for a painful denial-of-service attack.
There are several other interesting parts of the rather defensively worded press release. According to HP, no customers have reported suffering from these firmware-rewrite attacks, but it's unclear how those customers would know. Obviously, if their printers were emitting brown, smoking paper, there would be little question, but the researchers demonstrated other kinds of attacks that would be more difficult to detect:
As might be guessed, HP tries to minimize the extent of the problem, but it's not yet clear that the company completely understands the ramifications. From the press release:
Given the attack vector, submitted print jobs, it's a bit hard to believe that only Linux or Mac systems can trigger the problem. While that may be the case, it seems much more likely that there are ways to coerce Windows into submitting jobs with firmware upgrades as well. How else would customers running Windows do a firmware update? Even if Windows is somehow prevented from sending a corrupted print job, it's pretty uncommon today to find a corporate network with no Mac or Linux machines on them.
It's also rather disingenuous to suggest that printers behind firewalls (on networks with no malicious users) are somehow immune. Again, that could be the case, but it is far more likely that malware of various sorts could cause jobs to be sent to printers. A firewall doesn't necessarily prevent web or email-based attacks, for example, and anti-virus software is unlikely to be looking for malware exploiting printer vulnerabilities.
It doesn't take much imagination to come up with other attacks beyond those demonstrated. Printers could be used as part of a botnet, as bridgeheads to launch further attacks on a corporate network, and so on. Like many devices, printers are fairly capable general-purpose computers under the covers, even if they tend to have fewer resources (e.g. CPU horsepower, RAM) than desktops or servers.
HP has said that it will put out a firmware update to fix the problem, but it will be a challenge to get those patches installed on all of the affected devices. And, as pointed out in the msnbc report, any printers that are already infected—if attackers have previously discovered the hole—may well reject any further attempts to upgrade them. In addition, while the researchers found the problem in LaserJets, there is no reason to believe that other printers—or other networked devices, from HP and others—don't suffer from similar flaws. In many ways, embedded device security is in its infancy.
It is a difficult balancing act, however. If recent HP printers will only accept firmware updates that are signed using HP's keys, that solves the problem of this kind of attack, but leaves a different problem in its wake: lockdown by a manufacturer. As we have seen with TiVo, PlayStation 3, locked-down mobile phones, and other devices, manufacturers may be able to add anti-features, disable previously working features, and generally interfere with the owner's wishes when only they hold the keys to a device.
It is, in some ways, similar to the UEFI secure boot issues that have been in the news recently. In both cases, customers that want to actually own their devices are going to need a way to store their own key and have it be trusted by the device. That may be overkill for printers or other devices, so manufacturers could just require some manual, user-present action (e.g. press the OK button) to do a firmware upgrade. Doing it that way may be painful for corporate IT departments that need to upgrade hundreds of printers at once, but the alternative, ceding all upgradability only to the manufacturer, has some major downsides as well.
Brief items
Security quotes of the week
This kind of thing only serves to ratchet up fear, and doesn't make us any safer.
New vulnerabilities
apt: repository credential disclosure
Package(s): | apt | CVE #(s): | CVE-2011-3634 | ||||
Created: | November 28, 2011 | Updated: | November 30, 2011 | ||||
Description: | From the Ubuntu advisory:
It was discovered that APT incorrectly handled the Verify-Host configuration option. If a remote attacker were able to perform a man-in-the-middle attack, this flaw could potentially be used to steal repository credentials. This issue only affected Ubuntu 10.04 LTS and 10.10. | ||||||
Alerts: |
|
glibc: multiple vulnerabilities
Package(s): | glibc | CVE #(s): | CVE-2011-1089 CVE-2011-1659 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 28, 2011 | Updated: | December 7, 2011 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Mandriva advisory:
The addmntent function in the GNU C Library (aka glibc or libc6) 2.13 and earlier does not report an error status for failed attempts to write to the /etc/mtab file, which makes it easier for local users to trigger corruption of this file, as demonstrated by writes from a process with a small RLIMIT_FSIZE value, a different vulnerability than CVE-2010-0296 (CVE-2011-1089). Integer overflow in posix/fnmatch.c in the GNU C Library (aka glibc or libc6) 2.13 and earlier allows context-dependent attackers to cause a denial of service (application crash) via a long UTF8 string that is used in an fnmatch call with a crafted pattern argument, a different vulnerability than CVE-2011-1071 (CVE-2011-1659). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
hardlink: multiple vulnerabilities
Package(s): | hardlink | CVE #(s): | CVE-2011-3630 CVE-2011-3631 CVE-2011-3632 | ||||||||||||
Created: | November 24, 2011 | Updated: | August 20, 2012 | ||||||||||||
Description: | From the Fedora advisory: CVE-2011-3630 hardlink: Multiple stack-based buffer overflows when run on a tree with deeply nested directories CVE-2011-3631 hardlink: Multiple integer overflows, when adding string lengths CVE-2011-3632 hardlink: Prone to symlink attacks | ||||||||||||||
Alerts: |
|
kernel: denial of service
Package(s): | kernel | CVE #(s): | CVE-2011-4110 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 25, 2011 | Updated: | December 27, 2011 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat bugzilla:
A flaw was found in the way Linux kernel handled user-defined key types. An unprivileged local user could use this flaw to crash the system. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
kernel: multiple vulnerabilities
Package(s): | kernel | CVE #(s): | CVE-2011-4326 CVE-2011-3593 CVE-2011-3359 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 28, 2011 | Updated: | November 30, 2011 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Oracle advisory:
A flaw was found in the way the Linux kernel handled fragmented IPv6 UDP datagrams over the bridge with UDP Fragmentation Offload (UFO) functionality on. A remote attacker could use this flaw to cause a denial of service. (CVE-2011-4326, Important) A flaw was found in the way the Linux kernel handled VLAN 0 frames with the priority tag set. When using certain network drivers, an attacker on the local network could use this flaw to cause a denial of service. (CVE-2011-3593, Moderate) allocate receive buffers big enough for max frame len + offset (Maxim Uvarov) {CVE-2011-3359} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
kernel: denial of service
Package(s): | kernel | CVE #(s): | CVE-2011-2203 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 29, 2011 | Updated: | November 30, 2011 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Red Hat advisory:
A NULL pointer dereference flaw was found in the Linux kernel's HFS file system implementation. A local attacker could use this flaw to cause a denial of service by mounting a disk that contains a specially-crafted HFS file system with a corrupted MDB extent record. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
net6: multiple vulnerabilities
Package(s): | net6 | CVE #(s): | CVE-2011-4093 CVE-2011-4091 | ||||||||||||||||
Created: | November 25, 2011 | Updated: | January 5, 2012 | ||||||||||||||||
Description: | From the Red Hat bugzilla::
Vasiliy Kulikov reported that libnet6 did not check the basic_server::id_counter for integer overflows. This number is used to distinguish different users, so an attacker that was able to open UINT_MAX successive connections could get an identifier of an already existing connection, allowing them to hijack that user's connection. (CVE-2011-4093) Vasiliy Kulikov reported that libnet6 would check for user color collisions prior to authentication. This could allow for the disclosure of certain user information by users that were not authenticated. (CVE-2011-4091) | ||||||||||||||||||
Alerts: |
|
rest, libsocialweb: multiple vulnerabilities
Package(s): | rest, libsocialweb | CVE #(s): | CVE-2011-4129 | ||||||||||||||||
Created: | November 25, 2011 | Updated: | November 23, 2012 | ||||||||||||||||
Description: | A connection to twitter servers is is established by default, whether you want them or not. See the Red Hat bugzilla for details. | ||||||||||||||||||
Alerts: |
|
ReviewBoard: cross-site scripting
Package(s): | ReviewBoard | CVE #(s): | CVE-2011-4312 | ||||||||
Created: | November 29, 2011 | Updated: | November 30, 2011 | ||||||||
Description: | From the Red Hat bugzilla:
A cross-site scripting (XSS) flaw was found in the way the commenting system of the ReviewBoard, a web-based code review tool, sanitized user input (new comments to be loaded). A remote attacker could provide a specially-crafted URL, which once visited by valid ReviewBoard user could lead to arbitrary HTML or web script execution in the 'diff viewer' or 'screenshot pages' components. | ||||||||||
Alerts: |
|
update-manager: multiple vulnerabilities
Package(s): | update-manager | CVE #(s): | CVE-2011-3152 CVE-2011-3154 | ||||||||
Created: | November 28, 2011 | Updated: | February 16, 2012 | ||||||||
Description: | From the Ubuntu advisory:
David Black discovered that Update Manager incorrectly extracted the downloaded upgrade tarball before verifying its GPG signature. If a remote attacker were able to perform a man-in-the-middle attack, this flaw could potentially be used to replace arbitrary files. (CVE-2011-3152) David Black discovered that Update Manager created a temporary directory in an insecure fashion. A local attacker could possibly use this flaw to read the XAUTHORITY file of the user performing the upgrade. (CVE-2011-3154) | ||||||||||
Alerts: |
|
Page editor: Jake Edge
Kernel development
Brief items
Kernel release status
The current development kernel is 3.2-rc3, released on November 23. "Anyway, whether you will be stuffing yourself with turkey tomorrow or not, there's a new -rc out. I'd love to say that things have been calming down, and that the number of commits just keep shrinking, but I'd be lying. -rc3 is actually bigger than -rc2, mainly due to a network update (none in -rc2) and with Greg doing his normal usb/driver-core/tty/staging thing." Linus appears to be back on the Wednesday release schedule, so expect -rc4 sometime shortly after this page is published.
Stable updates: the 2.6.32.49, 3.0.11, and 3.1.3 stable kernel updates were released on November 28; they contained a long list of fixes and (for 3.x) one bit of USB driver breakage. The 3.0.12 and 3.1.4 updates came out shortly thereafter with one patch to fix that problem.
Quotes of the week
DM-Steg
DM-Steg is a kernel module that adds steganographic encryption to the device mapper. "Steganographic" means that the encrypted data is hidden to the point that its very existence can be denied. "Steg works with substrates (devices containing ciphertext) to export plaintext- containing block devices, known as aspects, to the user. Without having the key(s), there is no way of determining how many aspects a substrate contains, or if it contains any aspects at all." The initial release of this module has just been announced. "
The code has only ever been tested on my PC, but it works very nicely for me and has stopped eating my data, so I figure it's ready for public consumption!" See this document [PDF] for details.
Kernel development news
Routing Open vSwitch into the mainline
Visitors to the features page on the Open vSwitch web site may be forgiven if they do not immediately come away with a good understanding of what this package does. The feature list is full of enlightening bullet points like "LACP (IEEE 802.1AX-2008)", "802.1ag link monitoring", and "Multi-table forwarding pipeline with flow-caching engine". Behind the acronyms, Open vSwitch is a virtual switch that has already seen a lot of use in the Xen community and which is applicable to most other virtualization schemes as well. After some years as an out-of-tree project, Open vSwitch has recently made a push for inclusion into the mainline kernel.Open vSwitch is a network switch; at its lowest level, it is concerned with routing packets between interfaces. It is aimed at virtualization users, so, naturally, it is used in the creation of virtual networks. A switch can be set up with a number of virtual network interfaces, most of which are used by virtual machines to communicate with each other and the wider world. These virtual networks can be connected across hosts and across physical networks. One of the key features of Open vSwitch appears to be the ability to easily migrate virtual machines between physical hosts and have their network configuration (addresses, firewall rules, open connections, etc.) seamlessly follow.
Needless to say, there is no shortage of features beyond making it easier to move guests around. Open vSwitch offers a long list of options for access control, quality-of-service control, network bridging, traffic monitoring, and more. The OpenFlow protocol is supported, allowing the integration of interesting protocols and controllers into the network. Open vSwitch has been shipped as part of a number of products and it shows; it has the look of a polished, finished offering.
Most of Open vSwitch is implemented in user space, but there is one kernel module that makes the whole thing work; that module was submitted for review in mid-November. Open vSwitch tries to make use of existing networking features to the greatest extent possible; the kernel module mostly implements a control interface allowing the user-space code to make routing decisions. Routing packets through user space would slow things down considerably, so the interface is set up to avoid the user-space round trip whenever possible.
When the Open vSwitch module receives a packet on one of its interfaces, it generates a "flow key" describing the packet in general terms. An example key from the submission is:
in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, frag=no), tcp(src=49163, dst=80)
Most of the fields should be fairly self-explanatory; this key describes a packet that arrived on port (interface) 1, aimed at TCP port 80 on host 172.18.0.52. If Open vSwitch does not know how to process the packet, it will pass it to the user-space daemon, along with the generated flow key. The daemon can then decide what should be done; it will also, normally, pass a rule back to the kernel describing how to handle related packets in the future. These rules start with the flow key, which may be generalized somewhat, and include a set of associated actions. Possible actions include:
- Output the packet to a specific port, forwarding it on its way
to its final destination.
- Send the packet to user space for further consideration. The
destination process may or may not be the main Open vSwitch control
daemon.
- Make changes to the packet header on its way through; network address
translation could be implemented this way, for example.
- Add an 802.1Q virtual LAN header in preparation for tunneling the
packet to another host; there is also an action for stripping such
headers at the receiving end.
- Record attributes of the packet for statistics generation.
Once a rule for a given type of packet has been installed into the kernel, future packets can be routed quickly without the need for further user-space intervention. If the switch is working properly, most packets should never need to go through the control daemon.
Open vSwitch, by all appearances, is a useful and powerful mechanism; the networking developers seem to agree that it would be a good addition to the kernel. There is, however, some disagreement over the implementation. In particular, the patch adds a new packet classification and control mechanism, but the kernel already has a traffic control system of its own; duplicating that infrastructure is not a popular idea. As Jamal Hadi Salim put it:
Jamal suggested that Open vSwitch could add a special-purpose classifier for its own needs, but that classifier should fit into the existing traffic control subsystem.
That said, there seems to be some awareness within the networking community
that the kernel's traffic controller may not quite be up to the task. Eric
Dumazet noted that its scalability is not
what it could be and that the code reflects its age; he said: "Maybe
its time to redesign a new model, based on modern techniques.
"
Others seemed to agree with this assessment. The traffic controller, it
appears, is in need of serious improvements or replacement regardless of
what happens with Open vSwitch.
The fact that the traffic controller is not everything Open vSwitch needs will not normally be considered an adequate justification for duplicating its infrastructure, though. The obvious options available to the Open vSwitch developers will be to (1) improve the traffic controller to the point that it does work, or (2) position the Open vSwitch controller as a plausible long-term replacement. Neither task is likely to be easy. The outcome of this discussion may well be that developers who were hoping to merge their existing code will find themselves tasked with a fair amount of infrastructural work.
That can be the point where those developers take option (3): go away and continue to maintain their code out of tree. Requiring extra work from contributors can cause them to simply give up. But if the networking maintainers accept duplicated subsystems, the likely outcome is a lot of wasted work and multiple implementations of the same functionality, none of which is as good as it should be. There are solid reasons behind the maintainers' tendency to push back against that kind of contribution; without that pushback, the long-term maintainability of the kernel will suffer.
How things will be resolved in the case of Open vSwitch remains to be seen; the discussion is ongoing as of this writing. Open vSwitch is a healthy and active project; it may well have the resources and the desire to perform the necessary work to get into the mainline and ease its own long-term maintenance burden. Meanwhile, as was discussed at the 2011 Kernel Summit, code that is being shipped and used has value; sometimes it is best to get it into the mainline and focus on improving it afterward. Some developers (such as Herbert Xu) seem to think that may be the best approach to take in this case. So Open vSwitch may yet find its way into the mainline in the near future with the idea that its internals can be fixed up thereafter.
Hardware face detection
Once upon a time, a "system on chip" (SOC) was a package containing a processor and some number of I/O controllers. While SOCs still have all that, manufacturers have been busy adding hardware support for all kinds of interesting functionality. For example, OMAP4 processors have an onboard face detection module that can be used for camera focus control, "face unlock" features, and more. Naturally, there is interest in making use of such features in Linux; a recent driver submission shows that the question of just how to do that has not yet been answered, though.
The OMAP4 face recognition detection driver was
submitted by Tom Leiming, but was apparently written by Ming Lei. Upon
initialization, the driver allocates a memory area which is made available
to an application via mmap(). The application places an image in
that area (it seems that a 320x240 grayscale PGM image is the only supported
option), then uses a number of ioctl() operations to specify the
area of interest and to start and stop the image recognition process. A
read() on the device will, once detection is complete, yield a
number of structures describing the locations of the faces in the image as
rectangles.
Face detection functionality is clearly welcome, but this particular driver has a lot of problems and will not get into the mainline in anything resembling its current state. The most significant criticism, though, came from Alan Cox, who asked that, rather than being implemented as a standalone device, face detection be integrated into the Video4Linux2 framework.
In truth, V4L2 is probably the right place for this feature. Face detection is generally meant to be used with the camera controller integrated into the same SOC and the face detection hardware may be tightly tied to that controller. The media controller subsystem was designed for just this kind of functionality; it provides a mechanism by which camera data may (or may not) be routed to the face detection module as needed. Integration into V4L2 would bring the face detection module under the same umbrella as the rest of the video processing hardware and export the necessary data routing capabilities to user space.
The design of the user-space interface for this functionality seems likely to pose challenges of its own, though. The OMAP4 hardware is relatively simple in its operation; it appears to even lack the ability to work with multiple image formats, even moderately high-resolution images, or color data. Future hardware will certainly not be so limited. It is also not hard to imagine a shift from detection of any face to recognition of specific faces - or, at least, the generation of metrics to ease the association of faces and the identities of their owners. The hardware could become capable of blink detection, distinguishing real faces from pictures of faces, or determining when a face belongs to a poker player who is bluffing. Designing an API that can handle this kind of functionality is going to be an interesting task.
But it does not stop there. There is a discouragingly large market out there for devices capable of reading automobile license plates, for example. There is money in meeting the needs of the contemporary surveillance state, so manufacturers will certainly compete to provide the needed capabilities. In general, the world is filled with interesting things that are not faces; it is not hard to imagine that people will be able to do useful things with devices that can pick all kinds of high-level objects out of image data.
In general, we may be seeing a shift in what kinds of peripherals are attached to our processors. There will always be plenty of devices that serve essentially (from the CPU's point of view) as channels moving chunks of data in one direction or the other. But there will be more and more devices that offload some type of processing, and that is going to present some interesting ABI challenges. Hardware-based offload engines are nothing new, of course. But, once upon a time, offload devices mostly performed tasks otherwise handled by the operating system kernel. Integrated controllers and network protocol offload functionality are a couple of obvious examples. More recently, though, hardware has provided functionality that needs to be made available to user space. And that changes the game somewhat.
If one looks for examples of this kind of functionality, one almost certainly needs to start at the GPU found in most graphics cards. Creating a workable (and stable) user-space ABI providing access to the GPU has taken many years, and it is not clear that the job is done yet. The media controller ABI controls routing of data among the numerous interesting functional units in contemporary video processors, but writing a hardware-independent application using the media controller is hard. Creating a workable interface for the wide variety of available industrial sensors has also been a multi-year project.
Trying to anticipate where this kind of hardware will go in an attempt to create the perfect ABI from the outset seems like an exercise in futility. Most likely it will have to be done the way we've always done it: come up with something that seems reasonable, learn (the hard way) what it's shortcomings are, then begin the long process of replacing it with something better. It is not an ideal way to create an operating system, but it seems to be better than the alternatives. Figuring out the best way to support face detection will just be another step in this ongoing process.
Improving ext4: bigalloc, inline data, and metadata checksums
It may be tempting to see ext4 as last year's filesystem. It is solid and reliable, but it is based on an old design; all the action is to be found in next-generation filesystems like Btrfs. But it may be a while until Btrfs earns the necessary level of confidence in the wider user community; meanwhile, ext4's growing user base has not lost its appetite for improvement. A few recently-posted patch sets show that the addition of new features to ext4 has not stopped, even as that filesystem settles in for a long period of stable deployments.
Bigalloc
In the early days of Linux, disk drives were still measured in megabytes and filesystems worked with blocks of 1KB to 4KB in size. As this article is being written, terabyte disk drives are not quite as cheap as they recently were, but the fact remains: disk drives have gotten a lot larger, as have the files stored on them. But the ext4 filesystem still deals in 4KB blocks of data. As a result, there are a lot of blocks to keep track of, the associated allocation bitmaps have grown, and the overhead of managing all those blocks is significant.
Raising the filesystem block size in the kernel is a dauntingly difficult task involving major changes to memory management, the page cache, and more. It is not something anybody expects to see happen anytime soon. But there is nothing preventing filesystem implementations from using larger blocks on disk. As of the 3.2 kernel, ext4 will be capable of doing exactly that. The "bigalloc" patch set adds the concept of "block clusters" to the filesystem; rather than allocate single blocks, a filesystem using clusters will allocate them in larger groups. Mapping between these larger blocks and the 4KB blocks seen by the core kernel is handled entirely within the filesystem.
The cluster size to use is set by the system administrator at filesystem creation time (using a development version of e2fsprogs), but it must be a power of two. A 64KB cluster size may make sense in a lot of situations; for a filesystem that holds only very large files, a 1MB cluster size might be the right choice. Needless to say, selecting a large cluster size for a filesystem dominated by small files may lead to a substantial amount of wasted space.
Clustering reduces the space overhead of the block bitmaps and other management data structures. But, as Ted Ts'o documented back in July, it can also increase performance in situations where large files are in use. Block allocation times drop significantly, but file I/O performance also improves in general as the result of reduced on-disk fragmentation. Expect this feature to attract a lot of interest once the 3.2 kernel (and e2fsprogs 1.42) make their way to users.
Inline data
An inode is a data structure describing a single file within a filesystem. For most filesystems, there are actually two types of inode: the filesystem-independent in-kernel variety (represented by struct inode), and the filesystem-specific on-disk version. As a general rule, the kernel cannot manipulate a file in any way until it has a copy of the inode, so inodes, naturally, are the focal point for a lot of block I/O.
In the ext4 filesystem, the size of on-disk inodes can be set when a filesystem is created. The default size is 256 bytes, but the on-disk structure (struct ext4_inode) only requires about half of that space. The remaining space after the ext4_inode structure is normally used to hold extended attributes. Thus, for example, SELinux labels can be found there. On systems where extended attributes are not heavily used, the space between on-disk inode structures may simply go to waste.
Meanwhile, space for file data is allocated in units of blocks, separately from the inode. If a file is very small (and, even on current systems, there are a lot of small files), much of the block used to hold that file will be wasted. If the filesystem is using clustering, the amount of lost space will grow even further, to the point that users may start to complain.
Tao Ma's ext4 inline data patches may change that situation. The idea is quite simple: very small files can be stored directly in the space between inodes without the need to allocate a separate data block at all. On filesystems with 256-byte on-disk inodes, the entire remaining space will be given over to the storage of small files. If the filesystem is built with larger on-disk inodes, only half of the leftover space will be used in this way, leaving space for late-arriving extended attributes that would otherwise be forced out of the inode.
Tao says that, with this patch set applied, the space required to store a kernel tree drops by about 1%, and /usr gets about 3% smaller. The savings on filesystems where clustering is enabled should be somewhat larger, but those have not yet been quantified. There are a number of details to be worked out yet - including e2fsck support and the potential cost of forcing extended attributes to be stored outside of the inode - so this feature is unlikely to be ready for inclusion before 3.4 at the earliest.
Metadata checksumming
Storage devices are not always as reliable as we would like them to be; stories of data corrupted by the hardware are not uncommon. For this reason, people who care about their data make use of technologies like RAID and/or filesystems like Btrfs which can maintain checksums of data and metadata and ensure that nothing has been mangled by the drive. The ext4 filesystem, though, lacks this capability.
Darrick Wong's checksumming patch set does not address the entire problem. Indeed, it risks reinforcing the old jest that filesystem developers don't really care about the data they store as long as the filesystem metadata is correct. This patch set seeks to achieve that latter goal by attaching checksums to the various data structures found on an ext4 filesystem - superblocks, bitmaps, inodes, directory indexes, extent trees, etc. - and verifying that the checksums match the data read from the filesystem later on. A checksum failure can cause the filesystem to fail to mount or, if it happens on a mounted filesystem, remount it read-only and issue pleas for help to the system log.
Darrick makes no mention of any plans to add checksums for data as well. In a number of ways, that would be a bigger set of changes; checksums are relatively easy to add to existing metadata structures, but an entirely new data structure would have to be added to the filesystem to hold data block checksums. The performance impact of full-data checksumming would also be higher. So, while somebody might attack that problem in the future, it does not appear to be on anybody's list at the moment.
The changes to the filesystem are significant, even for metadata-only checksums, but the bulk of the work actually went into e2fsprogs. In particular, e2fsck gains the ability to check all of those checksums and, in some cases, fix things when the checksum indicates that there is a problem. Checksumming can be enabled with mke2fs and toggled with tune2fs. All told, it is a lot of work, but it should help to improve confidence in the filesystem's structure. According to Darrick, the overhead of the checksum calculation and verification is not measurable in most situations. This feature has not drawn a lot of comments this time around, and may be close to ready for inclusion, but nobody has yet said when that might happen.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Distributions
Pushing Python3
It has been almost exactly three years since the Python 3.0 release was announced. This release deliberately broke compatibility with version 2.x of the language, leaving a lot of old baggage behind with the idea of easing future development of both Python and programs written in Python. It is fair to say that Python3 has not yet displaced its predecessor; as can be seen on the Python3 wall of shame site, there is still a lot of Python2-only code out there and more continues to be written. Before writing off Python3 as a failure, though, it is worth looking at some of the work being done to push the transition to this version of the language, much of which is happening in the context of distributions.The most ambitious Python3 work, arguably, is happening at Ubuntu under the guidance of longtime Python hacker Barry Warsaw. The Ubuntu developers are working to port a number of desktop applications to Python3, with the idea of pushing the necessary changes upstream. If upstream is not receptive to the changes, Ubuntu will do the port regardless:
The list of specific applications has not been posted anywhere, but the long-term goal has been made quite clear:
The other distribution that has done a lot of Python3-related work is Fedora, though, as described by Toshio Kuratomi, the emphasis is a bit different. There is no big push to port specific applications or to set a deadline for pushing Python2 out of the default install. The work, instead, is more low-level:
The results of this work can be seen on the Fedora Python3 page. The Python3 interpreter itself was added in the Fedora 13 release; since then, work has gone into increasing the number of modules available for developers wanting to work on Python3 applications. Quite a bit of the necessary module support is now in place, but there is also still a long list of modules that have either not been ported to Python3, or that have not yet been packaged for Fedora.
In comparison, the Debian Python3 page seems like a desultory effort. There are some reports that Python3 on Gentoo is currently a bit painful to use; Gentoo seems to be short of developers able to work in this area. OpenSUSE packages Python3, but there does not appear to be any public information about an organized push toward a transition there. These distributions, it seems, are mostly waiting to see what happens elsewhere.
Distributors can play a major role in the adoption of major new language versions. Some of us still remember the pain caused by Red Hat's slow transition to Python2 many years ago. Nobody can accuse anybody of having acted with undue haste with regard to Python3, but it does seem that some distributors have decided that it is time to make something happen in that area. As that push gains momentum, we may be hearing more about Python3 in the next year or two.
Brief items
Distribution quote of the week
...
My plan, which still stands, is to remove dpatch in 6 years time. I'm fairly sure we can come to an agreement within that timeframe.
Linux Mint 12 released
The much-hyped Linux Mint 12 release has been announced. There is plenty of new stuff in this release; see the "what's new" page, the release notes, and this LWN article for more information. "The Linux Mint 12 desktop is a mix of old and new. It's a brand new desktop but with traditional components. The new technology in Gnome 3 is exciting but the components contributed by MGSE make users feel at home. Linux Mint 12, like previous releases, and despite the fact that it's based on Gnome 3, looks and behaves like a Mint desktop."
Mageia 2 Alpha 1
Mageia 2 Alpha 1 is available for testing. "Now it's time for everyone to test, test, test and report bugs - so that Mageia 2 will be in great shape for release in May. You will find more info on our freshly migrated wiki"
Distribution News
openSUSE
Advance discontinuation notice for openSUSE 11.3
The SUSE Security Team and the SUSE sponsored openSUSE maintenance team will stop releasing updates for openSUSE 11.3 after January 16, 2012. "As a consequence, the openSUSE 11.3 distribution directory on our server download.opensuse.org will be removed from /distribution/11.3/ to free space on our mirror sites. The 11.3 directory in the update tree /update/11.3 will follow, some time after all updates have been published." It is possible that maintenance will continue through the Evergreen project, but that is not certain.
Update: Some of the dates in the original advisory were incorrect. See this update for the correct dates.
Newsletters and articles of interest
Distribution newsletters
- DistroWatch Weekly, Issue 433 (November 28)
- Maemo Weekly News (November 28)
- openSUSE Weekly News, Issue 203 (November 26)
- Ubuntu Weekly Newsletter, Issue 243 (November 27)
People behind Debian: Stefano Zacchiroli
Raphaël Hertzog interviews Debian project leader Stefano Zacchiroli. "As a project, we seem to be more appealing to packagers than to software developers. That is a pity given the amount of exciting coding tasks that are everywhere in Debian. Part of the reason we are not appealing to developers is that we are not particularly good at collecting coding tasks in a place where interested developers could easily pick them up. It also takes quite a bit of inside knowledge to spot infrastructure bugs and understand how to fix them."
CrunchBang 10 "Statler" refresh R20111125 (Linux Journal)
Linux Journal takes a look at the latest version of CrunchBang. "In brief, it's a lightweight desktop OS that uses Debian Stable 6.0 as its base. The biggest change in the latest refresh is that the developer has jettisoned the Xfce version in order to become a pure Openbox distro. I'm a fan of Xfce, but I welcome the decision of developer Philip Newborough aka corenominal. The truth is that there are other Xfce based distros to choose from such as my personal current favorite, Xubuntu. One of the biggest challenges for a smaller Linux distro is to carve a useful niche for itself, and if this helps him to hone the Openbox experience, all the better."
Page editor: Rebecca Sobol
Development
New tools for open source font development
Open font development revolves around a set of tools that is small even by open source standards. There is the GUI font editor FontForge (which is used to edit the vector outlines of characters, but which is difficult to use for other important tasks like editing spacing, kerning, or substitution tables), and there are a handful of small, standalone utilities that focus on individual tasks (like checking glyph coverage, or comparing hinting and rendering options). As a result, it is proportionally bigger news when new projects are unveiled, and three such projects have recently appeared.
Google's sfntly toolkit
The largest project is Google's sfntly toolkit, which is an Apache-licensed library for inspecting and altering font tables, but also includes some simple command-line tools built on top. The project was announced on November 18, and according to the blog announcement, it has been in use for a while by the team behind Google's Web Fonts initiative.
Sfntly code is available in both Java and C++ implementations. It can read and parse the tables in TrueType or OpenType fonts, which share the same underlying sfnt format. Under the hood, those formats consist of a collection of lookup tables that contain things like the mapping between character codes and the glyphs of the font, the glyphs themselves, horizontal and vertical spacing for each glyph, and so on. The Java version of the toolkit is self-contained; the C++ version requires some external dependencies: a testing framework from Google, and the International Components for Unicode (ICU) library.
At their simplest, the sfntly command-line tools can inspect a font file and print out the tables to stdout. The utilities include a sfntdump program that does this for a font file and table provided as arguments. Running:
java -jar sfntdump.jar SomeFont.otfreturns a list of the tables found; you can inspect any of those tables by appending its name on the command line. A bit more interesting is sflint, which performs some validity checks on the contents of the tables. The current version checks that the full font name begins with the font-family-name, looks for glyphs which are clipped at the top or the bottom by the bounding box, tries to spot duplicate references, and checks for consistent character widths.
Another directory of utilities contains more substantial sfntly-derived programs, including some that modify font files. The fontinfo and sfnttool programs are both inspectors — fontinfo calculates and summarizes some useful values from the font tables (such as the sizes in bytes of the glyph and auxiliary tables), while sfnttool reports glyph coverage and compatibility of the font's options with the restrictive Web Open Font Format (WOFF).
Obviously, calculating the byte-size and WOFF-compatibility of a font are features frequently used by the Google Web Font office. So too is subsetting, which takes an input font file and extracts a new font file that preserves a specific portion of the original glyphs; that task is done using the subsetter tool. One could use this to construct a Cyrillic- or Latin-only version of a large multilingual font for example. The conversion tool will transform a TrueType or OpenType font into WOFF, at which point subsetter can extract only those characters used by a given page for delivery over HTTP.
In the announcement, Google claims that the speed of the code is "really, really
fast
", and quotes Google Web Fonts engineer (and former Ghostscript
maintainer) Raph Levien as saying: "
Using sfntly we can subset a
large font in a millisecond. It's faster than gzip'ing the result.
"
While the specific set of functions (error-checking, conversion to WOFF,
subsetting) implemented in the command-line tools is important to Google
Web Fonts, the rest of the library may prove just as helpful to other
developers. It is generic enough that writing utilities to inspect and
manipulate other font tables should be straightforward. Several of the
non-sfntly standalone utilities mentioned in the introduction already permit investigating specific
font tables, but they lack a common code base and uniform API.
TTFautohint
By comparison, ttfautohint is quite limited in scope. It is a utility that uses the auto-hinting subsystem of the FreeType library to pre-generate hinting tables for a TrueType font, then inserts them into the font. The result is a font that looks as good on a non-FreeType-rendered operating system (e.g., Windows) as it does on those OSes where FreeType handles the rendering (e.g. Linux and the BSDs).
"Hinting" is the process of adjusting the native vector outline of a glyph so that it looks good when rasterized — primarily by ensuring that lines and features fit onto grid boundaries so as not to appear blurry when anti-aliased on screen, but also to preserve white spaces that might be accidentally lost, to ensure that characters are uniform in height, and other details.
FreeType, the font renderer used by most Linux distributions, has included a quality autohinting engine for years, which alleviates (to one degree or another, depending on the eye of the beholder) the need for TrueType fonts to include hints of their own. PostScript fonts and PostScript-flavored OpenType fonts do not need such embedded hints, because the hinting functionality is expected to be handled by the rendering engine for those formats.
The ttfautohint tool is an effort by FreeType's Werner Lemberg. The current release is version 0.5, from November 6, and binaries for Windows and OS X are available in addition to the source code. Like FreeType, it is dual-licensed under GPLv2 and the FreeType license. It can already be used to generate hinting instructions for a font; because the instructions are resolution-dependent, ttfautohint processes fonts at a range of resolutions — by default between 8 pixels-per-em (or ppem, where an em is one standard character-width) and 1000ppem.
Rather than generating 993 separate sets of instructions, however, it
categorizes the results, looks for commonality, and outputs a minimal set
of hinting "classes" for a range of commonly-used sizes. The documentation
says typical results need "three to four sets (covering the whole
range), but sometimes more than ten are necessary.
" Similarly,
ttfautohint attempts to group related sets of instructions into
repeatable "high-level" operations, a concept that could be used to develop
ttfautohint into an interactive hinting editor — where
operations like "align this stem to the grid" would be more useful than the
manual methods currently used.
Such a visual editor is part of the long-term roadmap. No such hinting editor exists in open source at present, and hinting is tedious to do by hand. TrueType hints are written in a stack-based, PostScript-like language that utilizes abbreviated opcodes for every instruction and function. Again because separate instructions may be required for different output sizes, several sets of hints are typically needed, an area where a "hinting IDE" would come in handy. So, too, would the ability to test hints without exporting, re-generating, and re-loading the font file into the OS.
Lemberg is running a Kickstarter-like fundraising campaign to fund development of the the ttfautohint work, including the visual editor. So far donors have pledged about 65% of the support he estimates needing for the entire project.
A simple font previewer
![[Fonts previewer]](https://static.lwn.net/images/2011/fonts-sm.png)
Last but not least, Sune Vuorela's fonts is a tiny Qt utility that does one task and does it well: it previews user-provided text in every active font on the system, rendered in a simple, easy-to-compare window.
Vuorela announced it on November 21, saying he had created it to assist him with finding a suitable font that incorporates some specific visual features. That is a common job for graphic designers, for example when finding the right font for a logo. There are more full-featured font managers like Fontmatrix, and type classification systems like PANOSE, but neither can do the straightforward sorting that the eye sometimes requires. Think "I need to find a font where the dot over the i is a square" — no classification scheme will tell you that information. Other design tools like Scribus or the GIMP will allow you to enter the text and cycle through the available fonts, but it is time consuming and you cannot see them all at once.
With Vuorela's utility, you type in the text, and a sample of it rendered in each font appears below, alongside the font name. In the grand scheme of things, a feature like this would make a good addition to several other applications (perhaps with the ability to alter the background and foreground colors or sizes), but too much complexity would rob it of its charm.
Last word
Individually, each of these projects constitutes an incremental improvement to the open source font development toolbox, but it is encouraging to see them at all. Development on FontForge is slow and has been for some time, while several of the other, larger applications face an uncertain future — including Fontmatrix, whose lead developer has been busy with other tasks for close to two years.
The sfntly library has the potential to replace some of Fontmatrix's inspection functions and to unify the functionality provided by several of the standalone utilities. Whether development actually takes off is anyone's guess, but considering that sfntly is an in-house tool in use by Google (as opposed to a "20% time" project), it is probably stable. Ttfautohint will draw the most immediate attention from font designers who care about Windows rendering, but the prospect of a good, visual hinting editor is bigger in the long run — and would be a significant win for open source font development.
Brief items
Quotes of the week
Large server workloads (+32 cores) is another area that is being addressed. This has been an area where we lagged behind commercial databases, and I was starting to think that to catch up with them, we were going to need to do a lot of very ugly, platform-specific hacks. Well, that turns out to be false also. Changing the layout of the of the per-session shared memory storage has yielded a 40% performance improvement on 32-core tests. Lock scalability for a large number of clients was also improved by 5 times!
gcc-python-plugin 0.7
gcc-python-plugin is a plugin for the GCC compiler that enables the embedding of Python modules for static analysis and other types of checking. "The usability and signal:noise ratio is greatly improved over previous releases. It can now emit HTML reports showing the path of execution through a function that triggers a particular error." There is also a new feature allowing the creation of custom attributes that can be used to annotate a C API and check its use.
Numexpr 2.0
Numexpr is a numerical Python package aimed at fast evaluation of array operations. It makes use of multithreading and, when possible, the Intel vector math library to parallelize operations. The biggest change appears to be a shift to the new iterator operation provided in NumPy 1.6 for significantly improved performance.openBarter 0.2.0
OpenBarter is a PostgreSQL extension aimed at implementing barter-based markets. "Archaic barter has always been the result of a crumbling financial system. However openBarter performs automatic competition of multilateral exchanges and is nearly as liquid as a regular market using money, maximizing usefulness of resource allocation. By considering value as multidimensional, openBarter proposes a regulation mean for allocation of a large diversity of scarce ecological resources." Version 0.2.0 has been released; see this paper [PDF] for details on how it works.
Web Search By The People, For The People: YaCy 1.0
The Free Software Foundation Europe covers the release of YaCy 1.0, a peer-to-peer Free Software search engine. "The YaCy search engine runs on each user's own computer. Search terms are encrypted before they leave the user and the user's computer. Different from conventional search engines, YaCy is designed to protect users' privacy. A user's computer creates its individual search indexes and rankings, so that results better match what the user is looking for over time. YaCy also makes it easy to create a customised search portal with a few clicks."
Newsletters and articles
Development newsletters from the last week
- Caml Weekly News (November 29)
- LibreOffice development summary (November 28)
- Perl Weekly (November 28)
- PostgreSQL Weekly News (November 27)
- Tahoe-LAFS Weekly News (November 29)
Cinepaint resurrected, v1.0 released (LGW)
Libre Graphics World reports that the Cinepaint project has returned to life and put out a 1.0 release. "The official announcement doesn't mention any changes apart from fixes in the build system, and a quick comparison of the source code against Kai-Uwe's private fork reveals that none of his latest changes (if any at all) have been used in the upstream Cinepaint project. No plans for further development have been announced by Robin Rowe either. Feature-wise Cinepaint 1.0 is still the old GIMP from early 2000s which means that you get high bit depth precision, flipbook and color management on top of a rather limited toolbox and aging user interface." One has to start somewhere; perhaps this is the beginning of an interesting new development effort.
Wingo: the gnu extension language
Guile maintainer Andy Wingo re-examines its suitability as the official GNU extension language, showing off a number of useful features on the way. "Delimited continuations let a user extend their code with novel, expressive control structures that compose well with existing code. I'm going to give another extended example. I realize that there is a risk of losing you in the details, dear reader, but the things to keep in mind are: (1) most other programming languages don't let you do what I'm about to show you; and (2) this has seriously interesting practical applications." (The article is a few months old, but we had not encountered it previously).
Page editor: Jonathan Corbet
Announcements
Brief items
The Global Chokepoints Project launches
The Electronic Frontier Foundation has announced the launch of the Global Chokepoints project, intended to track and document copyright-based censorship initiatives worldwide. "Global Chokepoints will document the escalating global efforts to turn Internet intermediaries into chokepoints for online free expression. Internet intermediaries all over the world--from Internet Service Providers (ISPs) to community-driven sites like Twitter and YouTube to online payment processors--are increasingly facing demands by IP rightsholders and governments to remove, filter, or block allegedly infringing or illegal content, as well as to collect and disclose their users' personal data."
Articles of interest
FSFE: Fellowship interview with Mirko Boehm
The Fellowship of the Free Software Foundation Europe has an interview with Mirko Boehm. "Enabling users to fully understand what their computers are doing and how they operate is a very central function of Free Software. Not everybody might want to, but those who do will be able to learn their way from being a user to becoming a pro-coder. So there is an education side, and when it comes to teaching students basics about their computers or later computer science, there is a very strong argument for relying on Free Software for that. So while in my personal opinion I am lenient with people using Free Software applications but not operating systems, our demand to education systems should be that we only teach Free Software. For proprietary software, a new user is a future customer. For Free Software, a new user is a future black belt."
Akademy 2012 in Tallinn, Estonia (KDE.News)
KDE.News kicks off Akademy, which will be held June 3-July 6 2012 in Tallinn, Estonia, with an interview with Laur Mõtus, host of Akademy 2012. "Hi, I'm 26 year old guy from Estonia, one of the members of Estonian LUG, intrigued by Linux since 28 April 1999, non-stop KDE user since autumn 2004. My daily work is as a network administrator with a touch of project management. Currently, my biggest contribution to FLOSS is being an internship coordinator for Free Software translations, manuals, also some administrative and programming jobs. I am a representative of the Board of ALVATAL (Estonian Open Source and Free (Libre) Software Union), a young and enthusiastic team taking care of the local organization of Akademy 2012 in Tallinn. In the past, I have co-founded and contributed to two FLOSS projects-Estobuntu, an Estonian Linux distribution and Manpremo, a Python/Django based computer remote management system. For years, I have also done translation work for various FLOSS projects, but for last couple of years my focus has shifted more towards getting more contributors involved with FLOSS."
Upcoming Events
Events: December 1, 2011 to January 30, 2012
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
December 2 December 4 |
Debian Hildesheim Bug Squashing Party | Hildesheim, Germany |
December 2 December 4 |
Open Hard- and Software Workshop | Munich, Germany |
December 4 December 9 |
LISA 11: 25th Large Installation System Administration Conference | Boston, MA, USA |
December 4 December 7 |
SciPy.in 2011 | Mumbai, India |
December 27 December 30 |
28th Chaos Communication Congress | Berlin, Germany |
January 12 January 13 |
Open Source World Conference 2012 | Granada, Spain |
January 13 January 15 |
Fedora User and Developer Conference, North America | Blacksburg, VA, USA |
January 16 January 20 |
linux.conf.au 2012 | Ballarat, Australia |
January 20 January 22 |
Wikipedia & MediaWiki hackathon & workshops | San Francisco, CA, USA |
January 20 January 22 |
SCALE 10x - Southern California Linux Expo | Los Angeles, CA, USA |
January 27 January 29 |
DebianMed Meeting Southport2012 | Southport, UK |
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol