July 20, 2011
This article was contributed by Josh Berkus
Thanks to advances in storage, processing power, affordability of servers,
and above all the rapidly increasing number of software applications and
monitoring tools businesses seems to have these days, everyone seems to be
drowning in data. Many system administrators, application developers, and
junior database administrators are finding themselves doing "big data"
— ready or not — and have no familiarity with the terminology,
ideas, or tools in the field of data warehousing. Given that it's been
around for 40 years as a field, there's quite a bit to know. This article
will serve as a starting point for those wanting to learn the basics of
this field.
What is data warehousing?
Data warehousing isn't quite the same thing as "big data"; rather, it's a subset. Big data also includes systems which need to process millions of small requests per second across a large number of connections, such as Amazon's Dynamo database. Data warehousing generally refers to systems which accumulate data over large periods of time, and need to process a relatively small number of very large data requests.
Data warehousing really answers one of three different questions for organizations: archiving, data mining, or analytics. Sometimes it answers a combination of the three.
Archiving
"I want to store large amounts of data for long
periods of time and might never query any of it."
Thanks to HIPAA, Sarbanes-Oxley, the 8th Company Law Directive, and other
regulations, this is an increasingly popular (or, at least, necessary) kind of data. It can also be
called "WORN" data, for Write Once Read Never. Companies accumulate large volumes of data which they don't really want, but can't throw away and theoretically need to be able to access in some reasonable amount of time. Storage sizes can range from gigabytes to terabytes.
A good example of this kind of system is the archive for European cell phone call completion data I helped build for Comptel and Sun Microsystems. With a projected size of 75 terabytes of data per city, each database was expected to answer less than one information request per week. This allowed us to build it using a very inexpensive combination of PostgreSQL, Solaris, and filesystem compression.
If you're building an archive, your only requirements are to minimize storage cost and to make sure the archive can keep up with the generation of data. Generally this means compression, and a database which works acceptably on large cheap hard drives or even tape libraries. Query speed and features are not a concern. Interestingly, this is the one type of database for which there are no well-known packaged open source solutions.
Data mining
"I'm accumulating gigabytes of data a day and I know
there's valuable information in there, but I don't know what it is."
This is probably the most common type of data warehouse; most businesses and web sites generate large amounts of data as a side effect of operation. However, most don't have any clear idea how to utilize this data, they just know that they want to utilize it somehow. More importantly, the structure and meaning of a lot of the data may not be completely known; the data may be full of documents, unknown fields, and undefined categories. Data sizes are generally in the terabytes to petabytes. This is often referred to as "semi-structured" data.
Web clickstream analytics is probably the classic example of this kind of
application. Generally data comes in as a mixture of structured data and
message text from the web server logs and cookies. Companies accumulate this data because they can, and then gradually build up sets of queries and reports to try to get useful trending data out of the database.
Desired qualities in a data mining solution are the ability to perform CPU and I/O intensive searches, sorts, and computations as rapidly and efficiently as possible. Parallelism, either over multiple processors or multiple servers, is highly desirable. As a secondary concern, data mining databases often have to accept data at very high rates, as much as gigabytes per minute.
Analytics
"I have large volumes of highly structured data which I want to use to produce visualizations in order to support business decisions."
Businesses also often generate data they understand very well: sales records, customer accounts, and survey data. They want to use this data to generate charts, graphs, and other pretty pictures which can be used strategically. Quite a few different terms are used for this type of data system, including analytics, business intelligence (BI), decision support (DSS), and online analytical processing (OLAP).
I've implemented a couple of these types of systems using the data from a
Point Of Sale (POS) system for retail chains. POS data consists pretty
much entirely of numeric values, inventory IDs, and category trees, so it creates nice roll-ups by category, time, and geography. Mining is not required because the data is already very well understood, and changes to the categorization scheme are infrequent.
A great deal of the functionality of analytics systems resides in the analytics middleware tools. Data solutions for analytics are all about aggregation of large amounts of data. Support for various kinds of advance analytics, such as "cubes" (explained below) is useful, as are data compression and advanced indexing. Data usually comes into these systems via a nightly batch process, so fast response times on writes is not that important.
Five types of database systems
Answering these three fundamental questions of data warehousing are, in the current market, five different major types of database systems. These systems span decades of software development. Of course, many real-life big database systems are actually hybrids of more than one of the five types below, but I will list examples by their primary categories.
Standard relational databases
If you only have a dozen to a few hundred gigabytes of data, standard
mainstream relational databases are still probably the way to go. Whether
you choose PostgreSQL, MySQL, Oracle, or SQL Server, all bring significant maturity, flexibility, and a large number of third-party and vendor tools. Perhaps more importantly, technical staff will already be familiar with them.
I helped a small retail chain deploy an analytics data warehouse for their inventory management system. Originally we were talking about doing it on a proprietary big database system, but once we did some tests we discovered that the maximum size of the data warehouse was going to be 350GB. Given that, we decided to keep them on mainstream open source PostgreSQL and save them some money and time.
Standard relational databases do not excel at any of the tasks of archiving, data mining, or analytics. However, they can do all of those tasks. So if your data warehousing problem is relatively small or not response-time-critical, then they are the way to go.
MPP databases
This is the oldest type of database designed for data warehousing, dating back some 20 years. MPP stands for "massively parallel processing", which essentially means a relational database where a single query can be executed across multiple processors on multiple machines or motherboards. Database administrators love this kind of database because, with a few limitations, you can treat it like a really big, really fast relational database server. MPP databases include Teradata, Netezza, Greenplum, and the data warehousing edition of DB2.
When I worked for Greenplum, I set up multiple "clickstream analytics"
databases, where we were processing large amounts of web log-data for marketing companies. We had no way of knowing what we were going to see in the logs, or even the page structure of the sites. We had to do a lot of CPU-intensive processing: aggregation over parsed text, running custom database functions, and building materialized views, for which a 16-node Greenplum database was quite fast.
MPP databases are good for both data mining and analytics. Some of them — particularly Greenplum — are also hybridizing other types of databases in this list. However, to date all of the production-quality MPP databases are proprietary, and generally very expensive for any really large database.
Column-store databases
Invented in 1999, column
store (or C-store) databases work by changing the basic storage model
used for relational, or "row-based", databases. In a row-based database,
data is stored in contiguous rows of attributes, and columns are related
through table metadata. A column-store database turns this model 90
degrees, storing columns of attributes together and relating the rows only
through metadata. This permits quite a few optimizations, including
various forms of compression and very fast aggregates.
Current column stores include Vertica, Paraccel, Infobright, LucidDB, and MonetDB. While Vertica is probably the leading column-store database, the latter three are open source. Also, several databases of other types, including Aster Data and Greenplum, have been adding column stores as an option.
One of our clients was creating top-level radial charts for a few terabytes of hospital performance data. Since all of this data was numerics, ratings, or healthcare categories, Vertica turned out to be a very good solution which returned top-level summaries in a fraction of the time of a standard relational database.
Column stores are really suitable only for analytics, because all of the data must be well understood and highly structured to be stored in compressed columns. For that matter, C-stores are far more efficient with data which can be reduced to numbers and small lists of categories. Their main drawback is that they are slow to update or import data, and single-row updates are all but impossible.
MapReduce
The next innovation in data warehousing was popularized by Google less than a decade ago: MapReduce frameworks. MapReduce is really an algorithm which, when accompanied by clustering tools, allows you to take a single request and divide it into small portions to be executed across a large array of servers. When combined with some form of clustered or hash-partitioned storage, MapReduce allows users to perform large, long-running requests across tens to hundreds of nodes. Hadoop is the overwhelmingly dominant MapReduce framework. Current MapReduce-based databases include Hadoop with Hbase, Hadapt, Aster Data, and CouchDB.
On one project the client needed to be able to run requests across 30TB of mixed JSON and binary data. Because the search routines across the binary data were very processor-intensive, they put this on HBase and used Hadoop to run a lot of the processing routines, storing the query results in PostgreSQL for easy browsing later.
MapReduce is, in many ways, an open source, affordable alternative to MPP databases, and is primarily suitable for data mining. It also scales to larger numbers of nodes. However, MapReduce queries are a lot less efficient than MPP ones due to their generic nature — and are also a lot harder to write. This is changing thanks to tools like Hive and Pig which let users write MapReduce queries in a SQL-like syntax. Also, MapReduce databases are a lot younger than the preceding three types, making them less reliable and comparatively poorly documented.
Enterprise search
The "new kid on the block" for data warehousing is enterprise search. This is so new that it really only consists of two open source products, both of them descendants of the Apache Lucene project: Solr and Elastic Search (ES). Enterprise search consists of doing multi-server partitioned indexing across large amounts of semi-structured data, in the form of "documents." Both also support "facets," which are materialized search indexes, allowing users to rapidly count and search documents by category, values, ranges, and even complex search expressions. Enterprise search also often gives "approximate" answers, which can be a feature or a defect, depending on your design goals.
Enterprise search is useful in some surprising places. We have one client who is using it to allow their clients to produce nuanced aggregate statistics on a very large body of legal documents. Putting this in Solr allowed the client to skip a lot of the data processing they needed to do to get it into other kinds of databases, while still giving them very fast search results. Particularly, Solr's precomputed counts in indexes allowed returning counts of documents much faster than in a relational database.
Enterprise search serves a subset of both data mining and analytics, making it broadly useful. Its biggest value comes when the data to be searched is already in HTML, XML, or JSON format and thus does not need to be converted or transformed before indexing. However, it's the youngest of the database types here and both products still have a lot of reliability issues and surprising limitations. Also, database requests are still tightly bound to the "search" model, which can make it difficult to use for very different use-cases.
Other tools
As part of any data warehousing project, you will also need a variety of other tools in order to take your data from its source to a finished report or interface. While I cannot go into them in detail, here are the types of tools you need to know about:
Extract Transform Load (ETL) and Data Integration tools: these tools handle getting data from its original source into the final database format. Open source ETL tool leaders are Talend and KETTLE, and there are many proprietary tools such as Informatica. In modern infrastructures, open source queueing platforms like ActiveMQ, RabbitMQ, and custom code are replacing formal ETL tools for many applications.
Data Mining and Data Analysis tools: tools like Weka, SAS, and various programs in the R language provide advanced tools for making sense out of large volumes of unsorted data. They help you find patterns in your data through statistical analysis and machine learning algorithms. In this domain the open source tools (Weka and R) lead the market, and proprietary tools are primarily for legacy use.
Reporting Tools: since you need to actually present your data, you'll need
reporting tools like BIRT, JasperReports,
or proprietary platforms like Business
Objects or MicroStrategy. These tools provide simple visualizations of your data, usually in the form of charts and graphs, which may be interactive. Recently the two leading open source options have caught up with proprietary competitors in ease of use, but it will take them a while to surpass them for features.
Online Analytical Processing (OLAP): a deceptive name which actually has a lot to do with providing a navigation-based interface to exploring your data, using "cubes." OLAP tools like Mondrian, Cognos, and Microsoft Analysis Services create a multi-dimensional spatial map of data which allows users to see different parts of the data by moving around within it. This is one area where open source tools are really lagging; open source databases have weak OLAP support compared to Oracle and SQL Server, and Mondrian is the only open source OLAP middleware.
I also need to mention Pentaho, which is a kind of all-in-one open source platform which glues together various open source ETL, data mining, and reporting tools.
Overall, there are open source tools for all levels of a data warehousing stack, but these tools often lack maturity or features in comparison to their proprietary competitors. However, given that most of the cutting-edge development in the analytics and data mining space is open source today, it's likely that the balance will tip towards open source over the next three to five years.
Conclusion
You should now be better acquainted with the world of data warehousing, especially from the database perspective. While we couldn't cover topics in detail or mention every project or product, at least you know where to start searching, and that there are many tools out there for every niche in the data warehousing world.
You can also see this same material presented in video format, from the recent Open Source Bridge conference in Portland Oregon.
Comments (17 posted)
By Jonathan Corbet
July 20, 2011
Last week's Kernel Page included
an article on
the top contributors to the 3.0 kernel, an extended version of our
traditional look at who participated in each kernel development cycle.
Since this article is traditional, it tends not to draw a lot of
attention. This time was different: quite a few publications have picked
up on the fact that Microsoft was one of the top contributors of changesets
by virtue of a long series of cleanups to its "hv" driver in the staging
tree. Those sites, seemingly, got a lot more mileage out of those results
than LWN did, an amusing outcome which can be expected occasionally with
subscriber-only content. That said, this outcome is a bit dismaying for
other reasons.
Some background: the hv driver helps Linux to function as a virtualized
guest under
Windows. It is useful code which, with luck, will soon move out of the
staging tree and into the mainline kernel proper. After a period of
relative neglect, developers at Microsoft have started cleaning up hv with
a vengeance - 366 hv patches were merged for the 3.0 development cycle.
This work has clear value; it is aimed at getting this code ready to
graduate from staging; it is worth having.
That said, let's look at the actual patches. 47 of them simply
move functions around to eliminate the need for forward declarations; 39 of
them rename functions and variables; 135 of them take the form "get rid of
X" for some value of (usually unused) X. Clearly this 15,000-line driver
needed a lot of cleaning, and it's good that people are doing the work.
But it also seems somewhat uncontroversial to say that this particular body
of work does not constitute one of the more significant contributions to
the 3.0 kernel.
Part of the problem, beyond any doubt, is the creation of lists of top
changeset contributors in the first place. The number of changes is an
extremely poor metric if one is interested in how much real, interesting
work was contributed to the kernel. Some changes are clearly more
interesting than others. Highlighting changeset counts may have ill effects
beyond misleading readers; if the number of changesets matters, developers
will have an incentive to bloat their counts through excessive splitting of
changes - an activity which, some allege, has been going on for a while
now.
LWN does post another type of statistic - the number of lines changed. As
with changeset counts, this number does contain a modicum of information.
But as a metric for the value of kernel contributions, it is arguably even
worse than changeset counts. Picking a favorite Edsger Dijkstra quote is a
challenging task, but this one would be a contender:
If we wish to count lines of code, we should not regard them as
"lines produced" but as "lines spent".
Just because a patch changes (or adds) a lot of code does not mean that
there is a lot of value to be found therein.
Given these problems, one might be tempted to just stop producing these
statistics at all. Yet these metrics clearly have enough value to be
interesting. When LWN first started posting these numbers, your editor was
approached at conferences by representatives from two large companies who
wanted to discuss discrepancies between those numbers and the ones they had
been generating internally. We are routinely contacted by companies wanting to
be sure that all of their contributions are counted properly. Developers
have reported receiving job offers as a result of their appearance in the
lists of top contributors. Changeset counts are also used to generate the
initial list of nominees to the Kernel Summit. For better or for worse,
people want to know who the most significant contributors to the kernel
are.
So it would be good to find some kind of metric which yields that
information in a more useful way than a simple count of changesets or lines
of code. People who understand the code can usually look at a patch and
come to some sort of internal assessment - though your editor might be
disqualified by virtue of having once suggested that merging devfs would be
a good idea. But the reason why even that flawed judgment is not used in LWN's
lists is simple: when a typical development cycle brings in 10,000
changesets, the manual evaluation process simply does not scale.
So we would like to have a metric which would try, in an automated fashion,
to come up with an improved estimate of the value of each patch. That does
not sound like an easy task. One could throw out some ideas for heuristics
as a place to start; here are a few:
- Changes to core code (in the kernel, mm, and
fs directories, say) affect more users and are usually more
heavily reviewed; they should probably be worth more.
- Bug fixes have value. A heuristic could try to look to see if the
changelog contains a bug ID, whether the patch appears in a stable
update, or whether it is a revert of a previous change.
- Patches that shuffle code but add no functional change generally have
relatively low value. Patches adding documentation, instead, are
priceless.
- Patches that remove code are generally good. A patch that adds code
to a common directory and removes it from multiple other locations may
be even better.
Patches adding significant code which appears to be cut-and-pasted
from elsewhere may have negative value.
- Changes merged late in the development cycle may not have a high
value, but, if the development process is working as it should be,
they should be worth something above the minimum.
- Changes merged directly by Linus presumably have some quality which
caught his attention.
Once a scoring system was in place, one could, over time, try to develop a
series of rules like the above in an attempt to better judge the true value
of a developer's contribution.
That said, any such metric will certainly be seen as unfair by at least
some developers - and rightly so, since it will undoubtedly be
unfair. This problem has no solution that will be universally seen as
correct. So, while we may well play with some of these ideas, it seems
likely that we are stuck with changeset and lines-changed counts for the
indefinite future. These metrics, too, are unfair, but at least they are
unfair in an easily understood and objectively verifiable way.
Comments (34 posted)
July 20, 2011
This article was contributed by Nathan Willis
The Samba project has started
discussions to alter its long-standing copyright policy. For years, the
project has accepted only code contributions where the copyright is held by
an individual rather than by a corporation. Under the proposed change, the
project would accept corporate-copyrighted code, but would still prefer
that its contributors submit work as individuals.
Jeremy Allison posed the question to the Samba community in a July 12th email to the samba-technical mailing list. He described the proposal as coming from himself and Andrew Tridgell, a solicitation for input from the broader Samba community before putting the issue to a vote by Samba team members themselves.
The current
policy is detailed on the project's site, along with a note that
contributors for whom a personal copyright is infeasible can also choose to
assign their copyright to the Software
Freedom Conservancy (SFC), which has Samba as one of its member
projects.
Because Samba is a clean-room reimplementation of proprietary software, the
project also requires that
developers who have signed Microsoft's CIFS license agreement not submit
code, and that contributors ensure that their patches do not infringe on
known patents. Although all free software projects and developers are well
aware of the
trouble fueled by software patent litigation, Samba is in the minority of
projects by asking its contributors to assert up front that their work is non-infringing.
Contributions, yesterday and today
Allison lists two reasons for the project's historical stance against allowing corporate-copyrighted contributions. First, the team simply preferred that GPL enforcement decisions be made by individuals not by corporations. "Enforcement decisions" would typically include how and when to contact suspected violators, who would speak on behalf of the project, and what course of action to pursue at each stage of the discussion.
The second reason deals specifically with the consequences of a
successfully-resolved license violation — meaning one in which the
violator has come back into compliance. Under GPLv2 (which Samba was
available under prior to the 3.2 release in July of 2007), a violator
automatically loses all rights to modify or distribute the software under
the license (section 4), and those rights can only be reinstated by the
copyright holders' collective decision. Individuals, the Samba project
believes, act much more reasonably in these rights-reinstatement
circumstances than corporations sometimes do.
The termination-of-rights section in GPLv2 is commonly
overlooked (see section 5.2) by casual readers, and was one of the key
points in drafting GPLv3. Because the earlier license did not specifically
address how copyright holders go about reinstating the rights of former
violators, every project could enact its own policy, or be completely
arbitrary on a case-by-case basis. A high-profile example of the confusion
over this issue is Richard Stallman's public argument
that early KDE releases automatically lost their redistribution rights to some
GPL-covered software by combining it with the QPL-licensed Qt toolkit, and
call for the affected copyright-holders to publicly reinstate the project's
rights. In response, KDE called
Stallman's interpretation of the clause "absurd" and refused
to request "forgiveness" as Stallman suggested.
The corresponding section in GPLv3 (section 8) attempts to standardize the situation and provide for a simple reinstatement procedure. A former violator's rights to redistribute the licensed software are reinstated if the violation is corrected and the copyright holders do not make further complaints for 60 days, or if the violator corrects its first violation within 30 days of the first notice. Furthermore, "provisional" permission to redistribute the software is granted as soon as the violation is fixed, pending the expiration of the 30- or 60-day window (whichever applies).
Due to that change, Allison said, the GPLv3 releases of Samba no longer benefited by excluding corporate-copyrighted patches outright. However, he did recommend that the project continue to encourage personal copyrights (or SFC copyright-assignment) over corporate contributions for most code, and continue to ban corporate copyrights for the project's library components or "code that might be moved into a library." Allison does not draw a bright line between the situations where personal copyright would still be required and where it would only be encouraged, but the intent seems to be to free up companies to contribute fixes and patches, rather than to open the door for entirely new features or large modules submitted under a corporate copyright.
The reason for the desire to retain the individual-only policy for more
significant changes is re-licensing (an issue which is also referred to on
the current policy page). Consent of the copyright holder is also required
to re-license the contribution. Although it is not a frequent occurrence,
the Samba project believes that getting assent for the re-license from
individuals is both simpler and faster than getting it from corporations
— which typically require decisions to be made by internal managerial
and legal teams.
Samba has re-licensed portions of its code in the past, notably the tdb and talloc libraries were re-licensed under the LGPL. But for general patches, including "things like build fixes for specific platforms," Allison said the non-corporate policy can only delay or prevent potential contributors from sending in their work.
Feedback
Both individual and corporate contributors participated in the subsequent list discussion. By and large the community either approved of the proposal or simply deferred to the Samba team's wishes, but there were a few concerns voiced over the split-policy nature of the draft policy.
Simo Sorce, the Samba team GPL compliance officer, speculated as to whether corporate-copyrighted contributions of new features would hinder the team's later ability to split new features into separate libraries (thus triggering the re-licensing problem). Corporate copyrights not only slow down the re-licensing process, he said, but potentially cause greater problems when the corporation in question shuts down or is sold.
Andrew Bartlett expressed
concern that having a more complex policy regarding corporate
copyrights would make it harder to communicate to companies about their
responsibilities, which could slow down development by forcing them to
adjust their engineers' employment agreements. Generally, engineers'
employment agreements (or the legal norms of their jurisdiction) dictate that all code produced while a salaried employee are "for hire" works with the copyright held by the company. Even in those situations where the developer retains copyright, however, he or she must typically get approval from upstairs to contribute.
The select circle of companies who devote employee time to Samba
development must make a contractual exception in order to comply with
Samba's current policy (though judging by list traffic these companies have
been willing to do so because of their desire to contribute to Samba).
Bartlett's concern, then, is that subdividing the copyright policy to
distinguish between different types of code contributions would further
complicate the contracts of corporate-paid Samba contributors. Bartlett
also pointed out that the contribution guidelines need clarification
between code and non-code contributions, which are currently not treated
differently. Clarifying the situation around non-code contributions could
encourage companies to contribute documentation or help to the wiki, he
said.
A week has passed since the initial message to the list, and so far there are no adamant objections, so it appears likely that the new policy (perhaps with clarifications) will eventually go into effect.
In harmony
In the end, the policy shift is a fairly minor change, and in effect
loosens some of the restrictions on corporate contributions. So it's not
too much of a surprise that there has been little opposition. It is also interesting to observe that the discussion came up so soon after the final release of the much-debated Harmony project, which set out to craft a suite of contributor agreements.
It does not appear that the Harmony project's templates or tools had an influence on the Samba team's decision or process. However, it also appears that the Harmony project's agreements leave several issues of importance to the Samba team unaddressed. Harmony encompasses two separate types of agreement: contributor license agreements (CLAs) and copyright assignment agreements (CAAs). Samba's copyright policy is essentially a CLA, albeit an informal one instead of a signed contract.
The Harmony CLA templates cover ensuring the originality of the submitted code, and deal at length with the project's right to re-license the work as a whole. They do not, however, deal with either the patent-non-infringement issue or the clean-room reimplementation guarantee that Samba requires of contributors. Regarding the re-licensing question, having that permission in place from the beginning might simplify Samba's occasional re-licensing work — although historically it is a rare enough occurrence that the project has never felt the need to address it up front. In his message to the list, Sorce did ask corporate contributors for feedback on the re-licensing process, including the need to get "promises about the kind of licenses we will consider in case changes are needed."
The Harmony CLAs also do not provide a framework for Samba to specify that only contributions under personal copyrights are permitted, however, nor to provide separate rules for minor patches, new functionality, or library code.
The next major release of Samba is version 3.6.0, which is currently in
its second release-candidate testing phase. Allison and the other members
of the team did not set a timeline for a new copyright policy to take
effect — if one is eventually adopted — but presumably it
would not be before the next development cycle begins after a vote of the
development team.
Comments (1 posted)
Page editor: Jonathan Corbet
Security
By Jake Edge
July 20, 2011
A bug in the Blender 3D graphics rendering
program,
which was recently fixed in Gentoo and Fedora, may not really be a bug at all,
depending on who you listen to. Even though it has been assigned CVE-2009-3850,
there is a vocal segment—perhaps an overwhelming majority—of
longtime Blender users who don't want to see problems like this fixed because it
can seriously affect their workflow. It is an example of the classic
tradeoff between usability and security, and it would seem that usability
is winning out—at least for mainline Blender development.
The problem stems from Blender's use of Python as its scripting language.
A malicious script has access to all of the
power of Python running as the user, so it could completely compromise the
user's account. It is essentially the same problem that various macro
languages in office suites have had, but those languages are generally less
powerful
than Python—or are at least meant to be. An attacker could put up an
enticing .blend file, which promised to provide some interesting 3D
representation or effect, that instead (or in addition) installed a virus,
botnet client, or
spam-ware. Other nasty effects are possible too, of course.
For office suite macros and other similar application scripting languages,
there is often a dialog before running the code contained in a file, so
that users can decide
whether or not to run it. Or users can disable scripting entirely via the
application preferences. For Blender, though, the default is that
code inside .blend files is run, without prompting, making it
easy to craft attacks if
users can be enticed to open the file. That feature
can be turned off in the preferences, but that doesn't affect Blender when it
is running in background mode.
Background mode is a GUI-less version of Blender that is meant to be run on
"render farms" (multiple machines that render different parts of the scene
or animation). As might be guessed, scripts are used to control what gets
rendered by Blender running in background mode, so disabling scripts by
default in background mode would be fairly pointless for hardcore Blender
users. But, for
Blender neophytes—who are typically running in GUI
mode—grabbing a file from the internet to try out the
program is probably not something they expect can lead to system compromise.
The problem was discovered
by CoreLabs Research in October 2009 and communicated to the Blender
team, but there has been no real fix made in the mainline since then. It was
reported in the bugzillas for both Red Hat and
Gentoo in
November 2009, but very little action was taken by either distribution
until Sebastian Pipping started looking into it in April of this year. It
would seem that both distributions were assuming that a fix would be coming
from upstream, but none materialized.
As Pipping points out in
his analysis in the Gentoo bug report, upstream is indifferent, at best, to
changing the default. A long thread in
the blender-committers mailing list from April 2010 makes it clear that
many of the users
and developers of Blender find that security fixes are just getting in
their way. Part of the problem is that the "trusted
source" fix made for
Blender 2.50 was not fully baked and caused problems for
many—including many hours of wasted rendering time.
But distributions sometimes have different priorities than application
projects, and protecting the uninitiated from non-obvious ways to
compromise their system is generally high on any distribution's list. So,
Pipping created a patch for Gentoo and alerted
Fedora about it, which resulted in the Fedora fixes released on July
13. So far, Gentoo has not put out an advisory,
though the fix is in its repositories.
The fix itself is fairly straightforward, though there are a few wrinkles.
Part of the problem is that Blender uses different mechanisms to control
scripting depending on whether you are in GUI or background mode. So
enabling scripting in GUI mode does not affect what happens with background
mode and vice versa, which is one of the problems that Blender users were
complaining
about when 2.50 was released. In addition, the flags used for
controlling scripting (-y and -Y) have changed senses
between 2.49 and 2.50. So, Pipping chose -666 as the flag to
disable scripting in GUI mode. Security-conscious users (or
distributions) can put that flag in the .desktop file to disable
scripting in GUI mode, but leave background mode (where running code from
untrusted sources is unlikely) alone. Users who wish to run scripts in GUI
mode can still enable that through the interface.
One does wonder why Blender doesn't just make the defaults different for
the two different modes. If GUI mode defaulted to "scripting off", the
problem would largely go away, without adversely affecting the power-users
who are largely rendering in background mode. The minor inconvenience of
turning on the feature, once, in their GUI session would seem like a
reasonable tradeoff.
In the end, it is a fairly minor problem, overall, and it's hard to imagine
that there
are legions of attackers out there crafting malicious Blender
scripts—the payoff is just too small. Targeted attacks might be more
plausible, but finding targets with Blender installed and no understanding
of the potential danger of scripts in .blend files might be
something of a stretch.
But users do not expect that opening a spreadsheet will compromise their
system, and they should expect no less of opening a file in another kind of
application. Since it seems that Blender isn't interested in fixing the
problem, distributions are obviously right to step in and do so.
Comments (1 posted)
Brief items
17:05:49 <dvlasenk> I tried to understand what Trusted Boot *is*, and failed.
17:06:11 <ajax> dvlasenk: it's a complicated way of making your machine less likely to work.
--
Fedora
Engineering Steering Committee (FESCo) meeting log for July 18
It's on this point that IP Czar Victoria Espinel should really be
ashamed. After talking up how this agreement would help someone (not clear
who?)
"win the future," shouldn't she have been the least bit concerned
about the most obvious stakeholder who wasn't at the table? We see this way
too often with government officials these days. They think the only
stakeholders are the businesses, and leave out the
citizens they're
supposed to represent. Copyright law is supposed to benefit the public, but
the public wasn't at the table negotiating this agreement. In fact, pretty
much everyone admits that the government focused solely on bringing
together these two parties and putting tremendous pressure on the ISPs to
cave to the entertainment industry. Couldn't they have used some of that
"pressure" to make sure that the public's interest was included? Isn't that
what government is
supposed to do?
--
Mike
Masnick on the "six strikes" agreement
Comments (none posted)
Dark Reading
previews a
talk that will be given at the upcoming Black Hat Conference about Android application security issues. The talk is based on a study that looked at Android applications to determine the kinds of security problems that they had. "
In the study, Dasient analyzed the live behavior of Android apps to determine their security posture. Of the 10,000 applications evaluated, more than 800 were found to be leaking personal data to an unauthorized server, [Neil] Daswani says.
[...]
In addition, the researchers found that 11 of the applications were sending potentially unwanted SMS messages out to other smartphones -- the mobile version of spam, Daswani says."
Comments (3 posted)
New vulnerabilities
drupal7: restriction bypass
| Package(s): | drupal7 |
CVE #(s): | CVE-2011-2687
|
| Created: | July 18, 2011 |
Updated: | July 20, 2011 |
| Description: |
From the Drupal advisory:
Listings showing nodes but not JOINing the node table show all nodes regardless of restrictions imposed by the node_access system. In core, this affects the taxonomy and the forum subsystem. |
| Alerts: |
|
Comments (none posted)
kernel: denial of service
| Package(s): | kernel |
CVE #(s): | CVE-2011-2479
|
| Created: | July 14, 2011 |
Updated: | July 20, 2011 |
| Description: |
From the Scientific Linux advisory:
It was found that an mmap() call with the MAP_PRIVATE flag on "/dev/zero"
would create transparent hugepages and trigger a certain robustness check.
A local, unprivileged user could use this flaw to cause a denial of
service. (CVE-2011-2479, Moderate)
|
| Alerts: |
|
Comments (none posted)
kernel: multiple vulnerabilities
| Package(s): | kernel |
CVE #(s): | CVE-2011-2534
CVE-2011-1747
|
| Created: | July 14, 2011 |
Updated: | August 9, 2011 |
| Description: |
From the Ubuntu advisory:
Vasiliy Kulikov discovered that the netfilter code did not check certain
strings copied from userspace. A local attacker with netfilter access could
exploit this to read kernel memory or crash the system, leading to a denial
of service. (CVE-2011-1170, CVE-2011-1171, CVE-2011-1172, CVE-2011-2534)
Vasiliy Kulikov discovered that the AGP driver did not check the size of
certain memory allocations. A local attacker with access to the video
subsystem could exploit this to run the system out of memory, leading to a
denial of service. (CVE-2011-1746, CVE-2011-1747)
|
| Alerts: |
|
Comments (none posted)
kernel: multiple vulnerabilities
| Package(s): | kernel |
CVE #(s): | CVE-2010-4256
CVE-2011-1076
|
| Created: | July 14, 2011 |
Updated: | July 20, 2011 |
| Description: |
From the Ubuntu advisory:
It was discovered that named pipes did not correctly handle certain fcntl
calls. A local attacker could exploit this to crash the system, leading to
a denial of service. (CVE-2010-4256)
It was discovered that the key-based DNS resolver did not correctly handle
certain error states. A local attacker could exploit this to crash the
system, leading to a denial of service. (CVE-2011-1076)
|
| Alerts: |
|
Comments (none posted)
kernel: multiple vulnerabilities
| Package(s): | kernel |
CVE #(s): | CVE-2011-1576
CVE-2011-1936
CVE-2011-2213
CVE-2011-2492
|
| Created: | July 15, 2011 |
Updated: | September 14, 2011 |
| Description: |
From the Red Hat advisory:
A flaw allowed napi_reuse_skb() to be called on VLAN (virtual LAN)
packets. An attacker on the local network could trigger this flaw by
sending specially-crafted packets to a target system, possibly causing a
denial of service. (CVE-2011-1576)
A flaw in the way the Xen hypervisor implementation handled CPUID
instruction emulation during virtual machine exits could allow an
unprivileged guest user to crash a guest. This only affects systems that
have an Intel x86 processor with the Intel VT-x extension enabled.
(CVE-2011-1936)
A flaw in inet_diag_bc_audit() could allow a local, unprivileged user to
cause a denial of service (infinite loop). (CVE-2011-2213)
Structure padding in two structures in the Bluetooth implementation
was not initialized properly before being copied to user-space, possibly
allowing local, unprivileged users to leak kernel stack memory to
user-space. (CVE-2011-2492) |
| Alerts: |
|
Comments (none posted)
libapache2-mod-authnz-external: SQL injection
| Package(s): | libapache2-mod-authnz-external |
CVE #(s): | CVE-2011-2688
|
| Created: | July 19, 2011 |
Updated: | August 21, 2012 |
| Description: |
From the Debian advisory:
It was discovered that libapache2-mod-authnz-external, an apache
authentication module, is prone to an SQL injection via the $user
parameter.
|
| Alerts: |
|
Comments (none posted)
libpng: multiple vulnerabilities
| Package(s): | libpng |
CVE #(s): | CVE-2011-2690
CVE-2011-2691
CVE-2011-2692
|
| Created: | July 19, 2011 |
Updated: | October 17, 2011 |
| Description: |
From the CVE entries:
Buffer overflow in libpng 1.0.x before 1.0.55, 1.2.x before 1.2.45, 1.4.x before 1.4.8, and 1.5.x before 1.5.4, when used by an application that calls the png_rgb_to_gray function but not the png_set_expand function, allows remote attackers to overwrite memory with an arbitrary amount of data, and possibly have unspecified other impact, via a crafted PNG image.
(CVE-2011-2690)
The png_err function in pngerror.c in libpng 1.0.x before 1.0.55, 1.2.x before 1.2.45, 1.4.x before 1.4.8, and 1.5.x before 1.5.4 makes a function call using a NULL pointer argument instead of an empty-string argument, which allows remote attackers to cause a denial of service (application crash) via a crafted PNG image. (CVE-2011-2691)
The png_handle_sCAL function in pngrutil.c in libpng 1.0.x before 1.0.55, 1.2.x before 1.2.45, 1.4.x before 1.4.8, and 1.5.x before 1.5.4 does not properly handle invalid sCAL chunks, which allows remote attackers to cause a denial of service (memory corruption and application crash) or possibly have unspecified other impact via a crafted PNG image that triggers the reading of uninitialized memory. (CVE-2011-2692) |
| Alerts: |
|
Comments (none posted)
likewise-open: SQL injection
| Package(s): | likewise-open |
CVE #(s): | CVE-2011-2467
|
| Created: | July 20, 2011 |
Updated: | July 20, 2011 |
| Description: |
Likewise-open (an Active Directory authentication service) suffers from a local SQL injection vulnerability. |
| Alerts: |
|
Comments (none posted)
mariadb: missing innodb support
| Package(s): | mariadb |
CVE #(s): | |
| Created: | July 19, 2011 |
Updated: | July 20, 2011 |
| Description: |
From the openSUSE advisory:
The last security version upgrade of MariaDB (a MySQL fork)
removed innodb support, breaking old databases.
|
| Alerts: |
|
Comments (none posted)
nfs-utils: user-controlled /etc/mtab corruption
| Package(s): | nfs-utils |
CVE #(s): | CVE-2011-1749
|
| Created: | July 14, 2011 |
Updated: | March 22, 2012 |
| Description: |
From the Pardus advisory:
It was found that mount.nfs suffers from the same flaw as other mount
helpers (see CVE-2011-1089). Instead of using addmntent(), nfs-utils
implements its own similar function (nfs_addmntent()) which also fails
to anticipate whether resource limits would interfere with correctly
writing to /etc/mtab. A local user could use this to trigger corruption
of the /etc/mtab file via a process with a small RLIMIT_FSIZE value.
|
| Alerts: |
|
Comments (2 posted)
opera: multiple vulnerabilities
Comments (none posted)
phpmyadmin: multiple vulnerabilities
| Package(s): | phpMyAdmin |
CVE #(s): | |
| Created: | July 18, 2011 |
Updated: | July 20, 2011 |
| Description: |
From the phpMyAdmin advisories [1; 2; 3; 4]:
It was possible to manipulate the PHP session superglobal using some of the Swekey authentication code. This could open a path for other attacks.
An unsanitized key from the Servers array is written in a comment of the generated config. An attacker can modify this key by modifying the SESSION superglobal array. This allows the attacker to close the comment and inject code.
Through a possible bug in PHP, a null byte can truncate the pattern string allowing an attacker to inject the /e modifier causing the preg_replace function to execute its second argument as PHP code.
Fixed filtering of a file path in the MIME-type transformation code, which allowed for directory traversal. |
| Alerts: |
|
Comments (none posted)
seamonkey: multiple vulnerabilities
| Package(s): | seamonkey |
CVE #(s): | |
| Created: | July 15, 2011 |
Updated: | July 20, 2011 |
| Description: |
Seamonkey 2.2 fixes multiple issues. See the change log for details. |
| Alerts: |
|
Comments (none posted)
system-config-firewall: privilege escalation/arbitrary code execution
| Package(s): | system-config-firewall |
CVE #(s): | CVE-2011-2520
|
| Created: | July 19, 2011 |
Updated: | August 2, 2011 |
| Description: |
From the Red Hat advisory:
It was found that system-config-firewall used the Python pickle module in
an insecure way when sending data (via D-Bus) to the privileged back-end
mechanism. A local user authorized to configure firewall rules using
system-config-firewall could use this flaw to execute arbitrary code with
root privileges, by sending a specially-crafted serialized object.
|
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The 3.0 kernel is not yet released as of this writing. Subtle core
kernel bugs, first with the VFS (see below) then with RCU have delayed the
release which is otherwise ready; history suggests it will come out
immediately after the LWN Weekly Edition is published.
Stable updates: no stable updates have been released in the last
week, and none are in the review process as of this writing.
Comments (1 posted)
This patch set contains fixes for a trainwreck involving RCU, the
scheduler, and threaded interrupts. This trainwreck involved RCU
failing to properly protect one of its bit fields, use of RCU by
the scheduler from portions of irq_exit() where in_irq() returns
false, uses of the scheduler by RCU colliding with uses of RCU by
the scheduler, threaded interrupts exercising the problematic
portions of irq_exit() more heavily, and so on.
--
Paul McKenney on why we can't have nice
3.0 things (yet)
That said, have fun and make sure that you have the fire
extinguisher ready when you start using this!
--
Thomas Gleixner
Comments (none posted)
Matthew Garrett
investigates the subtleties of booting Linux with EFI. Once again, hardware vendors are myopically focusing on Windows.
"
As we've seen many times in the past, the only thing many hardware vendors do is check that Windows boots correctly. Which means that it's utterly unsurprising to discover that there are some systems that appear to ignore EFI boot variables and just look for the fallback bootloader instead. The fallback bootloader that has no namespacing, guaranteeing collisions if multiple operating systems are installed on the same system.
[...]
It could be worse. If there's already a bootloader there, Windows won't overwrite it. So things are marginally better than in the MBR [Master Boot Record] days. But the Windows bootloader won't boot Linux, so if Windows gets there first we still have problems."
Comments (25 posted)
Those who follow the realtime preemption patch set know that it has been
stuck on 2.6.33 for some time. With the release of
a new patch based on 3.0-rc7, Thomas Gleixner
tells us why: the entire series has been reworked and cleaned up, a new
solution to the per-CPU variable problem has been implemented, and a nasty
bug held up what would otherwise have been a release based on 2.6.38.
"
The beast insisted on destroying filesystems with reproduction times
measured in days and the total refusal to reveal at least a minimalistic
hint to debug the root cause. Staring into completely useless traces for
months is not a very pleasant pastime." The 3.0-rc7 version of the
patch, happily, shows no such behavior.
Comments (6 posted)
By Jonathan Corbet
July 20, 2011
One of the nice things that the IPv6 protocol was supposed to do for us was
to eliminate the need for network address translation (NAT). The address
space is large enough that many of the motivations for the use of NAT (lack
of addresses, having to renumber networks when changing providers) are no
longer present. NAT is often seen as a hack which breaks the architecture
of the Internet, so there has been no shortage of people who would be happy
to see it go; the IPv6 switch has often looked like the opportunity to make
it happen.
So it is not surprising that, when Terry Moës posted an IPv6 NAT implementation for Linux, the
first response was less than favorable. Anybody wanting to see the end of
NAT is unlikely to welcome an implementation which can only serve to
perpetuate its use after the IPv6 transition. The sad fact, though, is
that NAT appears to be here to stay. David Miller expressed it in a typically direct manner:
People want to hide the details of the topology of their internal
networks, therefore we will have NAT with ipv6 no matter what we
think or feel.
Everyone needs to stop being in denial, now.
Like it or not, we will be dealing with NAT indefinitely. For those who
are curious about how it might work in Linux, Terry's implementation can be
found on
SourceForge along with a paper describing the design of the code. Both
stateless (RFC 6296) and
stateful NAT are supported.
Comments (45 posted)
Kernel development news
By Jonathan Corbet
July 19, 2011
It's all Hugh's fault.
Linus was all set to release the final 3.0 kernel when Hugh Dickins showed
up on the list with a little problem:
occasionally a full copy of the kernel source tree fails because one of the
files found therein vanishes temporarily. What followed was a determined
bug-chasing exercise which demonstrates how subtle and tricky some of our
core code has become. The problem has been found and squashed, but there
may be more.
A bit of background might help in understanding what was happening here.
The 2.6.38 release included the dcache
scalability patches; this code uses a number of tricks to avoid taking
locks during the process of looking up file names. For the right kind of
workload, the "RCU walk" method yields impressive performance improvements.
But that only works if all of the relevant directory entry ("dentry")
structures are in the kernel's dentry cache and the lookup process does not
race with other CPUs which may be making changes on the same path.
Whenever such a situation is encountered, the lookup process will fall back
to the older, slower algorithm which requires locking each dentry.
The dentry cache (dcache) is a highly dynamic data structure, with dentries
coming and going at all times. So one CPU might be removing a dentry at
the same time that another is using it to look up a name. Chaos is avoided
through the use of read-copy-update (RCU) to manage the removal of dentries; a
dentry may be removed from the cache, but, if the thread using that dentry
for lookup got a reference to it before its removal, the structure itself will
continue to exist for as long as that thread needs it. The same should be
true of the inode structure associated with that dentry.
Hugh tracked the problem down to a bit of code in
walk_component():
err = do_lookup(nd, name, path, &inode);
/* ... */
if (!inode) {
path_to_nameidata(path, nd);
terminate_walk(nd);
return -ENOENT;
}
If do_lookup() returns a null inode pointer,
walk_component() assumes that a "negative dentry" has been
encountered. Negative dentries are kept in the dentry cache to record the
fact that a specific name does
not exist; they are an important performance-enhancing feature in
the Linux virtual filesystem layer. To see an example, run any simple
program under strace and watch how many system calls return with
ENOENT; lookups on nonexistent files happen frequently. What Hugh
determined was that this inode pointer was coming back null even though the
file exists, leading the code to believe that a negative dentry had been
found and causing the "briefly vanishing file" problem.
Hugh must have looked at this code for some time before concluding that the
kernel must be removing the dentry from the dcache at just the wrong time
during the lookup process. As described above, the dentry itself continues
to exist after its removal from the cache, but that does not mean that it
is unchanged: the removal process sets its d_inode pointer to
NULL. (It's worth noting that this behavior goes against normal
RCU practice, which calls for the structure to be preserved unmodified
until the last reference is known to be gone).
Hugh concluded that this null pointer was being picked up
later by the lookup process, causing walk_component()
to conclude that the file does not exist when all that had happened was the
removal of a dentry from the cache. His problem report included a patch
causing the lookup code to check much more carefully when the inode pointer
comes up null.
Linus acknowledged the problem but didn't
like the fix which, he thought, was too specific to one particular
situation. He proposed an alternative: just don't set d_inode to
NULL; that would keep the inode pointer from picking up that value
later. Al Viro posted an alternative
fix which changed dcache behavior in less subtle ways, and worried about the possibility of introducing
other weird bugs:
I'm not entirely convinced that it's a valid optimization in the
first place (probably is, but I'm seriously scared by the
complexity we already have there), and I'm really not fond of the
idea of dealing with whatever subtle crap we might discover with
Linus' patch. Again, dcache is not in a healthy shape right now;
at this point dumb and straightforward is, IMO, better than subtle
and risking to step on toes of very odd code out there...
Once we are done with code audit, sure, I'm fine with ->d_inode
being kept until dentry is actually freed. Any code that relies
on that thing being cleared is asking for trouble and should be
rewritten anyway. The only thing is, it needs to be found before
we rewrite it...
Linus didn't like Al's fix either; it threatened to force slow lookups when
negative dentries are involved.
The discussion of the patches went on at some length; in the process of
trying to find the safest way to fix this subtle bug the participants slowly
came to the realization that they did not actually know what was
happening. After looking at things closely, Linus threw up his hands and admitted he didn't
understand it:
So how could Hugh's NULL inode ever happen in the first place? Even
with the current sources? It all looks solid to me now that I look
at all the details.
As it happens, Linus's exposition was enough to point Hugh at the real
problem. Just as the process of transiting through a specific dentry is
almost complete, do_lookup() makes a call to
__follow_mount_rcu(), whose job is to redirect the lookup process
if it is passing through a mount point. The inode pointer is passed to
__follow_mount_rcu() separately; Hugh noticed that this function
was doing the following:
*inode = path->dentry->d_inode;
In other words, the inode pointer is being re-fetched from the dentry
structure; this assignment happens regardless of whether the dentry
represents a mount point.
That is the true source of the
problem: if the dentry has been removed from the dcache after the lookup
process gained a reference, d_inode will be NULL. So
__follow_mount_rcu() will zero a pointer which had pointed to a
valid inode, causing later code to think that the file does not exist at
all.
Linus posted a fix for the real problem
along with his now-famous
Google+ posting saying that he was delaying the 3.0 release for a day
just in case:
We have a patch, we understand the problem, and it looks
ObviouslyCorrect(tm), but I don't think I want to release 3.0 just
a couple of hours after applying it.
Linus delayed the release despite the
inconvenient fact that it will push the 3.1 merge window into his
planned vacation. That was a well-placed bit of caution on his part: the
ObviouslyCorrect(tm) patch had YetAnotherSubtleBug(tm) in it. A fixed
version of the patch exists, and this particular bug should, at this point,
be history.
There is a sobering conclusion to be drawn from this episode, though. The
behavior of the dentry cache is, at this point, so subtle that even the
combined brainpower of developers like Linus, Al, and Hugh has a hard time
figuring out what is going on. These same developers are visibly nervous
about making changes in that part of the kernel. Our once approachable and
hackable kernel has, over time, become more complex and difficult to
understand. Much of that is unavoidable; the environment the kernel runs
in has, itself, become much more complex over the last 20 years. But if we
reach a point where almost nobody can understand, review, or fix some of
our core code, we may be headed for long-term trouble.
Meanwhile, we should be able to enjoy a 3.0 release (and a 2.6.39 update)
without mysteriously vanishing files. One potential short-term problem
remains, though: given that the next merge window will push into Linus's
vacation, there is a distinct chance that he might be more than usually
grumpy with maintainers who get their pull requests in late. Wise
subsystem maintainers may want to be ready to go when the merge window
opens.
Comments (27 posted)
By Jake Edge
July 20, 2011
The setuid() system call has always been something of a security
problem for Linux (and other Unix systems). It interacts oddly with
security and other kernel features (e.g. the unfortunately named "sendmail-capabilities
bug") and is often used incorrectly in programs. But, it is part of the
Unix legacy, and one that will be with us at least until the 2038 bug puts
Unix systems out of their misery. A recent patch from Vasiliy Kulikov arguably shows
these kinds of problems in action: weird interactions with resource limits
coupled with misuse of the setuid() call.
There is a fair amount of history behind the problem that Kulikov is trying
to solve. Back in 2003, programs that used setuid() to switch to
a non-root user could be used to evade the limit on the number of processes
that an administrator had established for that user
(i.e. RLIMIT_NPROC). But that was fixed with a patch from Neil Brown that
would cause the setuid() call to fail if the new user was at or above
their process limit.
Unfortunately, many programs do not check the return value from calls to
setuid() that are meant to reduce their privileges. That, in
fact, was exactly the hole that sendmail fell into when Linux capabilities
were introduced, as it did not check to see that the change to a new UID
actually succeeded. Buggy programs that don't check that return
value can cause fairly serious security problems because they assume their
actions are limited by the reduced privileges of the
switched-to user, but
are actually
still operating with the increased privileges (often root) that they
started with. In effect, the 2003 change made it easier for attackers to
cause setuid() to fail when RLIMIT_NPROC was being used.
Kulikov described the problem back in June,
noting that it was not a bug in Linux, but allowed buggy privileged
programs to wreak havoc:
I don't consider checking RLIMIT_NPROC in
setuid() as a bug (a lack of syscalls return code checking is a real
bug), but as a pouring oil on the flames of programs doing poorly
written privilege dropping. I believe the situation may be improved by
relatively small ABI changes that shouldn't be visible to normal
programs.
In the posting, he suggested two possible solutions to the problem. The
first is to
move the check against RLIMIT_NPROC from set_user()
(a setuid() helper function) to execve() as most programs
will check the status of that call (and can't really cause
any harm if they don't). The other suggestion is one that was proposed by Alexander
Peslyak (aka Solar Designer) in 2006 to cause a failed setuid()
call to send a SIGSEGV to the process,
which would presumably terminate those misbehaving programs.
The first solution is not complete because it would still allow users
to violate their process limit by using programs that do a
setuid() that is not followed by an execve(), but that is a
sufficiently rare case that it isn't considered to be a serious problem.
Peslyak's solution was seen as too big of a hammer when it was proposed,
especially for programs that do check the status of
setuid(), and might have proper error handling for that case.
There were no responses to his initial posting, but when he brought it back
up on July 6, he was pleasantly surprised
to get a positive response from Linus Torvalds:
My reaction is: "let's just remote the crazy check from set_user()
entirely". If somebody has credentials to change users, they damn well
have credentials to override the RLIMIT_NPROC too, and as you say,
failure is likely a bigger security threat than success.
The whole point of RLIMIT_NPROC is to avoid fork-bombs. If we go over
the limit for some other reason that is controlled by the super-user,
who cares?
That led to the patch, which changed do_execve_common() to return
an error (EAGAIN) if the user was over their process limit and
removed the check from set_user(). The patch was generally
well-received,
though several commenters were not convinced that it should go into the -rc
for 3.0 as Torvalds had suggested. In fact, as Brown dug into the patch, he
saw a problem that might need addressing:
Note that there is room for a race that could have unintended consequences.
Between the 'setuid(ordinary-user)' and a subsequent 'exit()' after execve()
has failed, any other process owned by the same user (and we know where are
quite a few) would fail an execve() where it really should not.
Basically, the problem is that switching the process to a new user could
now exceed the process limit, but that limit wouldn't actually be enforced
until an execve() was done (the failure of which would presumably
cause the process to exit). In the interim, any execve() from
another of the user's processes would fail. It's not clear how big of a
problem that is,
though it could certainly lead to unexpected behavior. Brown offered up
a patch that would address the problem by
adding a process flag (PF_NPROC_EXCEEDED) that would be set
if a setuid() caused the process to exceed RLIMIT_NPROC
and would then be checked in do_execve_common(). Thus, only the
execve() in the offending process would fail.
Kulikov and Peslyak liked the approach, though Peslyak was not convinced it
added any real advantages over Kulikov's original patch. He also pointed out that there could be a
indeterminate amount of time between the setuid() and
execve(), so the RLIMIT_NPROC test should be repeated when
execve() is called: "It would be surprising to see a process
fail on execve() because of RLIMIT_NPROC when that limit had been
reached, say, days ago and is no longer reached at the time of
execve()."
So far, Brown has not respun the patch to add that test. There is also the
question of whether the problem that Brown is concerned about needs to be
addressed at all, and whether it is worth using up another process flag
bit (there are currently only three left) to do so. In the end, some kind
of fix is likely to go in for 3.1 given Torvalds's interest in seeing this
problem with buggy programs disarmed. It's unclear which approach will win
out, but either way, setuid() will not fail due to exceeding the
allowable number of processes.
As Kulikov and others noted, it is definitely not a bug in the
kernel that is being fixed here. But, it is a common enough error in
user-space programs—often with dire consequences—which makes it
worthwhile to fix as a pro-active security
measure. Peslyak listed several recent
security problems that arose from programs that do not check the return
value from setuid(). He also noted that the problem is not
limited to setuid-root programs, as other programs that try to switch to a
lesser—differently—privileged user can also cause
problems when using setuid() incorrectly.
The impact of this fix is quite small, and badly written user-space
programs—even those meant to run with privileges—abound, which
makes this change more palatable than some other pro-active fixes. As we
have seen before, setuid() is subtle and quick to anger; it can
have surprising interactions with other
seemingly straightforward security measures. Closing a hole with
setuid(), even if the problem lives in user space, will definitely
improve overall Linux security.
Comments (4 posted)
By Jonathan Corbet
July 19, 2011
There are numerous use cases for a checkpoint/restart capability in the
kernel, but the highest level of interest continues to come from the
containers area. There is clear value in being able to save the complete
state of a container to a disk file and restarting that container's
execution at some future time, possibly on a different machine. The
kernel-based checkpoint/restart patch has been discussed here a number of
times, including
a report from last year's
Kernel Summit and
a followup published
shortly thereafter. In the end, the developers of this patch do not seem
to have been able to convince the kernel community that the complexity of
the patch is manageable and that the feature is worth merging.
As a result, there has been relatively little news from the
checkpoint/restart community in recent months. That has changed, though,
with the posting of a new patch by Pavel
Emelyanov. Previous patches have implemented the entire checkpoint/restart
process in the kernel, with the result that the patches added a lot of
seemingly fragile (though the developers dispute that assessment) code into
the kernel. Pavel's approach, instead, is focused on simplicity and doing
as much as possible in user space.
Pavel notes in the patch introduction that almost all of the information
needed to checkpoint a simple process tree can already be found in
/proc; he just needs to augment that information a bit. So his
patch set adds some relevant information there:
- There is a new /proc/pid/mfd directory containing
information about files mapped into the process's address space. Each
virtual memory area is represented by a symbolic link whose name is
the area's starting virtual
address and whose target is the mapped file. The bulk of this
information already exists in /proc/pid/maps, but the
mfd directory collects it in a useful format and makes it
possible for a checkpoint program to be sure it can open the exact
same file that the process has mapped.
- /proc/pid/status is enhanced with a line listing all of the
process's children. Again, that is information which could be
obtained in other ways, but having it in one spot makes life easier.
- The big change is the addition of a /proc/pid/dump
file. A process reading this file will obtain the information about
the process which is not otherwise available: primarily the contents
of the CPU registers and its anonymous memory.
The
dump file has an interesting format: it looks like a new
binary executable format to the kernel. Another patch in Pavel's series
implements the necessary logic to execute a "program" represented in that
format; it restores the register and memory contents, then resumes
executing where the process was before it was checkpointed. This approach
eliminates the need to add any sort of special system call to restart a
process.
There is need for one other bit of support, though: checkpointed processes
may become very confused if they are restarted with a different process ID
than they had before. Various enhancements to (or replacements for) the
clone() system call have been proposed to deal with this problem
in the past. Pavel's answer is a new flag to clone(), called
CLONE_CHILD_USEPID, which allows the parent process to request
that a specific PID be used.
With this much support, Pavel is able to create a set of tools which can
checkpoint and restart simple trees of processes. There are numerous
things which are not handled; the list would include network connections,
SYSV IPC, security contexts, and more. Presumably, if this patch set looks
like it can be merged into the mainline, support for other types of objects
can be added. Whether adding that support would cause the size and
complexity of the patch to grow to the point where it rivals its
predecessors remains to be seen.
Thus far, there has been little discussion of this patch set. The fact
that it was posted to the containers list - not the largest or most active
list in our community - will have something to do with that. The few
comments which have been posted have been positive, though. If this patch
is to go forward, it will need to be sent to a larger list where a wider
group of developers will have the opportunity to review it. Then we'll be
able to restart the whole discussion for real - and maybe actually get a
solution into the kernel this time.
Comments (21 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Distributions
If all goes well, Debian might be adding another port for 7.0 ("Wheezy"): GNU/Hurd. According to the most recent Hurd news, there's now "a real plan to release a Hurd variant" of Debian with Wheezy. It's not a definite, but the Hurd team is buckling down and trying to get Hurd into shape.
Samuel Thibault has created Debian GNU/Hurd CDs with a GUI installer for those who are interested. There's also work going on in the form of a Google Summer of Code (GSoC) project by Jérémie Koenig to improve Java on Hurd by porting OpenJDK, and create low-level Java bindings, which is also helping to improve Hurd for other applications that are being ported to it.
One might wonder why there's a renewed interest in Hurd. Linux
has now been around for nearly 20 years — an impressive run in terms
of longevity, especially considering all of the success that Linux has
enjoyed. But Linux might not have happened at all if GNU Hurd had been ready when Linus Torvalds started working on Linux. Despite the fact that Hurd predates Linux, the official kernel for GNU has never really been widely used or enjoyed more than a small fraction of the developer interest of Linux.
But in a world where even Duke Nukem Forever is finally completed, it makes sense to keep trying with Hurd, too — if for radically different reasons. Why are people still working on Hurd? I emailed Koenig and the Hurd maintainers list to find out what their motivation is.
In a response from the maintainers (Thibault, Arne Babenhauserheide, and Thomas Schwinge), the first reason given is that Hurd provides better support for GNU's Freedom 0 (The freedom to run the program for any purpose) "by giving users and programs as much control over their computing environment as possible." According to the maintainers, Hurd gives users more control over their system thanks to the translator architecture.
Understanding GNU Hurd and translators
GNU Hurd is a set of servers (translators) that run on top of the Mach
microkernel. Mach handles fewer things than the Linux kernel — it
mostly deals with memory management, inter-process communication, and task switching. In short, say the maintainers, "GNU Mach implements what is necessary to delegate all the rest to specialized and separated user-space translators." Mach also includes device drivers taken from Linux, but most are from the Linux 2.0 series. The maintainers say that a new driver framework is being developed that will allow embedding current drivers from Linux into user-space processes. Once complete, they say that this will reduce GNU Mach to providing only "basic hardware access abstractions, like forwarding IRQ events to the corresponding user-space processes."
Why are they called translators? Because they "translate" from one
representation of data into another — such as blocks stored on a hard
disk into a directory and file hierarchy. The maintainers say that Hurd
translators are similar to Linux's FUSE (filesystem in user space),
but they do not go through the kernel and have broader applications:
It thus looks more like gvfs, except that it's transparent for *all* applications in the system that are using the glibc/POSIX interfaces, including mere shells of course.
And this not only applies to file access, the same basic architecture also allows for setting up a VPN connection, for example, and make user-space applications use it without the need for administration rights (for altering the system's routing table, for example), or things like libsocks, since it's already meant to be pluggable from the glibc itself.
For some users the modularity of GNU Hurd may be a major advantage. The maintainers say that Hurd's translators provide more powerful features "that users have actually always wanted but never dared ask their administrator." The concepts of Hurd translators are interesting, but right now the list of translators is fairly small and there's little functionality that one wouldn't be able to get on Linux.
But functionality isn't everything. There's also the appeal of the
challenge of Hurd and learning in the process. The maintainers say that
learning and solving problems to get Hurd up and running "are indeed
a big part of the current maintainers' motivation." For example,
they say it has required getting involved with glibc "which proved to be useful to them in other contexts." Schwinge says that working on Hurd and "exposure and investigation of glibc," helped him to be accepted into CodeSourcery's apprentice program and to land a part-time job with the company while finishing his university studies. Thibault says that work with Hurd helped with his research and to land a position with XenSource.
The maintainers say that it's also fun to hack Hurd. "Debugging the TCP/IP stack can be done simply by running GDB on the pfinet translator (which handles the standard socket interface, and translates it to/from Ethernet frames)." They say it's easier to work on a translator for GNU Hurd than working inside the Linux kernel. "A translator can actually contain a complete TCP/IP stack implementation, and just be used as a container to debug and profile it."
Finally, for developers who want to have a direct impact, Hurd offers opportunities that Linux doesn't. "There are still some interesting tasks to achieve in the GNU Hurd, while contributing anything but drivers to the Linux kernel has become extremely challenging."
Missing pieces
Potential aside, GNU/Hurd is still not quite ready for widespread
deployment. However, the maintainers say that it's
reached the point of being "stable enough to be used
day-to-day." Now they say it's time to improve application
availability and support, and to provide Hurd to more users and developers by making it a full port of Debian.
There's still much to be done before Hurd is even up to par with the
Debian GNU/kFreeBSD port, much less Linux. So far, more than half of the
packages in the Debian archive (68%) have compiled successfully on Hurd, but many do not. Java, as already noted, is a problem. However, Hurd also fails to build or has problems with git, GNU Emacs, Screen, Wine, PulseAudio, and many others.
Koenig's work, if successful, will not only provide Hurd with a Java
implementation, but also some system interfaces that are required by
OpenJDK but not available yet in Hurd. Koenig says that he's working on
bringing Hurd's signal delivery code (in glibc) up to par with respect to
POSIX threads. He also says that he's adding features needed for Unix
sockets and other features needed by OpenJDK that are not present in Hurd. "OpenJDK is not very friendly from a porter's point of view; there are many places where the code assumes we're either on Solaris, Linux or Windows."
There's plenty of room for contributors, too. Koenig says that the Hurd team consists of "about a dozen people with various degrees of involvement," and the maintainers say that they're happy to see new developers. Instructions on getting started are included on the Hurd wiki, and there is even an option of getting shell access to a system running GNU/Hurd already.
Note that installing and running Debian GNU/Hurd at this time is for those who are not only adventurous, but also with a great deal of time to kill. Thibault has provided an image with the Debian GUI installer that will install a Debian GNU/Hurd distribution, but it's very slow and is unlikely to work on a large swath of real hardware. The image still uses Linux 2.0-era network drivers, so it's not likely to work on many newer machines. That is being worked on, however. The Hurd folks are working on using Device Driver Environment (DDE) work to reuse current Linux drivers on Hurd. DDE is a universal interface for drivers that is meant to allow using drivers for other systems (like Linux) on systems without a lot of native drivers.
Instructions are provided for booting it under KVM, but running the install under KVM takes hours even on a relatively fast system. It took about five hours to run through the installation once, with a minimal package set, on a Core i7 system with 1GB of RAM given over to the virtual machine running Hurd. According to Koenig, this is due to a KVM bug that causes a performance issue for Hurd on recent kernels. Thibault says that he's building an image that will have DMA enabled to work around this KVM PIO performance issue.
Once installed, it can be fairly slow — and users may run into problems, though some are possibly due to the distribution being based on Debian Sid than being caused by Hurd. For example, I ran into a number of problems trying to install packages, but it was unclear whether the problems lie with Hurd or with Sid.
Is GNU/Hurd going to challenge Linux anytime soon? It doesn't seem likely. But the effort to provide a full-blown port for Debian 7.0 seems to be going well so far. In the process, the folks working on Hurd may continue to uncover bugs and improve software that's common to both systems, and provide an interesting alternative for developers who want to dig deeper into system development.
Comments (41 posted)
Brief items
There is no precedent for requiring Debian packages to avoid trademark
infringement as a condition of inclusion in the archive. I am very much
opposed to anything that would require Debian to remove potentially
trademark infringing logos from packages "until we have agreement with the
trademark owners". This is entirely the wrong way around - we should always
assume that our use is permitted wrt trademark law unless either a) a court
ruling determines otherwise, or b) we decide it's not in our interest to
fight a lawsuit over the matter and as a project decide to stop using the
mark. In no event should the ftpmasters be preemptively deciding that such
works should be excluded from the archive pending an agreement unless so
directed by Debian's counsel in the course of litigation.
--
Steve Langasek
Well, while we're putting stakes in the ground, I suppose I'll hammer mine
in there as well. I completely disagree to the point that I would take
that to a GR.
--
Russ Allbery (on systemd in Debian)
If we lag behind in features that are good for GNU/Linux users (who are
the vast majority of our users) just because users of some ports can't
have them, we might force users to choose other distros, renouncing to
some of the unique features that Debian has to offer (freedom, quality,
open development, etc.). This of course goes both way: we should not
hold back non-Linux features on non-Linux kernels because the Linux
kernel lack them. Adopting that as a general principle would mean
offering, overall, the intersection of features available in all our
ports, something which is doomed to reduce with the growth of the number
of ports.
--
Stefano Zacchiroli
I'm not denying that change has no cost. But upkeep has a cost as
well. Just because my father-in-law can keep his 40+ year old snow
blower operationally tweaking it doesn't mean its the most valuable
use of his time compared to buying a newer one and learning how to
maintain the newer design. Then again his stated mission in life
isn't to innovate snow blower design and be on the cutting edge of
snow removal.
--
Jeff Spaleta (on systemd in Fedora)
Comments (3 posted)
The next Ubuntu Developer Summit (UDS) will be held October 31-November 4,
2011 in Orlando, Florida.
UDS is
where Ubuntu developers share ideas and plans for the next version of
Ubuntu (in this case v12.04). There are a limited number of sponsorships
available, and applications are open until August 24.
Full Story (comments: none)
The
LTIB project is an RPM based build
system that can be used to develop and deploy BSPs (Board Support Packages)
for a number of embedded target platforms including PowerPC, ARM,
Coldfire. The project is
hosted at
savannah.org. (Thanks to Rick Foos)
Comments (none posted)
Remember
Plan 9, the distributed
research operating system developed at Bell Labs?
9Front is a community driven fork of
Plan 9. The project has a
home page at Google code.
Some
new
features include a newer (more reliable) boot loader, an improved cwfs
as the default file system, support for the Go programming language, and
much more. (Thanks to Stein Gjoen)
Comments (none posted)
Distribution News
Mandriva Linux
The Mandriva board has
decided
to extend full support for Mandriva 2010.1 and Mandriva 2010.2
distributions for additional 6 months.
Comments (none posted)
Other distributions
The
Mageia community has been
discussing
the project's release cycle. "
The outcome of our discussions: the
release cycle for Mageia will be 9 months. We think it's a well-balanced
choice, providing an up-to-date distribution that's also stable. It should
also give us enough time to build the specifications, develop, package,
innovate and finalize it. Each Mageia release will be supported for 18
months." The project is also thinking about a Long Term Support
release every 18 months that will be supported for 3 years.
Comments (none posted)
Newsletters and articles of interest
Comments (none posted)
The
ninth
installment in the "systemd for administrators" series is actually a
lengthy discussion on why Lennart thinks that distributions should get rid
of the
/etc/sysconfig or
/etc/default directories.
"
They are inherently distribution-specific. With systemd we hope to
encourage standardization between distributions. Part of this is that we
want that unit files are supplied with upstream, and not just added by the
packager -- how it has usually been done in the SysV world. Since the
location of the directory and the available variables in the files is very
different on each distribution, supporting /etc/sysconfig files in upstream
unit files is not feasible. Configuration stored in these files works
against de-balkanization of the Linux platform."
Comments (34 posted)
Michael Reed
takes a
look at aptosid with KDE4. "
[aptosid] works well, and
subjectively, it feels a bit faster than Kubuntu 11.04. Thanks to extensive
documentation on the website, the low level customizations and the freaky
artwork, this is more than a mere respin of Debian. The adherence to
Debian's take on what constitutes truly free software will be either an
attractive feature or something that has to be worked around, depending on your belief system."
Comments (none posted)
Page editor: Rebecca Sobol
Development
July 15, 2011
This article was contributed by Nathan Willis
DigiKam has always stood out among
the Linux photography tools because it incorporates the features of what
are often two separate tools: the photo manager and the raw image editor.
When the KIPI plug-in API is
added into that list, the feature set can grow quite large. DigiKam announced the first release candidate for version 2.0.0 recently and, as one might expect, a host of new features dominates the new builds.
DigiKam is a KDE program, but apart from dependencies on core KDE and Qt libraries, operates in stand-alone mode, so is perfectly usable in GNOME or other desktop environments. For example, the application's photo collection management features take center stage: it manages large image collections and a wealth of metadata about each entry. But it maintains this information in its own application-specific database (user-configurable for either MySQL or SQLite, and with a built-in migration tool lest one decide to change), rather than relying on Tracker, Zeitgeist, or another external indexer. This approach also allows digiKam to keep an eye on multiple discrete directory locations, rather than requiring you to move your library to a central location (or copying it to a separate location for you) as many of its competitors do.
The database-backed collections framework gives digiKam powerful search
capabilities. It is aware of IPTC,
EXIF,
XMP,
and Makernote
metadata tags, plus user-defined tags, labels, and ratings, geolocation
information, filesystem data (file size, modification time, etc.), and
more. The search tools enable you to drill down into large collections
with compound queries. DigiKam even allows you to create multiple
"metadata templates" that are pre-filled with frequently used information.
Since 21st Century Laziness is often the primary reason users do not make
use of the metadata formats and ontologies available to them, this helps
keep the collection organized.
Much of Digikam's functionality is implemented through KIPI
plug-ins. The KIPI API is shared with other KDE-based image programs,
such as GwenView and KPhotoAlbum. The official Digikam packages use the plug-ins to implement
many of the export and display functions, plus auxiliary functions such
as DNG
conversion. Often, new functionality is first implemented as a plug-in,
such as support for editing a new type of metadata.
The 2.0.0 release candidate code can be downloaded from the project's SourceForge page. There, a source code bundle and a 32-bit Windows installer are available. Linux users wishing to test binary packages will need to find a distribution-specific build provided by a downstream maintainer — digiKam maintains a list of known packages, but does not currently release its own. FreeBSD and Mac OS X builds are also available from third parties.
Image organization
On the image management side of the application, there are a half-dozen or so new features in this release, several of which are the result of Google Summer of Code projects integrated into the main code base during a sprint this spring. The first is XMP sidecar support. XMP sidecars are metadata files that are associated with image formats that cannot store metadata internally. The sidecar files typically retain the base of the original filename, but use the .xmp extension.
As mentioned earlier, digiKam supports its own local metadata, such as user-assigned tags and ratings. The 2.0.0 series adds a pair of new label types: "color labels" and "pick labels." The pick labels appear as red, yellow, and green flags, and their meaning is described in tooltips as "rejected," "pending," and "accepted," respectively. Color labels are visible as a colored highlight around the thumbnail in the image browser. The colors available include the six basic primary and secondary colors, plus black, white, and gray, and there is no pre-defined semantic meaning assigned to any of them.
Considering that digiKam already gives users a wealth of other ways to sort and mark up collections (tags, star ratings, albums), it might seem odd to add more. But I think it is helpful to have multiple, orthogonal ways to mark up a collection, simply to sift through it on multiple factors — particularly when the sorting process may involve transitory issues not suitable for the assignment of a persistent tag. Consider trying to find the "best" image to accompany a particular blog post. Star ratings might reflect overall picture quality, which would leave the color labels open to use in some other part of the decision (such as illustrating different parts of the story). Thus sorted, the picks might come in handy for another user (e.g., an editor) to select among the alternatives. Attempting to do the same thing with star ratings alone or with tags would get confusing.
In addition to the new sorting dimensions, 2.0.0.-RC introduces keyboard shortcuts for assigning common tags, and it allows the user to select and "group" images in the thumbnail browser. Groups of images seem to operate much like a multi-item selection, in the sense that the user can apply changes to the entire group simultaneously, but they do not disappear with a stray mouse click. The tag-assigning keyboard shortcuts are entirely user-configurable, provided that one does not choose a key combination also captured by the window manager or another system component.
The so-called "reverse geocoding" feature is also new. This allows the user to look up human-readable place names to associate with latitude and longitude coordinates typically assigned automatically by GPS tagging software. The upshot is simply metadata that is easier to browse and easier to search.
Technical and editing changes
Sorting is not the only area of improvement in this release, however. Several new technical features make their debut as well, starting with face recognition (yes, the facial recognition data can be searched on, but it constitutes a substantially new feature in its own right). Users can add "face tags" in two ways: either by drawing rectangles on faces in individual images, or by allowing digiKam to scan the entire image collection and automatically mark what it determines to be faces.
At the moment, the documentation of the feature is scant, but the workflow seems to involve marking as many faces as you can stand to manually, adding a name for each. The names are converted to "People tags" in the general tag database. Upon a blind scan-and-identify run, digiKam will compare the unknown faces to the already tagged-and-labeled specimens. Obviously, the higher the percentage of your suspects you tag, the easier digiKam will recognize them in the future.
In the image editing arena, this release of digiKam uses an updated version of the LibRaw library (0.13.5), which adds a few noteworthy features of its own. LibRaw began as an attempt to massage Dave Coffin's dcraw utility into an API-stable shared library usable by other applications. This release, however, also imports several new advanced raw decoding options originally found in the RawTherapee application. Owners of Sigma DSLRs will also be happy to learn that the more recent version of LibRaw includes support for their cameras' Foveon sensors. The Foveon uses a three-layer light sensor that captures RGB data at the same grid location, as opposed to the matrix of single-color detectors found in most other cameras. As a result, entirely different decoding mechanisms are required. Canon is also reportedly working on a 3-layer sensor, so LibRaw and digiKam support for the decoding algorithms is important news.
DigiKam has also added support for file versioning in the editor component. As with all raw photo editors, the editing process is non-destructive to the original image, but most applications do not easily allow the user to save multiple versions of the "edit list" file. digiKam's editor component allows you to view the version history as a flat list (similar to the history pane in GIMP), or as a tree that preserves individual branches created when you roll back and make different edits. DigiKam's editing capabilities lie somewhere in between the color- and
exposure-adjustment-only functions found in a typical raw converter and
those of a full-blown raster editor. There are a few filter effects and
simple touch-up tools (such as a red-eye corrector), but it also allows
you to open any image in an external editor application from the
right-click context menu.
Finally, digiKam has always supported easy export of images to devices and other applications, and this release adds two: the Czech web service RajCe, and MediaWiki. The MediaWiki exporter is compatible with Wikimedia properties (including Wikipedia) that require authenticated user accounts to upload content.
Focus
By and large, the new additions to digiKam are welcome. Most, such as
pick and color tags, keyboard shortcuts, or reverse geocoding, are designed
to make searching and managing your images a simpler and more intuitive
task. A few of the new features, however, I still find difficult to use.
The face recognition process, for example, is awkward. Drawing rectangles over people's faces is simple enough, but the pop-up window that appears once you do so is unhelpful: it pre-fills the top line with "Unknown," which you might expect to leave the newly-marked face in a blank state, but instead creates a People Tag named "Unknown." The other two buttons on the pop-up window are "Confirm" and "Remove" — but Confirm appears only to remove the unknown face from the set of faces to be scanned by the recognition software. Add to that the fact that by default all of the face tags are invisible to the eye, and you have a confusing user experience. Perhaps the documentation will improve on the situation.
Speaking of awkwardness, I have always disliked vertical side-tabs in
GUIs, in any application. At best they are difficult-to-read labels, and
at worst they make it unclear which portions of the UI belong to the "tab"
and which do not. That problem goes double for interfaces that feature a
set of vertical tabs on the left hand side and a separate set on the
right.
For horizontally-written
languages, vertical tabs give you sideways text labels (running in two
different directions depending on whether they are stacked on the left-
or right-hand edge), and applications that use them invariably also use
normal horizontal menus and toolbars across the top of the window,
introducing ambiguity as to which edge of the pane controls its contents.
digiKam inflicts this on you, plus it provides no text labels for the un-selected tabs, forcing you to hover the cursor over them to discern the meaning of the cruelly-tiny icons. I suppose the only good thing about this UI design is that it is a clear sign that the application is filled to the brim with features. Still, I wouldn't shed any tears if it went away.
Apart from the interface woes, though, I found the digiKam 2.0.0 release
candidate remarkably stable and fast. As always, it scores points for
managing sizable collections of images and for providing a myriad of ways
to arrange and edit content as the situation dictates. The final release
of 2.0.0 is slated for "late July," so it should be a short wait for what
looks to be a great update.
Comments (4 posted)
Brief items
But what brings us here today is a gentle reminder that when you
write code this bad, you can actually kill people.
I'm leaving the DB-dump images in the following quote as a reminder
of just how insane this code was. Think of these as skulls on
sticks at the edge of the wasteland, saying "Never pass this way
again".
--
Jamie
Zawinski
It annoys me to no end that I even feel I need to write these down, but
apparently I do. So, here are a couple of simple rules to guide your
behaviour around here:
- Do not call other developers nor users idiots nor other derogatory
terms on the mailing list.
- Do not use Twitter or other public broadcasting systems to call
other developers or users idiots or other derogatory terms.
That's all, pretty simple actually. Most well-adjusted people would not
stand up in a crowd of people and start calling people around them
idiots. Just because there is a monitor and a network cable separating
you from the crowd doesn't make it ok, and I am tired of it.
--
Rasmus Lerdorf
People *state* that it would be good to have more Parrot
developers on Windows, but they really would like those developers
to be *somebody else*...
Our mantra: "Parrot is a virtual machine aimed at all dynamic
languages." The reality: "Parrot is a virtual machine aimed at
all dynamic languages, provided you're on Linux."
--
James E Keenan
Reality showed us that the balance of maintaining the code of other
OSs in the main repository is much more work than the work the few
lines of useful code the few people of the other OSs contribute. We
tried with many projects in the past, and decided against it. The
needed abstractions are just hard to manage and get into our way
all the time.
The only really thinkable solution for the niche OSs is to port the
needed Linux interfaces to their kernels. But I guess that will
never happen, and so systemd, udev, ... will probably never happen
for them.
--
Kay Sievers
Comments (4 posted)
Version 3.6.0 of the Parrot multi-language virtual machine is available.
This release cleans up some code and fixes bugs.
Full Story (comments: 2)
IBM's Rob Weir, noting that "
we at IBM have not been exemplary
community members when it came to OpenOffice.org," has announced
that IBM will be contributing its "Symphony" OpenOffice.org fork to the new
Apache-based OpenOffice project. "
First, we're going to contribute
the standalone version of Lotus Symphony to the Apache OpenOffice.org
project, under the Apache 2.0 license. We'll also work with project
members to prioritize which pieces make sense to integrate into OpenOffice.
For example, we've already done a lot of work with replacing GPL/LPGL
dependencies. Using the Symphony code could help accelerate that work and
get us to an AOOo release faster."
Full Story (comments: 36)
The Freedom to Tinker site carries
an
announcement for Telex, a new approach to the circumvention of
censorship of the net by national governments. "
As the connection
travels over the Internet en route to the non-blacklisted site, it passes
through routers at various ISPs in the core of the network. We envision
that some of these ISPs would deploy equipment we call Telex
stations. These devices hold a private key that lets them recognize tagged
connections from Telex clients and decrypt these HTTPS connections. The
stations then divert the connections to anti-censorship services, such as
proxy servers or Tor entry points, which clients can use to access blocked
sites. This creates an encrypted tunnel between the Telex user and Telex
station at the ISP, redirecting connections to any site on the
Internet." There is a proof-of-concept implementation available on
the Telex site.
Comments (23 posted)
From hacks.mozilla.org comes
Tilt,
a Firefox extension which creates a 3D display of a pages document object
model. "
Unlike other developer tools or inspectors, Tilt allows for
instant analysis of the relationship between various parts of a webpage in
a graphical way, but also making it easy for someone to see obscured or
out-of-page elements. Moreover, besides the 3D stacks, various information
is available on request, regarding each node's type, id, class, or other
attributes if available, providing a way to inspect (and edit) the inner
HTML and other properties." There's a video available for those who
want to see the eye candy in action without actually installing the
extension.
Comments (9 posted)
Version 4.1 of the VirtualBox virtualization system from Oracle is out.
New features include a virtual machine cloning mechanism, support for
guests with up to 1TB of RAM, better remote access, and more; see
the press
release and
the
changelog for more information.
Full Story (comments: 1)
Newsletters and articles
Comments (none posted)
Christopher Blizzard
describes
the goals for - and motivations behind - the move toward multi-process
Firefox. "
Physical pages of memory are allocated at the
operating system layer and handed to user processes, at the process level,
as virtual pages. The best way to return those to the operating system is
to exit the process. It's a pretty high-level granularity for recycling
memory, for very long-running browser sessions it's the only way to get
predictable memory behaviour. This is why content processes offer a better
model for returning memory to the operating system over time."
Comments (24 posted)
Gervase Markham has put up
a
brief posting on Mozilla's advantages as he sees them, and on how those
advantages could be better used. "
I'm sure a large proportion of
Chrome users are ex-Firefox users. In the time before Chrome, when we were
the new shiny, we missed an opportunity to educate them about why Mozilla
is different, why the open web is important, and why having the coolest,
fastest, slickest browser around is a great thing but it's not the most
important thing. So when something they perceived as cooler, faster and
slicker came long, they left us for precisely the same reason they
arrived. We didn't tell them why they should stay."
Comments (36 posted)
Linux Journal
examines using the
Maxima computer algebra system for doing calculus. "
Putting all these techniques together, you can solve a differential equation for a given variable—for example, solve dy/dx = f(x) for y. You can do this by doing all the required algebra and calculus, but you don't really need to. Maxima has the very powerful function, ode2, which can do it in one step."
Comments (4 posted)
Nathan Willis
looks
at some tools for capturing full-motion video and audio of your
desktop. "
The leading screen recorders at present are recordMyDesktop
and Istanbul. The feature sets are roughly comparable: both record to Theora-encoded video with Vorbis audio, both allow you to select just the portion of the screen you are interested in recording, and work in multiple desktop environments."
Comments (11 posted)
Page editor: Jonathan Corbet
Announcements
Brief items
One fallout from the acquisition of Novell was the laying-off of the Mono
team. Novell/SUSE has now
announced
a deal with Xamarin (the company those Mono developers formed) to continue
supporting Mono for SUSE users. "
The agreement grants Xamarin a
broad, perpetual license to all intellectual property covering Mono,
MonoTouch, Mono for Android and Mono Tools for Visual Studio. Xamarin will
also provide technical support to SUSE customers using Mono-based products,
and assume stewardship of the Mono open source community project."
Comments (none posted)
Articles of interest
Rob Tiller has
a
brief article on opensource.com summarizing a one-year review of the
effects of the Bilski decision in the US Supreme Court. "
That trend
is continuing in a good direction. A new study of the first full year of
decisions applying Bilski to software confirms that the direction of the
case law is toward finding software is not patentable subject
matter." The full study is available as
a 118-page PDF.
Comments (40 posted)
Allison Randal has put up
a
lengthy collection of thoughts about the Harmony agreements 1.0 release
and the process that created them. "
I expect that over the next year
a handful of projects will adopt Harmony agreements. That may not sound
like much, but I consider my time on Harmony well-spent when I count the
collective human-years that will be saved from drafting and redrafting
contributor agreements. That time can be much better spent on community
building, documentation, coding and everything else that makes FLOSS
projects great. Some posters expressed concern that the mere existence of
Harmony might divert some projects from one philosophy or legal strategy to
another. I just don't see that happening. The community of FLOSS developers
are some of the most legally aware and opinionated non-lawyers on the
planet. Harmony will be useful to those projects who would have adopted a
CLA/CAA anyway, or for projects who already have one and are looking for an
update."
Comments (11 posted)
New Books
Pragmatic Bookshelf has released "Continuous Testing with Ruby, Rails, and
JavaScript", by Ben Rady and Rod Coffin.
Full Story (comments: none)
Upcoming Events
The Desktop Summit 2011 will be held August 6-12, 2011 in Berlin, Germany.
Intel is a platinum sponsor. "
The event also welcomes Collabora and
SUSE as Gold partners. The organization is delighted with the community
spirit of these generous corporate partners."
Full Story (comments: none)
This year's
Linux Plumbers
Conference will have a track focusing on development tools on September
7, 2011. "
Talks will cover a variety of topics, including bug
finding, patch management, and code vizualization." Click below for
a schedule.
Full Story (comments: none)
The 8th netfilter developer workshop will take place August 22-26, 2011 in
Freiburg, Germany. The project is seeking conference sponsors. "
If
your company is using netfilter and always wanted to give something back,
this is a great opportunity. Sponsorship money will be used for travel and
accomodation expenses to get the developers to the workshop. Perks include
your company name and logo on the workshop page, depending on the level of
sponsorship an invitation for one or more of your own developers and a
heartily thanks from the netfilter project."
Full Story (comments: none)
Registration is open for the openSUSE Conference which will be held in
Nürnberg, Germany, September 11-14, 2011. Some travel subsidies will
be available. A standard invitation letter is available for those who need
to get a visa. "
Last but not least, there is an option between two different tickets. Entrance is free for all but those who want to support the conference and can afford it can buy a professional ticket. For $350 or €250 you get, besides access to the conference, a networking dinner with the speakers and other professional ticket holders as well as a session with SUSE Product Management."
Full Story (comments: none)
The official schedule for PyCon Australia 2011 has been announced.
"
This year's conference will feature 3 fantastic keynotes, 7
introductory classroom sessions, and 26 presentations on topics as diverse
as web programming, benchmarking, social issues and API design."
PyCon Australia takes place August 20-21, 2011 in Sydney, Australia.
Full Story (comments: none)
There are a number of Free and Open Source Software related events in the
Ukrainian city of Odessa, including FOSS Sea Conference, Summer and Winter
FOSS Fests, WebCamp and others. "
Our events involve people from Ukraine, Russia, Belarus and Moldova, but we will be happy to expand the geography of our participants, and therefore we are looking for new participants and partners. We kindly invite you to join, and distribution of the information about our events among your friends, at your web resources or in your communities will be highly appreciated."
Full Story (comments: none)
Events: July 28, 2011 to September 26, 2011
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
July 24 July 30 |
DebConf11 |
Banja Luka, Bosnia |
July 25 July 29 |
OSCON 2011 |
Portland, OR, USA |
July 30 July 31 |
PyOhio 2011 |
Columbus, OH, USA |
July 30 August 6 |
Linux Beer Hike (LinuxBierWanderung) |
Lanersbach, Tux, Austria |
August 4 August 7 |
Wikimania 2011 |
Haifa, Israel |
August 6 August 12 |
Desktop Summit |
Berlin, Germany |
August 10 August 12 |
USENIX Security 11: 20th USENIX Security Symposium |
San Francisco, CA, USA |
August 10 August 14 |
Chaos Communication Camp 2011 |
Finowfurt, Germany |
August 13 August 14 |
OggCamp 11 |
Farnham, UK |
August 15 August 16 |
KVM Forum 2011 |
Vancouver, BC, Canada |
August 15 August 17 |
YAPC::Europe 2011 Modern Perl |
Riga, Latvia |
August 17 August 19 |
LinuxCon North America 2011 |
Vancouver, Canada |
August 20 August 21 |
PyCon Australia |
Sydney, Australia |
August 20 August 21 |
Conference for Open Source Coders, Users and Promoters |
Tapei, Taiwan |
August 22 August 26 |
8th Netfilter Workshop |
Freiburg, Germany |
| August 23 |
Government Open Source Conference |
Washington, DC, USA |
August 25 August 28 |
EuroSciPy |
Paris, France |
August 25 August 28 |
GNU Hackers Meeting |
Paris, France |
| August 26 |
Dynamic Language Conference 2011 |
Edinburgh, United-Kingdom |
| August 27 |
PyCon Japan 2011 |
Tokyo, Japan |
| August 27 |
SC2011 - Software Developers Haven |
Ottawa, ON, Canada |
August 27 August 28 |
Kiwi PyCon 2011 |
Wellington, New Zealand |
August 30 September 1 |
Military Open Source Software (MIL-OSS) WG3 Conference |
Atlanta, GA, USA |
September 6 September 8 |
Conference on Domain-Specific Languages |
Bordeaux, France |
September 7 September 9 |
Linux Plumbers' Conference |
Santa Rosa, CA, USA |
| September 8 |
Linux Security Summit 2011 |
Santa Rosa, CA, USA |
September 8 September 9 |
Italian Perl Workshop 2011 |
Turin, Italy |
September 8 September 9 |
Lua Workshop 2011 |
Frick, Switzerland |
September 9 September 11 |
State of the Map 2011 |
Denver, Colorado, USA |
September 9 September 11 |
Ohio LinuxFest 2011 |
Columbus, OH, USA |
September 10 September 11 |
PyTexas 2011 |
College Station, Texas, USA |
September 10 September 11 |
SugarCamp Paris 2011 - "Fix Sugar Documentation!" |
Paris, France |
September 11 September 14 |
openSUSE Conference |
Nuremberg, Germany |
September 12 September 14 |
X.Org Developers' Conference |
Chicago, Illinois, USA |
September 14 September 16 |
Postgres Open |
Chicago, IL, USA |
September 14 September 16 |
GNU Radio Conference 2011 |
Philadelphia, PA, USA |
| September 15 |
Open Hardware Summit |
New York, NY, USA |
| September 16 |
LLVM European User Group Meeting |
London, United Kingdom |
September 16 September 18 |
Creative Commons Global Summit 2011 |
Warsaw, Poland |
September 16 September 18 |
Pycon India 2011 |
Pune, India |
September 18 September 20 |
Strange Loop |
St. Louis, MO, USA |
September 19 September 22 |
BruCON 2011 |
Brussels, Belgium |
September 22 September 25 |
Pycon Poland 2011 |
Kielce, Poland |
September 23 September 24 |
Open Source Developers Conference France 2011 |
Paris, France |
September 23 September 24 |
PyCon Argentina 2011 |
Buenos Aires, Argentina |
September 24 September 25 |
PyCon UK 2011 |
Coventry, UK |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol