June 10, 2013
This article was contributed by Josh Berkus
This year's pgCon, which concluded
May 25th,
included an unusually high number of changes to the PostgreSQL community,
codebase, and development. Contributors introduced multiple new major
projects which will substantially change how people use PostgreSQL,
including parallel query, a new binary document store type, and pluggable
storage. In addition, Tom Lane switched jobs, four new committers were
selected, pgCon
had the highest attendance ever at 256 registrations, and held its first unconference after the regular conference. Overall, it was a mind-bending and exhausting week.
pgCon is a PostgreSQL developer and advanced user conference held in
Ottawa, Canada every year in May. It usually brings together most of the
committers and major contributors to the project in order to share ideas,
present projects and new features, and coordinate schedules and code
changes. The main conference days are preceded with various summits,
including the PostgreSQL Clustering Summit, the Infrastructure Team meeting, and the Developer Meeting. The latter consists of a closed summit of 18 to 25 top code contributors to PostgreSQL who coordinate feature development and plans for the next version (PostgreSQL 9.4).
Parallel query
The longest and most interesting discussion at the developer meeting was about adding parallel query capabilities to PostgreSQL. Currently, the database is restricted to using one process and one core to execute each individual query, and cannot make use of additional cores to speed up CPU-bound tasks. While it can execute dozens of separate queries simultaneously, the lack of individual-query parallelism is still very limiting for users of analytics applications, for which PostgreSQL is already popular.
Bruce Momjian announced on behalf of EnterpriseDB that its engineering
team would be focusing on parallelism for the next version of PostgreSQL.
Noah Misch will be leading this project. The project plans to have index
building parallelized for 9.4, but most of its work will be creating a general framework for parallelism. According to Momjian, there are three things you need for any parallel operation in the database:
- An efficient way of passing data to the parallel backends, probably using a shared memory facility.
- A method for starting and stopping worker processes.
- The ability for the worker processes to share the reference data and state information of parent process.
The EnterpriseDB team had explored using threads for worker processes,
but these were not seen as a productive approach, primarily because the
PostgreSQL development team is used to working with processes and the
backend code is structured around them. While the cost of starting up
processes is high compared to threads, the additional locking required for
threading looked to be just as expensive in performance terms. Momjian put it this way:
With threads, everything is shared by default and you have to take specific steps not to share.
With processes, everything is unshared by default, and you have to specifically share things.
The process model and explicit sharing is a shorter path from where we are currently.
The PostgreSQL developers plan to build a general framework for parallelism and then work on parallelizing one specific database task at a time. The first parallel feature is expected to be building indexes in parallel using parallel in-memory sort. This is an important feature for users because building indexes is often slower than populating the the underlying table, and it is often CPU-bound. It's also seen as a good first task because index builds run for minutes rather than milliseconds, so optimizing worker startup costs can be postponed until later development.
PostgreSQL and non-volatile memory
Major contributor KaiGai Kohei of NEC brought up the recent emergence of non-volatile memory (NVRAM), or persistent memory, devices and discussed ideas on how to take advantage of them for PostgreSQL. Intel engineer Matthew Wilcox further reinforced the message that NVRAM was coming in his lightning talk on the first day of pgCon. NVRAM persists its contents after a system power cycle, but is addressed like main memory and is around 50% as fast.
Initially, Kohei is interested in using NVRAM for the PostgreSQL Write Ahead Log (WAL), an append-only set of files that is used to guarantee transactional integrity and crash safety. This will work with the small sizes and limited write cycles of the early NVRAM devices. For servers with NVRAM, WAL writes would go to a memory region on the device allocated using mmap(). In later generations of NVRAM, developers can look at using it for the main database files.
There are many unknowns about this technology, such as what method can be employed to guarantee absolute write ordering. Developers speculated about whether transactional memory could somehow be employed for this. Right now, the PostgreSQL community is waiting to get its collective hands on an NVRAM device for testing and development.
Disqus keynote
Mike Clarke, Operations Lead of Disqus, delivered the
keynote for pgCon this year. Disqus is the leading comment-hosting
platform, which is used extensively by blogs and news sites all over the
internet. Its technology platform includes Python, Django, RabbitMQ, Cassandra and PostgreSQL.
Most of its over three terabytes of data, including comments, threads, forums, and user profiles, is stored in PostgreSQL. This adds up to over 50,000 writes per second to the database, and millions of new posts and threads a day.
Clarke extolled the virtues of SSD storage. Disqus uses a 6-node
master-slave replication database cluster running on fast machines with
RAIDed SSD storage. Only SSDs have allowed it to continue scaling to its
current size. Prior to moving to SSD-based storage, Disqus was at 100% IO
utilization and had continual problems with long IO wait times. Now
utilization is down and wait times are around one millisecond.
He also complained about some of the pain points of scaling out
PostgreSQL. Disqus uses Slony-I, PostgreSQL's older replication system,
which the company has customized to its workload, and feels that it can't
afford to
upgrade. For that reason, Clarke is eagerly awaiting the new logical
replication system expected with PostgreSQL 9.4 next year. He also was
unhappy about the lack of standard design patterns for PostgreSQL proxying and failover; everyone seems to build their stack differently. On the other hand, he praised extensions as the best feature of PostgreSQL, since they allow building applications inside the database.
Clarke ended with a request for some additional PostgreSQL features. He
wants tools to enable sharded multiserver databases to be built inside
PostgreSQL more easily, such as by improving PL/Proxy, the
distributed table interface extension for PostgreSQL introduced by Skype.
He'd also like to see a query progress indicator, something that was later presented at pgCon by Jan Urbański.
HStore and the future of JSON
During the regular talks, Oleg Bartunov and Teodor Sigaev introduced the
prototype of the next version of their "hstore"
extension for PostgreSQL. hstore allows storing a simple key-value store,
a "hash" or "dictionary", in a PostgreSQL field, and to allow indexing the keys. Today, many users of PostgreSQL and JSON use it to store "flattened" JSON objects so that they can be indexed on all keys. The presentation introduced a new version of hstore which can nest, as well as storing arrays, so that it will be a closer match for fully structured JSON, as well as for complex multi-level hashes and dictionaries in Perl and Python.
This prototype new hstore also supports indexing which enables very fast
lookup of keys, values, and even document fragments many levels deep. In
their tests, they used the Del.icio.us dataset, which includes 1.2 million
bookmark documents, and were able to search out all values matching a
complex nesting expression in 0.1 seconds, or all instances of a common key
in 0.5 seconds. The indexes are also reasonably sized, at around 75% of
the size of the data to which they are attached. Earlier attempts to index tree-structured text data in PostgreSQL and
other databases have resulted in indexes which are significantly larger
than the base table.
Individual hstore fields can be up to 512MB in size.
While many attendees were excited and impressed by the prototype, some were unhappy. Several contributors were upset that the new type wasn't JSON. They argued that the PostgreSQL project didn't need a non-standard type and interface, when what developers want is a binary, indexed JSON type. After extensive discussion, Bartunov and Sigaev agreed to work on JSON either instead of, or in addition to, a new hstore for the next version.
Hopefully, this means that users can expect a JSON type for version 9.4 that supports arbitrary nested key lookup and complex search expressions. This would make PostgreSQL more suitable for for applications which currently use a JSON document database, such as MongoDB or CouchDB. With the addition of compatibility projects like Mongres, users might even be able to run such applications largely unaltered.
Pluggable storage
The final day of pgCon this year was the conference's first-ever "unconference day". An unconference is a meeting in which the attendees select sessions and compose the schedule at the event. Unconferences tend to be more discussion-oriented than regular conferences, and center more around recent events and ideas. Around 75 of the pgCon attendees stayed for the unconference.
One of the biggest topics discussed at the unconference was idea of making PostgreSQL's storage "pluggable". Historically, companies have wanted to tailor the database for particular workloads by adding support for column store tables, clustered storage, graphs, streaming data, or other special-purpose data structures. These changes have created incompatible forks of PostgreSQL, such as Greenplum or Vertica, cutting off any development collaboration with those vendors. Other companies, such as Huawei and Salesforce, who are newly involved in PostgreSQL, would like to be able to change the storage model without forking the code.
The PostgreSQL contributors discussed methods of accomplishing this.
First, they discussed the possibility of using the Foreign Data
Wrapper (FDW) facility to attach new storage types. Foreign Data
Wrappers allow users to attach external data, such as other databases,
through a table interface. After some discussion, this was seen as
unsuitable in most cases, since users want to actually manage tables,
including creation, backup, and replication, through PostgreSQL, not just
"have a window" into them. They also want to support creating indexes on
different storage types.
If FDW won't work, the developers will need to create a new set of hooks and an API for "storage managers". This was actually supported by early versions of POSTGRES at the University of California, which had prototypes of both an in-memory and a write-once media (WORM) storage manager. However, that code has atrophied and doesn't support most current PostgreSQL features.
For any potential storage, the storage manager would need to support several conventions of PostgreSQL, including:
- having tuples (rows) which are structured like PostgreSQL's tuples, including metadata
- being transactional
- providing a method for resolving data visibility
- providing a physical row identifier for index building
The PostgreSQL system catalogs would stay on the current conventional native storage, regardless of what new types of storage managers were added.
If implemented, this would be a major change to the database system. It
would become possible to use PostgreSQL as a query engine, transaction
manager, and interface for very different types of databases, both
proprietary and open source. It might even become possible for MySQL
"storage engine" vendors, such as Infobright and Tokutek, to port their
products. Peter van Hardenberg of Heroku suggested it might also make it
possible to
run PostgreSQL on top of HDFS.
Committer changes
The most frequently quoted news from pgCon this year was news that Tom Lane, lead committer on PostgreSQL, was changing employers from Red Hat to Salesforce. While announced in a rather low-key way through the Developer Meeting notes and Lane's show badge, this was big enough news that Wired picked it up. Lane had worked at Red Hat for 11 years, having joined to support Red Hat Database, its distribution of PostgreSQL. While Red Hat Database was eventually canceled, Lane stayed on at Red Hat, which was very supportive of his contributions to the project.
Lane's move is more significant in what it says about Salesforce's
commitment to PostgreSQL than any real change in his expected activities as
a committer. Until now, most commentators have suggested that Salesforce's
mentions of PostgreSQL were merely posturing, but hiring Lane suggests that
it plans to follow through on migrating away from Oracle Database. Six other Salesforce staff also attended pgCon. Its exact plans were not shared with the community, although it's reasonable to hypothesize from development discussions at the conference that Salesforce plans to contribute substantially to the open-source project, and that pluggable storage is a development target.
Lane memorialized his change of employment by putting his nine-year-old Red Hat laptop bag into the charity auction at the end of pgCon. It sold for $170.
The PostgreSQL Core Team, a six-member steering committee for the project, announced the selection of four new committers to PostgreSQL: Jeff Davis of Aster Data, author of the range types feature in version 9.2; Fujii Masao of NTT Data, main author of the synchronous replication feature; Stephen Frost of Resonate, author of several security features; and Noah Misch of EnterpriseDB, author of numerous SQL improvements. This brings the number of committers on PostgreSQL to twenty.
More PostgreSQL
Of course, there were many other interesting presentations and talks at pgCon. Keith Paskett ran a tutorial on optimizing and using PostgreSQL on ZFS atop OmniOS (an OpenSolaris fork), while other users talked about using PostgreSQL on ZFS for Linux. Jeff Davis presented strategies to use PostgreSQL's new anti-disk-corruption features. Josh McDermott ran another Schemaverse tournament, as a qualifier for the upcoming Defcon tournament. Robert Haas showed the most common failures of the PostgreSQL query planner, and sparked discussion about how to fix them.
On the second full conference day, Japanese community members presented the newly-formed PostgreSQL Enterprise Consortium of Japan, a group of 39 Japanese companies aiming to promote and improve PostgreSQL. This group is currently working on clustered PostgreSQL, benchmarking, and migration tools to migrate from other database systems. And just for fun, Álvaro Hernández Tortosa demonstrated creating one billion tables in a single PostgreSQL database.
Overall, it was the most exciting pgCon I've attended, and shows the many new directions in which PostgreSQL development is going. Anyone there got the impression that the project would be completely reinventing the database within a few years. If you work with PostgreSQL, or are interested in contributing to it, you should consider attending next year.
[ Josh Berkus is a member of the PostgreSQL Core Team. ]
Comments (6 posted)
As always, there were more sessions at the recently completed triumvirate
of Linux Foundation conferences in Tokyo than can be written up. In fact,
also as usual, there were more sessions available than people to cover
them. The Automotive
Linux Summit Spring, LinuxCon
Japan, and CloudOpen
Japan covered a lot of ground in five days. Here are reports from three presentations at
LinuxCon.
OSS meetups in Japan
Hiro Yoshioka spoke about the types of open source gatherings that go on in
Japan. He is the technical managing officer for Rakuten, which is a large
internet services company in Japan. Before that, he was the CTO of Miracle
Linux from 2000 to 2008.
The goal of his talk was to encourage other Japanese people in the audience to start
up their own "meetups" and other types of technical meetings and seminars,
but the
message was
applicable anywhere. Organizing these meetings is quite rewarding, and
lots of fun, but it does take some time to do, he said.
Yoshioka used the "kernel code reading party" that he started in Yokohama in
April 1999 as a case study. He wondered if he would be able to read the kernel
source code, so he gathered up some members of the Yokahama Linux Users
Group to create an informal technical seminar to do so. The name of
the meeting
has stuck, but the group no longer reads kernel source. Instead, they have
presentations on kernel topics, often followed by a "pizza and beer party".
There are numerous advantages to being the organizer of such a meeting, he
said. You get to choose the date, time, and location for the event, as
well as choosing the speakers. When he wants to learn about something in
the kernel, he asks someone who knows about it to speak. Presenters also
gain from the experience because they get to share their ideas in a relaxed
setting. In addition, they can talk about an "immature idea" and get
"great feedback" from those attending. Attendees, of course, get to hear
"rich technical information".
Being the organizer has some downsides, mostly in the amount of time it
takes. The organizer will "need to do everything", Yoshioka said, but
sometimes the community will help out. In order to make the meetings
"sustainable", the value needs to exceed the cost. So either increasing
the value or decreasing the cost are ways to help make the meetings
continue. Finding great speakers is the key to making the value of the
meetings higher, while finding inexpensive meeting places is a good way to
bring down costs.
How to find the time to organize meetings like those he mentioned was one
question from the audience. It is a difficult question, Yoshioka said, but
as with many things it comes down to your priorities. Another audience
member noted that convincing your employer that the meeting will be useful
in your job may allow you to spend some of your work time on it. "Make it
part of your job".
Another example that Yoshioka gave was the Rakuten Technology
Conference, which has been held yearly since 2007. It is a free
meeting with
content provided by volunteers. In the past, it has had keynotes from Ruby
creator Matz and Dave Thomas of The Pragmatic Programmer. Proposals
for talks are currently under discussion for this year's event, which will
be held on October 26 near Shinagawa station in Tokyo. Unlike many other
technical meetings in Japan, the conference is all in English, he said.
The language barrier was of interest to several non-Japanese audience
members. Most of the meetings like Yoshioka described are, unsurprisingly,
in Japanese, but for non-speakers there are a few possibilities. The Tokyo
hackerspace has half of its meetings in English, he said, and the Tokyo
Linux Users Group has a web page and
mailing list in English.
In addition, Yoshioka has an English-language blog with occasional posts covering the
kernel code
reading party meetings and other, similar meetings.
One laptop per child
A Kroah-Hartman different from the usual suspect spoke in the student track.
In a presentation that followed her father's, Madeline Kroah-Hartman looked
at the One Laptop Per Child (OLPC) project, its history, and some of its
plans for the future. She has been using the OLPC for a number of years,
back to the original XO version,
and she brought along the newest model, XO-4 Touch, to show.
The project began in 2005 with the goal of creating a laptop for children
that could be sold for $100. It missed that goal with the initial XO, but did
ship 2.5 million of the units, including 83,000 as part of the "Give 1 Get
1" program that started in 2007. The idea was to have a low-powered laptop
that would last the whole school day, which the XO is capable of, partly
because it "sleeps between keystrokes" while leaving the display on, she said.
Recharging the laptops has been something of a challenge for the project,
particularly in developing countries where electricity may not be
available. Various methods have been tried, from a hand crank to a "yo-yo
charger" that was never distributed. Using the yo-yo got painful after
ten minutes, she said, but it took one and a half hours to fully charge the
device. Solar-powered charging is now the norm.
OLPCs were distributed in various countries, including 300,000 to Uruguay
(where every child in the country got one) and 4,500 to women's schools in
Afghanistan, as well as to Nicaragua, Rwanda, and others. In Madagascar, the youngest
students were teaching the older ones how to use the laptops, while in
India the attendance rate neared 100% for schools that had OLPCs, she said.
OLPCs generally run the Sugar
environment on top of Fedora. It is a "weird" interface that sometimes
doesn't work, she said, but it is designed for small children. That means
it has lots of pictures as part of the interface to reduce clutter and make
it more intuitive for that audience. There are lots of applications
that come with the OLPC, including the Etoys authoring environment, a Python
programming environment, the Scratch 2D animation tool, a physics
simulation program, a local copy of
Wikipedia in the native language, a word processor, and more. The Linux
command line is also available in a terminal application, though children
may not actually use it in practice, she said.
The first model was designed so that you could "throw it at a wall" and it
wouldn't break, she said. Various other versions were created over the
years, including the X0-1.5, a
dual-touchscreen XO-2 that was
never released, and the XO-4 Touch. The latter will be shipping later this
year. There is also the Android-based XO tablet
that will be selling at Walmart for $100 starting in June. It is "very
different" than the Sugar-based XOs, Kroah-Hartman said, but will come
pre-loaded with education and mathematical apps.
There are lots of ways to participate in the project, she said, many of
which are listed on the Participate wiki page.
She noted that only 30% of the XO software is translated to Japanese, so
that might be one place for attendees to start.
OpenRelief
In an update to last year's
presentation, Shane Coughlan talked about the progress (and setbacks)
for the OpenRelief project. That project had its genesis at the 2011
LinuxCon Japan—held shortly after the earthquake, tsunami, and nuclear
accident that hit Japan—as part of a developer panel discussion about
what could be done
to create open source technical measures to help out disaster relief efforts.
That discussion led to the creation
of the OpenRelief project, which seeks
to build a robotic airplane (aka drone) to help relief workers "see through
the fog" to get the right aid to the right place at the right time.
The test airframe he displayed at last year's event had some durability
flaws: "airframes suck", he said. In particular, the airframe would
regularly break in ways that would be difficult to fix in the field.
Endurance is one of the key features required for a disaster relief
aircraft, and the project had difficulty finding one that would both be
durable
and fit into
its low price point ($1000 for a fully equipped plane, which left
$100-200 for the airframe).
In testing the original plane, though, OpenRelief found that the navigation
and flight software/hardware
side was largely a solved problem, through projects like ArduPilot and CanberraUAV. Andrew Tridgell
(i.e. "Tridge" of Samba and other projects) is part of the CanberraUAV
team, which won the 2012 Outback Rescue Challenge;
"they completely rock", Coughlan said. The "unmanned aerial vehicle" (UAV)
that was used by CanberraUAV was "a bit big and expensive" for the
needs of OpenRelief, but because it didn't have to focus on the flight
software side of things, the project could turn to other parts of the problem.
One of those was the airframe, but that problem may now be solved. The
project was approached by an "aviation specialist" who had created a
regular airframe as part of a project to build a vertical takeoff and
landing (VTOL) drone to be sold to the military. It is a simple design
with rails to attach the wings and wheels as well as to hang payloads
(e.g. cameras, radiation detectors, ...). There are dual servos for the
control surfaces which provides redundancy. It is about the same size as
the previous airframe, but can go 40km using an electric engine rather than
20km as the older version did. It can also carry 9kg of payload vs. the
0.5kg available previously. With an optional gasoline-powered engine, the
range will increase to 200-300km.
OpenRelief released
the design files for this new airframe on the day of Coughlan's talk. It
is something that "anyone can build", he said. Test flights are coming
soon, but he feels confident that the airframe piece, at least, is now
under control. There is still plenty of work to do in integrating all of
the different components into a working system, including adding some
"mission control"
software that can interface with existing disaster relief systems.
Coughlan also briefly mentioned another project he has been working on,
called Data Twist. The
OpenStreetMap (OSM) project is
popular in Japan—where Coughlan lives—because the "maps are great", but the
data in those maps isn't always easy to get at. Data Twist is a Ruby
program that processes the OSM XML data to extract information to build
geo-directories.
A geo-directory might contain "all of the convenience stores in
China"—there were 43,000 of them as of the time of his talk—for example.
Data Twist uses the
categories tagged in the OSM data and can extract the locations into a Wordpress
Geo Mashup blog
post, which will place the locations on maps in the posts.
Data Twist is, as yet, just an experiment in making open data (like OSM
data) more useful in other contexts. It might someday be used as part of
OpenRelief, but there are other applications too. The idea was to show
someone who didn't care about open source or disaster relief efforts some
of the benefits of open data. It is in the early stages of development and
he encourages others to take a look.
Wrap-up
All three conferences were held at the Hotel Chinzanso Tokyo and
its associated conference (and wedding) center. It was a little off the
beaten track—if that phrase can ever be applied to a city like Tokyo—in the
Mejiro section of the city. But the enormous garden (complete with
fireflies at night) was beautiful; it tended to isolate the conferences
from the usual Tokyo "hustle and bustle". As always, the events were
well-run and featured a wide array of interesting content.
[I would like to thank the Linux Foundation for travel assistance to Tokyo
for LinuxCon Japan.]
Comments (none posted)
By Nathan Willis
June 12, 2013
Google Reader, arguably the most widely-used feed aggregation tool,
is being unceremoniously dumped and shut down at the end of June. As
such, those who spend a significant chunk of their time consuming RSS
or Atom content have been searching for a suitable replacement. There
are a variety of options available, from third-party commercial
services to self-hosted web apps to desktop applications. Trade-offs
are involved simply in choosing which application type to adopt; for
example, a web service provides access from anywhere, but it also
relies on the availability of a remote server (whether someone else
administrates it or not). But there is at least one other option
worth exploring: browser extensions.
As Luis Villa pointed out in April,
browsers do at best a mediocre job of making feed content
discoverable, and they do nothing to support feed reading directly.
But there are related features in Firefox, such as "live
bookmarks," which blur together the notion of news feeds and
periodically polling a page for changes. Several Firefox add-ons
attempt to build a decent feed-reading interface where none currently
exists—not all of them exploit the live bookmark functionality
for this, although many do. Since recent Firefox releases are capable
of synchronizing both bookmarks and add-ons, it lets the user access
the same experience across multiple desktop and laptop machines (although an
extension-based feed reader does not offer universal availability, as
a purely web-based solution does).
Bookmarks, subscriptions; potato, potahto
The most lightweight option available for recent Firefox builds is
probably MicroRSS,
which offers nothing more than a list of subscribed feeds down the
left hand margin, and text of the entries from the selected feed on
the right. For some users that may be plenty, of course, but as a
practical replacement for Google Reader it falls short, since there is
no way to import an existing list of subscriptions (typically as an Outline Processor Markup
Language (OPML) file). It also does not count unread items, much
less offer searching, sorting, starring, or other news-management
features. On the plus side, it is actively maintained, but the
license is not specified.
Feed
Sidebar is another lightweight option. It essentially just
displays the existing "live bookmarks" content in a persistent Firefox
sidebar. This mechanism requires the user to subscribe to feeds as
live bookmarks, but it has the benefit of being relatively simple.
The top half of the sidebar displays the list of subscriptions, with
each subscription as a separate folder; its individual items are
listed as entries within the folder, which the user must click to open
in the browser. Notably, the Firefox "sidebar" is browser chrome and
not a page element, which makes the feed sidebar visible in every tab,
as opposed to confining the feed reading experience to a single spot.
Feed Sidebar is licensed under GPLv2, which is a tad atypical for
Firefox extensions, where the Mozilla Public License (MPL) dominates.
When it comes to the simple implementations, it is also worth
mentioning that Thunderbird can subscribe to RSS and Atom feeds
natively. This functionality is akin to the email client's support
for NNTP news; like newsgroups, a subscription is presented to the
user much as a POP or IMAP folder is. Feeds with new content appear in
the sidebar, new messages are listed in the top pane, and clicking on
any them opens the content in the bottom message pane. Subscribing to
news feeds does require setting up a separate "Blogs and News Feeds"
account in Thunderbird, though, and users can only read one feed at a
time—one cannot aggregate multiple feeds into a folder, for example.
Moving up a bit on the functionality ladder, Sage is an MPL-1.1-licensed extension
that stores your subscribed feeds in a (user-selectable) bookmark
folder. For reading, it provides a Firefox sidebar with two panes; the upper
one presents a list of the subscriptions, and the lower one presents a
list of available articles from the selected subscription. The main
browser tab shows a summary of each entry in the selected feed,
although opening any entry opens up the corresponding page on the
original site, rather than rendering it inside the feed-reader UI. As
rendering the original page in the browser might suggest, Sage does
not store any content locally, so it does not offer search
functionality.
The project is actively developed on GitHub, although it is
also worth noting that one of the project's means of fundraising is to
insert "affiliate" links into feed content that points toward certain
online merchants.
The high end
Digest
is a fork of a no-longer-developed extension called Brief. It attempts to provide a
more full-featured feed-reading experience than some of the other
readers; it keeps a count of unread items for each feed, allows the
user to "star" individual items or mark them as unread, and quite a
few other features one would expect to find in a web service like
Google Reader.
As is the case with several other extensions, Digest stores feed
subscriptions in a (selectable) bookmarks folder. However, it also
downloads entries locally—allowing the user to choose how long
old downloads are preserved (thankfully), which enables it to offer
content search. It also renders its entire interface within the
browser tab in HTML, unlike some of the competition. Digest is licensed as MPL
2.0, and is actively under development by its new maintainer at GitHub. It can import
(and export) OPML subscription files.
Like Digest, Newsfox
replicates much of the Google Reader experience inside Firefox. The
look is a bit different, since Newsfox incorporates a three-pane
interface akin to an email client. This UI is implemented in browser
chrome, but unlike the earlier live-bookmark–based options, it still
manages to reside entirely within one tab. That said, Newsfox expects
to find subscriptions in the default Live Bookmarks folder, and there
does not appear to be a way to persuade it to look elsewhere. Perhaps
more frustrating, it either does not understand subfolders within the
Live Bookmarks folder, or it chooses to ignore them, so dozens or
hundreds of feeds are presented to the user in a single, scrolling
list.
On the plus side, Newsfox offers multi-tier sorting; one can tell
it to first sort feeds alphabetically (increasing or decreasing), then
sort by date (again, increasing or decreasing), and so on, up to four
levels deep. It can also encrypt the locally-download feed content, which
might appeal to laptop users, and is an option none of the other extensions seems to feature.
Downloaded entries can be searched, which is a plus, and on the whole
the interface is fast and responsive, more so than Digest's HTML UI.
The last major option on the full-fledged feed-aggregator front is
Bamboo,
an MPL-1.1-licensed extension that appears to be intentionally
aiming for Google Reader replacement status—right down to the
UI, which mimics the dark gray "Google toolbar" currently plastered
across all of the search giant's web services. The interface is
rendered in HTML, and uses the decidedly Google Reader–like sidebar
layout, rendering feed content within the right-hand pane. Bamboo supports all of
the basic features common to the high-end aggregators already
discussed: OPML import/export, folders, search, sorting, marking items
as read/unread, and locally storing feed content. It also adds more,
such as the ability to star "favorite" items, the ability to save
items for offline reading, a toggle-able headline-or-full-item display
setting, and a built-in ad blocker.
Interestingly enough, despite its comparatively rich feature set,
Bamboo uses a bookmark folder to keep track of feed subscriptions, but
it does not allow the user to select the folder where subscriptions
are saved. Instead, like Newsfox, it only examines the default Live
Bookmarks folder.
And the rest
If one goes searching for "RSS" on the Firefox Add-ons
site, there are plenty more options that turn up, many of which
reflect entirely different approaches to feed aggregation. For
example, SRR
offers a "ticker"-style scroll of headlines from subscribed feeds, which is useful for a handful of feeds at
best. Dozens or hundreds, however, will overpower even the toughest
attention span. Or there is Newssitter,
which provides a "bookshelf"-style interface that seems visually
designed for reading primarily on a mobile device. That may meet the
needs of many news junkies, of course, but it bears little resemblance
to the Google Reader experience; getting a quick overview of dozens of
feeds is not possible, for example.
Selecting a Google Reader replacement is not a simple task;
everyone uses the service in slightly different ways, and all of the
options offer different (and overlapping) subsets of the original
product's feature set.
The bare-bones feed reading extensions all have big limitations
that probably make them less useful as a drop-in replacement; for
instance they may not check for new content in the background, and
they certainly do not provide much search functionality. For a user
with a lot of subscriptions, supplementary features like searching and
saving items can take the application from mediocre to essential.
After all, it is frequently hard to backtrack to a barely-remembered
news story weeks or months after reading the original feed.
To that end, the more fleshed-out Google Reader alternatives offer
a much more useful experience in the long run. Only time will tell
how solid they are over the long haul, of course—it is not
beyond reason to think that some of them will start to slow down or
wobble with months of saved content to manage. On the other hand,
none of them can offer one key feature of Google Reader: the months'
(or in many cases years') worth of already read news items. Most
individual feeds do not publish their site's entire history, but
Google Reader could search years' worth of already read material.
That is just one of the things people lose when a web service shuts down.
Based on my early experiments, Bamboo offers the most features,
while Newsfox is faster, but Digest is more flexible. It is tempting
to fall back on that familiar old saying: you pays your nickel and you
takes your chances (though sans nickel in free software circles). But
because all three options can follow and display the same set of
feeds, it may be worth installing more than one and giving them a
simultaneous test drive for a week or so. At the very least, Firefox
can synchronize the bookmarks and add-ons, providing you with some way
to get at your subscriptions when away from home—at least if there is a
Firefox installation nearby.
Comments (18 posted)
Page editor: Nathan Willis
Security
At the 2013 Tizen Developer Conference in San Francisco, there was
a range of different security talks examining difference facets of
hardening the mobile platform. Last week, we examined the Smack framework that implements
access control for system resources. There were also sessions that
explored the problem of protecting the device at higher levels of the
system stack. Sreenu Pilluta spoke about guarding against malware
delivered via the Internet, and Roger Wang offered an unusual
proposal for obfuscating JavaScript applications themselves: by
compiling them.
Content, secure
Pilluta is an engineer at anti-virus software vendor McAfee. As he
explained, Tizen device vendors are expected to manage their own "app
stores" through which users install safe applications, but that leaves
a lot of avenues for malicious content unblocked. Email, web
pages, and media delivery services can all download content from
untrusted sources that might contain a dangerous payload. Pilluta
described Tizen's Content Security Framework (CSF), a mechanism
designed to let device vendors add pluggable virus- and
malware-scanning software to their Tizen-based products.
The CSF itself provides a set of APIs that other components can use
to scan two distinct classes of content: downloaded data objects and remote
URLs. The actual engines that perform the scanning are plugins to
CSF, and are expected to be added to Tizen by device vendors.
Security engines come in two varieties: Scan Engines (for data) and
Site Engines (for URLs). Scan Engines inspect content and are
designed to retrieve malware-matching patterns from the vendor, as is
typical of PC virus-scanning programs today. Site Engines use a
reputation system, in which the vendor categorizes URLs and creates
block list policies by category (e.g., gambling, pornography, spyware,
etc.).
Applications dictate when the scanning is performed, Pilluta said,
which is intentionally a decision left up to the vendor. Some might
choose to scan a page before loading it at all, while others might
load the page but scan it before executing any JavaScript. It is also
up to the application what to do when infected content is found; the
rationale being that the application can provide a more context-aware
response to the user, and do so within the expected bounds of the user
interface, rather than popping up an imposing and unfamiliar warning
notification from a component the user was unaware even existed.
The CSF scanning APIs are high-level and event-driven, which
Pilluta said allowed applications to call them cooperatively. For
example, an email client could call the Site Engine to scan an URL
inside of an email message and the Scan Engine on a file attachment.
Similarly, the email client could call the Site Engine on a URL
clicked upon to be opened in the browser. Old-fashioned scanning
methods that use "deep hooks" into the filesystem would make this sort
of cooperation difficult, he said.
The APIs are also designed to provide flexibility to application
authors. For example, the Site Engine API is not tied to the Tizen
Web Runtime or even to the system's HTTP stack. Thus, an application
that uses its own built-in HTTP proxy can still take advantage of the
CSF to scan URLs without re-implementing the scanner.
Ultimately, CSF is a framework that device makers will take
advantage of, each in its own way. Presumably commercial vendors will
offer virus scanning engines to interested OEMs, but consumers will
likely not see any of them until a Tizen product hits the market. The
flexible framework also seems designed to support HTTP-driven services
like downloadable media and game content, which are frequently
the most-cited examples of why companies want to see Tizen in devices
like smart TVs and car dashboards.
CSF is an open source contribution to the Tizen platform, although
one would reasonably expect McAfee to also develop scanning engines to
offer to device vendors and mobile providers. As the CSF begins to
take shape in products coming to market, it will be interesting to see
if there are also any open source scanning engines, either in the
Tizen reference code or produced by third parties. One would hope so,
since malware detection is a concern for everyone, not just commercial
device makers.
JavaScript app protection
In contrast to Pilluta's talk, Wang was not presenting a component
of the Tizen architecture; rather, he was showing the progress he has
made on a personal effort that he hopes will appeal to independent
application developers. The issue he tackled was protecting
JavaScript applications against reverse-engineering. While that
is not an issue for developers of open source apps, building tools to
simplify the process on an open platform like Tizen could have
implications further down the road. Wang is a developer for Intel
working on the Tizen platform, although this particular project is a
personal side-effort.
In the past, he said, JavaScript was primarily used for incidental
page features and other such low-value scripts, but today JavaScript
applications implement major functionality, and HTML5-driven platforms
like Tizen should offer developers a way to protect their code against
theft and reverse-engineering. There are a number of techniques
already in use that side-step the issue, such as separating the core
functionality out into a server-side component, or building the
business model around the value of user-contributed data. But these
approaches do not work for "pure" client-side JavaScript apps.
Most app developers rely on an obfuscation system to "minify"
JavaScript that they want to obscure from prying eyes. Obfuscation
removes easily-understood function and variable names, and changes the
formatting to make the code delivered difficult to understand. The most
popular obfuscator, he noted, was Yahoo's YUI Compressor (which has other
beneficial features like removing dead code), followed by the Google
Closure Compiler, and UglifyJS. But obfuscators still produce
JavaScript which is delivered to the client browser or web runtime and
can ultimately be reverse-engineered.
The other major approach found in practice today is encryption, in
which the app is downloaded by the device and placed in encrypted
storage by the installer. Typically either the initial download is
conducted over a secure channel (e.g., HTTPS) or the download is done
in the clear and the installation program encrypts the app when it is
installed. Both have weaknesses, Wang said. Someone can dump the
HTTP connection if it is unencrypted and intercept the app, but a
skilled attacker could also run a man-in-the-middle attack against
HTTPS. Ultimately, he concluded, there is always dumping from memory,
so encryption is an approach that will always get broken one way
or another.
Although there are a few esoteric approaches out there—such
as writing one's app in another language and then compiling it
to JavaScript (a practice Wang said was out-of-scope for the
talk since he was addressing the concerns of JavaScript coders), most
people simply "lawyer up" and apply licensing terms that forbid
examining the app. That may not work in every jurisdiction, he said,
and even when it does, it is expensive.
Wang's experiment takes a different approach entirely: compiling
the JavaScript app to machine code, just like one does with a native
app. The technique works by exploiting the difference between a
platform's web runtime (which does not allow the user to inspect or
save HTML content) and the web browser. A developer can work in
JavaScript, then deploy the app as a binary. The platform would have
to support this approach, both in the installer and in the web
runtime, however, and developers would need to rebuild their apps for
each HTML5 platform.
Wang has implemented the technique as an experimental feature of node-webkit, his
app runtime derived from Chromium and Node.js. It compiles a
JavaScript app using the V8 JavaScript engine's "snapshot" feature.
Snapshots dump the engine's heap, and thus contain all of the created
objects and Just-In-Time (JIT) compiled functions. In Chromium, snapshots
are used to cache contexts for performance reasons; the node-webkit
compiler simply saves them. The resulting binaries can then be
executed by WebKit's JSC utility.
There are, naturally, limitations. V8 snapshots are created very
early in the execution process, so some DOM objects (such as
window) have not yet been created when the snapshot is
taken. On the wiki
entry for the feature, Wang suggests a few ways to work around
this issue. The other limitation, however, is that the snapshot tool
will throw an error if the JavaScript app is too large. Wang suggests
splitting the app up if this limitation poses a practical problem.
Another limitation is that the resulting binary also runs
significantly slower than JavaScript executed in the runtime.
He has been exploring other techniques for extending the idea, such
as using the Crankshaft
optimizer. Crankshaft is an alternative to the JavaScript compiler
currently used in V8. At the moment, using Crankshaft's compiler can
generate code that runs faster, Wang said, but it takes significantly
longer to compile, and it requires "training Crankshaft on your code."
Wang has defined an additional field for the package.json
file that defines Tizen HTML5 applications; "snapshot" :
"snapshot.bin" can be used to point to compiled JavaScript apps
and test them with node-webkit. He is still in the process of working
out the API required to connect JSC to the Tizen web runtime,
however. The feature is not currently slated to become part of the
official Tizen platform.
Obfuscating JavaScript by any means is a controversial subject. To
many in the free software world, it is seen as a technique to prevent
users from studying and modifying the software on their systems.
Bradley Kuhn of the Software Freedom Conservancy lambasted it at SCALE 2013, for instance.
Then again, obfuscation is not required to make a JavaScript app
non-free; as the Free Software Foundation notes,
licensing can do that alone. Still, it is likely that compiling
JavaScript apps to machine code offers a tantalizing measure of
protection to quite a few proprietary software vendors, beyond the
attack-resistance of traditional obfuscation techniques.
Many users, of course, are purely pragmatic about mobile apps: they
use what is available, free software or otherwise. But as the FSF
points out, unobfuscated JavaScript,
while it may be non-free, can still be read and modified. Perhaps the
longer-term concern about obfuscation or compiling to machine code is
that a device vendor could automate the technique on its mobile app
store. But automated or manual, the prospect of building JavaScript compilation into Tizen
did appear to ruffle several feathers at Tizen Dev Con; audience
members asked about the project during the Q&A sections of several
later talks. Nevertheless, for the foreseeable future, Wang's effort
remains a side project of an experimental nature.
[The author wishes to thank the Linux Foundation for travel assistance to Tizen Dev Con.]
Comments (3 posted)
Brief items
2. If I add a phone to my account, will those calls also be monitored?
Once again, the answer is good news. If you want to add a child or any other family member to your Verizon account, their phone calls—whom they called, when, and the duration of the call—will all be monitored by the United States government, at no additional cost.
— "
US
President Barack Obama" in a FAQ for Verizon customers
Knowing how the government spies on us is important. Not only because so
much of it is illegal -- or, to be as charitable as possible, based on
novel interpretations of the law -- but because we have a right to
know. Democracy requires an informed citizenry in order to function
properly, and
transparency
and accountability are essential parts of that. That means knowing what
our government is doing to us, in our name. That means knowing that the
government is operating within the constraints of the law. Otherwise, we're
living in a police state.
We need whistle-blowers.
—
Bruce
Schneier
Only one explanation seems logical. The government is afraid of us -- you and me. They're terrified (no pun intended) that if we even knew the most approximate ranges of how many requests they're making, we would suspect significant abuse of their investigatory powers.
In the absence of even this basic information, conspiracy theories have flourished, which incorrectly assume that the level of data being demanded from Web services is utterly unfettered and even higher than reality -- and the government's intransigence has diverted people's anger inappropriately to those Web services. A tidy state of affairs for the spooks and their political protectors.
—
Lauren Weinstein
Even assuming the U.S. government never abuses this data -- and there is no reason to assume that! -- why isn't the burgeoning trove more dangerous to keep than it is to foreswear? Can anyone persuasively argue that it's virtually impossible for a foreign power to
ever gain access to it? Can anyone persuasively argue that if they did gain access to years of private phone records, email, private files, and other data on millions of Americans, it wouldn't be hugely damaging?
Think of all the things the ruling class never thought we'd find out about the War on Terrorism that we now know. Why isn't the creation of this data trove just the latest shortsighted action by national security officials who constantly overestimate how much of what they do can be kept secret? Suggested rule of thumb: Don't create a dataset of choice that you can't bear to have breached.
—
Conor Friedersdorf
Comments (15 posted)
New vulnerabilities
bzr: denial of service
| Package(s): | bzr |
CVE #(s): | CVE-2013-2099
CVE-2013-2098
|
| Created: | June 7, 2013 |
Updated: | September 10, 2013 |
| Description: |
From the Red Hat bug report:
A denial of service flaw was found in the way SSL module implementation of Python3, version 3 of the Python programming language (aka Python 3000), performed matching of the certificate's name in the case it contained many '*' wildcard characters. A remote attacker, able to obtain valid certificate with its name containing a lot of '*' wildcard characters could use this flaw to cause denial of service (excessive CPU consumption) by issuing request to validate such a certificate for / to an application using the Python's ssl.match_hostname() functionality. |
| Alerts: |
|
Comments (2 posted)
cgit: directory traversal
| Package(s): | cgit |
CVE #(s): | CVE-2013-2117
|
| Created: | June 6, 2013 |
Updated: | July 17, 2013 |
| Description: |
From the Red Hat Bugzilla entry:
Today I found a nasty directory traversal:
http://somehost/?url=/somerepo/about/../../../../etc/passwd
[...] Cgit by default is not vulnerable to this, and the vulnerability only
exists when a user has configured cgit to use a readme file from a
filesystem filepath instead of from the git repo itself. Until a
release is made, administrators are urged to disable reading the
readme file from a filepath, if currently enabled. |
| Alerts: |
|
Comments (none posted)
chromium-browser: multiple vulnerabilities
| Package(s): | chromium-browser |
CVE #(s): | CVE-2013-2855
CVE-2013-2856
CVE-2013-2857
CVE-2013-2858
CVE-2013-2859
CVE-2013-2860
CVE-2013-2861
CVE-2013-2862
CVE-2013-2863
CVE-2013-2865
|
| Created: | June 11, 2013 |
Updated: | June 12, 2013 |
| Description: |
From the Debian advisory:
CVE-2013-2855:
The Developer Tools API in Chromium before 27.0.1453.110 allows
remote attackers to cause a denial of service (memory corruption) or
possibly have unspecified other impact via unknown vectors.
CVE-2013-2856:
Use-after-free vulnerability in Chromium before 27.0.1453.110
allows remote attackers to cause a denial of service or possibly
have unspecified other impact via vectors related to the handling of
input.
CVE-2013-2857:
Use-after-free vulnerability in Chromium before 27.0.1453.110
allows remote attackers to cause a denial of service or possibly
have unspecified other impact via vectors related to the handling of
images.
CVE-2013-2858:
Use-after-free vulnerability in the HTML5 Audio implementation in
Chromium before 27.0.1453.110 allows remote attackers to cause
a denial of service or possibly have unspecified other impact via
unknown vectors.
CVE-2013-2859:
Chromium before 27.0.1453.110 allows remote attackers to bypass
the Same Origin Policy and trigger namespace pollution via
unspecified vectors.
CVE-2013-2860:
Use-after-free vulnerability in Chromium before 27.0.1453.110
allows remote attackers to cause a denial of service or possibly
have unspecified other impact via vectors involving access to a
database API by a worker process.
CVE-2013-2861:
Use-after-free vulnerability in the SVG implementation in Chromium
before 27.0.1453.110 allows remote attackers to cause a
denial of service or possibly have unspecified other impact via
unknown vectors.
CVE-2013-2862:
Skia, as used in Chromium before 27.0.1453.110, does not
properly handle GPU acceleration, which allows remote attackers to
cause a denial of service (memory corruption) or possibly have
unspecified other impact via unknown vectors.
CVE-2013-2863:
Chromium before 27.0.1453.110 does not properly handle SSL
sockets, which allows remote attackers to execute arbitrary code or
cause a denial of service (memory corruption) via unspecified
vectors.
CVE-2013-2865:
Multiple unspecified vulnerabilities in Chromium before
27.0.1453.110 allow attackers to cause a denial of service or
possibly have other impact via unknown vectors. |
| Alerts: |
|
Comments (none posted)
kde: weak passwords generated by PasteMacroExpander
| Package(s): | kde |
CVE #(s): | CVE-2013-2120
|
| Created: | June 12, 2013 |
Updated: | June 17, 2013 |
| Description: |
From the Red Hat bugzilla:
A security flaw was found in the way PasteMacroExpander of paste applet of kdeplasma-addons, a suite of additional plasmoids for KDE desktop environment, performed password generation / derivation for user provided string. An attacker could use this flaw to obtain plaintext form of such a password (possibly leading to their subsequent ability for unauthorized access to a service / resource, intended to be protected by such a password). |
| Alerts: |
|
Comments (6 posted)
kernel: multiple vulnerabilities
| Package(s): | kernel |
CVE #(s): | CVE-2013-1935
CVE-2013-1943
CVE-2013-2017
|
| Created: | June 11, 2013 |
Updated: | June 13, 2013 |
| Description: |
From the Red Hat advisory:
* A flaw was found in the way KVM (Kernel-based Virtual Machine)
initialized a guest's registered pv_eoi (paravirtualized end-of-interrupt)
indication flag when entering the guest. An unprivileged guest user could
potentially use this flaw to crash the host. (CVE-2013-1935, Important)
* A missing sanity check was found in the kvm_set_memory_region() function
in KVM, allowing a user-space process to register memory regions pointing
to the kernel address space. A local, unprivileged user could use this flaw
to escalate their privileges. (CVE-2013-1943, Important)
* A double free flaw was found in the Linux kernel's Virtual Ethernet
Tunnel driver (veth). A remote attacker could possibly use this flaw to
crash a target system. (CVE-2013-2017, Moderate)
Red Hat would like to thank IBM for reporting the CVE-2013-1935 issue and
Atzm WATANABE of Stratosphere Inc. for reporting the CVE-2013-2017 issue.
The CVE-2013-1943 issue was discovered by Michael S. Tsirkin of Red Hat. |
| Alerts: |
|
Comments (none posted)
libraw: code execution
| Package(s): | libraw |
CVE #(s): | CVE-2013-2126
|
| Created: | June 7, 2013 |
Updated: | July 31, 2013 |
| Description: |
From the Secunia advisory:
Two vulnerabilities have been reported in LibRaw, which can be exploited by malicious people to potentially compromise an application using the library.
1) A double-free error exits when handling damaged full-color within Foveon and sRAW files.
2) An error during exposure correction can be exploited to cause a buffer overflow.
Successful exploitation may allow execution of arbitrary code. |
| Alerts: |
|
Comments (none posted)
mediawiki: insecure file uploading
| Package(s): | mediawiki |
CVE #(s): | CVE-2013-2114
|
| Created: | June 7, 2013 |
Updated: | July 22, 2013 |
| Description: |
From the Red Hat bug report:
MediaWiki user Marco discovered that security checks for file uploads were not being run when the file was uploaded in chunks through the API. This option has been available to users who can upload files since MediaWiki 1.19. |
| Alerts: |
|
Comments (none posted)
mod_security: denial of service
| Package(s): | mod_security |
CVE #(s): | CVE-2013-2765
|
| Created: | June 6, 2013 |
Updated: | July 2, 2013 |
| Description: |
From the Red Hat Bugzilla entry:
Fixed Remote Null Pointer DeReference (CVE-2013-2765). When forceRequestBodyVariable action is triggered and a unknown Content-Type is used,
mod_security will crash trying to manipulate msr->msc_reqbody_chunks->elts however msr->msc_reqbody_chunks is NULL. (Thanks Younes JAAIDI) |
| Alerts: |
|
Comments (none posted)
PackageKit: only allow patches for regular updates
| Package(s): | PackageKit |
CVE #(s): | CVE-2013-1764
|
| Created: | June 10, 2013 |
Updated: | June 12, 2013 |
| Description: |
From the openSUSE advisory:
The PackageKit zypp backend was fixed to only allow patches
to be updated. Otherwise a regular user could install new
packages or even downgrade older packages to ones with
security problems. |
| Alerts: |
|
Comments (none posted)
php: code execution
| Package(s): | php |
CVE #(s): | CVE-2013-2110
|
| Created: | June 11, 2013 |
Updated: | June 24, 2013 |
| Description: |
From the Slackware advisory:
A heap-based overflow in the quoted_printable_encode() function could be used by a remote attacker to crash PHP or execute code as the 'apache' user. |
| Alerts: |
|
Comments (none posted)
pki-tps: two vulnerabilities
| Package(s): | pki-tps |
CVE #(s): | CVE-2013-1885
CVE-2013-1886
|
| Created: | June 6, 2013 |
Updated: | June 12, 2013 |
| Description: |
From the Red Hat bugzilla entries [1, 2]:
CVE-2013-1885: It was reported that Certificate System suffers from XSS flaws in the /tus/ and /tus/tus/ URLs, such as:
GET /tus/tus/%22%2b%61%6c%65%72%74%28%34%38%32%36%37%29%2b%22
or
GET /tus/%22%2b%61%6c%65%72%74%28%36%31%34%35%32%29%2b%22
which will in turn output something like:
<!--
var uriBase = "/tus/"+alert(85384)+";
var userid = "admin";
This was reported against Certificate System 8.1 and may also affect Dogtag 9 and 10.
CVE-2013-1886: It was reported that Certificate System suffers from a format string injection flaw when viewing certificates. This could allow a remote attacker to crash the Certificate System server or, possibly, execute arbitrary code with the privileges of the user [running] the service (typically run as an unprivileged user, such as pkiuser). |
| Alerts: |
|
Comments (none posted)
pymongo: denial of service
| Package(s): | pymongo |
CVE #(s): | CVE-2013-2132
|
| Created: | June 11, 2013 |
Updated: | July 8, 2013 |
| Description: |
From the Debian advisory:
Jibbers McGee discovered that pymongo, a high-performance schema-free
document-oriented data store, is prone to a denial-of-service
vulnerability.
An attacker can remotely trigger a NULL pointer dereference causing MongoDB
to crash. |
| Alerts: |
|
Comments (none posted)
rubygem-passenger: insecure temp files
| Package(s): | rubygem-passenger |
CVE #(s): | CVE-2013-2119
|
| Created: | June 11, 2013 |
Updated: | July 10, 2013 |
| Description: |
From the Red Hat bugzilla:
Michael Scherer reported that the passenger ruby gem, when used in standalone mode, does not use temporary files in a secure manner. In the lib/phusion_passenger/standalone/main.rb's create_nginx_controller function, passenger creates an nginx configuration file insecurely and starts nginx with that configuration file:
@temp_dir = "/tmp/passenger-standalone.#{$$}"
@config_filename = "#{@temp_dir}/config"
If a local attacker were able to create a temporary directory that passenger uses and supply a custom nginx configuration file they could start an nginx instance with their own configuration file. This could result in a denial of service condition for a legitimate service or, if passenger were executed as root (in order to have nginx listen on port 80, for instance), this could lead to a local root compromise. |
| Alerts: |
|
Comments (none posted)
samba: multiple vulnerabilities
| Package(s): | samba |
CVE #(s): | |
| Created: | June 10, 2013 |
Updated: | June 12, 2013 |
| Description: |
From the openSUSE advisory:
- - Add support for PFC_FLAG_OBJECT_UUID when parsing
packets; (bso#9382).
- - Fix "guest ok", "force user" and "force group" for guest
users; (bso#9746).
- - Fix 'map untrusted to domain' with NTLMv2;(bso#9817).
- - Fix crash bug in Winbind; (bso#9854).
- - Fix panic in nt_printer_publish_ads; (bso#9830).
|
| Alerts: |
|
Comments (none posted)
subversion: denial of service
| Package(s): | subversion |
CVE #(s): | CVE-2013-1968
CVE-2013-2112
|
| Created: | June 10, 2013 |
Updated: | June 28, 2013 |
| Description: |
From the Debian advisory:
CVE-2013-1968:
Subversion repositories with the FSFS repository data store format
can be corrupted by newline characters in filenames. A remote
attacker with a malicious client could use this flaw to disrupt the
service for other users using that repository.
CVE-2013-2112:
Subversion's svnserve server process may exit when an incoming TCP
connection is closed early in the connection process. A remote
attacker can cause svnserve to exit and thus deny service to users
of the server. |
| Alerts: |
|
Comments (none posted)
wireshark: denial of service
| Package(s): | wireshark |
CVE #(s): | CVE-2013-3561
|
| Created: | June 7, 2013 |
Updated: | June 12, 2013 |
| Description: |
From the CVE database entry:
Multiple integer overflows in Wireshark 1.8.x before 1.8.7 allow remote attackers to cause a denial of service (loop or application crash) via a malformed packet, related to a crash of the Websocket dissector, an infinite loop in the MySQL dissector, and a large loop in the ETCH dissector. |
| Alerts: |
|
Comments (none posted)
wireshark: multiple vulnerabilities
| Package(s): | wireshark |
CVE #(s): | CVE-2013-4074
CVE-2013-4081
CVE-2013-4083
|
| Created: | June 12, 2013 |
Updated: | September 30, 2013 |
| Description: |
From the CVE entries:
The dissect_capwap_data function in epan/dissectors/packet-capwap.c in the CAPWAP dissector in Wireshark 1.6.x before 1.6.16 and 1.8.x before 1.8.8 incorrectly uses a -1 data value to represent an error condition, which allows remote attackers to cause a denial of service (application crash) via a crafted packet. (CVE-2013-4074)
The http_payload_subdissector function in epan/dissectors/packet-http.c in the HTTP dissector in Wireshark 1.6.x before 1.6.16 and 1.8.x before 1.8.8 does not properly determine when to use a recursive approach, which allows remote attackers to cause a denial of service (stack consumption) via a crafted packet. (CVE-2013-4081)
The dissect_pft function in epan/dissectors/packet-dcp-etsi.c in the DCP ETSI dissector in Wireshark 1.6.x before 1.6.16, 1.8.x before 1.8.8, and 1.10.0 does not validate a certain fragment length value, which allows remote attackers to cause a denial of service (application crash) via a crafted packet. (CVE-2013-4083) |
| Alerts: |
|
Comments (none posted)
Page editor: Jake Edge
Kernel development
Brief items
The current development kernel is 3.10-rc5, released on June 8. In the announcement Linus
Torvalds made it clear he was not completely pleased with the patches he
was getting this
late in the cycle: "Guys, guys, guys. I'm going to have to start
cursing again unless you stop sending me non-critical stuff. So the next
pull request I get that has "cleanups" or just pointless churn, I'm going
to call you guys out on, and try to come up with new ways to insult you,
your mother, and your deceased pet hamster."
Stable updates:
Five stable kernels were released on June 7: 3.9.5, 3.4.48, and
3.0.81 by Greg Kroah-Hartman;
from Canonical's extended stable trees Kamal Mostafa released 3.8.13.2 and Luis Henriques released 3.5.7.14.
The 3.9.6, 3.4.49, and 3.0.82 stable kernels are currently under
review. They can be expected June 13 or soon after.
Comments (2 posted)
If companies are going to go off and invent the square wheel, and
that makes *them* suffer the loss of being able to merge back into
the mainline kernel, thereby making *their* job of moving forward
with their kernel versions much harder, then yes, we *are* happy.
—
Russell King
Randomly not being able to connect AT ALL to wireless networks is
not a valid "rate control" model.
—
Linus Torvalds
First of all, we really need to stop thinking about choosing [CPU]
frequency (at least for x86). that concept basically died for x86
6 years ago.
—
Arjan van de Ven
Comments (5 posted)
Kernel development news
June 12, 2013
This article was contributed by Andrew Shewmaker
A skiplist is composed of a hierarchy of ordered linked lists, where each higher
level contains a sparser subset of the list below it. In
part
one, I described the basic idea of a skiplist, a little history of
various attempts to use it in the Linux kernel, and Chris Mason's new
cache-friendly skiplist for index ranges.
This article will continue with a description of the current state of Chris's skiplist API and his
future plans for it. I'll also discuss the performance of skiplists and rbtrees
in a simple RAM test, as well as Chris's more interesting IOMMU comparison.
Skiplist API
A skiplist can be declared and initialized to an empty state with lines like:
#include <linux/skiplist.h>
struct sl_list list;
sl_init_list(&list, GFP_ATOMIC);
Once the list exists, the next step is to populate it with data. As is
shown in the data structure diagram, each
structure to be placed in the list should embed an sl_slot
structure; pointers to this embedded structure are used with the skiplist
API.
Insertion into the skiplist requires the programmer to get a "preload
token" — skiplist_preload() ensures that the necessary
memory is available and
disables preemption. With the token in hand, it's possible to actually
insert the item, then re-enable preemption. Preloading helps
avoid the need
for atomic allocations and also to minimize time spent inside a leaf's lock
during insertion. The preload function takes a pointer to a skiplist and a "get
free page" mask describing the type of allocation to be performed, and it
returns an integer token to be used later:
int skiplist_preload(struct sl_list *list, gfp_t gfp_mask);
Note that preemption is disabled by skiplist_preload() and must
not be re-enabled during insertion because the function is holding an RCU read lock
and working with per-CPU data structures.
The function that actually adds an item to the list,
skiplist_insert(), is called with that list, a slot to be inserted, and
a token returned by skiplist_preload():
int skiplist_insert(struct sl_list *list, struct sl_slot *slot,
int preload_token);
Here's an example insertion into a skiplist:
int preload_token, ret;
preload_token = skiplist_preload(skiplist, GFP_KERNEL);
if (preload_token < 0)
return preload_token;
ret = skiplist_insert(skiplist, slot, preload_token);
preempt_enable();
Deletion only requires one
function call, though it is implemented in two phases if a leaf becomes empty.
In that case, the leaf is marked "dead," then it is unlinked from the skiplist
level by level. In either case, it returns the slot pointer of what it deleted from
the list.
struct sl_slot *skiplist_delete(struct sl_list *list, unsigned long key,
unsigned long size);
Adding or removing a key to/from an existing leaf is simple and only requires a
lock at the leaf. However, if a leaf is created or destroyed, then more locking
is required. Leaves with higher levels require locks to be taken on neighboring
nodes all the way down to level zero so that everything can be re-linked
without having a neighbor being deleted out from under. The list of affected
leaves is kept track of in a temporary sl_node list referred to as a
cursor.
(Chris is reworking his code to get
rid of cursors).
The best-case scenario is a modification at level zero where only
a couple of locks are required. Both the preallocation and the insertion code are
biased in favor of creating a level-zero leaf. Regardless, the locking is only
required for a small window of time.
Unlike an rbtree, rebalancing of the skiplist is not required, even when
simultaneous insertions and deletions are being performed in different
parts of the skiplist.
A specialized insertion function is provided that finds a free index range
in the skiplist that is aligned and of a given size. This isn't required by
filesystems, but Chris implemented it so that he could directly
compare rbtrees to skiplists in the IOMMU code. The IOMMU requires this
functionality because each PCIE device's domain requires an aligned
range of memory addresses.
Calls to
skiplist_insert_hole() take a hint of where a hole might be inserted,
and must be retried with a new hint if the return value is -EAGAIN.
That error return happens when simultaneous holes are being created and the
one you hinted at
was good, but was stolen before you could use it. On successful insertion,
the slot passed in is updated
with the location of the hole.
int skiplist_insert_hole(struct sl_list *list, unsigned long hint,
unsigned long limit, unsigned long size, unsigned long align,
struct sl_slot *slot, gfp_t gfp_mask);
Tearing down a whole skiplist requires a fair amount of work. First free
the structures embedding the slots of
each leaf, then use sl_free_leaf(), and finally, zero the pointers in
the head of the skiplist. Wrappers around container_of() for obtaining
the leaf embedding a node or the structure embedding a slot are provided by
sl_entry(ptr) and sl_slot_entry(ptr, type, member),
respectively. Comments in the code indicate future plans to add skiplist
zeroing
helpers, but for now you must roll your own as Chris did for his IOMMU patch.
Here's a generic example of destroying a skiplist:
struct sl_node *p;
struct sl_leaf *leaf;
struct sl_slot *slot;
struct mystruct *mystruct;
sl_lock_node(skiplist->head);
p = skiplist->head->ptrs[0].next;
while (p) {
leaf = sl_entry(p);
for (i = 0; i < leaf->nr; i++) {
slot = leaf->ptrs[i];
mystruct = sl_slot_entry(slot, struct mystruct, slot);
free_mystruct_mem(mystruct);
}
p = leaf->node.ptrs[0].next;
sl_free_leaf(leaf);
}
memset(skiplist->head->ptrs, 0, sl_node_size(SKIP_MAXLEVEL));
sl_unlock_node(skiplist->head);
Chris considered including slot iterators equivalent to rb_next() and
rb_prev(), but decided against it because of the overhead involved in
validating a slot with each call. Instead, skiplist_next() and
skiplist_prev() are leaf iterators that allow a caller to more
efficiently operate on slots in bulk. Chris hasn't posted the updated API
yet, but it seems likely that the iterators will resemble the existing
sl_next_leaf() and
friends.
Calls to
sl_first_leaf() and sl_last_leaf() return pointers to the
first and last entries of the skiplist. The sl_next_leaf() call is a
little different in that you must provide it with an sl_node (embedded
in your current leaf), and since each node potentially has many next entries, you
must also provide the level l you want to traverse.
struct sl_leaf *sl_first_leaf(struct sl_list *list);
struct sl_leaf *sl_last_leaf(struct sl_list *list);
struct sl_leaf *sl_next_leaf(struct sl_list *list, struct sl_node *p, int l);
Since this skiplist implementation focuses on index ranges (or extents) defined
by key and size parameters, it can provide search functions. This is in contrast
to rbtrees—they are more diverse, so users must roll their own search
functions. Each of the skiplist search functions needs to be passed a pointer to
the skiplist, the key you are looking for, and the slot size (the number of
extents in a leaf). If successful, they return a pointer to the slot matching
the key.
struct sl_slot *skiplist_lookup(struct sl_list *list, unsigned long key,
unsigned long size);
struct sl_slot *skiplist_lookup_rcu(struct sl_list *list, unsigned long key,
unsigned long size);
The first, skiplist_lookup(), is appropriate for when a skiplist is
experiencing high read/write contention. It handles all the locking for you.
It protects the skiplist with read-copy-update
(RCU) while it finds the correct leaf and then it protects the leaf with a
spinlock during a binary search to find the slot. If no slot corresponds to the
key, then a NULL pointer is returned.
If skiplist contention is low or you need more control, then use the second
variant. Before calling skiplist_lookup_rcu(), you must call
rcu_read_lock() and you must take care of details such as reference
counting yourself. The search for the leaf uses the same helper function
as skiplist_lookup(), but the leaf spinlock is not held. Instead, it
depends on the skiplist's RCU read lock being held to also protect the slots in
a leaf while it performs a sequential search. This search is sequential because
Chris does not do the copy part of RCU. He does order the operations of
insertion/deletion to try to make the sequential search safe, and that should
usually work. However, it might not return the slot of interest, so it is
the responsibility of the caller
to verify the key of the returned slot, and then call
skiplist_lookup_rcu() again if it the returned slot's key doesn't match
the key being searched for.
Chris elaborated on his future plans for the API in a private email:
In terms of changes coming to the patches, the biggest will be in the
insert code. Right now skiplist_insert does the search, cursor
maintenance, and the insert, but that won't work for XFS because they
need more control over the
EEXIST condition.
It'll get split out into search and insert steps the caller can control,
and you'll be able to call insert with just a locked leaf from any
level...
The searching API will also improve, returning both the leaf and the
slot. This allows skiplist versions of rb_next() and rb_prev().
The skiplist code also indicates that there is work to be done to make lockdep
understand Chris's skiplist locking. It needs to be taught that holding multiple
locks on the same level of a skiplist is allowed as long as they are taken left
to right.
Testing
In addition to the IOMMU comparison between rbtrees and skiplists that
Chris posted
numbers for, his patch also includes a simple RAM-only comparison in the
form of a
kernel module called skiplist_test.
I tested 100,000 items for 100,000 rounds with multiple numbers of threads.
This table shows the results:
| ADT | Threads | Fill Time (ms) |
Check Time (ms) | Delete Time (ms) |
Avg. Thread Time (s) |
| rbtree | 1 | 37 | 9 | 12 | 0.752 |
| skiplist-rcu | 1 | 18 | 15 | 23 | 2.615 |
| rbtree | 2 | 36 | 8 | 12 | 2.766 |
| skiplist-rcu | 2 | 19 | 19 | 27 | 2.713 |
| rbtree | 4 | 36 | 11 | 10 | 6.660 |
| skiplist-rcu | 4 | 23 | 24 | 21 | 3.161 |
These results show skiplists beating rbtrees in fill time, but losing on
check and delete times. The skiplist average thread time is only slightly
better with two threads, and beats rbtree soundly with four threads (they
take half the time). However, rbtree wins the single threaded case, which
surprises Chris because it doesn't match what he sees in user-space
testing. He told me, "
Most of the difference is the cost of calling
spin_lock (even single threaded)."
The more interesting numbers are from Chris's IOMMU comparison.
Even though he is mostly interested in using skiplists for Btrfs extents, he
chose to use the IOMMU because it is easier to isolate the performance of the
two data structures, which makes it both easier for non-Btrfs people to
understand and more meaningful to them. He also says, "... with the IOMMU,
it is trivial to consume 100% system time on the rbtree lock."
The rbtree lock is, in effect, a global lock held once at the start and once at
the end of an IO.
Chris kept the basic structure of the IOMMU code so that he could compare
skiplists to rbtrees. He was not trying to design a better IOMMU that
looked for free ranges of addresses differently or fix the IOMMU contention,
though he told me he would work with David Woodhouse on a proper solution that
tracks free extents later this year.
His benchmarks were run on a single socket server with two SSD cards.
He used a few FIO jobs doing
relatively large (20MB) asynchronous/direct IOs with 16
concurrent threads and 10 pending IOs each (160 total).
Here are his results for streaming and random writes:
Streaming writes
IOMMU off 2,575MB/s
skiplist 1,715MB/s
rbtree 1,659MB/s
Not a huge improvement, but the CPU time was lower.
[...]
16 threads, iodepth 10, 20MB random writes
IOMMU off 2,548MB/s
skiplist 1,649MB/s
rbtree 33MB/s
The existing rbtree-based IOMMU slows streaming writes down 64.4% of the
maximum, and the skiplist's throughput is slightly better at 66.6% while
using less CPU time. Evidently the skiplist's advantages in concurrency and
in maintaining a balanced overall structure only give it a modest advantage
in the streaming write case. However, random writes cause rbtree
performance to only achieve 1.3% of the maximum throughput. In this case, a skiplist fares much better, dropping only to 64.7% of the maximum because different threads can hold locks simultaneously while in different parts of the skiplist and it doesn't need to go through a costly rebalancing operation like the rbtree.
16 threads, iodepth 10, 20MB random reads
IOMMU off 2,861MB/s (mostly idle)
skiplist 2,484MB/s (100% system time)
rbtree 99MB/s (100% system time)
... lowering the thread count did improve the rbtree performance, but the
best I could do was around 300MB/s ...
Reads are easier than writes, and we could expect streaming read results to all be close and relatively uninteresting. Certainly both the rbtree and skiplist do better at random reads than random writes. In fact, the skiplist achieves higher throughput for random reads than it does for streaming writes although it has to work hard to do so. And in case anyone thought the thread count was particularly unfair for rbtree in these tests, Chris points out that the best he got for random IOs with rbtree was around 300MB/s. That's still only 10% of the maximum throughput. Furthermore, Chris noted that all of the CPU time spent in the skiplist was in skiplist_insert_hole(), which isn't optimized.
In a recent discussion on the Linux filesystems mailing list, Mathieu
Desnoyers proposed another data structure that he is calling
RCU Judy
arrays. They can't be compared with skiplists just yet since the Judy
arrays are
only implemented in user space so far, but the competition between the two
ideas should improve them both.
Even though there are plenty of opportunities for refinement, this is a
promising start for a cache-friendly skiplist for the Linux kernel. It should
provide better performance for any subsystem that has high levels of contention
between concurrent accesses of its rbtrees:
various filesystem indexes, virtual memory areas (VMAs), the high-resolution
timer code, etc. CPU schedulers will probably not see any benefit from skiplists
because only one thread is making the scheduling decision, but perhaps
multiqueue schedulers for the network or block layer might in the case where
they have one queue per NUMA node.
Comments (3 posted)
At LinuxCon
Japan, Yasuaki Ishimatsu of Fujitsu talked about the status of memory
hotplug, with a focus on what still needs to be done to fully support both
hot adding and hot removing memory. If a memory device is broken in a
laptop or desktop, you can just replace that memory, but for servers, especially
ones that need to stay running, it is more difficult. In addition,
having a way to add and remove memory would allow for dynamic
reconfiguration on systems where the hardware has been partitioned into two
or more virtual machines.
The focus of the memory hotplug work is for both scenarios: broken memory
hardware and dynamic reconfiguration. Memory hotplug will be supported in
KVM, Ishimatsu said. It is currently supported by several operating
systems, but Linux does not completely support it yet. Fixing that is the
focus this work.
There are two phases to memory hotplug: physically adding or removing
memory (hot add or hot remove) and logically changing the amount of memory
available to the system (onlining or offlining memory). Both phases have
to be completed before Linux can use any new memory, and taking the memory
offline (so that Linux is no longer using it) is required before it can be
removed.
The memory management subsystem manages physical memory by using two
structures, he said. The page tables hold a direct mapping for virtual to
physical addresses. The virtual memory map manages page structures. In
order to offline memory, any data needs to be moved out of the memory and
those data structures need to be updated. Likewise, when adding memory,
new page table and
virtual memory map entries must be added.
Pages are managed in zones and, when using the sparse memory model that is
needed for memory hotplug systems, zones are broken up into sections that are
128M in size. Sections can be switched from online to offline and vice
versa using the /sys/devices/system/memory/memoryX/state file. By
echoing offline or online into that file, the pages in
that section have their state changed to unusable or usable respectively.
In the 3.2 kernel, hot adding memory and onlining it were fully
supported. Offlining memory was supported with limitations, and hot
removing it was not supported at all. Work started in July 2012 to remove
the offline limitations and to add support for hot remove, Ishimatsu said.
The work for hot remove has been merged for the 3.9 kernel. It will invalidate
page table and virtual memory map entries that correspond to the memory
being removed. But, since the memory must be taken offline before it is
removed, the limitations on memory offline still make it impossible to
remove arbitrary memory hardware from the system.
When memory that is to be offlined has data in it, that data is migrated to
other memory in the system. But the only pages that are migratable this
way are the page cache and anonymous pages, which are known as "movable"
pages. If the memory contains non-movable memory, which Ishimatsu called
"kernel memory",
the section cannot be offlined.
There are two ways to handle that problem that are being considered. The
first is to support moving kernel memory when offlining pages
that contain it. The advantages to that are that all memory can be
offlined and there is no additional performance impact for NUMA systems
since there are no restrictions on the types of allocations that can be
made. On the
downside, though, the kernel physical to virtual address relationship will
need to change completely. The other alternative is to make all of a
node's memory movable, which would reuse the existing movable memory
feature, but means that only page cache and anonymous pages can be stored
there, which will impact the performance of that NUMA node.
Ishimatsu said that he prefers the first solution personally, but, as a
first step they are implementing the second: creating a node that consists
only of movable memory. Linux has the idea of a movable zone
(i.e. ZONE_MOVABLE), but zones of that type are not created
automatically. If a node consists only of movable memory, all of it can be
migrated elsewhere so that the node can be taken offline.
A new boot option, movablecore=acpi, is under development that will use
the memory affinity structure in the ACPI static resource affinity table
(SRAT) to choose which nodes will
be constructed of movable memory. The existing use for
movablecore allows setting aside a certain amount of memory that
will be
movable in the system, but it spreads it evenly across all of the nodes
rather than concentrating it only on the nodes of interest. The "hotpluggable"
bit for a node in the SRAT will be used to choose the target nodes in the
new mode.
Using the online_movable flag to the sysfs memory state
file (rather than just online) allows an administrator to tell the
system to make that memory movable. Without that, the onlined memory is
treated as ZONE_NORMAL, so it may contain kernel memory and thus
not be able to be offlined. The online_movable feature was merged
for 3.8. That reduces the limitations on taking memory offline, but there
is still work to do.
Beyond adding the movablecore=acpi boot option (and possibly a
vm.hotadd_memory_treat_as_movable sysctl), there are some other
plans. Finding a way to put the page tables and virtual memory map into the
hot-added memory is something Ishimatsu would like to see, because it would
help performance on that node, but would not allow that memory to be
offlined unless those data structures can be moved. He is thinking about
solutions for
that. Migrating vmalloc() data to other nodes when offlining a
node is another feature under consideration.
Eventually, being able to migrate any kernel memory out of a node is
something he would like to see, but solutions to that problem are still somewhat
elusive. He encouraged those in attendance to participate in the
discussions and to help find solutions for these problems.
[I would like to thank the Linux Foundation for travel assistance to Tokyo
for LinuxCon Japan.]
Comments (7 posted)
By Jake Edge
June 12, 2013
While three kernel internships for women were originally
announced in late April, the size of the program has more than doubled
since then. Seven internships have been established for kernel work
through the Outreach Program for
Women (OPW); each comes with a $5000 stipend and a $500 travel grant. The
program officially kicks off on June 17, but the application process
already brought in several hundred patch submissions from eighteen
applicants, 137 of which were
accepted into the staging and Xen trees—all in thirteen days.
The program was initiated by the Linux Foundation, which found sponsors for
the first three slots, but Intel's Open Source Technology Center added
three more
while the OPW itself came up with funding for another. The OPW has
expanded well beyond its GNOME project roots, with eighteen different
organizations (e.g. Debian, KDE, Mozilla, Perl, Twisted, and many more)
participating in this round.
The program pairs the interns with a mentor from
a participating project to assist the intern with whatever planned work
she has
taken on for the three months of the each program round. OPW is patterned
after the Google Summer of Code project, but is not only for students and
programmers as other kinds of projects (and applicants) are explicitly
allowed. As the name would imply, it also restricts applicants to those
who self-identify as a woman.
The kernel effort has been guided by Sarah Sharp, who is a USB 3.0 kernel
hacker for Intel. She is also one of the mentors for this round. In late
May, she put together a blog
post that described the application process and the patches it brought
in. Sharp filled us in on the chosen interns. In addition, most of
the patches accepted can be seen in her cherry-picked kernel
git tree.
The interns
Sharp will be mentoring Ksenia (Xenia) Ragiadakou who will be working on
the USB 3.0 host driver. Ragiadakou is currently studying for her bachelor's degree in
computer science at the University of Crete in Greece. In addition to her
cleanup patches for the rtl8192u wireless staging driver,
Ragiadakou has already found a bug
in Sharp's host controller driver.
Two of the interns will be working on the Xen subsystem of the kernel with
mentors Konrad Wilk of Oracle and Stefano Stabellini of Citrix. They are Lisa
T. Nguyen, who received a bachelor's degree in computer science from the
University of Washington in 2007, and Elena Ufimtseva, who got a master's
degree in
computer science from St. Petersburg University of Information
Technologies in 2006. Nguyen did several cleanup
patches for Xen (along with various other cleanups) as part of the
application process, while Ufimtseva focused on cleanups in the ced1401
(Cambridge Electronics 1401 USB device) driver in staging.
Lidza Louina will be working with Greg Kroah-Hartman as a mentor on further
cleanups in staging drivers. She was working on a bachelor's degree in
computer science at the
University of Massachusetts but had to take time off to work full-time.
Her contributions were to the csr wireless driver in the staging tree.
Tülin İzer is working on parallelizing the x86 boot process with mentor PJ
Waskiewicz of Intel. She is currently pursuing a bachelor's
degree in computer engineering at Galatasary University in Istanbul,
Turkey. Her application included fixes for several staging drivers.
Two other Intel-mentored interns are in the mix: Hema Prathaban will be
working with Jacob Pan on an Ivy Bridge temperature sensor driver, while
Laura Mihaela Vasilescu will be working on Intel Ethernet drivers, mentored
by Carolyn Wyborny and Anjali Singhai. Prathaban graduated in 2011 from KLN College
of Engineering in India with a bachelor's degree in computer science. She
has been a full-time mother for the last year, so the internship provides
her a way to get back into the industry. Vasilescu is a master's student at
the University of Politehnica of Bucharest, Romania and is also the student
president of ROSEdu, an organization
for Romanian open source education. Both did a number of patches; Prathaban
in the staging tree (including fixing a bug
in one driver) and Vasilescu in Intel Ethernet drivers.
Getting started
As with many budding kernel developers, most of the applicants' patches
were to various staging drivers. There was a short application window as
the kernel portion didn't get announced until a little under two weeks before
the deadline. But that didn't seem to slow anything down as there were 41
applicants for the internships, with eighteen submitting patches and eleven
having those patches accepted into the mainline.
That level of interest—and success—is partly attributable to a first
patch tutorial that she wrote, Sharp said.
The tutorial
helps anyone get started with kernel development from a fresh Ubuntu 12.04
install. It looks at setting up email, getting a kernel tree, using git,
building the kernel, creating a patch, and more. The success was also due to strong
applicants and mentors that were "patient and encouraging",
she said.
The kernel OPW program was mentioned multiple times at the recently held
Linux Foundation conferences in Japan as a helpful step toward making the
gender balance of kernel developers better represent the world we live in
(as Dirk Hohndel put it). It is also nice to see the geographical diversity
of the interns, with Asia, Europe, and North America all represented.
Hopefully South America, Africa, and Oceania will appear in follow-on
rounds of the program—Antarctica may not make the list for some time to come.
Another round of the OPW, including kernel internships, is planned for
January through March 2014 (with application deadlines in December). The
program is seeking more interested projects, mentors, and financial backers
for the internships. While there are certainly critics of these types of
efforts, they have so far proved to be both popular and effective. Other
experiments, using different parameters or criteria, are definitely
welcome, but reaching out and making an effort to bring more women into the
free-software fold is something that will hopefully be with us for some
time—until that hoped-for day when it isn't needed at all anymore.
Comments (2 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jake Edge
Distributions
Tizen is intended to serve as a base Linux distribution for
consumer electronics products, from phones to automobile dash systems,
all built and sold by different manufacturers—yet offering a
consistent set of application-level APIs. Consequently, one of
the problems the project clearly needs to address is assessing the
compliance of products marketed as Tizen offerings. At the 2013 Tizen
Developer Conference in San Francisco, several sessions examined
the compliance program and the testing process involved.
First, Intel's Bob Spencer addressed the goals and
scope of the compliance program. Perhaps the biggest distinction he
made was that only hardware products would be subject to any
compliance testing; applications will not need to be submitted for
tests. That said, applications will need to conform to packaging
and security guidelines, but as he put it "acceptance into the Tizen
app store means success" on that front. The project is working on a
tool to flag build and packaging errors for app developers, but it is
not ready for release.
As for hardware, Spencer said, companies developing a Tizen-based
product will not need to send in physical devices. Instead, they will
need to send the results of the compliance test suite to the Tizen
Association, along with a "branding request" for approval of the use
of the Tizen trademark.
In broad strokes, he said, the compliance model consists of a set
of specifications and the test suite that checks a device against
them. The specification distinguishes between requirements (i.e.,
"MUST" statements) and recommendations ("SHOULD" statements). There
is a common core specification that applies to all Tizen devices, plus
separate "profile" specifications that address particular device
categories. Each includes a combination of hardware and system software
features. To get the compliance seal of approval, a device must
pass both the common specification plus one or more device class
profiles.
The "or more" requirement might allow a device to
qualify as both a tablet (which falls under the "mobile" profile) and
as a "convertible" (which, sadly, is not part of the automotive profile,
but rather the "clamshell" profile also used for laptops).
Interestingly enough, devices will not be allowed to qualify for Tizen
compliance by meeting only the common core specification. At present,
the common core specification and the "mobile" profile have been published
for the Tizen 2.1 release as a "public draft." Spencer said that the
final 2.1 specification is expected by the end of June.
The draft
[PDF] does not separate out the common core and mobile profile
requirements into distinct sections, which the site says will be done
only once there are multiple device profiles published. On the
compliance discussion list, Mats Wichmann said
that this was due to the need to get a mobile profile specification out
the door.
Spencer provided an overview of the specification in his session.
He described the hardware requirements as being designed for
flexibility, supporting low-end feature phones up to high-end
smartphones, tablets from simple e-readers on up, and perhaps even
watches (which, incidentally, marked the first mention of Tizen
powered watches I have encountered). The list includes 512MB of RAM,
1GB of storage, at least one audio output, some form of Internet
connectivity (which SHOULD be wireless), display resolution of at least
480×320, USB 2.0, and touch-screen support (which MUST support
single-touch, but SHOULD support multi-touch).
There is considerably more flexibility regarding the vast
assortment of sensors and radios found in phones today; the
specification indicates that things like GPS, Near-Field
Communications (NFC), and accelerometers are all optional, but that if
a device provides any of them, it must implement the associated APIs.
At the moment, the draft requires supporting both Tizen's HTML5
APIs and its native APIs; Spencer said there were internal discussions
underway as to whether there should be separate "web-only" and
"web-plus-native" profile options. In addition to the application
APIs, the software side of the specification requires that devices be
either ARM or x86 architecture, defines application packaging and
management behavior, lists required multimedia codecs, SDK and
development tool compliance, and mandates implementation of the
"AppControl" application control interface (which defines a set of
basic cross-application operations like opening and displaying a
file).
The requirements are a bit more stringent in one area: web
runtimes. A device must provide both a standard web browser and a web
runtime for HTML5 applications. In addition, both must be built from
the official Tizen reference implementations (which are based on
WebKit), and must not alter the exposed behavior implemented
upstream. The browser and web runtime must also report a specific
user agent string matching the reference platform and version
information.
Testing, testing, testing
Immediately after Spencer finished his overview of the compliance
specification, Samsung's Hojun Jaygarl and Intel's Cathy Shen spoke
about the Tizen Compliance Test (TCT) used to assess devices. TCT is
designed to verify that the version of Tizen running on a product
conforms to the specifications, they said, although the project
requires that the Tizen reference code will be ported to each
device, rather than implemented from scratch. Consequently, the TCT
tests are designed to test features that ensure a consistent
application development environment and a consistent customer
experience, but allow manufacturers to differentiate the user
experience (UX).
The TCT battery of tests includes both automated and manual tests,
they explained. The manual tests cover those features that require
interoperating with other devices, such as pairing with another device
over WiFi Direct, or human interaction (such as responding to button
presses). The automated tests are unit tests addressing the mandatory
hardware and software requirements of the specification, and
compliance with any of the optional features the vendor chooses to
implement.
TCT splits the native API and Web APIs into separate categories
(although, again, both are currently required for any device to
pass). The native TCT involves a native app called FtApp that
executes individual tests on the device in question. The tests
themselves are built on the GTest framework
developed by Google. Tests are loaded into FtApp from a PC connected
to the device via the Smart
Development Bridge (SDB) tool in the Tizen SDK. There is also a
GUI tool for the host PC to monitor test progress and generate the
reports necessary for submission. The "native" tests cover the native
application APIs, plus application control, conformance to security
privileges, and the hardware features.
The web TCT can use the GUI tool to oversee the process, but
there is a command line utility as well. This test suite involves
loading an embedded web server onto the device, since it tests the
compliance of the device's web runtime with the various Web APIs
(including those coming from the W3C and the supplementary APIs
defined by Tizen). It also tests the device's web runtime for
compliance with package management, security, and privacy
requirements, and can run tests on the device's hardware capabilities.
These may not be completely automated, for example, involving a human
to verify that the screen rotates correctly when the device is turned
sideways. Finally, there is a tool called TCT-behavior that tests
interactive UI elements; it, too, requires a person to operate the
device.
The web TCT currently covers more than 10,000 individual
tests, while the native TCT incorporates more than 13,000. Shen and
Jaygarl said the automated tests take three to four hours to complete,
depending on the device. The manual tests add about one more hour.
Reports generated by the test manager are fairly simple; they list the
pass/fail result for each test case, the elapsed time, the completion
ratio (if applicable), and link to a more detailed log for each case.
The test management tool is an Eclipse plugin, designed for use with
the Tizen SDK.
During the Q&A at the end of the session, the all-important
question of source code availability was raised by the audience. Shen
and Jaygarl said that they expected to release the TCT test tools by
the end of June. Currently, they are still working on optimizing the
manual test cases—although it also probably goes without saying
that the TCT can hardly be expected before the final release of the
specification it is intended to test.
With more than 23,000 test cases, compliance with Tizen 2.1 will
hardly be a rubber-stamp, though requiring vendors to port the
reference code ought to take much of the guesswork out of the
process. Jaygarl and Shen also commented that developers will be able
to write their own test cases in GTest format and run them using the
official TCT tools, so when the toolset arrives it may offer something
to application developers as well as system vendors.
Compliance with a specification is not necessarily of interest to
everyone building an embedded Linux system, nor even to everyone
building a system based on Tizen. The program is designed to meet the
needs of hardware manufacturers, after all, who already have other
regulatory and development tests built into their product cycle. It
will be interesting to see how the Tizen Compliance Program evolves to
handle the non-mobile-device profiles in the future, but even if that
takes a while, it could be amusing to run the tests against the first
batches of commercially available Tizen phones, which are reported to
arrive this year.
[The author wishes to thank the Linux Foundation for travel assistance to Tizen Dev Con.]
Comments (none posted)
Brief items
Quite a lot of people have said they find the current layout a bit
confusing, but then, we tried two other layouts before this one and
people found both of those confusing too. At this point we are running
out of possibilities, but perhaps we could label the button 'Unicorn'
and have it orbit the screen randomly. That would at least be different.
--
Adam Williamson
Comments (none posted)
The FreeBSD Release Engineering Team has
announced the
availability of FreeBSD 8.4. See the
detailed
release notes for more information.
Comments (none posted)
Newsletters and articles of interest
Comments (none posted)
Bruce Byfield
reviews
Linux Mint 15.
"
Linux Mint 15 is a solid release, but not an end in itself. Rather, it is part of the ongoing process of refining Cinnamon and Mate while minimizing innovation to keep users comfortable. With this release, the process is starting to meet some of its early promise, but remains ongoing.
It is still uncertain whether any distribution with Linux Mint's goal is capable of more than mild innovations. Users, perhaps, might consider this limited scope a good thing—and, after some of the events of the last few years, I can understand this attitude.
At this point, many users must be weary of thinking so much about their desktop environments. Such users have settled on Linux Mint precisely because it allows them to forget about their interfaces and concentrate on their work."
Comments (2 posted)
The H
takes
a brief look at the latest version of ROSA's Desktop Fresh. "
The developers say that users of Desktop Fresh R1 can now install Valve's Steam distribution platform on it, giving them access to over a hundred commercial games. The default desktop environment in ROSA Fresh R1 is KDE and the distribution includes version 4.10.3 of the desktop environment. The developers promise that GNOME and LXDE editions of the distribution will follow."
Comments (none posted)
Page editor: Rebecca Sobol
Development
June 8, 2013
This article was contributed by Neil Brown
The designers of a new programming language are probably most interested in
the big features — the things that just couldn't be done with whichever
language they are trying to escape from. So they are probably
thinking of the type system, the data model, the concurrency support,
the approach to polymorphism, or whatever it is that they feel will
affect the expressiveness of the language in the way they want.
There is a good chance they will also have a pet peeve about syntax,
whether it relates to the exact meaning of the humble semicolon, or
some abhorrent feature such as the C conditional expression which (they feel)
should never be allowed to see the light of day again.
However, designing a language requires more than just addressing the
things you care about. It requires making a wide range of decisions
concerning various sorts of abstractions, and making sure the choices
all fit together into a coherent, and hopefully consistent, whole.
One might hope that, with over half a century of language development
behind us, there would be some established norms which can be simply
taken as "best practice" without further concern. While this is
true to an extent, there appears to be plenty of room for languages to
diverge even on apparently simple concepts.
Having
begun
an exploration of the relatively new languages
Rust and Go
and, in particular, having two languages to provide illuminating
contrasts, it seems apropos to examine some of those language features
that we might think should be uncontroversial to see just how
uniform they have, or have not, become.
Comments
When first coming to C
[PDF] from Pascal, the usage of braces can be a bit of
a surprise. While Pascal sees them as one option for enclosing
comments, C sees them as a means of grouping statements. This harsh
conflict between the languages is bound to cause confusion, or at
least a little friction, when moving from one language to the next,
but fortunately appears to be a thing of the past.
One last vestige of this sort of confusion can be seen in the
configuration files for
BIND,
the Berkeley Internet Name Daemon.
In the BIND configuration files semicolons are used as statement
terminators while in the database files they introduce comments.
When not hampered by standards conformance as these database files
are, many languages have settled on C-style block comments:
/* This is a comment */
and C++-style one-line
comments:
// This line has a comment
these having won over from the other Pascal option of:
(* similar but different block comments *)
and Ada's:
-- again a similar yet different single line comment.
The other popular alternative is to start comments with a "#"
character, which is a style championed by the C-shell and Bourne shell, and
consequently
used by many scripting languages.
Thankfully the idea of starting a comment with "COMMENT" and ending
with "TNEMMOC" never really took off and may be entirely apocryphal.
Both Rust and Go have embraced these trends, though not as fully as
BIND configuration files and other languages like Crack which allow
all three (/* */, //, #). Rust and Go only
support the C
and C++ styles.
Go doesn't use the "#" character at all, allowing it only inside comments
and string constants, so it is available as a comment character for a
future revision, or maybe for something else.
Rust has another use for "#" which is slightly reminiscent of its use by the
preprocessor in C. The construct:
#[attribute....]
attaches arbitrary metadata to nearby parts of the program which can
enable or disable compiler warnings, guide conditional compilation,
specify a license, or any of various other things.
Identifiers
Identifiers are even more standard than comments. Any combination of
letters, digits, and the underscore that does not start with a digit is
usually acceptable as an identifier providing it hasn't already been
claimed as a reserved word (like if or while).
With the increasing awareness of languages and writing systems other than
English, UTF-8 is more broadly supported in programming languages these
days. This extends the range of
characters that can go into an identifier, though different languages
extend it differently.
Unicode
defines a category for every character, and Go simply extends
the definition given above to allow "Unicode letter" (which has 5
sub-categories: uppercase, lowercase, titlecase, modifier, and other) and
"Unicode decimal digit" (which is one of 3 sub-categories of "Number",
the others being "Number,letter" and "Number,other") to be combined
with the underscore.
The
Go FAQ
suggests this definition may be extended depending on how
standardization efforts progress.
Rust gives a hint of what these efforts may look like by
delegating the task of determining valid identifiers to the Unicode
standard.
The
Unicode Standard Annex #31
defines two character classes, "ID_Start" and "ID_Continue", that can be
used to form identifiers in a standard way. The Annex offers these as
a resource, rather than imposing them as a standard, and acknowledges
that particular use cases may extend them is various ways.
It particularly highlights that some languages like to allow
identifiers to start with an underscore, which ID_Start does not
contain. The particular rule used by Rust is to allow an identifier to
start with an ASCII letter, underscore, or any ID_Start, and to be
continued with ASCII letters, ASCII digits, underscores, or Unicode
ID_Continue characters.
Allowing Unicode can introduce interesting issues if case is
significant, as Unicode supports three cases (upper, lower, and title)
and also supports characters without case. Most programming languages
very sensibly have no understanding of case and treat two characters
of different case as different characters, with no attempt to fold case
or have a canonical representation. Go however does pay some
attention to case.
In Go, identifiers where the first character is an uppercase letter
are treated differently in terms of visibility between packages. A
name defined in one package is only exported to other packages if it
starts with an uppercase letter. This suggests that writing systems
without case, such as Chinese, cannot be used to name exported
identifiers without some sort of non-Chinese uppercase prefix.
The
Go FAQ
acknowledges this weakness but shows a strong reluctance to give up
the significance of case in exports.
Numbers
Numbers don't face any new issues with Unicode though possibly that is
just due to continued English parochialism, as Unicode does contain a
complete set of Roman numerals as well as those from more current numeral
systems. So
you might think that numbers would be fairly well
standardized by now. To a large extent they are, but there still
seems to be wiggle room.
Numbers can be integers or, with a decimal point or exponent suffix
(e.g. "1.0e10"), floating point. Integers can be expressed in decimal, octal
with a leading "0", or hexadecimal with a leading "0x".
In C99 and D [PDF], floating point numbers can also be hexadecimal. The
exponent suffix must then have a "p" rather than "e" and gives a power of
two expressed in decimal. This allows precise specification of floating
point numbers without any risk of conversion errors.
C11 and D also allow a "0b" prefix on integers to indicate a binary
representation (e.g. "0b101010") and D allows underscores to be sprinkled
though numbers to improve readability, so 1_000_000_000 is clearly the
same value as 1e9.
Neither Rust nor Go have included hexadecimal floats. While Rust
has included binary integers and the underscore spacing character, Go
has left these out.
Another subtlety is that while C, D, Go, and many other languages allow a
floating point number to start with a period (e.g. ".314159e1"), Rust does
not. All numbers in Rust must start with a digit. There does not
appear to be any syntactic ambiguity that would arise if a leading
period were permitted, so this is presumably due to personal preference
or accident.
In the language Virgil-III
this choice is much clearer. Virgil has a
fairly rich
"tuple" concept [PDF] which provides a useful shorthand for a
list of values. Members of a tuple can be accessed with a syntax
similar to structure field references, only with a number rather than
a name. So in:
var x:(int, int) = (3, 4);
var w:int = x.1;
The variable "w" is assigned the value "4" as it is element one of the
tuple "x".
Supporting this syntax while also allowing ".1" to be a floating point
number would require the tokenizer to know when to report two tokens
("dot" and "int") and when it is just one ("float"). While possible, this
would be clumsy.
Many fractional numbers (e.g. 0.75) will start with a zero even in languages
which allow a leading period (.75). Unlike the case with integers,
the leading zero does not mean these number are interpreted in base eight.
For 0.75 this is unlikely to
cause confusion. For 0777.0 it might. Best practice for programmers
would be to avoid the unnecessary digit in these cases and it would be
nice if the language required that.
As well as prefixes, many languages allow suffixes on numbers with a
couple of different meanings. Those few languages which have
"complex" as a built-in type need a syntax for specifying "imaginary"
constants. Go, like D, uses an "i" suffix. Python uses "j".
Spreadsheets like LibreOffice localc or Microsoft Excel allow
either "i" or "j". It is a pity more languages don't take that approach.
Rust doesn't support native complex numbers, so it doesn't need to choose.
The other meaning of a suffix is to indicate the "size" of the value -
how many bytes are expected to be used to store it. C and D allow
u, l,
ll, or f for unsigned, long, long long, and float, with a few
combinations permitted.
Rust allows u, u8, u16, u32,
u64, i8, i16, i32, i64,
f32, and f64 which
cover much
the same set of sizes, but are more explicit. Perhaps fortunately, i
is not a permitted suffix, so there is room to add imaginary numbers in
the future if that turned out to be useful.
Go takes a completely different approach to the sizing of constants.
The
language specification
talks about "untyped" constants though this seems to be some strange
usage of the word "untyped" that I wasn't previously aware of. There
are in fact "untyped integer" constants, "untyped floating point"
constants, and even "untyped boolean" constants, which seem like they
are untyped types. A more accurate term might be "unsized constants with
unnamed
types" though that is a little cumbersome.
These "untyped" constants have two particular properties. They are
calculated using high precision with overflow forbidden, and
they can be transparently converted to a different type provided that the exact
value can be represented in the target type. So "1e15" is an untyped
floating point constant which can be used where an int64 is
expected, but not where an int32 is expected, as it requires
50 bits to store as an integer.
The specification states that "Constant expressions are always
evaluated exactly" however some edge cases are to be expected:
print((1 + 1/1e130)-1, "\n")
print(1/1e130, "\n")
results in:
+9.016581e-131
+1.000000e-130
so there does seem to be some limit to precision.
Maintaining high precision and forbidding overflow means that there
really is no need for size suffixes.
Strings
Everyone knows that strings are enclosed in single or double quotes.
Or maybe backquotes (`) or triple quotes ('''). And that
while they
used to contain ASCII characters, UTF-8 is preferred these days.
Except when it isn't, and UTF-16 or UTF-32 are needed.
Both Rust and Go, like C and others, use single quotes for characters
and double quotes for strings, both with the standard set of escape
sequences (though Rust inexplicably excludes \b, \v,
\a, and \f). This set includes \uXXXX and
\UXXXXXXXX so that all
Unicode code-points can be expressed using pure ASCII program text.
Go chooses to refer to character constants as "Runes" and provides the
built in type "rune" to store them. In C and related
languages "char" is used both for ASCII characters and 8-bit values.
It appears that the Go developers wanted a clean break with that and
do not provide a char type at all. rune
(presumably more aesthetic than wchar) stores (32-bit) Unicode
characters while byte or uint8 store 8-bit values.
Rust keeps the name char for 32-bit Unicode characters and
introduces u8 for 8-bit values.
The modern trend seems to be to disallow literal newlines inside
quoted strings, so that missing quote characters can be quickly
detected by the compiler or interpreter. Go follows this trend and, like
D, uses the
back quote (rather than the Python triple-quote) to surround "raw"
strings in which escapes are not recognized and newlines are
permitted. Rust bucks the trend by allowing literal newlines in strings
and does not provide for uninterpreted strings at all.
Both Rust and Go assume UTF-8. They do not support the prefixes of C
(U"this is a string of 32bit characters")
or the suffixes of D
("another string of 32bit chars"d),
to declare a string to be a multibyte string.
Semicolons and expressions
The phrase "missing semicolon" still brings back memories from first-year computer science and learning Pascal. It was a running joke that
whenever the lecturer asked "What does this code fragment do?"
someone would call out "missing semicolon", and they were right more
often than you would think.
In Pascal, a semicolon separates statements while in C it terminates
some statements — if, for, while,
switch and compound statements do not require a semicolon.
Neither rule is particularly difficult to get used to,
but both often require semicolons at the end of lines that can look
unnecessary.
Go follows Pascal in that semicolons separate statements — every pair
of statements must be separated. A semicolon is not needed before the
"}" at the end of a block, though it is permitted there.
Go also follows the pattern seen in Python and JavaScript where the
semicolon is sometimes assumed at the end of a line (when a newline
character is seen). The details of this "sometimes" is quite
different between languages.
In Go, the insertion of semicolons happens during
"lexical analysis",
which is the step of language processing that breaks the stream of
characters into a stream of tokens (i.e. a tokenizer). If a newline is
detected on a
non-empty line and the last token on the line was one of:
- an identifier,
- one of the keywords break, continue, fallthrough,
or return
- a numeric, rune, or string literal
- one of ++, --, ), ], or }
then a semicolon is inserted at the location of the newline.
This imposes some style choices on the programmer such that:
if some_test
{
some_statement
}
is not legal (the open brace must go on the same line as the
condition), and:
a = c
+ d
+ e
is not legal — the operation (+) must go at the end of the
first line, not the start of the second.
In contrast to this, JavaScript waits until the "parsing" step of language
processing when the sequence of tokens is gathered into syntactic units
(statements, expressions, etc.) following a context free grammar.
JavaScript will insert a semicolon, provided that semicolon would
serve to terminate a non-empty statement, if:
- it finds a newline in a location that the grammar forbids a
newline, such as after the word "break" or before the postfix
operator "++";
- it finds a "}" or End-of-file that is not expected by the grammar
- it finds any token that is not expected, which was separated from
the previous token by at least one newline.
This often works well but brings its own share of style choices
including the interesting
suggestion
to sometimes use a semicolon to start a statement.
While both of these approaches are workable, neither really seems
ideal. They both force style choices which are rather arbitrary and
seem designed to make life easy for the compiler rather than for the
programmer.
Rust takes a very different approach to semicolons than Go or JavaScript
or many other languages. Rather than making them less important and
often unnecessary they are more important and have a significant semantic
meaning.
One use involves the attributes mentioned earlier. When followed by a
semicolon:
#[some_attribute];
the attribute applies to the entity (e.g. the function or module) that
the attribute appears within. When not followed by a semicolon, the
attribute applies to the entity that follows it. A missing
semicolon could certainly make a big difference here.
The primary use of semicolons in Rust is much like that in C — they
are used to terminate expressions by turning the expressions into
statements, discarding any result. The effect is really quite
different from C because of a related difference: many things that C
considers to be statements, Rust considers to be expressions. A
simple example is the if expression.
a = if b == c { 4 } else { 5 };
Here the if expression returns either "4" or "5", which is stored in
"a".
A block, enclosed in braces ({ }), typically includes a
sequence of expressions with semicolons separating them. If the last
expression is also followed by a semicolon, then the block-expression
as a whole does not have a value — that last semicolon discards the
final value. If the last expression is not followed by a semicolon,
then the value of the block is the value of the last expression.
If this completely summed up the use of semicolons it would produce
some undesirable requirements.
if condition {
expression1;
} else {
expression2;
}
expression3;
This would not be permitted as there is no semicolon to discard the
value of the if expression before expression3. Having a semicolon
after the last closing brace would be ugly, and that if expression
doesn't actually return a value anyway (both internal expressions are
terminated with a semicolon) so the language does not require the ugly
semicolon and the above is valid Rust code.
If the internal expression did return a value, for example if the
internal semicolons were missing, then a semicolon would be
required before expression3.
Following this line of reasoning leads to an interesting result.
if condition {
function1()
} else {
function2()
}
expression3;
Is this code correct or is there a missing semicolon? To know the
answer you need to know the types of the functions. If they do not
return a value, then the code is correct. If they do, a semicolon is
needed, either one at the end of the whole "if" expression, or one
after each function call. So in Rust, we need to evaluate the types
of expressions before we can be sure of correct semicolon usage in
every case.
Now the above is probably just a silly example, and no one would ever
write code like that, at least not deliberately. But the rules do
seem to add an unnecessary complexity to the language, and the task of
programming is complex enough as it is — adding more complexity through subtle
language rules is not likely to help.
Possibly a bigger problem is that any tool that wishes to accurately
analyze the syntax of a program needs to perform a complete type
analysis. It is a known problem that the correct parsing of C
code requires you to know which identifiers are typedefs and which are
not. Rust isn't quite that bad as missing type information wouldn't lead to an
incorrect parse, but at the very least it is a potential source of confusion.
Return
A final example of divergence on the little issues, though perhaps not
quite so little as the others, can be found in returning values from
functions using a return
statement. Both Rust and
Go support the traditional return and both allow multiple values to
be returned: Go by simply allowing a list of return types, Rust
through the "tuple" type which allows easy anonymous structures.
Each language has its own variation on this theme.
If we look at the half million return statements in the Linux
kernel, nearly 35,000 of them return a variable called "ret",
"retval", "retn", or similar, and a further 20,000 return "err",
"error", or similar. This totals more than 10% of total usage of
return in the kernel.
This suggests that there is often a need to declare a variable to hold
the intended result of a function, rather than to just return a result
as soon as it is known.
Go acknowledges this need by allowing the signature of a function to
give names to the return values as well as the parameter values:
func open(filename string, flags int) (fd int, err int)
Here the (hypothetical) open() function returns two integers
named fd (the file descriptor) and err.
This provides useful documentation of the meaning of the return values
(assuming programmers can be more creative than "retval")
and also declares variables with the given names. These can be set
whenever convenient in the code of the function and a simple:
return
with no expressions listed will use the values in those variables.
Go requires that this return be
present, even if it lists no values and is at the end of the function,
which seems a little unnecessary, but isn't too burdensome.
There is
evidence
[YouTube]
that some Go developers are not completely comfortable with this feature,
though it isn't clear whether the feature itself is a problem, or rather the
interplay with other features of Go.
Rust's variation on this theme we have already glimpsed with the
observation that Rust has "expressions" in preference to
"statements". The whole body of a function can be viewed as an
expression and, provided it doesn't end with a semicolon, the value
produced by that expression is the value returned from the function.
The word return is not needed at all, though it is available and an
explicit return expression within the function body will cause an
early return with the given value.
Conclusion
There are many other little details, but this survey provides a good
sampling of the many decisions that a language designer needs to make
even after they have made the important decisions that shape the
general utility of the language.
There certainly are standards that are appearing and broadly being
adhered to, such as for comments and identifiers, but it is a little
disappointing that there is still such variability concerning the
available representations of numbers and strings.
The story of semicolons and statement separation is clearly not a
story we've heard the end of yet. While it is good to see language
designers exploring the options, none of the approaches explored above
seem entirely satisfactory. The recognition of a line-break as being
distinct from other kinds of white space seems to be a clear recognition that
the two dimensional appearance of the code has relevance for parsing
it. It is therefore a little surprising that we don't see the line
indent playing a bigger role in interpretation of code. The
particular rules used by Python may not be to everyone's liking, but
the principle of making use of this very obvious aspect of a program
seems sound.
We cannot expect ever to converge on a single language that suits
every programmer and every task, but the more uniformity we can find
on the little details, the easier it will be for programmers to move
from language to language and maximize their productivity.
Comments (158 posted)
Brief items
Make sure you've got meaningful examples. If you have functions named foo or bar, then that's a warning sign right there.
—
James Hague,
explaining how to write quality functional programming tutorials.
Wow, seems like Stallman was right all along. I might have take
back my words on his antics after all.
—
William
McBee, on the recent privacy and government surveillance dust-up
in the US.
Comments (none posted)
Version 7.4 of the rsyslog system logger has been
released. This is the first version of the new 7.4 stable branch and it joins version 7.2.7 as supported versions of the tool. New headline features include
support for the systemd journal (both as input and output) along with log file
encryption,
signatures, and
anonymization.
Comments (18 posted)
Version 4.2 of Newscoop, the open source content management system for news
sites is available.
This release adds a REST API, Arabic support, and improvements to theming.
Comments (none posted)
A new stable release of the GNU Autoconf Archive is available. The archive includes more than 450 user-contributed macros for GNU Autoconf.
Full Story (comments: none)
After five years of development, version 1.1 of GNU Teseq has been released. Teseq is a utility for analyzing files that contain control characters and control sequences. This new release adds support for color output, descriptions and labels of a variety of non-standard controls, and automatic character-set recognition.
Full Story (comments: none)
Newsletters and articles
Comments (none posted)
Over at Phoronix, Eric Griffith has attempted to set the record straight on
X and Wayland, with assistance from X/Wayland developer Daniel Stone. He looks at the failings of X and the corresponding "fixings of Wayland", along with some misconceptions about the two and some generic advantages for Wayland. "
'X is Network Transparent.' Wrong. [It's] not. Core X and DRI-1 were network transparent. No one uses either one. Shared-Memory, DRI-2 and DRI-3000 are NOT network transparent, they do NOT work over the network. Modern day X comes down to synchronous, poorly done VNC. If it was poorly done, async, VNC then maybe we could make it work. But [it's] not. Xlib is synchronous (and the movement to XCB is a slow one) which makes networking a NIGHTMARE."
Comments (153 posted)
Linux.com
interviews Wolfgang Denk, creator of the U-Boot bootloader, about two great things that embedded Linux has achieved: abstracting away hardware differences for application developers and the rapid adoption of the
Yocto project. "
But the really dramatic changes do not happen in Linux, but in the hardware. If you consider the landslide-like move from Power Architecture to ARM systems in the last two or three years it is highly notable that this happened without disconcertment for both developers and users: thanks to Linux, the low level hardware details are well abstracted away, and on application level it does not really matter at all which exact architecture or SoC you are working with. This is really a great achievement."
Comments (none posted)
At his blog, Michael Stapelberg examines
the first batch of responses to Debian's recent survey about systemd
support. This round covers concerns over the number of dependencies,
complexity, and feature creep. Stapelberg concludes that "While
systemd consumes more resources than sysvinit, it uses them to make
more information available about services; its finer-grained service
management requires more state-keeping, but in turn offers you more
control over your services." Presumably more posts will
follow, addressing more of the survey responses.
Comments (none posted)
Page editor: Nathan Willis
Announcements
Brief items
The Free Software Foundation Europe reports that the German Parliament
decided upon a joint motion to limit software patents. "
The
Parliament urges the German Government to take steps to limit
the granting of patents on computer programs. Software should
exclusively be covered by copyright, and the rights of the copyright
holders should not be devalued by third parties' software patents. The
only exception where patents should be allowed are computer programs
which replace a mechanical or electromagnetic component. In addition the
Parliament made clear that governmental actions related to patents must
never interfere with the legality of distributing Free Software."
Full Story (comments: 10)
Upcoming Events
The
schedule
is available for the openSUSE Conference (oSC13), which will take place
July 18-22 in Thessaloniki, Greece.
Comments (none posted)
Linux Plumbers Conference will take place September 18-20, 2013 in New
Orleans, Louisiana. The refereed track paper submission deadline is June
17. Twelve microconferences have been announced, with some slots still
open. LPC is expected to sell out, and the conference hotel has other
events going on at the same time, so make your travel plans soon.
Full Story (comments: none)
Events: June 13, 2013 to August 12, 2013
The following event listing is taken from the
LWN.net Calendar.
| Date(s) | Event | Location |
June 10 June 14 |
Red Hat Summit 2013 |
Boston, MA, USA |
June 13 June 15 |
PyCon Singapore 2013 |
Singapore, Republic of Singapor |
June 17 June 18 |
Droidcon Paris |
Paris, France |
June 18 June 20 |
Velocity Conference |
Santa Clara, CA, USA |
June 18 June 21 |
Open Source Bridge: The conference for open source citizens |
Portland, Oregon, USA |
June 20 June 21 |
7th Conferenza Italiana sul Software Libero |
Como, Italy |
June 22 June 23 |
RubyConf India |
Pune, India |
June 26 June 28 |
USENIX Annual Technical Conference |
San Jose, CA, USA |
June 27 June 30 |
Linux Vacation / Eastern Europe 2013 |
Grodno, Belarus |
June 29 July 3 |
Workshop on Essential Abstractions in GCC, 2013 |
Bombay, India |
July 1 July 5 |
Workshop on Dynamic Languages and Applications |
Montpellier, France |
July 1 July 7 |
EuroPython 2013 |
Florence, Italy |
July 2 July 4 |
OSSConf 2013 |
Žilina, Slovakia |
July 3 July 6 |
FISL 14 |
Porto Alegre, Brazil |
July 5 July 7 |
PyCon Australia 2013 |
Hobart, Tasmania |
July 6 July 11 |
Libre Software Meeting |
Brussels, Belgium |
July 8 July 12 |
Linaro Connect Europe 2013 |
Dublin, Ireland |
| July 12 |
PGDay UK 2013 |
near Milton Keynes, England, UK |
July 12 July 14 |
5th Encuentro Centroamerica de Software Libre |
San Ignacio, Cayo, Belize |
July 12 July 14 |
GNU Tools Cauldron 2013 |
Mountain View, CA, USA |
July 13 July 19 |
Akademy 2013 |
Bilbao, Spain |
July 15 July 16 |
QtCS 2013 |
Bilbao, Spain |
July 18 July 22 |
openSUSE Conference 2013 |
Thessaloniki, Greece |
July 22 July 26 |
OSCON 2013 |
Portland, OR, USA |
| July 27 |
OpenShift Origin Community Day |
Mountain View, CA, USA |
July 27 July 28 |
PyOhio 2013 |
Columbus, OH, USA |
July 31 August 4 |
OHM2013: Observe Hack Make |
Geestmerambacht, the Netherlands |
August 1 August 8 |
GUADEC 2013 |
Brno, Czech Republic |
August 3 August 4 |
COSCUP 2013 |
Taipei, Taiwan |
August 6 August 8 |
Military Open Source Summit |
Charleston, SC, USA |
August 7 August 11 |
Wikimania |
Hong Kong, China |
August 9 August 11 |
XDA:DevCon 2013 |
Miami, FL, USA |
August 9 August 12 |
Flock - Fedora Contributor Conference |
Charleston, SC, USA |
August 9 August 13 |
PyCon Canada |
Toronto, Canada |
August 11 August 18 |
DebConf13 |
Vaumarcus, Switzerland |
If your event does not appear here, please
tell us about it.
Page editor: Rebecca Sobol