Leading items
Some unreliable predictions for 2015
Welcome to the first LWN Weekly Edition for 2015. We hope that the holiday season was good to all of you, and that you are rested and ready for another year of free-software development. It is a longstanding tradition to start off the year with a set of ill-informed predictions, so, without further ado, here's what our notoriously unreliable crystal ball has to offer for this year.We will hear a lot about the "Internet of things" of course. For larger "things" like cars and major appliances, Linux is the obvious system to use. For tiny things with limited resources, the picture is not so clear. If the work to shrink the Linux kernel is not sufficiently successful in 2015, we may see the emergence of a disruptive competitor in that space. We may feel that no other kernel can catch up to Linux in terms of features, hardware support, and development community size, but we could be surprised if we fail to serve an important segment of the industry.
We'll hear a lot about "the cloud" too, and we'll be awfully tired of it by the end of the year. Some of the hype over projects like OpenStack will fade as the project deals with its growing pains. With some luck, we'll see more attention to projects that allow users to own and run their own clouds rather than depending on one of the large providers — but your editor has often been overly optimistic about such things.
While we're being optimistic: the systemd wars will wind down as users realize that their systems still work and that Linux as a whole has not been taken over by some sort of alien menace. There will still be fights — we, as a community, do seem to like fighting about such things — but most of us will increasingly choose to simply ignore them.
There is a wider issue here, though: we are breaking new ground in systems design, and that will necessarily involve doing things differently than they have been done in the past. There will certainly be differences of opinion on the directions our systems should take; if there aren't, we are doing something wrong. There is a whole crowd of energetic developers out there looking to do interesting things with the free software resources we have created. Not all of their ideas will be good ones, but it is going to be fun to watch what they come up with.
There will be more Heartbleed-level security incidents in 2015. There are a lot of dark, unmaintained corners in our software ecosystem, many of which undoubtedly contain ancient holes that, if we are lucky, nobody has yet discovered. But they will be discovered, and we'll not be getting off the urgent-update treadmill this year.
Investments in security will grow considerably as a consequence of 2014's high-profile vulnerabilities, high-profile intrusions at major companies, and ongoing spying revelations. How much good that investment will do remains to be seen; much will be swallowed up by expensive security companies that have little interest in doing the hard work required to actually make our systems more secure.
Investments in other important development areas will grow more slowly despite the great need in many areas. We all depend on code which is minimally maintained, if at all, and there are many unsolved problems out there that nobody seems willing to pick up. The Linux Foundation's Critical Infrastructure Initiative is a good start, but it cannot come close to addressing the whole problem.
Speaking of important development areas, serious progress will be made on the year-2038 problem in 2015. The pace picked up in 2014, but developers worked mostly on the easy part of the problem — internal kernel interfaces. But a real solution will involve user-space changes, and the sooner those are made, the better. The relevant developers understand the need; by the end of this year we'll know at least what the shape of the solution will be.
Some long-awaited projects will gain some traction this year. The worst Btrfs problems are being addressed thanks to stress testing at Facebook and real-world deployment in distributions like openSUSE. Wayland is reaching a point of usability for brave early adopters. Even Python 3, which has been ready for a while, will see increasing use. We'll have programs like X.org and Python 2 around for a long time, but the world does eventually move on.
There has been some talk of a decline in the number of active Linux distributions. If that is indeed the case, any decline in the number of distributions will be short-lived. We may not see a whole lot more general-purpose desktop or server distributions; that ground has been pretty well explored by now, and, with the possible exception of the systemd-avoidance crowd, there does not appear to be a whole lot to be done in that area. But we will see more and more distributions that are specialized for particular applications, be it network-attached storage, routing, or driving small gadgets. The flexibility of Linux in this area is one of its greatest strengths.
Civility within our community will continue to be a hot-button issue in 2015. Undoubtedly somebody will say something offensive and set off a firestorm somewhere. But, perhaps, we will see wider recognition of the fact that the situation has improved considerably over the years. With luck, we'll be able to have a (civil!) conversation on how to improve the environment we live in without painting the community as a whole in an overly bad light. We should acknowledge and address our failures, but we should recognize our successes as well.
Finally, an easy prediction is that, on January 22, LWN will finish its 17th year of publication. We could never have predicted that we would be doing this for so long, but it has been a great ride and we have no intention of slowing down anytime soon. 2015 will certainly be an interesting year for those of us working in the free software community, with the usual array of ups, downs, and surprises. We're looking forward to being a part of it with all of you.
Dark Mail publishes its secure-email architecture
The Dark Mail Alliance has published the first description of the architecture that enables its secure-and-private alternative to the existing Internet email system. Called the Dark Internet Mail Environment (DIME), the system involves a new email message format and new protocols for email exchange and identity authentication. Nevertheless, DIME also makes an effort to be backward-compatible with existing email deployments. DIME includes several interesting ideas, but its main selling points remain its security: it not only offers end-to-end encryption, but it encrypts much of the message metadata other systems leave in cleartext, too, and it offers resistance to attacks that target servers between the sender and the recipient.
The Alliance
Dark Mail was started in 2013, led
by Ladar Levison of the privacy-centric email service Lavabit and by PGP
creator Phil Zimmermann of Silent Circle. Both of those companies
abruptly shut down their email offerings in August 2013 in reaction to
a US government request for access to Edward Snowden's Lavabit
account—including a copy of the Lavabit SSL keys, which would
have enabled the government to decrypt all of the traffic between
Lavabit and its customers. Subsequently, Levison and Zimmermann
announced that they would be developing an "email 3.0
"
system through Dark Mail, with the goal of preventing just the sort of
attacks that occurred in the Snowden case.
One key problem that the Snowden incident revealed was that, even if two users employ strong encryption on their email messages (such as with OpenPGP or S/MIME), the metadata in those messages remains unencrypted. And that metadata can contain vital information: the sender and receiver addresses, the subject line, various mail headers, and even the trail of mail servers that relayed the message from sender to destination. Changing that would necessitate a new email message format, new protocols for email transfer and retrieval, and some sort of new infrastructure to let users authenticate each other. A new authentication framework is needed to avoid revealing key owners' email addresses, as currently happens with public PGP keyservers—and to avoid the well-documented vulnerabilities of the certificate authority (CA) system used for SSL/TLS.
DIME is designed to be that replacement email system. It describes a message format that encrypts every part of the message separately, using separate encryption keys for the different parts. Thus, mail transfer agents (MTAs) along the way can decrypt the portions of the message they need to deliver the message—but nothing else—and mail delivery agents (MDAs) can deliver messages to the correct user's inbox without learning anything about their content or about the sender. DIME also describes a transport protocol for sending such encrypted messages—one in which the multiple key retrieval and authentication steps are handled automatically—and a framework for how the authentication tokens required by the system should be published and formatted.
The DIME package
The DIME system is detailed in a 108-page PDF
specification—although it should be noted that several sections
in the specification are empty, either blank or labeled "TBD." The
most significant of these is DIME's IMAP replacement, DMAP, about
which the document says: " There is also source code for a suite of DIME-related libraries available
through the Lavabit GitHub account. So far, none of those GitHub
repositories indicates what software license the code is under.
Mozilla's Hubert Figuiere filed an issue
requesting one that does not yet seem to have been addressed. At this point, however, digesting and understanding the
architecture and formats described in the DIME specification is
probably the more important concern.
A bird's-eye view of the system starts with the message format. A
DIME message object contains three separate sections: the Next-Hop
section (which is unencrypted and holds the routing information needed
for the current transport method), the Envelope section (which
includes two "chunks" for the origin and destination information, each
encrypted separately), and the Content section (which contains the
email message headers and body, with each header and each body part
encrypted separately).
Within the Envelope and Content sections, it
is critical that each is encrypted separately and with a variety of
keys. This allows applications to decrypt only some parts of a
message if not all are of immediate importance (such as a mobile
client that only decrypts the Subject and Sender of new messages for a
summary screen, rather than downloading and decrypting everything).
It also allows the software to control which applications can decrypt
which sections by using several different keys.
By encrypting things like attachments and headers separately, there
is a clear security and privacy improvement—consider, for
example, that mailing-list thread information and return paths could
allow an attacker to collect a significant amount of information about
a conversation even without seeing the message body. Still, it may
come as a surprise to some that DIME also encrypts the sender and
recipient email addresses and names. The name of the sender and
recipient are optional, of course, but encrypting the addresses might
seem to make mail routing and delivery impossible.
DIME's solution to this problem is to adopt a domain-based
authentication scheme that the origin and destination mail servers can
use to validate each other's identities. Each mail server is also
responsible for authenticating the user on its end, but the
user-to-server authentication is logically separate from the
server-to-server authentication.
In other words, the scheme looks like this:
For each step (sender-to-origin, origin-to-destination,
destination-to-recipient), the necessary information to complete the
next step is encrypted separately, so that only the need-to-know
parties for that step have access to the information. The various
fields in the message are each encrypted with an ephemeral session
key, and a separate copy of that session key is included in the
message for each party trusted to access that field—with each
copy encrypted using a known public key for the appropriate party.
So there are three copies of the session key that protects the
recipient's email address: one encrypted with the sending user's
public key, one encrypted with the destination server's public key, and one
encrypted with the recipient user's public key. There are also three
copies of the (different) session key that protects the sender's
address: one for the sender, one for the recipient, and one for the
origin server. All of the keys in question are intended to be
generated automatically: users may naturally wish to have control over
their personal public/private key pairs (which will require software
support), but the session-key generation and retrieval of remote keys
is designed to be handled without explicitly involving the user.
The last piece in the puzzle is the actual transport method used to
send the message from the origin server to the destination server.
Here, DIME allows for several options: TLS, DIME's own SMTP
replacement DMTP, or even connecting over a Tor circuit.
Left up to the implementer are details such as exactly how the
users authenticate to their servers. There is a "paranoid" mode in
which the servers have no access to the user's key material and a full
key-exchange process is required for every connection, as well as a
"cautious" mode in which the server can store encrypted copies of the
user's keys to simplify the process somewhat, and a "trustful" mode in
which the server has full access to the user's secret keys.
The server-to-server authentication, however, is more precisely
specified. There are two authentication methods, both of which ought
to be used to protect against a well-funded adversary. The first is a
dedicated keyserver system akin to the OpenPGP keyserver network. The
other is based on DNS: each server publishes its DIME public key in a new DNS
resource record type, which (for security reasons) ought to be looked
up using DNSSEC. Thus, each server can look up the public key of its
peer in multiple ways, and verify that it generates an encrypted
session key matching the one included in the message before agreeing
to the message exchange.
So far, we have been using the term "public key" to describe the
DIME keys published for both mail servers and users, but DIME's actual
identity system is a bit more complicated than that. The credentials
used are called signets, and they include not just a public
key, but also a series of signatures and a set of fields describing
the DIME options, ciphers, and other settings supported by that user
or server. Since DIME's functionality places a great deal of trust in
domain-wide identity, each user signet has to be signed by the key for
the controlling organization.
DIME is, by any measure, a complex system. Interested users are
encouraged to read the full specification, which (naturally) goes into
considerably more detail than is feasible here. But by looking at
DIME constituent parts separately, it can be easier to follow the
overall design. The relevant fields of each message are encrypted
separately, and a copy of the decryption key for each field is
transmitted for each party that must decrypt the field for
processing. The per-party keys are published in a federated manner:
each mail domain is responsible for maintaining its own DIME DNS
records and keyserver, which places ultimate control of the
authentication scheme in the hands of the mail-server administrators,
not in a CA that can be compromised.
It is also noteworthy that the project seems to be taking pains to
consider how email providers and users might transition to
DIME—even if it is a wild success, there will necessarily be a
need for DIME users to interoperate with traditional email for many
years still to come. The new DNS records and the signet data format
include information that can be used to fall back to the most secure
alternative available, and several pieces of the overall architecture
are optional. Webmail providers, for example, could employ either the
"cautious" or "trustful" user-authentication models—the users
would have to decide if they indeed trust the provider enough to use
the service.
The DIME specification also examines a number of possible attack
scenarios against the new system, and shows how DIME is designed to
cope with such attacks. Public scrutiny will, of course, be required
before most potential adopters consider implementing the
architecture. For now, even Lavabit and Silent Circle have not yet
announced any intention to deploy DIME-based mail services. When they
do so, no doubt the offerings will attract a great many users
interested in testing the system.
The other major dimension to any widespread roll-out scenario is
acceptance of the DIME architecture by some appropriate standards
body. Levison told
Ars Technica that he intends to pursue eventual IETF approval via a
set of RFCs. That will be a slow process, though, starting when he
begins " That said, there is clearly considerable interest within the
technology community for the additional protections that DIME offers
beyond existing email encryption systems. The government surveillance
revealed in the Snowden case alarmed many a software developer (and
regular citizen), but the law-enforcement chase that followed
it—particularly where it affected Lavabit and Silent
Circle—was, in many ways, an even bigger call to arms for
privacy advocates.
Gnuplot is a program for creating plots, charts, and graphs that
runs on Linux as well as on a wide
variety of free and proprietary operating systems.
The purpose of a plot, in general, is to help to understand data or
functional relationships by representing them visually.
Some plotting programs, including gnuplot, may perform calculations and massage data,
which can also be convenient. Some data-plotting tools are complete solutions, standalone
programs that can be controlled through a command line, a GUI,
or both. Others exist as subsystems of various tools, or as
libraries available for a specific programming language.
This article will introduce a prominent example of the first
type. Gnuplot is one of the earliest open-source programs in wide use.
It's free enough to be packaged with Debian, for example, but has
an idiosyncratic license, with unusual
restrictions on how modifications to the source code may be distributed.
The name is not derived from the GNU project, with which it has no
particular relationship, but came about when the original authors, who had
decided on the name "newplot", discovered that this name was already
in use.
You may already be using gnuplot without knowing it. The
plotting facilities of Maxima, Octave, gretl, the Emacs graphing calculator, and statist,
for example, all use gnuplot. Most of gnuplot is written in C and is quite fast and memory-efficient.
Its output is highly customizable, and can be seen in a multitude of
scientific and technical publications. It's also a popular choice with
system administrators who want to generate graphs of server performance,
as it can be run from a script on a remote machine and forward its graphs
over X11, without having to transfer the usually voluminous data sets. The same
arrangement makes gnuplot useful for monitoring the progress of simulations
running on remote machines or clusters. Gnuplot has an interactive command-line prompt, can run script
files stored on disk, can be controlled through a socket connection
from any language, and has interfaces in everything from Fortran to Clojure.
There are also several GUI interfaces for gnuplot, including an Emacs mode,
that are not too widely used, since much of gnuplot's power arises from its
scriptability. Gnuplot is actively developed, with desirable new features added regularly.
If you have Octave or Maxima installed, then you already have gnuplot somewhere,
although you might not have a recent version. Binaries are probably available
from your distribution's package management system, but they are likely to lag approximately
one major version behind the shiniest. The solution is to follow the Download link from gnuplot headquarters
to get the source tarball of the latest stable release (or a pre-release
version if you can't live without some feature in development). A simple
./configure and make will get you a working gnuplot, but you
probably want to check for some dependencies first. Having the right packages installed before compiling gnuplot
will ensure that the resulting binary supports the "terminals"
that you want to use. In gnuplot land, a terminal is the form taken by the
output: either a file on disk or a (possibly interactive) display on the
screen. Gnuplot is famous for the long list of output formats that it
supports. You can create graphs using ASCII art on the console,
in a canvas on a web page, in various ways for LaTeX and ConTeXt,
as a rotatable, zoomable object in an X window,
for Tektronix terminals, for pen plotters, and much else, including
Postscript, EPS, PNG, SVG, and PDF. Support for most of this will happen without any special action
on your part. But you will want to make sure that you have compiled
in the highest quality, anti-aliased graphics formats, using the
Cairo libraries; this makes a noticeable difference in the quality
of the results. You will need to have the development libraries for
Cairo and Pango installed. On my Ubuntu laptop installation of
the packages libcairo2-dev and libpango1.0-dev are sufficient
for the latest stable (v. 4.6.6) gnuplot version. Pick up libwxgtk2.8-dev while
you're at it: it will add support for a wxWidgets interactive terminal
that's a higher quality alternative to the venerable X11 display.
Finally, if you envision using gnuplot with LaTeX, you might want the Lua
development package, which enables gnuplot's tikz terminal. Gnuplot comes with
extensive help. For extra information about any of the commands used below, try typing
"help command" at the gnuplot interactive prompt. For more, try the
official documentation [PDF], the many examples on the web,
or the two books about gnuplot:
one by Philipp K. Janert and one by me.
The command stanzas here can be entered as shown at the gnuplot prompt or saved in
a file and executed with: gnuplot file. Here is how to plot a pair of curves: The set ytics etc. commands create independent sets of tics and labels
on the two vertical axes. The final line illustrates the usual form of gnuplot's
2D plot command, and some of the program's support for special functions.
The axes parameters tell gnuplot what axis to associate with which curve,
lw is an abbreviation for "linewidth" (gnuplot's default is pretty thin), and
each curve has an individual title assigned, which is used in the automatically
generated legend. The sequence of colors used to distinguish the curves is chosen
automatically, but can, of course, be specified manually as well. Gnuplot also excels at all kinds of 3D plots. Here is a surface plot with contours
projected on the x-y plane. There is a vector field embedded in the surface as well. The set hidd front command has the effect of making the surface
opaque to itself but transparent to the other elements in the plot. The
set style command is an example of gnuplot's commands for defining
detailed styles for lines, arrows, and anything else that can be made into
a plot element. After this command is entered, arrowstyle 1 (or as 1)
can be referred to wherever we want a black arrow with a filled arrowhead. This script defines a function, f(x,y), using gnuplot's
ternary notation (with an embedded ternary form to implement two conditions)
in concert with NaNs, to skip a range of coordinates when
plotting. The function is used on the following line to plot the vector
field over only part of the surface. Two additional details may be worth noting in this example. First, in gnuplot, NaN (for "not a number") is a special value that you
can use in conditional statements where you want to disable plotting,
as we did here. You can also use "1/0" and some other undefined
values, but using NaN makes the code easier to understand. Second,
gnuplot's ternary notation is borrowed from C. In the statement B will be executed if A is true, otherwise C
will be executed. In order to have two conditions, as we have here, B
needs to be replaced by another ternary statement. The splot command is the 3D version of plot.
The part before the comma plots our Bessel function again, this time
as a surface depending on x and y.
The rest of it plots the vector field of a circular flow as an array of arrows
originating on the surface. Vector plotting uses gnuplot's data graphing syntax,
which refers to columns of data ($1 and $2 instead of x
and y). There are six components per vector, for the three spatial
coordinates on each side of the arrow. Finally, the every clause
skips some grid points to avoid crowding, and we invoke our defined arrow style at the end. Gnuplot can integrate with the LaTeX document processing system in several ways.
Most of these allow gnuplot to calculate and draw the graphic elements
while handing off the typesetting of any text within the plot
(including, of course, mathematical expressions) to LaTeX.
This is desirable because, first, TeX's typesetting algorithms produce superior
results, and, second, the labels that are typeset as part of the graph will
harmonize with the text of the paper in which it is embedded. The results
look like the figure here, which is a brief excerpt from an imaginary math textbook. Notice that the fonts used in the figure labels and the text in the paragraph
are the same — everything is typeset by LaTeX (even the numbers on the axes). There is a two-step procedure to produce this result. First, we create the figure
in gnuplot, using the cairolatex terminal: We've used LaTeX syntax for the labels. Running this through gnuplot
creates a file called fig3.tex, which we include in the LaTeX
document, listed in the Appendix. The final step is to process the document with pdflatex.
This is just one of several workflows for integrating gnuplot with LaTeX.
If you use tikz to draw diagrams in your LaTeX documents,
for example, you can extend it with calls to gnuplot from within the
tikz commands. Gnuplot and LaTeX share a family resemblance. They are both early
open-source programs that demand a certain amount of effort on the part of the user
to achieve the desired
results, but that repay that effort handsomely. They're both popular
with scientists and other authors of technical publications. Both programs
are unusually extensively documented by both their creators and a cadre
of third parties. And both systems, originating in an era of more anemic
hardware, do a great deal with a modest amount of machine memory.
Gnuplot has a good reputation for the ability to plot large data files
that cause most other plotting programs to crash or exhaust the available
RAM. Gnuplot can do more than just plot data and functions. It can perform
several types of data analysis and smoothing — nothing like a specialized
statistics platform, but enough to fit functions or plot a smoothed curve
through noisy data. To illustrate, we first need to create some noisy data.
The Appendix contains a
little Python program that will write the coordinates of a Gaussian curve
to a file, called rn.dat, with some pseudorandom noise added to the ordinates. Suppose we are presented with this data and we want to fit a function
to it. Since it looks bell-shaped to us, we'll attempt to fit a Gaussian.
That kind of curve has two parameters, its amplitude and its width, or
standard deviation. We could write a program to search the parameter
space of these two numbers to optimize the fit of the curve to the data,
or we could ask gnuplot to do it for us.
Gnuplot's built-in fitting routine is invoked like this: After typing that command into gnuplot's interactive prompt, it
will return its best guess for the free parameters a
and b, as well as its confidence in its estimates. It also
remembers the estimated values, so we can plot the fit function
on top of the data:
gets us this plot:
The pointtype specifier selects the style
of marker used in the scatterplot of the data. There is a different
list for every terminal type, which you can see by typing
test at the gnuplot prompt. We've selected a thick line width
(lw 5) and a black line color (lc 'black'). Gnuplot is endowed with some simple language constructs providing
blocks, loops, and conditional execution. This is enough to do
significant calculation without having to resort to external
programs. Using looping, you can create animations on the screen.
Try the following gnuplot script to get a rotating surface plot: The first line tells gnuplot not to delete the window after the
script is complete, which it will otherwise do if these commands are
not run interactively. The last line contains the loop that creates
the animation. The pause command adds a tenth of a second
delay between each frame. Gnuplot in the wild is not a rare encounter.
Its output can be found in
many of the math and science entries on Wikipedia; my article about calculating Fibonacci numbers;
the book Mechanics by Somnath Datta, an example of a complex text with
closely integrated intricate plots, using LaTeX and gnuplot; the book Modeling with Data: Tools and Techniques for Scientific Computing
by Ben Klemens, using gnuplot’s latex terminals;
and the free online text Computational Physics
by Konstantinos Anagnostopoulos, just to give a few examples.
In the system administrator field, check out the articles on benchmarking Apache,
graphing performance statistics on Solaris,
and using gnuplot with Dstat.
Gnuplot is a good choice if you have large data sets, if you prefer a
language-agnostic solution, if you need to automate your graphing,
and especially if you use LaTeX.
This protocol specification will not
be released as part of the initial publication of this
document
", followed by an assurance that a later release with
more details will follow.
Authenticating identities
What next
circulating the project’s specifications document among
members of the IETF at the group’s meeting this March
".
Plotting tools for Linux: gnuplot
Installation
Using gnuplot
set title 'Bessel Functions of the First and Second Kinds'
set samp 1000
set xrange [-.05:20]
set y2tics nomirror
set ytics nomirror
set ylabel 'Y0'
set y2label 'J0'
set grid
plot besy0(x) axes x1y1 lw 2 title 'Y0', besj0(x) axes x1y2 lw 2 title 'J0'
set samp 200
set iso 100
set xrange [-4:4]
set yrange [-4:4]
set hidd front
set view 45, 75
set ztics .5
set key off
set contour base
set style arrow 1 filled lw 3 lc 'black'
f(x,y) = x**2+y**2 < 2.0 ? x**2+y**2 > 0.5 ? besj0(x**2+y**2) : NaN : NaN
splot besj0(x**2+y**2), '++' using 1:2:(f($1,$2)):\
( -.5*sin(atan2($2,$1)) ):( .5*cos(atan2($2,$1)) ):(0)\
every 4:2 w vec as 1
A ? B : C
LaTeX support
set term cairolatex pdf
set out 'fig3.tex'
set samp 1000
set xrange [-4:4]
set key off
set label 1 '\huge$\frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}}$' at -3.5,.34
set label 2 '\Large$\sigma = 1$' at 0.95,.3
set label 3 '\Large$\sigma = 2$' at 2.7,.1
plot for [s=1:2] exp(-x**2/(2*s**2))/(s*sqrt(2*pi)) lw 3
set out
Analysis
fit a*exp(-b*x**2) 'rn.dat' via a,b
plot 'rn.dat' pointtype 7, a*exp(-b*x**2) lw 5 lc 'black'
set term wxt persist
set yr [-pi:pi]
set xr [-pi:pi]
end = 200.0
do for [a=1:end] {set view 70, 90*(a/end); splot cos(x)+sin(y); pause 0.1}
Conclusion
Page editor: Jonathan Corbet
Next page:
Security>>