Leading items
Welcome to the LWN.net Weekly Edition for June 29, 2017
This edition contains the following feature content:
- Ripples from Stack Clash: an examination of the Stack Clash vulnerability from several points of view.
- An introduction to asynchronous Python: a talk on how to do asynchronous programming in the Python language.
- ProofMode: a camera app for verifiable photography: ProofMode is a free camera app designed to make it easy for photographers to prove that their work is not faked.
- daxctl() — getting the other half of persistent-memory performance: a proposed new system call to help developers get the most out of persistent memory.
- CentOS and ARM: CentOS is thought of as an enterprise x86 distribution; now it is available on ARM systems as well.
- Distributing filesystem images and updates with casync: a look at Lennart Poettering's new image distribution tool.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Ripples from Stack Clash
In one sense, the Stack Clash vulnerability that was announced on June 19 has not had a huge impact: thus far, at least, there have been few (if any) stories of active exploits in the wild. At other levels, though, this would appear to be an important vulnerability, in that it has raised a number of questions about how the community handles security issues and what can be expected in the future. The indications, unfortunately, are not all positive.A quick review for those who are not familiar with this vulnerability may be in order. A process's address space is divided into several regions, two of which are the stack and the heap areas. The stack contains short-lived data tied to the running program's call chain; it is normally placed at a high address and it grows automatically (toward lower addresses) if the program accesses memory below the stack's current lower boundary. The heap, instead, contains longer-lived memory and grows upward. The kernel, as of 2010, places a guard page below the stack in an attempt to prevent the stack from growing into the heap area.
The "Stack Clash" researchers showed that it is possible to jump over this guard page with a bit of care. The result is that programs that could be fooled into using a lot of stack space could be made to overwrite heap data, leading to their compromise; setuid programs are of particular concern in this scenario. The fix that has been adopted is to turn the guard page into a guard region of 1MB; that, it is hoped, is too much to be readily jumped over.
Not a new problem
There are certain developers working in the security area who are quite fond of saying "I told you so". In this case, it would appear that they really told us so. An early attempt to deal with this problem can be found in this 2004 patch from Andrea Arcangeli, which imposed a gap of a configurable size between the stack and the heap. Despite being carried in SUSE's kernels for some time, this patch never found its way into the mainline.
In 2010, an X server exploit took advantage of the lack of isolation between the stack and the heap, forcing a bit of action in the kernel community; the result was a patch from Linus Torvalds adding a single guard page (with no configurability) at the bottom of the stack. It blocked the X exploit, and many people (and LWN) proclaimed the problem to be solved. Or, at least, it seemed solved once the various bugs introduced by the initial fix were dealt with.
In the comments to the above-linked LWN article (two days after it was published), Brad Spengler and "PaX Team" claimed that a single-page gap was insufficient. More recently, Spengler posted a blog entry in his classic style on how they told us about this problem but it never got fixed because nobody else knows what they are doing. The thing they did not do, but could have done if they were truly concerned about the security of the Linux kernel, was to post a patch fixing the problem properly.
Of course, nobody else posted such a patch either; the community can only blame itself for not having fixed this problem. Perhaps LWN shares part of that blame for presenting the problem as being fixed when it was not; if so, we can only apologize and try to do better in the future. But we might argue that the real problem is a lack of people who are focused on the security of the kernel itself. There are few developers indeed whose job requires them to, for example, examine and address stack-overrun threats. Ensuring that this problem was properly fixed was not anybody's job, so nobody did it.
The corporate world supports Linux kernel development heavily, but there are ghetto areas that, seemingly, every company sees as being somebody else's problem; security is one of those. The situation has improved a little in recent times, but the core problem remains.
Meanwhile, one might well ask: has the stack problem truly been fixed this time? One might answer with a guarded "yes" — once the various problems caused by the new patch are fixed, at least; a 1MB gap is likely to be difficult for an attacker to jump over. But it is hard to be sure, anymore.
Embargoes
Alexander "Solar Designer" Peslyak is the manager of both the open oss-security and the closed "distros" list; the latter is used for the discussion of vulnerabilities that have not yet been publicly disclosed. The normal policy for that list is that a vulnerability disclosed there can only be kept under embargo for a period of two weeks; it is intended to combat the common tendency for companies to want to keep problems secret for as long as possible while they prepare fixes.
As documented by Peslyak, the disclosure of Stack Clash did not follow that policy. The list was first notified of a problem on May 3, with the details disclosed on May 17. The initial disclosure date of May 30 was pushed back by Qualys until the actual disclosure date of June 19. Peslyak made it clear that he thought the embargo went on for too long, and that the experience would not be repeated in the future.
The biggest problem with the extended embargo, perhaps, was that it kept the discussion out of the public view for too long. The sheer volume on the (encrypted) distros list was, evidently, painful to deal with after a while. But the delay also kept eyes off the proposed fix, with the result that the patches merged by the disclosure date contained a number of bugs. The urge to merge fixes as quickly as possible is not really a function of embargo periods, but long embargoes fairly clearly delay serious review of those fixes. Given the lack of known zero-day exploits, it may well have been better to disclose the problem earlier and work on the fixes in the open.
That is especially true since, according to
Qualys, the reason for the embargo extension was that the fixes were
not ready. The longer embargo clearly did not result in readiness. There
was a kernel patch of sorts, but the user-space side of the equation is in
worse shape. A goal like "recompile all userland code with GCC's
-fstack-check option
" was never going to happen in a short period
anyway, even if -fstack-check were well suited to this application — which it currently is not.
There is a related issue in that OpenBSD broke the embargo by publicly committing a patch to add a 1MB stack guard on May 18 — one day after the private disclosure of the problem. This has raised a number of questions, including whether OpenBSD (which is not a member of the distros list) should be included in embargoed disclosures in the future. But perhaps the most interesting point to make is that, despite this early disclosure, all hell stubbornly refused to break loose in its aftermath. Peslyak noted that:
As was noted above, the "underlying issue" has been known for many years. A security-oriented system abruptly making a change in this area should be a red flag for those who follow commit streams in the hope of finding vulnerabilities. But there appears to be no evidence that this disclosure — or the other leaks that apparently took place during the long embargo — led to exploits being developed before the other systems were ready. So, again, it's not clear that the lengthy embargo helped the situation.
Offensive CVE assignment
Another, possibly discouraging, outcome from this whole episode was a
demonstration of the use of CVE numbers as a commercial weapon. It
arguably started with this
tweet from Kurt Seifried, reading: "CVE-2017-1000377 Oh you
thought running GRsecurity PAX was going to save you?
". CVE-2017-1000377,
filed by Seifried, states that the grsecurity/PaX patch set also suffers from
the Stack Clash vulnerability — a claim which its developers dispute.
Seifried has not said whether he carried out these actions as part of his
security work at Red Hat, but Spengler, at least, clearly sees a connection
there.
Seifried's reasoning appears to be based on this text from the Qualys advisory sent to the oss-security list:
The advisory is worth reading in its entirety. It describes an exploit against sudo under grsecurity, but that exploit depended on a second vulnerability and disabling some grsecurity protections. With those protections enabled, Qualys says, a successful exploit could take thousands of years.
It is thus not entirely surprising that
Spengler strongly denied that the CVE
number was valid; in his unique fashion, he made it clear that he
believes the whole thing was commercially motivated; "
Meanwhile Spengler, not to be outdone, filed
for a pile of CVE numbers against the mainline kernel, to the befuddlement of Andy Lutomirski, the author of
much of the relevant code. Spengler made it
appear that this was a retaliatory act and suggested that Lutomirski
talk to Seifried about cleaning things up. "
The CVE mechanism was created as a way to make it easier to track and talk
about specific vulnerabilities. Some have questioned its value, but there
does seem to be a real use for a unique identifier for
each problem. If, however, the CVE assignment mechanism becomes a
factory for mudballs to be thrown at the competition, it is likely to lose
whatever value it currently has. One can only hope that the community will
realize that turning the CVE database into a cesspool of fake news will do
no good for anybody and desist from this kind of activity.
Our community's procedures for dealing with security issues have been
developed over decades and, in many ways, they have served us well over
that time. But they are also showing some signs of serious strain. The
lack of investment in the proactive identification and fixing of security
issues before they become an emergency has hurt us a number of times and
will continue to do so. The embargo processes we have developed are
clearly not ideal and could use improvement — if we only knew what form
that improvement would take.
It is becoming increasingly apparent to the world as a whole that our
industry's security is not at the level it needs to be. Hopefully, that
will create some commercial incentives to improve the situation. But it
also creates incentives to attack others rather than fixing things at
home. That is going to lead to some increasingly ugly behavior; let us
just hope that our community can figure out a way to solve problems without
engaging in divisive and destructive tactics. Our efforts are much better
placed in making Linux more secure for all users than in trying to take
other approaches down a notch.
In his PyCon 2017 talk, Miguel
Grinberg wanted to introduce asynchronous programming with Python to
complete beginners. There is a lot of talk about asynchronous Python,
especially with the advent of the
asyncio module, but there are multiple ways to create
asynchronous Python programs, many of which have been available for quite
some time. In the talk, Grinberg took something of a step back from the
intricacies of those solutions to look at what asynchronous processing
means at a
higher level.
He started by noting that while he does a lot of work on the Flask Python-based web microframework,
this talk would not be about Flask. He did write the Flask
Mega-Tutorial (and a book on
Flask), but he would be trying to mention it less than ten times during
the talk—a feat that he managed admirably. He has also developed a
Python
server for
Socket.IO that started
out as something for "that framework", but has since "taken on
a life of its own".
He asked attendees if they had heard people say that "async makes your code
go fast".
If so, he said, his talk would explain why people say that. He started
with a simple definition of "async" (as "asynchronous" is often
shortened). It is one way of doing concurrent programming, which means
doing many things at once. He is not only referring to asyncio
here as there are many ways to have Python do more than one thing at once.
He then reviewed those mechanisms. First up was multiple processes, where
the operating system (OS) does all the work of multi-tasking. From
CPython (the reference Python implementation) that is the only way to use
all the cores in the system. Another way to do more than one thing at once
is by using
multiple threads, which
is also a way
to have the OS handle the multi-tasking, but Python's Global Interpreter
Lock (GIL) prevents multi-core concurrency. Asynchronous programming, on
the other hand, does not require OS participation. There is a single
process and thread, but the program can get multiple things done at once.
He asked: "what's the trick?"
He turned to a real-world example of how this works: a chess exhibition,
where a chess
master takes on, say, 24 opponents simultaneously. "Before computers
killed the fun out of chess", these kinds of exhibitions were done
regularly, but he is not sure if they still are. If each game takes around 30
move pairs to complete, the master would require
twelve hours to finish the
matches if they were played consecutively (at one minute per move pair).
By sequentially making moves in each game, though, the whole
exercise can be completed in an hour. The master simply makes a move at
a board (in, say, five seconds) and then goes on to the next, leaving the
opponent lots of time
to move before the master returns (after making 23 other moves). The
master will
"cream everyone" in that time, Grinberg said.
It is "this kind of fast" that people are talking about for async
programming.
The chess master is not optimized to go faster, the work is arranged so
that they do not waste time waiting. "That is the complete secret" to
asynchronous programming, he said, "that's how it works". In that case,
the CPU is the chess master and it waits the least amount of time possible.
But attendees are probably wondering how that can be done using just one
process and one thread. How is async implemented? One thing that is
needed is a way for functions to suspend and resume their execution. They
will suspend when they are waiting and resume when the wait is over. That
sounds like a hard thing to do, but there are four ways to do that in
Python without involving the OS.
The first way is with callback functions, which is "gross", he said; so
gross, in fact, that he was not even going to give an example of that.
Another is using generator
functions, which have been a part of Python for a long time. More
recent Pythons, starting with 3.5, have the async and await keywords,
which can be used for async programs.
There is also a third-party package, greenlet, that has a
C extension to Python to support suspend and resume.
There is another piece needed to support asynchronous programming: a
scheduler that keeps track of suspended functions and resumes them at the
right time. In the async world, that scheduler is called an "event loop".
When a function suspends, it returns control to the event loop, which finds
another function to start or resume. This is not a new idea; it is
effectively the same as "cooperative
multi-tasking" that was used in old versions of Windows and macOS.
Grinberg created examples
of a simple "hello world" program using some of the different mechanisms.
He did not get to all of them in the presentation and encouraged the
audience to look at the rest. He started with a simple synchronous example
that had a function that slept for three seconds between printing "Hello"
and "World!". If he called that in a loop ten times, it would take 30
seconds to complete since each function would run back to back.
He then showed two examples using asyncio. They were essentially
the same, but one used the @coroutine decorator for the function
and yield
from in the body (the generator function style), while the other used
async def for the
function and
await in the body. Both used the asyncio version of the
sleep() function to sleep for three seconds between the two
print() calls. Beyond those differences, and some boilerplate
to set up the
event loop and call the function from it, the two functions had the same
core as the original example. The non-boilerplate differences are by design;
asyncio makes the places where code suspends and resumes "very
explicit".
The two programs are shown below:
Running the program gives the expected result (three seconds
between the two strings), but it gets more interesting if you wrap the
function call in a loop. If the loop is for ten iterations, the result
will be ten "Hello" strings, a three-second wait, then ten "World!" strings.
There are other examples for mechanisms beyond
asyncio, including for greenlet and Twisted. The greenlet examples
look almost exactly the same as the synchronous example, just using a different
sleep(). That is because greenlet tries to make asynchronous
programming transparent, but hiding those differences can be a blessing and
a curse, Grinberg
said.
There are some pitfalls in asynchronous programming and people "always trip
on these things". If there is a task that requires heavy CPU use, nothing
else will be done while that calculation is proceeding. In order to let
other things happen, the computation needs to release the CPU
periodically. That could be done by sleeping for zero seconds, for example
(using await asyncio.sleep(0)).
Much of the Python standard library is written in blocking fashion,
however, so the socket, subprocess, and
threading modules (and other modules that use them) and even
simple things like time.sleep() cannot be used in async programs.
All of the asynchronous frameworks provide their own non-blocking
replacements for those modules, but that means "you have to relearn how to
do these things that you already know how to do", Grinberg said.
Eventlet and gevent, which are built on greenlet, both
monkey patch the
standard library to make it async compatible, but that is not what
asyncio does. It is a framework that does not try to hide the
asynchronous nature of programs. asyncio wants you to think about
asynchronous programming as you design and write your code.
He concluded his talk with a comparison of processes, threads, and async in
a number of different categories.
All of the techniques optimize the waiting periods; processes and threads
have the OS do it for them, while async programs and frameworks do it for
themselves. Only processes can use all cores of the system, however,
threads and async programs do not. That leads some to write programs that
combine one process per core with threads and/or async functions, which can
work quite well, he said.
Scalability is "an interesting one". Running multiple processes
means having multiple copies of Python, the application, and all of the
resources used by both in memory, so
the system will run out of memory after a fairly small number of
simultaneous processes (tens of processes are a likely limit), Grinberg
said. Threads are
more lightweight, so there can be more of those, on the order of hundreds.
But async programs are "extremely lightweight", such that
thousands or tens of thousands of simultaneous tasks can be handled.
The blocking standard library functions can be used from both processes and
threads, but not from async programs. The GIL only interferes with
threads, processes and async can coexist with it just fine. But, he noted,
there is only "some" interference from the GIL even for threads in his
experience; when
threads are blocked on I/O, they will not be holding the GIL, so the OS
will give the CPU to another thread.
There are not many things that are better for async in that comparison.
The main advantage to asynchronous programs for Python is the massive
scaling they allow, Grinberg said. So if you have servers that are going
to be super busy
and handle lots of simultaneous clients, async may help you avoid going
bankrupt from buying servers. The async programming model may also be
attractive for
other reasons, which is perfectly valid, but looking strictly at
the processing advantages shows that scaling is where async really wins.
A
YouTube video of
Grinberg's talk is available; the Speaker
Deck slides are similar, but not the same as what he used.
[I would like to thank The Linux Foundation for travel assistance to
Portland for PyCon.]
The default apps on a mobile platform like Android are familiar
targets for replacement, especially for developers concerned about
security. But while messaging and voice apps (which can be replaced by
Signal and Ostel, for
instance) may be the best known examples, the non-profit Guardian
Project has taken up the cause of improving the security features
of the camera app. Its
latest such project is ProofMode, an app to let users take
photos and videos that can be verified as authentic by third parties.
Media captured with ProofMode is combined with metadata about the source
device and its environment at capture time, then signed with a
device-specific private PGP key. The result can be used to attest that the
contents of the file have not been retouched or otherwise tampered with,
and that the capture took place when and where the user says it did.
For professional reporters or even citizen journalists capturing sensitive imagery, such an attestation
provides a defense against accusations of fakery — an all-too-common
response when critiquing those in positions of power. But making that goal
accessible to real-world users has been a bit of a challenge for the
Guardian Project.
It is widely accepted that every facet of digital photography
has both an upside and a downside. Digital cameras are cheap and carry no film
or development costs, but digital images are impermanent and easily
erased. Instant cloud storage and online sharing make media distribution
easy, but do so at the cost of privacy and individual ownership. Perhaps nowhere is the
dichotomy more critical, however, than in the case of news
photography. Activists, journalists, and ordinary citizens have
documented important world events using the cameras in their mobile
devices, capturing everything from political uprisings to sudden acts of
unspeakable violence. The flipside, though, is that the authenticity
of digital photos and videos is hard to prove, and detractors are wont to
dismiss any evidence that they don't like as fake.
Improving the verification situation was the goal of the Guardian
Project's 2015 app, CameraV. The app provided a rather complex
framework for attesting to the untampered state of recorded images,
which the team eventually decided was inhibiting its adoption by
journalists, activists, and other potential users. ProofMode is an
attempt to whittle the CameraV model back to its bare essentials.
Nevertheless, a quick look at CameraV is useful for understanding the approach.
CameraV attests to the unmodified state of an image by taking a
snapshot of the device's sensor readings the same instant that the photograph is
taken. The sensor data recorded for the snapshot is user-configurable,
consisting of geolocation data (including magnetometer readings, GPS, and network
All of this metadata is stored in JSON
Mobile Media Metadata (J3M)
format and is appended to the image file, a process termed "notarization".
The file is then MD5-hashed
and the result signed with the user's OpenPGP key. CameraV provides
another Android Intent service to let users verify the hash on any
CameraV-notarized image they receive.
The signature can be published with the image, enabling third
parties to verify that the metadata matches what the photographer
claims about the location and context of the image. In theory, some
of that metadata (such as nearby cell towers) could also be verified
by an outside source. The app can also generate a short SHA-1 fingerprint of the signed file intended to be sent out separately.
This fingerprint is short enough to fit into an SMS message, so that users can
immediately relay proof of their recording, even if they do not have a
means to upload the image itself until later.
Users can share their digitally notarized images to
public services or to publish them over Tor to a secure server that
the user controls.
CameraV takes a number of steps to ensure that images
are not altered while on the user's device, lest the app then be used
to create phony attestations and undermine trust in the system.
First, the
MD5 hash of image or video that is saved alongside the device-sensor
metadata is computed over the raw pixel data (or raw
video frames), as a mechanism to protect against the image being faked
using some other on-camera app before the user publishes it for
consumption. Second, the full internal file path of the raw image file is saved
with the metadata, which serves as a record that the CameraV app is the source of
the file. Third, app-specific encrypted storage is used for the device's
local file storage — including the media, the metadata, and key material.
Finally, the OpenPGP key used is specific to the app
itself. The key is generated when the user first sets up CameraV; the
installer prompts the user to take a series of photos that are used as
input for the key-generation step.
CameraV's design hits a lot of the bullet points that
security-conscious developers care about, but it certainly never
gained a mass following. Among other stumbling blocks, the user had to decide
in advance to use the CameraV app to record any potentially sensitive
imagery. That might be fine for someone documenting human rights
violations as a full-time job, but is less plausible for a spur-of-the-moment
incident — and it does not work for situations where the user only
realizes the newsworthiness of a photo or video after the fact. In
addition, there may be situations where it is genuinely
harmful to have detailed geolocation information stored in a
photo, so using CameraV for all photos might frighten off some
potential users.
Consequently, in 2016 the Guardian Project began working on a sequel of
sorts to CameraV. That effort is what became ProofMode, which was first
announced
to the public on the project's blog in February 2017. The
announcement describes ProofMode as a "reboot," but it is worth noting
that CameraV remains available (through the Google Play Store as well
as through the F-Droid repository) and is still being updated.
ProofMode essentially turns CameraV's metadata-recording process
into a background service and makes its available to the user as a
"share"
action (through Android's Intent API). When any media is
captured with any camera app, ProofMode takes a snapshot of the device
sensor readings. The user then has the option of choosing "Share
Proof" from their camera app's sharing menu.
At present, ProofMode offers three sharing options: "Notarize Only"
(which shares only the SHA-1 fingerprint code), "Share Proof Only"
(which shares a signed copy of the metadata files), and "Share
In March, Bruce Schneier posted
about ProofMode on his blog, which spawned a series of in-depth
questions in the comment section.
As might be expected on such a public forum, the comments
ranged from complaints about the minutia of the app's approach to
security to bold assertions that true authentication on a mobile
device is unattainable.
Among the more specific issues, though, the commenters criticized ProofMode's
use of unencrypted storage space, its practice of extracting the PGP private key into
RAM with the associated passphrase, and how the keys are generated on
the device. There were also some interesting questions about how a
malicious user might be able to generate a fake ProofMode notary file
by hand.
The Guardian Project's Nathan Freitas responded
at length to the criticism in the comment thread, and later
reiterated
much of the same information on the Guardian Project blog. As to the
lower-level security steps, he assured commenters that the team knew
what it was doing (citing the fact that Guardian Project ported Tor to
Android, for example) and pointed to open issues on the ProofMode bug
tracker for several of the enhancements requested (such as the use of
secure storage for credentials).
On other issues, Freitas contended that there may simply be a valid
difference of opinion. For example, the on-device generation of key
pairs may seem less than totally secure, but Freitas noted that the
keys in question are app-specific and not designed for use as a long-term user
identity. " Android also provides some APIs that can protect against
tampering. Freitas said that the project has already integrated the
SafetyNet
API, which is used to detect if the app is running in an emulator
(although ProofMode does not block this behavior; it simply notes it
in the metadata store). In the longer term, the team is also exploring implementing
stronger security features, such as more robust hashing mechanisms or
the Blockchain-based OpenTimestamps.
Ultimately, however, complexity is the enemy of growing a broad
user base, at least from the Guardian Project's perspective. Freitas
told the Schneier commenters that the goal is to provide notarization
and security for " Given all the talk about recording sensor input and geolocation
information, a privacy-conscious user might well ask whether or not
CameraV and ProofMode take a step backward for those users who are
interested in recording sensitive events but are also legitimately
worried about being identified and targeted for their trouble. This
is a real concern, and the Guardian Project has several approaches to
addressing it.
The first is that CameraV and ProofMode both provide options for
disabling some of the more sensitive metadata that can be captured.
For now, that includes the network information and geolocation data.
Second, potentially identifiable metadata like Bluetooth device MAC addresses are
not recorded in the clear, but only in hashed form. And the project
has an issue
open to allowing wiping ProofMode metadata files from a device.
For the extreme case, however — when a user might want to
completely sanitize an image of all traceable information before
That anonymizing app is called ObscuraCam.
It automatically removes geolocation data and the device make and
model metadata from any captured photo. It also provides a mechanism
for the user to block out or pixelate faces, signs, or other areas of
the image that might be sensitive.
At the moment, it is not possible to use ObscuraCam in conjunction
with ProofMode (attempting to do so crashes the ProofMode app), but the precise
interplay between the two security models likely would require some
serious thought anyway. Nevertheless, if anonymity is of importance,
it is good to know there is an option.
In the final analysis, neither CameraV nor ProofMode is of much
value if it remains merely a theoretical service: it has to be usable to
real-world, end-user human beings. In my own personal tests, CameraV
is complex enough that it is little surprise that it has not been
adopted en masse. The first step after installation requires
the user to set up a "secure database," the preferences screen is not
particularly user-friendly, and the sharing features are high on
detail but light on interface polish.
On the other hand, ProofMode makes serious
strides forward in ease-of-use but, at present, it lacks the built-in
documentation that a new user might require in order to make the right
choices. If one has not read the ProofMode blog posts, the sharing
options ("Notarize Only" and "Share Proof Only") might not be easy to
decipher. Obviously, the project is still in pre-release mode,
though, so there is plenty of reason to believe that the final version
will hit the right notes.
Readers with long memories might also recall that the
CameraV–ProofMode saga marks the second time that the Guardian Project
developed a security app only to later refactor the code into a
system service. The first instance was PanicKit, a framework for
erasing device data from multiple apps that grew out of the project's
earlier storage-erasing app InTheClear.
Freitas calls this a
coincidence, however, rather than a development trend. With PanicKit,
he said, the goal was to develop a service that third-party app
developers would find useful, too. ProofMode, in contrast, was merely
a simplification of the original concept designed to meet the needs of
a broader audience. Regardless of how one looks at it, though, most
will likely agree that if security features come built into the
operating system at a lower level — eliminating the need to choose
"secure apps" or "insecure apps" — then the end users will benefit in the end.
The "DAX" mechanism allows an application to map a file in
persistent-memory storage directly into its address space, bypassing the
kernel's page cache. Thereafter, data in the file can be had via a
pointer, with no need for I/O operations or copying the data through
RAM. So far, so good, but there is a catch: this mode really only works
for applications that are reading data from persistent memory. As soon as
the time comes to do a write, things get more complicated. Writes can
involve the allocation of blocks on the underlying storage device; they
also create metadata updates that must be managed by the filesystem. If
those metadata updates are not properly flushed out, the data cannot be
considered properly written.
The end result is that applications performing writes to persistent memory
must call fsync() to be sure that those writes will not be lost.
Even if the developer remembers to make those calls in all the right
places, fsync() can create an arbitrary amount of I/O and, thus,
impose arbitrary latencies on the calling application. Developers who go
to the trouble of using DAX are doing so for performance reasons; such
developers tend to respond to ideas like "arbitrary latencies" with poor
humor at best. So they have been asking for a better solution.
That is why Dan Williams wrote in the introduction to this patch series that "
Williams's proposal to implement this approach requires a couple of steps.
The first is that the application needs to call fallocate()
to ensure that the file of interest actually has blocks allocated in
persistent memory. Then it has to tell the kernel that the file is to be
accessed via DAX and that the existing block allocations cannot be changed
under any circumstances. That is done with a new system call:
Here, path indicates the file of interest, flags
indicates the desired action, and align is a hint regarding the
size of pages that the application would like to use. The
DAXFILE_F_STATIC flag, if present, will put the file into the "no
changes allowed mode"; if the flag is absent, the file becomes an ordinary
file once again. While the static mode is active, any operation on the
file that would force metadata changes (changing its length with
truncate(), for example) will fail with an error code.
The implementation of this new mode would seem to require significant
changes at the filesystem level, but it turns out that this functionality
already exists. It is used by the swap subsystem which, when swapping to
an ordinary file, needs to know where the blocks allocated to the file
reside on disk. There are two pieces to this mechanism, the first of which
is this address_space_operations method:
A call to bmap() will return the physical block number on which
the given sector is located; the swap subsystem uses this
information to swap pages directly to the underlying device without
involving the filesystem. To ensure that the list of physical blocks
corresponding to the swap file does not change, the swap subsystem sets the
S_SWAPFILE inode flag on the file. Tests sprinkled throughout the
virtual filesystem layer (and the filesystems themselves) will block any
operation that would change the layout of a file marked with this flag.
This functionality is a close match to what DAX needs to make direct writes
to persistent memory safe. So the daxctl() system call has simply
repurposed this mechanism, putting the file into the no-metadata-changes
mode while not actually swapping to it.
Christoph Hellwig was not slow to register his opposition to this idea. He
would rather not see the bmap() method used anywhere else in the
kernel; it is, in his opinion, broken in a
number of ways. Its use in swapping is also broken, he said, though
"
An alternative approach, proposed by Andy
Lutomirski, has been seen before: it was raised (under the name
MAP_SYNC) during the "I know what I'm
doing" flag discussion in early 2016. The core idea here
is to get the filesystem to transparently ensure that any needed metadata
changes are always in place before an application is allowed to write to a page
affected by those changes. That would be done by write-protecting the
affected pages, then flushing any needed changes as part of the process of
handling a
write fault on one of those pages. In theory, this approach would allow
for a lot of use cases blocked by the daxctl() technique,
including changing the length of files, copy-on-write semantics, concurrent
access, and more. It's a seemingly simple idea that hides a lot of
complexity; implementing it would not be trivial.
Beyond implementation complexity, MAP_SYNC has another problem: it
runs counter to the original low-latency goal. Flushing out the metadata
changes to a filesystem can be a lengthy and complex task, requiring
substantial amounts of CPU time and I/O. Putting that work into the
page-fault handler means that page faults can take an arbitrarily long
amount of time. As Dave Chinner put it:
There was some discussion about how the impact of doing metadata updates in
the page-fault handler could be reduced, but nobody has come forth with an
idea that would reduce it to zero. Those (such
as Hellwig) who support the MAP_SYNC approach acknowledge that
cost, but see it
as being preferable to adding a special-purpose interface that brings its
own management difficulties.
On the other hand, this work could lead to improvements to the swap
subsystem as well, making it more robust and more compatible with
filesystems (like Btrfs) whose copy-on-write semantics work poorly with the
"no metadata changes" idea. There is another use case for this
functionality: high-speed DMA directly to persistent memory also
requires that the filesystem not make any unexpected changes to how the
file is mapped. That, and the relative simplicity of Williams's patch, may
help to push the daxctl() mechanism through, even though it is not
universally popular.
Arguably, the real lesson from this discussion is that persistent memory is
not a perfect match to the semantics provided by the Unix API and current
filesystems. It may eventually become clear that a different type of
interface is needed, at least for applications that want to get maximum
performance from this technology. Nobody really knows what that interface
should look like yet, though, so the current approach of trying to retrofit
new mechanisms onto what we have now would appear to be the best way
forward.
The CentOS distribution has long been
a boon to those who want an enterprise-level operating system without an
enterprise-level support contract—and the costs that go with it. In
keeping with its server orientation, CentOS has been largely focused on
x86 systems, but that has been changing over the last few
years. Jim Perrin has been with the project since 2004 and his talk at Open
Source Summit Japan (OSSJ) described the process of making CentOS
available for the ARM server market; he also discussed the status of that
project and some plans for the future.
Perrin is currently with Red Hat and is the maintainer of the CentOS 64-bit
ARM (aarch64) build. CentOS is his full-time job; he works on building the
community around CentOS as well as on some of the engineering that goes
into it. His background is as a system administrator, including stints
consulting for the defense and oil industries; with a bit of a grin, he
said that he is
"regaining a bit of my humanity" through his work at Red Hat on CentOS.
The initial work on CentOS for ARM started back in the CentOS 6 days
targeting 32-bit ARMv6 and ARMv7 CPUs. That distribution is now six or
seven years old and it was already old when the developers started working
on an ARM version of it. The software in CentOS 6 was simply too old
to effectively support ARM, Perrin said. The project ended up with a
distribution that mostly worked, but not one it was happy to publish. It
improperly mixed Fedora and RHEL components and was not up to the project's
standards, so that build was buried.
In January 2015, which was after CentOS 7 was released, the project
restarted using that base but targeting aarch64. There was "lots more
support for ARM" in that
code base, he said. After about six months, there was a working version of
the distribution that he and other project members were happy with, so it
was time to give the community access to it. Unfortunately, 64-bit ARM
chips were not widely available in July 2015, so the project needed to
decide where it wanted to go with the distribution.
There are multiple parts of the CentOS community, each of which has
its own needs. Hardware vendors are the first and foremost members of the
community, because they must create the hardware that all of the others
will use. If the hardware does not work well—or CentOS doesn't work well
on it—no one will be interested in it.
The second group is the business partners of the hardware vendors. These
are early adopters that get the hardware from the vendors and want to "kick
things around" to see that the hardware is working for their use cases.
CentOS needs to be able to provide help and support for these companies.
There are also early adopters who are not affiliated with the hardware
vendors. They buy and break new hardware and are particularly vocal on
social media. They will let it be known that they have this new
hardware and what software is or isn't being supported on it. They have
opinions and a project needs to take care of their needs, he said.
A group that is somewhat similar to early adopters is the maker community.
The difference is that early adopters are going to try out business use
cases using the system, while the makers will "blast it into space" or run
it at the bottom of a lake. Any folks that do "that level of
weird things with the hardware" deserve their own group, Perrin said.
Then there are the slower-moving parts of the community. Businesses will
typically allow others to work out the bugs in the hardware and software
before starting to use it; they have "a more cautious approach", he said. The
last group is the end users, who are system administrators and others whose
boss bought the hardware; they may not be particularly pleased about using
the distribution, but they need to get work done so it is important to try
to make their jobs easier.
Some of these communities are "more equal than others", which sounds
backwards or odd coming from a community person, Perrin said. But what he
is really talking about is timing; you don't need to worry about makers, say,
until there is working hardware available. So CentOS needed to take a
tiered approach to supporting its various communities.
It all starts with the hardware, naturally. Working with some of the
larger vendors on building the distribution for their aarch64 server
prototypes was the first step. That was facilitated by the unannounced
arrival of
hardware at his house. That was "fantastic, but really surprising". From
the audience, Jon Masters, who had arranged for some of those shipments,
jokingly warned attendees: "don't tell me your address". With a grin,
Perrin said: "my
electric bill does not thank you".
CentOS started by working with AppliedMicro; that was used as the reference
platform starting in March 2015. After that, the project also worked with
Cavium, Qualcomm, AMD, and
some other vendors that are not public.
Once the hardware is supported, it is time to move on to the early
adopters. It was not practical to work with the hardware vendors' business
partners as it is not his job to manage those kinds of relationships, he
said. But early adopters are different; CentOS wanted to work with folks
who are going to be loud about using the distribution. From those efforts,
the project learned about some optimizations for aarch64 that were not
working well for some users, for example.
One of the biggest things that helped with that was working with the Fedora
and Extra Packages for
Enterprise Linux (EPEL) communities to get the EPEL
packages working for aarch64. Those packages are valuable for day-to-day
work on servers, he said. The CentOS project focused on making the
hardware work and making a base set of packages, then getting out of the
way. The EPEL group has been "fantastic at packaging up things they think
people will need".
Part of the process of creating the distribution is figuring out what
software the community wants. The short answer turns out to be that it wants
"containers and virtualization". So one of the early projects was to get
docker (with a small "d", "not with a large 'D' that is now
trademarked", he said) running on aarch64. Docker is written in Go, which
meant that the golang package needed to be built.
When the process started, though, the released version of golang was 1.4,
which did not support aarch64. The project had to bootstrap a build of 1.5
beta using the 1.4 compiler in a container on an x86_64 system. That
"failed immediately" because of a calculation done by docker to determine
the page
size. It is 4KB on x86_64, but 64KB on CentOS aarch64. That got fixed
(and upstreamed) and CentOS was able to build docker by late 2015 or early
2016.
The availability of docker started to accelerate other development on
CentOS for ARM. For example, Kubernetes is being ported. The same
page-size problem cropped up there, but the Kubernetes developers are quite
receptive to patches. Kubernetes 1.4 is "not 100% baked yet" for the
distribution but is getting there.
On the virtualization side, users wanted OpenStack. They wanted to be able
to do virtualization and virtualization management on ARM. As it turns out,
though, the bulk of the need for OpenStack was for network
function virtualization (NFV), rather than wanting OpenStack for its
own sake. OpenStack is just a stepping stone to NFV, he said. The process
of porting OpenStack is under active development right now.
The overall goal for the CentOS on ARM project is to "get to boring". The
idea is that the distribution works just like every other distribution on
every other architecture. For some platforms, it has been difficult at
times to get to that point. There is a mindset of how software works in
the embedded world that doesn't translate well to the server world. If the
story is that "this system boots this way, this other one that way", it
will not sit well with customers.
So a lot of work was put into community building within the hardware vendor
community regarding standards. The idea is that ARM, Intel, AMD, and others all
need to work
the same way, install the same way, boot the same way, and so on. That
means support for PXE, UEFI, ACPI, and so on. There is something of a
balance required, though, because at the same time he is beating on the
vendors to
standardize, he is also asking them to provide hardware for makers and
other early adopters.
At this point, there is a functional base distribution that matches what there
is on the x86 side. The next step is to get things like Kubernetes and
OpenStack working; after that is less clear. He is no longer a system
administrator, so he is not attuned to what users may want and need. Part
of coming to OSSJ was to hopefully gather some feedback on what users would
like to see. He can take that back to Red Hat engineering as input for
upcoming plans. Maybe there are tools and technologies that CentOS doesn't
even know about that need to be added to the ARM ecosystem; he encouraged
attendees to let him know what they are.
In answer to an audience question, Perrin said that installing CentOS for
ARM is straightforward, much like the process for x86: download an ISO
image, boot that from a USB drive or via PXE. Instructions are available
on the wiki. That is for supported 64-bit ARM hardware; for 32-bit
hardware, like Raspberry Pi, a specialized image is needed for the
platform. The 64-bit Raspberry Pi 3 (RPi3) will ostensibly be supported in
three months or so, he said, once the U-Boot bootloader gets UEFI support.
Masters spoke up to note that Perrin is "stuck with" some of the decisions
that Masters made. One of those is the 64KB page size, which is good for
servers but not as good for embedded use cases like Raspberry Pi. Red Hat
(where Masters also works) is focused on the server market where the larger page
size makes a lot of sense.
Some ARM distributions did not think about that, he said, and will be stuck
with 4KB pages that are more suited to embedded use cases.
There are some other hardware choices, which have a similar price point to the RPi3,
that could be used for development and testing, Perrin said in answer to
another question. The ODROID C2 and C3 boards
have 64-bit ARM CPUs, but there is a "giant caution flag" for those kinds
of systems. Since the changes for those boards have not been pushed to the
upstream kernel, users will be running the CentOS user space with a vendor
kernel. That may be just fine, but there have been occurrences in the past
where vendor kernels have had problems—a remote root hole in one case.
If you want online hardware, Perrin suggested Packet, where you can get a bare-metal
aarch64 system. It is "kind of like AWS" but with ARM hardware.
When asked about 96Boards, Perrin
said the company has an "array of good hardware" that doesn't do what is
needed for CentOS. The HiKey board is the best, but there are some
implementation issues that cause CentOS difficulties and the DragonBoard
410c does not have the right
bootloader for CentOS. As Masters put it, the right answer is to spend
$1000 to get a real server.
The final question was about whether CentOS is talking with other ARM
distributions that do not emanate from Red Hat. Perrin said there is a
cross-distribution mailing list; he doesn't see eye to eye with the others
on it all the time, but that's true with his colleagues at Red Hat too at
times. Driving standards for the hardware helps everyone and the people on
the list are trying to do that. That is part of why there has been some
effort into supporting CentOS on RPi3; everyone has one, so it is a good
way to open
up development without having to tell interested people to go buy a $1000
server.
[I would like to thank the Linux Foundation for travel assistance to Tokyo
for Open Source Summit.]
Recently, Lennart Poettering announced
a new tool
called casync for
efficiently distributing filesystem and disk images.
Deployment of virtual machines or containers often requires such an image
to be distributed for them.
These
images typically contain most or all of an entire operating system and its
requisite data
files; they
can be quite large. The images also often need
updates, which can take up considerable bandwidth depending on how
efficient the update mechanism is. Poettering developed casync as an
efficient tool for distributing such filesystem images, as well as for
their updates. Poettering found that none of the existing system image delivery
mechanisms suited his
requirements. He wanted to conserve bandwidth when
updating the images, minimize disk space usage on the server and on
clients, make downloads work well with content delivery networks (CDNs),
and for the mechanism to be simple to use. Poettering considered Docker's
layered
tarball, OSTree's direct file
delivery via HTTP with packed deltas for updates, and other systems that
deliver entire filesystem images. Docker's approach of "layers" of updates on top of an initial tarball
required tracking revisions and history, which Poettering believes a
deployment should not be burdened with. OSTree's method of serving
individual files would be detrimental to the performance of content
distribution networks if there were a plethora of small files, as
synchronization will hammer the CDN with multiple HTTP GET
requests. Delivering entire filesystem images repeatedly for every update
would be an unacceptably high use of bandwidth and server disk space, even
though the delivery would be simple to implement. In the end, Poettering
decided that, while existing systems have their merits, he had to roll his
own solution optimized for the use case of filesystem image delivery with
frequent updates. Casync was inspired by rsync (which
copies and syncs files based on deltas) and Git (which provides
content-addressable storage based on
hashing). Casync can be used to distribute directory trees as well as raw disk
images. When operating on directories, all data in the target directory is
serialized into a stream of bytes, much like the tar
utility does. Poettering created his own serialization as the output of
tar varies from implementation to implementation; he required
consistent output without being dependent on any
particular flavor of tar.
When invoked on either a directory or a
disk image, casync will create a repository of data that mirrors the
original, but reorganized such that it is broken into data chunks of
similar size, stored inside a directory called a "chunk store", together
with an index file that stores the metadata for the repository. The directory
and index file can both be served via a web server (or with
another network file transfer protocol) to a client, which can use casync
to reassemble the original data. Casync works by chunking the target data from a stream of bytes into
into a set of variable-sized chunks, though the
sizes do not vary by much. Chunking helps reduce the bandwidth consumed
when a user synchronizes their repository, since the index file can be used
to determine which chunks have changed or been added, and only those chunks need
to be downloaded. The chunking algorithm will create the same chunks for the same data,
even at different offsets. To accomplish this, casync makes use of a cyclic
polynomial
hash (also known as Buzhash) to find the offsets for identical
data. Buzhash is a hash
The basic idea of Buzhash
is that, given two strings of data where one may contain the other, it is
possible to search for the target by looking at a few
bytes (called the window) at every
byte offset and hashing them (this is called a "rolling hash"). The resulting
hash is compared against the hash of
a search key of the same size as the window; a match provides a strong
indicator that the rest of the string might also match the other data, and
a full hash at the indicated offset can be executed to confirm
it. An advantage of a rolling hash is that it can be performed quickly
since the next hash can be generated from the previous one by a
computationally cheap
operation. When chunking for the first time, the Buzhash algorithm is used with a
48-byte window across the stream, moving one byte at a time and calculating
a hash. A chunk boundary is placed whenever the value of the calculated
hash, h satisfies the following equation:
The constant k is chosen to reflect the intended average chunk
size. Assuming an even distribution of the hash h, the probability
that the function h mod k will yield a specific value, in this case
k-1, is 1/k. Therefore, the probability is such that for roughly
every k bytes read, the equation will evaluate to true and a
boundary is placed. To guarantee the chunks are neither too small or too
large, there are hard limits on the chunk size enforced by
the algorithm. The implementation is
similar to what rsync does when chunking, except that rsync uses a smaller
window, works with individual files rather than a serialized directory
tree, and the hash
algorithm used is the Adler-32 checksum. Each time a chunk is created, its contents are hashed with SHA-256 to create a digest
for the chunk, which is then recorded in an index together with the chunk
size Serializing directories requires the preservation of metadata such
as ownership and access control lists (ACLs). A user can specify what metadata
to save in the chunk archive when running the tool. Casync will
also store extended attributes, file capabilities, ACLs, Linux
chattr file attributes, and FAT file attributes. Casync
recognizes pseudo-filesystems such as
/proc and sysfs; it will not include them when creating
an archive. Additionally, if the underlying filesystem supports reflinks, which save space by sharing disk
blocks for identical files
(with copy-on-write semantics), then casync can take advantage
of this; instead of creating identical files, it will reflink them
instead. Casync supplies a FUSE facility for read-only mounting of
filesystem or disk images directly from the HTTP source. There are packages for casync created by third parties available for Ubuntu,
Arch Linux, and
Fedora. I
tried it out by compiling it from the GitHub repository, which
requires the Meson build system. The
installed binaries let you create chunked repositories and reconstruct
them, both locally and over HTTP, FTP, or SFTP. The README
contains a list of commands you can run to try out the various features of
casync. Casync is not intended as a replacement for rsync or zsync, as it is
more for filesystem delivery than fine-grained file-based backup. It also
does not attempt to find the most optimal deduplication and smallest
deltas, but has a "good enough" heuristic to save bandwidth and storage. It is
a welcome addition in the space of filesystem delivery, where something
like rsync would be useful, but the fine-grained, per-file granularity is
not required. Poettering has stated that he has "concrete plans" for adding encryption
to casync, so that it could be used as a backup tool like restic, BorgBackup, or Tarsnap. He also intends to automate
GPG validation of data, so that chunks can be signed and verified without
user intervention. Casync does not
expose an API for third party tools, although it is designed to be able to
do so eventually. This will enable things such as GNOME's GVfs to access
casync repositories, and make it modular enough so that components like the
HTTP delivery mechanism can be replaced with customized
implementations.
Other plans are support for local network caches of
chunks and automated home-directory synchronization.
Casync only works on
Linux at the moment, but Poettering says he is open to accepting patches
for portability that do not interfere with the fundamental operation of
casync. Currently, casync is developed mainly by Poettering, with a few
other contributors. The project is not yet completely stable,
although it is usable and has many features implemented already. There may
be changes to the file formats down the road, so any index or serialized
files made with the current version might break in the future. Casync is a new option to complement tools like rsync, which may prove
useful to anyone who needs to distribute large filesystem images that
also need to be regularly updated. The granularity of "chunks" that
casync uses is reminiscent of BitTorrent, but the fact that it is
network
protocol independent should make the distribution of data friendlier to
firewalls and content distribution networks. It should be a useful tool for
cloud providers, software distributions, developers sharing customized
virtual machine images, and anyone else who needs an efficient way of
providing large and constantly updated bundles of data. [I would like to thank Lennart Poettering for his help in clarifying
some of the inner workings of casync.]
this taints the
CVE process
", he said. Seifried defended
the CVE as "
legitimate
" but suggested that he was getting
tired of the whole show and might give up on it.
I am certain he will
treat a member of upstream Linux the same as I've been treated,
as he is a very professional and equitable person.
"
In conclusion
An introduction to asynchronous Python
Chess
Examples
# async/await version
import asyncio
loop = asyncio.get_event_loop()
async def hello():
print('Hello')
await asyncio.sleep(3)
print('World!')
if __name__ == '__main__':
loop.run_until_complete(hello())
# @coroutine decorator version
import asyncio
loop = asyncio.get_event_loop()
@asyncio.coroutine
def hello():
print('Hello')
yield from asyncio.sleep(3)
print('World!')
if __name__ == '__main__':
loop.run_until_complete(hello())
Pitfalls
Comparison
ProofMode: a camera app for verifiable photography
CameraV
location information), accelerometer readings, and environmental
sensors (such as ambient light, barometric pressure, and
temperature). Network device state, such as the list of visible
Bluetooth devices and WiFi access points, can optionally be included as
well. In addition, the standard Exif image tags (which include the
make and model of the device as well as camera settings) are
recorded. A full list is provided in the CameraV user's guide.
Rethinking the complexity issues
Proof with Media" (which appends the metadata to the media file and
signs the result, as in the CameraV case). Whichever option the user
chooses, selecting it immediately brings up another "share" panel so
the user can pick an app to finalize the action — thus directing
the ProofMode file to email, SMS, a messaging app, Slack, or any other
option that supports attaching files.
Our thinking was more focused on integrity through
digital signatures, with a bit of lightweight, transient identity
added on.
" Nevertheless, he added, the project does have an
issue open to port key storage to the Android
Keystore system service.
every day activists around the world, who may
only have a cheap smartphone as their only computing device
"
rather than cryptographers. In an email, he also noted that ProofMode
requires little to no training for users to understand, which is a
stark contrast to the complexity of CameraV.
Verification versus anonymity
publishing it — there is too little overlap with the intent of
ProofMode, but the project has published a separate app that may fit
the bill.
In the pudding
daxctl() — getting the other half of persistent-memory performance
Persistent memory promises high-speed, byte-addressable access to storage,
with consequent benefits for all kinds of applications. But realizing those
benefits has turned out to present a number of challenges for the Linux
kernel community. Persistent memory is neither ordinary memory nor
ordinary storage,
so traditional approaches to memory and storage are not always well suited
to this new world. A proposal for a new daxctl() system call,
along with the ensuing discussion, shows how hard it can be to get the most
out of persistent memory.
daxctl()
the full promise
of byte-addressable access to persistent memory has only been half realized
via the filesystem-dax interface
". Realizing the other half
requires getting the filesystem out of the loop when it comes to write
access. If, say, a file could be set up so that no metadata changes would
be needed in response to writes, the problem would simply go away.
Applications would be able to write to DAX-mapped memory and, as long as
they ensured that their own writes were flushed to persistent store (which
can be done in user space with a couple of special instructions), there
should be no concerns about lost metadata.
int daxctl(char *path, int flags, int align);
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *s, sector_t sector);
MAP_SYNC
we manage to paper over the fact
". He suggested that development should be focused
instead on making DAX more stable before adding new features.
CentOS and ARM
Community
More packages
Boring
Distributing filesystem images and updates with casync
Chunking
-based search algorithm that can that can be used to find patterns in a
stream of bytes more efficiently than brute-force scanning.
h mod k == k - 1
and a
filename. The chunks are kept in compressed form in the chunk store. If a
chunk arriving in the store hashes to the same digest as an existing one in
the index, the chunk need not be added. This gives the chunk store
deduplication for data, which is particularly efficient for filesystem
images that do not differ much between versions. The chunks can then be
delivered over HTTP, along with the index, and they can be reassembled on
the client side.Trying it out
Future work
Conclusion
Page editor: Jonathan Corbet
Next page:
Brief items>>