Leading items
Welcome to the LWN.net Weekly Edition for March 1, 2018
This edition contains the following feature content:
- Creating an email archive with public-inbox: a tool that may help build a replacement for Gmane
- Avoiding license violations in a large organization: Tom Yates covers a FOSDEM talk on educating developers about free-software licenses.
- Shedding old architectures and compilers in the kernel: the kernel is about to get rid of some unloved code.
- Some advanced BCC topics: developing BPF programs with Python.
- Habitica: a role-playing game for self improvement: a different way to motivate yourself to get that to-do list done.
- The true costs of hosting in the cloud: which is cheaper: running your own servers or using cloud instances? The answer is not entirely straightforward.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Creating an email archive with public-inbox
Keeping up with the free-software development community requires following a lot of mailing lists. For many years, the Gmane email archive has helped your editor to do that without going any crazier than he already is, but Gmane is becoming an increasingly unreliable resource. A recent incident increased the priority of a longstanding goal to find (or create) an alternative to Gmane. That, in turn, led to the discovery of public-inbox.
The decline of Gmane
At its peak, Gmane was by far the best way to follow many dozens of mailing lists. It holds archives of a vast number of lists — the front page currently claims over 15,000 — so most of the lists of interest can be found there. Crucially, Gmane offers an NNTP feed; newsreaders are the fastest way that your editor has found to quickly get through a day's email and pick out the interesting messages. Gmane also offered a web-based view into the archive that could be easily linked using a message ID; that made it easy to capture emails and link them back to the context in which they were sent.
Gmane was created by Lars Magne Ingebrigtsen, who operated it for many
years before burning out and moving on in
2016. A company called Yomura picked up the archive and continued
operating the NNTP feed, but that is where things stopped. The web
interface disappeared, never to return, breaking thousands of links across
the net. The front page still says "some things are very
broken
" and links to a blog
page that was last updated in September 2016. Gmane has
appeared to be on minimal life support for some time.
In mid-February, Gmane stopped receiving emails from every mailing list hosted at vger.kernel.org; those include most of the kernel-related lists, but also lists for other projects like Git. Your editor posted a query and learned that delivery problems had forced Gmane to be dropped from all lists hosted at vger. While this was happening, the main Gmane web page also ceased to work. Since then, a handful of vger lists have returned to Gmane, though the bulk of them remain unsubscribed.
Those lists could certainly be fixed too, if somebody were to find the right person to poke. But the fact that so many high-profile lists could disappear for a week or more without anybody even seeming to notice makes it clear that Gmane is not getting a lot of attention these days. The wait for the web interface to come back is in vain; it's not at all clear that even what's there now is going to last for much longer.
Gmane has served the community well for years; and we all owe the people who have worked to make that happen a huge round of thanks. But all things must end, and it may well be that Gmane's time is coming soon. So what is a frantic LWN editor to do to ensure his ability to keep up with the community?
public-inbox
In the same discussion mentioned above, Konstantin Ryabitsev mentioned that the Linux Foundation is working with a project called public-inbox to create a comprehensive archive for the linux-kernel list. That inspired your editor to go and take a look. The conclusion is that public-inbox may well be the tool for this job, but there are some rough edges to be smoothed out first. The first of those could be said to be the project's web site, which is an unadorned directory listing containing a handful of documentation files.
To summarize: public-inbox can be used to implement an archive for one or more mailing lists. There is a web interface (see the page for the project's own mailing list for an example); it is functional but not necessarily designed for aesthetic appeal. There is a search facility implemented with Xapian that can make it easy to find messages of interest, though it lacks notmuch-style tags. Public-inbox also, happily, implements an NNTP interface to the archive.
Public-inbox, created and almost exclusively developed by Eric Wong, does not appear to have the creation of a Gmane-style mailing-list archive as its primary use case. Instead, it is a tool allowing people to follow (and participate in) mailing lists without the hassle of actually subscribing to them. That shows up in various ways in the design of the system.
For example, there is an interesting design decision at the core of public-inbox: each mailing-list archive is stored in a Git repository. Every incoming message is added to the repository in its own file in a separate commit; the Git history is thus the history of incoming email. A bare Git repository is normally used, so there is no need to duplicate the emails themselves. Viewing an email requires locating its file and checking it out of the repository — though none of that activity is visible to users of the system.
This use of Git would appear to be driven by a desire to make it easy for others to duplicate a specific list archive. And, perhaps more to the point, readers can "subscribe" to the list by periodically pulling new messages from the archive repository. There is a tool (called ssoma) that can be used to feed messages from a public-inbox repository into an email client. When readers get tired of a specific mailing list, they need only stop pulling from the relevant repository; no "unsubscribe" operations are needed. Whether people really want to follow mailing lists in this manner is unclear, but the capability is there.
There are various ways of feeding email into a public-inbox repository. The source comes with an import_maildir script that took many hours to import a 500,000-message linux-kernel archive. It is a somewhat fragile tool, crashing easily on email with malformed headers, but it worked well in the end and public-inbox is quite responsive with an archive of that size — at least, until it decides to run git prune on the repository. The public-inbox-mda utility will read a message from the standard input and inject it into an archive; it is meant to be used from a .forward or .procmailrc file. There is also public-inbox-watch, which will keep an eye on a maildir directory and feed new messages to the archive as they arrive. In general, setting up a new archive is a simple and easily scripted task once one understands how the utilities work.
A young project
The initial commit to the public-inbox repository was made in January 2014, just over four years ago. Since then, some 1,300 commits have built it up to 11,000 lines of code or so. In many ways, though, public-inbox feels like a young project that is still working to get some of the basic functionality in place. It will certainly need some work before it can be used to create archives that run at any sort of scale.
The project's documentation can be accurately described as "spartan", leaving much for the user to figure out on their own. To keep that task from being too easy, many of the commands will just silently fail if something is not set up to their liking. For example, public-inbox-mda will silently drop messages on the floor if the given mailing-list name does not appear in the To or CC headers. Your editor has more than once had to resort to placing print statements in the code (which is all Perl 5, tragically) in order to figure out where things were going wrong.
Other glitches abound. The web interface offers no customization or theming support. The NNTP server does not create proper Xref headers for messages that are cross-posted to more than one list, meaning that a reader of both lists will see a lot of duplicates. There are no tools for monitoring the flow of emails into the archive or troubleshooting problems. The Git-based design could make it interesting to remove an old email from the archive, should that become necessary — from looking at the code, it appears that rebasing the repository would break the archive, though your editor has not actually run this experiment. The X-No-Archive header is not honored. There are concerns about scalability to huge archives. There is also no word about what the project has done, if anything, to ensure the security of code that is exposed to the Internet via the email stream and the HTTP and NNTP ports.
Still, it seems that public-inbox has the core features that are needed to set up a no-nonsense email archive without a huge amount of work. Its simplicity is a nice contrast to something like HyperKitty, which quickly leads a hopeful user into a morass of Django setup and dependencies — and which lacks an NNTP server. There is enough apparent potential here that the Linux Foundation is funding some work to improve the scalability of public-inbox for its linux-kernel archive project. If public-inbox can generate some more interest and grow beyond an essentially single-developer project, it may well come to fill an important niche in our community.
Avoiding license violations in a large organization
Over the years, I have heard people from the Software Freedom Conservancy (SFC)
say many
times that most free-software license
violations are not intentional. Indeed, the SFC's principles
of community-oriented enforcement
say that "most GPL violations occur by mistake, without
ill will
". I've always had some difficulty in believing
that; after all, how hard can it be to create a GPL
repository on GitHub and sync the code into it? But it is also
said that managing programmers is like herding cats. It was therefore
interesting to hear a large-scale cat herder talk at FOSDEM 2018 about
the license violations that occurred in their organization and what he and
his colleagues did about it.
Andreas Schreiber works for DLR, Germany's national aeronautics and space research center. DLR has some 8,000 employees across 40 institutions at 20 sites; of those, around 1,500 work on software development. Schreiber said its annual budget of some €150M for software development makes DLR one of the largest software developers in Germany. However, it is primarily an academic institution. Unlike many large commercial software developers, its software is largely written by people employed because of their expertise in such fields as aeronautics and space transportation, who have no formal computer science background, and often no formal training in software development.
![Andreas Schreiber [Andreas Schreiber]](https://static.lwn.net/images/2018/fosdem-schreiber-sm.jpg)
Many different technologies and over 30 programming languages are routinely used at DLR. Much free software is both used and created there, under many different licenses, and that, said Schreiber, is where the problems come in. Because of the academic nature of DLR, an overview of all existing software development projects is simply not possible. Development often takes place with external partners and the resulting software is frequently released to the world at large under a free license.
In the past, DLR developers have mistakenly released software with license compatibility problems or unintentionally failed to honor license conditions. In one high-profile case, the German government gave some of DLR's software to another country, assuring the recipient that the software was open-source, when this was not in fact true. A journalist from Der Spiegel picked up on this, and much effort was required to rectify the situation. Although DLR's software licensing problems are not quite as visible or expensive as its hardware problems — which Schreiber illustrated with a picture of a rocket exploding — they were nevertheless visible at a high level inside DLR. In 2012, the CIO sent around a letter reminding everyone that free-software license conditions were not optional and must be honored. Nevertheless, developers who were not trained in the minutiae of free licenses often struggled to understand their obligations, especially when more than one license was involved.
So Schreiber's group at DLR developed a four-hour training program on the legal issues surrounding free software, which could be attended by any DLR employee that wanted to do so. It is given by two people, one a DLR software engineer and the other a legal expert from DLR's technology marketing department. His group has run this training once a year from 2012 onward, and each year between 10 and 30 people have signed up.
The group has also produced a paper brochure, which was commissioned from a law firm in Berlin. This brochure contains information about one's rights and obligations with respect to free software, for both unmodified and modified versions of the software. It includes the requirements of the licenses in common use at DLR, including the GPL, Apache, BSD, and MIT licenses, in an easy-to-honor checklist form. Furthermore, the brochure contains a decision tree for choosing a project license, depending on the project's requirements for strong, weak, or no copyleft, and whether special conditions are required. It is currently available only in German, but DLR is working on an English-language version that it intends to release under a Creative Commons license sometime in 2018.
DLR found that encouraging knowledge exchange between people inside the organization who had experience with licensing issues was important. DLR has an internal tradition of wikis and the infrastructure to support that, so Schreiber's group put up an "Open Source" wiki. There is also an internal tradition of WAWs (Wissens Austausch Workshops, or Knowledge Exchange Workshops) which are run on subjects as varied as visualization of huge data sets, autonomous flying, software engineering, and photonic systems. So Schreiber's group ran several DLR.Open WAWs, each lasting two days and limited to 60 participants; there are short lectures, lightning talks, and small-group breakout sessions. Last year all participants had to produce a poster about themselves and their work at DLR, examples of which are shown in the slides [PDF], though the main thing learned from some posters was that there are people at DLR with "very strange minds" and it's quite hard to follow their thinking, Schreiber said, to laughter.
What you can't measure, you don't really know. DLR is an academic institution, so Schreiber's group systematically tried to measure how useful these free-license initiatives were. In addition to receiving positive feedback, his group also learned that, although much free software is used at DLR, a lot of that was open libraries and tools brought in from outside. Most of the software developed at DLR is still not being published freely. The group also found that there was strong interest in freeing internally-developed software.
At the same time, though, there was criticism of the idea. Although some worried that revenue-generating opportunities were being missed, one of the big obstacles was the lack of a formal process definition for employees to follow in order to publish their work as free software and the demotivating effect of the extra time required to do it. Schreiber's group has since created the missing process definition. It also created a single internal email address for all free-software licensing queries, which gets questions on topics such as choosing a free license, best practices for one's own free-software project, and how to migrate away from commercial software. Some of these the group answers internally; sometimes it acts as an information exchange, by putting the employee in touch with other internal resources such as the legal or technology marketing departments.
The group has also had three licenses formally approved by the legal department. Any employee can use any of these licenses on code they create in the course of work with no further oversight. The three licenses are BSD, Apache v2, and Eclipse Public License v1. Interestingly, none of these is a strong copyleft license, and one of them (the EPL) is incompatible with the GPL.
In answer to a question from the floor, Schreiber said the absence of a strong copyleft license from the list was unintentional; those three are the licenses about which his group is most often asked. The list is not intended to be set in stone for all time, and he would be perfectly happy to add a strong copyleft license, most likely the GPL; it hasn't happened yet, though. In answer to another question, Schreiber said that there are DLR projects released under the GPL, and the AGPL, but that doing so currently requires the approval of both the legal department and the developer's manager. It would seem to me that adding a strong copyleft license to the list of pre-approved licenses is important and should be addressed soon.
Schreiber also noted that both NASA and ESA have developed their own open-source licenses, whereas DLR has deliberately chosen not to do that. Given widespread concerns about license proliferation, and that NASA's license is both non-free and GPL-incompatible, this seems a good decision. In addition, in response to a later question, Schreiber said his group has tried mandating licenses for DLR projects, but that just did not work in the DLR culture, where researchers are used to doing what they like, how they like. Imposing a single institutional license would have been difficult; instead, the group provides advice and support, it will even recommend if asked to, but it doesn't mandate.
In summary, Schreiber said the approach DLR took was to offer specific and relevant information to DLR employees, and then provide time and space for peer discussions and knowledge exchange. Only after all that has been done is it time to introduce any kind of formal process or directions from management. The lessons learned and procedures developed have been taken up by some other interested organizations, including the Helmholtz Association. If I were in an organization of anything like the complexity and mindset of DLR, I'd be paying a lot of attention to what Schreiber's group is doing, too.
[We would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Brussels for FOSDEM.]
Shedding old architectures and compilers in the kernel
The kernel development process tends to be focused on addition: each new release supports more drivers, more features, and often new processor architectures. As a result, almost every kernel release has been larger than its predecessor. But occasionally even the kernel needs to slim down a bit. Upcoming kernel releases are likely to see the removal of support for a number of unloved architectures and, in an unrelated move, the removal of support for some older compilers.
Architectures
The Meta
architecture was added to the 3.9 kernel as "metag" in 2013; it is a 32-bit
architecture developed by Imagination Technologies. Unfortunately, at
about the same time as the code was merged, Imagination Technologies bought
MIPS Technologies and shifted its attention to the MIPS architecture.
Since then, the kernel's support for Meta has languished, and it can only
be built with the GCC 4.2.4 release, which is unsupported. On
February 21, James Hogan, the developer who originally added the Meta
port to the kernel, proposed that it be
removed, calling it "essentially dead with no users
".
The very next day, Arnd Bergmann, working entirely independently, also proposed removing Meta. Bergmann, however, as is his way, took a rather wider view of things: he proposed that the removal of five architectures should be seriously considered. The other four were:
- The Sunplus
S+core ("score") architecture entered the kernel for
the 2.6.30 release in 2009. Since then, the maintainers for that
architecture have moved on and no longer contribute changes to the
kernel. GCC support for score was removed in the 5.0 release in
2015. Bergman said: "
I conclude that this is dead in Linux and can be removed
". Nobody has spoken up in favor of retaining this architecture. - The Unicore 32 architecture was merged for the 2.6.39 kernel in 2011. This architecture was a research project at Peking University. Bergmann noted that there was never a published GCC port and that the maintainer has not sent a pull request since 2014. This architecture, too, seems to lack users and nobody has spoken in favor of keeping it.
- Qualcomm's Hexagon is a
digital signal processor architecture; support for Hexagon entered the
kernel in 2011 for the 3.2 release. Hexagon is a bit of a different
case, in that the architecture is still in active use; Bergmann said
that "
it is being actively used in all Snapdragon ARM SoCs, but the kernel code appears to be the result of a failed research project to make a standalone Hexagon SoC without an ARM core
". The GCC 4.5 port for this architecture never was merged.Richard Kuo responded in defense of the Hexagon architecture, saying: "
We still use the port internally for kicking the tools around and other research projects
". The GCC port is indeed abandoned, he said, but only because Qualcomm has moved to using LLVM to build both kernel and user-space code. Bergmann responded that, since there is still a maintainer who finds the code useful, it will remain in the kernel. He would like to put together a working LLVM toolchain to build this port, though. - The OpenRISC architecture was
merged in the 3.1 release, also in 2011. Bergmann observed that OpenRISC
"
seems to have lost a bit of steam after RISC-V is rapidly taking over that niche, but there are chips out there and the design isn't going away
". He added it to his list because there is no upstream GCC support, but said that the OpenRISC GCC port is easy to find and the kernel code is being actively maintained. Philipp Wagner responded that the GCC code has not been upstreamed because of a missing copyright assignment from a significant developer; that code is in the process of being rewritten. The end result is that there is no danger of OpenRISC being removed from the kernel anytime soon.
Bergmann also mentioned in passing that the FR-V and M32R architectures
(both added prior to the beginning of the Git era) have been marked as
being orphaned and should eventually be considered for removal. It quickly
became apparent in the discussion, though, that nobody seems to care about
those architectures. Finally, David Howells added support for the mn10300
architecture for 2.6.25 in 2008 and is still its official maintainer but,
according to Bergmann,
it "doesn't seem any more active than the other two, the last real
updates were in 2013
". Others in the discussion mentioned the tile
(2.6.36 in 2010) and blackfin (2.6.21, 2007) as being unused at this
point.
The plan that emerged from this discussion
is to remove score, unicore, metag, frv, and m32r in the 4.17 development
cycle, while
hexagon and openrisc will be retained. There will be a
brief reprieve for blackfin and tile, which will be removed "later
this year
" unless a maintainer comes forward. And mn10300 will be
marked for "pending removal
" unless it gains support for
recent chips. All
told, there is likely to be quite a bit of code moving out of the kernel in
the near future.
Compilers
The changes.rst file in the kernel documentation currently states that the oldest supported version of GCC is 3.2, which was released in 2002. It has been some time, though, since anybody has actually succeeded in building a kernel with a compiler that old. In a discussion in early February, Bergmann noted that the oldest version known to work is 4.1 from 2006 — and only one determined developer is even known to have done that. The earliest practical compiler to build the kernel would appear to be 4.3 (2008), which is still supported in the SLES 11 distribution.
Linus Torvalds, though, said that the real minimum version would need to be 4.5 (2010); that is the version that added the "asm goto" feature allowing inline assembly code to jump to labels in C code. Supporting compilers without this feature requires maintaining a fair amount of fallback code; asm goto is also increasingly needed for proper Meltdown/Spectre mitigation. Some developers would be happy to remove the fallback code, but there is a minor problem with that as pointed out by Kees Cook: LLVM doesn't support asm goto, and all of Android is built with LLVM. Somebody may need to add asm goto support to LLVM in the near future.
Peter Zijlstra would like to go another step and require GCC 4.6, which added the -fentry feature; this replaces the old mcount() profiling hook with one that is called before any other function-entry code. That, too, would allow the removal of some old compatibility code. At that point, though, according to Bergmann, it would make sense to make the minimum version be 4.8, since that is the version supported by a long list of long-term support distributions. But things might not even stop there, since the oldest version of GCC that is said to have support for the "retpoline" Spectre mitigation is 4.9, released in 2014.
Nobody has yet made a decision on what the true minimum version of GCC needed to build the kernel will be so, for now, the documentation retains the fictional 3.2 number. That will certainly change someday. Meanwhile, anybody who is using older toolchains to build current kernels should probably be thinking about moving to something newer.
(Thanks to Arnd Bergmann for answering a couple of questions for this article. It's also worth noting that he has recently updated the extensive set of cross compilers available on kernel.org; older versions of those compilers can be had from this page.)
Some advanced BCC topics
The BPF virtual machine is working its way into an increasing number of kernel subsystems. The previous article in this series introduced the BPF Compiler Collection (BCC), which provides a set of tools for working with BPF. But there is more to BCC than a set of administrative tools; it also provides a development environment for those wanting to create their own BPF-based utilities. Read on for an exploration of that environment and how it can be used to create programs and attach them to tracepoints.The BCC runtime provides a macro, TRACEPOINT_PROBE, that declares a function to be attached to a tracepoint that will be called every time the tracepoint fires. The following snippet of C code shows an empty BPF program that runs every time kmalloc() is called in the kernel:
TRACEPOINT_PROBE(kmem, kmalloc) { return 0; }
The arguments to this macro are the category of the tracepoint and the event itself; this translates directly into the debugfs file system hierarchy layout (e.g. /sys/kernel/debug/tracing/events/category/event/). In true BCC-make-things-simple fashion, the tracepoint is automatically enabled when the BPF program is loaded.
The kmalloc() tracepoint is passed a number of arguments, which are described in the associated format file. Tracepoint arguments are accessible in BPF programs through the magic args variable. For our example, we care about args->call_site, which is the kernel instruction address of the kmalloc() call. To keep a count of the different kernel functions that call kmalloc(), we can store a counter in a hash table and use the call-site address as an index.
While BCC provides access to the full range of data structures exported by the kernel (and covered in the first article of this series), the two most frequently used are BPF_HASH and BPF_TABLE. Fundamentally, all of BCC's data structures are maps, and higher-level data structures are built on top of them; the most basic of these is BPF_TABLE. The BPF_TABLE macro takes a type of table (hash, percpu_array, or array) as an argument, and other macros, such as BPF_HASH and BPF_ARRAY are simply wrappers around BPF_TABLE. Because all data structures are maps, they all support the same core set of functions, including map.lookup(), map.update(), and map.delete(). (There are also some map-specific functions such as map.perf_read() for BPF_PERF_ARRAY and map.call() for BPF_PROG_ARRAY.)
Returning to our example program, we can store the kernel instruction-pointer address of the kmalloc() call-site (and the number of times it was called) using a BPF_HASH map and post-process it with Python. Here is the entire script, including the BPF program.
#!/usr/bin/env python from bcc import BPF from time import sleep program = """ BPF_HASH(callers, u64, unsigned long); TRACEPOINT_PROBE(kmem, kmalloc) { u64 ip = args->call_site; unsigned long *count; unsigned long c = 1; count = callers.lookup((u64 *)&ip); if (count != 0) c += *count; callers.update(&ip, &c); return 0; } """ b = BPF(text=program) while True: try: sleep(1) for k,v in sorted(b["callers"].items()): print ("%s %u" % (b.ksym(k.value), v.value)) print except KeyboardInterrupt: exit()
The syntax for the BPF_HASH macro is described in the BCC reference guide. The macro takes a number of optional arguments, but for most uses all you need to specify is the name of this hash table instance (callers in this example), the key data type (u64), and the value data type (unsigned long). BPF hash table entries are accessed using the lookup() function; if no entry exists for a given key, NULL is returned. update() will either insert a new key-value pair (if none exists) or update the value of an existing key. Thus, the BPF code for working with hashes can be quite compact because you can use a single function (update()) regardless of whether you're inserting a new item or updating an existing one.
Once a count has been stored in the hash table, it can be processed with Python. Accessing the table is done by indexing the BPF object (called b in the example). The resultant Python object is a HashTable (defined in the BCC Python front end) and its items are accessed using the items() function. Note that Python BCC maps provide a different set of functions than BPF maps.
items() returns a pair of Python c_long types whose values can be retrieved using the value member. For example, the following code from the example above iterates over all items in the callers hash table and prints the kernel functions (using the BCC BPF.ksym() helper function to convert kernel addresses to symbols) that invoked kmalloc() and the number of calls:
for k,v in sorted(b["callers"].items()): print ("%s %u" % (b.ksym(k.value), v.value))
The output from this little program looks like:
# ./example.py i915_sw_fence_await_dma_fence 4 intel_crtc_duplicate_state 4 SyS_memfd_create 1 drm_atomic_state_init 4 sg_kmalloc 7 intel_atomic_state_alloc 4 seq_open 504 SyS_bpf 22
Though this example is relatively straightforward, larger tools will not be, and developers need ways to debug more complex tools. Thankfully, there are a few ways that BCC helps simplify the debugging process.
Controlling BPF program compilation and loading
Whenever a Python BPF object is instantiated, the BPF program source code contained within it is automatically compiled and loaded into the kernel. The compilation process can be controlled by passing compiler flag arguments in the cflags parameter to the BPF class constructor. These flags are passed directly to the Clang compiler, so any options that you might normally pass to the compiler can be used; all compiler warnings can be turned on with "cflags=['-Wall']", for instance.
A popular use of cflags in the official BCC tools is to pass macro definitions. For example, the xdp_drop_count.py script statically allocates an array with enough space for every online CPU using Python's multiprocessing library and Clang's -D flag:
cflags=["-DNUM_CPUS=%d" % multiprocessing.cpu_count()])
The BPF class constructor also accepts a number of debugging flags in the debug argument. Each of these flags individually enables extra logging during either the compilation or the loading process. For example, the DEBUG_BPF flag causes the BPF bytecode to be output, which can be a last hope for those really troublesome bugs. This output looks like:
0: (79) r1 = *(u64 *)(r1 +8) 1: (7b) *(u64 *)(r10 -8) = r1 2: (b7) r1 = 1 3: (7b) *(u64 *)(r10 -16) = r1 4: (18) r1 = 0xffff8801a6098a00 6: (bf) r2 = r10 7: (07) r2 += -8 8: (85) call bpf_map_lookup_elem#1 9: (15) if r0 == 0x0 goto pc+3 R0=map_value(id=0,off=0,ks=8,vs=8,imm=0) R10=fp0 10: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=8,vs=8,imm=0) R10=fp0 11: (07) r1 += 1 12: (7b) *(u64 *)(r10 -16) = r1 13: (18) r1 = 0xffff8801a6098a00 15: (bf) r2 = r10 16: (07) r2 += -8 17: (bf) r3 = r10 18: (07) r3 += -16 19: (b7) r4 = 0 20: (85) call bpf_map_update_elem#2 21: (b7) r0 = 0 22: (95) exit from 9 to 13: safe processed 22 insns, stack depth 16
This output comes directly from the in-kernel verifier and shows every instruction of bytecode emitted by Clang/LLVM, along with the register state on branch instructions. If this level of detail still isn't enough, the DEBUG_BPF_REGISTER_STATE flag generates even more verbose log messages.
For run-time debugging, bpf_trace_printk() provides a printk()-style interface for writing to /sys/kernel/debug/tracing/trace_pipe from BPF programs; those messages can then be consumed and printed in Python using the BPF.trace_print() function.
However, a major drawback of this approach is that, since the trace_pipe file is a global resource, it contains all messages written by concurrent writers, making it difficult to filter messages from a single BPF program. The preferred method is to store messages in a BPF_PERF_OUTPUT map inside the BPF program, then process them with open_perf_buffer() and kprobe_poll(). An example of this scheme is provided in the open_perf_buffer() documentation.
Using BPF with applications
This article has focused exclusively on attaching programs to kernel tracepoints, but you can also attach programs to the user-space tracepoints included with many popular applications using User Statically-Defined Tracing (USDT) probes. In the next and final article of this series, I'll cover the origin of USDT probes, the BCC tools that use them, and how you can add them to your own application.
Habitica: a role-playing game for self improvement
What if real-life chores could gain you fake internet points like in an online role-playing game? That's the premise of Habitica, a productivity application disguised as a game. It's a self-improvement application where players can list their daily tasks or to-do items in the game; every time one is checked-off, the game rewards the player with points or game items.
The game dresses up the task-checking mechanics with the standard trappings of the genre; there are character classes, weapons, armor, and level progression. These are mapped onto real-life tasks in novel ways; it is designed to make daily chores fun. The game is hosted on the Habitica server and can be played from either a web interface or a mobile app (iOS and Android). Both the mobile apps and the server software are available under the GPLv3.
Playing the game
Habitica draws from the tropes of fantasy role-playing games. The player creates a character to represent themselves in the game. The character's appearance, such as skin color, hair, and gender, can be customized. The player's character will belong to a class: warrior, mage, rogue, or healer. All characters start off as warriors, but a player can choose one of the other classes once they reach level ten. Playing the game involves listing "habits" that are real-life activities that the player wishes to do on a regular basis. Habits can be anything, such as exercise or flossing; the player needs to state the difficulty level of each habit when listing it.
Whenever a player performs one of the listed habits in real life, the corresponding item in the game can be clicked; that action will be rewarded with experience points and game currency. Experience points help with the character's level progression and currency can be used to purchase equipment. A player can also earn pets, which are unique little cosmetic additions to the character's avatar. The pets can be "fed" with "food" (which is an in-game prize); with sufficient food the pet will grow into a mount, which the avatar can ride.
For habits that the player wants to cultivate regularly, there is a special list called "Dailies", though items on the list can be less frequent than daily (certain days of the week, weekly, etc.). Players will be penalized health points if they miss checking off an item from their Dailies list. Finally, there is a to-do list where one-off items can be listed and checked-off for points. The character's class has some modifiers to the point-scoring system, but it often does not make much of a difference which class the player's character is.
Players can team up with other players to form a questing party and team up in group quests where every task accomplished will contribute points to the quest at hand. Missed habits will incur a penalty to the entire party, so friends can team up and motivate each other to stay the course for their daily habits. Besides the party system, there is also another way players can work together in Habitica: a group chat feature called guilds. A guild is formed with a theme or shared common goal. For example, there are guilds for artists, students, and healthy living enthusiasts. Guilds can set up challenges, which will add tasks to the habits list; completing them will earn in-game prizes.
It is a relatively simple web game, the graphics are two-dimensional and lack the sort of sophistication you'd find in modern video games. The characters are reminiscent of the sprites from 8-bit Nintendo games of the 1980s, albeit lacking animation. There is an old-school kind of charm to the game's look and feel.
Since the game can't enforce whether or not you do a particular challenge in real life, it is up to the player to accurately record their habits in the game. To get the most out of the self-improvement aspect of the game, players need to be honest with themselves. The social aspect of the game helps in this regard, as party members encourage each other to stick to their respective routines and not give up.
Development
Habitica grew out of a personal project by Tyler Renelle to help him track his daily habits. The first version was called HabitRPG, and it was just a Google Docs spreadsheet. As interest grew for the game, it was converted into an online app. Eventually, Renelle was joined by Siena Leslie and Vicky Hsu to create a company called HabitRPG Inc. to develop and support the game. A Kickstarter campaign was launched in 2013 where 2,817 backers pledged $41,191 to help fund development. The code for the game is available from the company's GitHub account.
Playing the game on HabitRPG's servers is free, but to finance its operation, the company has a paid tier that offers players special items not available on the free tier. Paid-tier members can accumulate another in-game currency called gems that can be used to acquire special items and quests.
The site is developed mainly in JavaScript, with Vue.js providing the front-end rendering. Node.js and MongoDB power the server-side back-end of the application. The site is developed as a community effort; users of the site collaborate with each other and HabitRPG to add features, artwork, and bug fixes. The development community uses a combination of tools to aid in development: Trello, Bountysource, and Habitica itself. It is interesting how the software can be used as a way to gamify its own development.
There is a guild for contributors where development can be discussed and specific tasks will win prizes in the form of in-game gems. Bountysource is also used for further incentives; users can donate money toward a feature request, which can be claimed by anyone who successfully implements the feature. Contributors organize their tasks on Trello. Once a contribution is accepted, there are a number of in-game badges of honor, special items, and gems as rewards.
Unlike regular applications, games require not just code but all manner of assets such as graphics, music, sound, and playable content. Thus, the Habitica development community isn't just coders, but also artists and gaming enthusiasts. The community has created categories for contributors: blacksmiths (coders), artisans (artists), bards (sound designers), linguists (translators), linguistic scribes (wiki translators), socialites (question and answer support), and storytellers (game content writers). Each group has a dedicated community around creating new content and features for the site.
In the coding guild there are over 5,400 members, and the artist guild has over 1,700. The other guilds have less than a thousand members, but even the smallest guilds have more than 300. Granted, not all the members are active, but the number of people joining are a metric of how interested people are in contributing to the game. Contributors consist not only of professional developers and artists but also people who write code or draw art as a hobby. The simplicity of the game's mechanics and graphics make it relatively easy for contributors to jump in. The messages exchanged in the guild chat room are supportive and constructive, which keeps the environment friendly and welcoming.
The site exports an application programming interface for third-party programs. A third-party app can talk to the server using a RESTful interface. The documentation is comprehensive and lists every possible available action in the game that can be invoked via the API. This allows for innovative third party tools such as a bulk pet feeder or a tool to create Habitica tasks from GitHub.
Conclusion
Habitica is both a game and a productivity application and, as such, the development of the software gets the best of both the application and game development worlds. Most games need to be "finished" before they are interesting enough to play, and this is why most open-source games have not really taken off. However, the Habitica development model lets players invest their time in the development of the game as part of the game, and offers gamified incentives to contribute.
The game is also useful in itself as a tool, unlike a conventional game. However, the development process relies on a number of different tools that lack a degree of integration between them, so developers may run into problems such as tasks on Trello being unable to update bounties on Bountysource. These issues will probably be addressed in the future, as there is an enthusiastic community around the game that is proceeding with a rapid pace of development. Habitica is useful as a self-improvement tool, fun to play, and fun to engage with as a developer.
The true costs of hosting in the cloud
Should we host in the cloud or on our own servers? This question was at the center of Dmytro Dyachuk's talk, given during KubeCon + CloudNativeCon last November. While many services simply launch in the cloud without the organizations behind them considering other options, large content-hosting services have actually moved back to their own data centers: Dropbox migrated in 2016 and Instagram in 2014. Because such transitions can be expensive and risky, understanding the economics of hosting is a critical part of launching a new service. Actual hosting costs are often misunderstood, or secret, so it is sometimes difficult to get the numbers right. In this article, we'll use Dyachuk's talk to try to answer the "million dollar question": "buy or rent?"
Computing the cost of compute
So how much does hosting cost these days? To answer that apparently trivial question, Dyachuk presented a detailed analysis made from a spreadsheet that compares the costs of "colocation" (running your own hardware in somebody else's data center) versus those of hosting in the cloud. For the latter, Dyachuk chose Amazon Web Services (AWS) as a standard, reminding the audience that "63% of Kubernetes deployments actually run off AWS". Dyachuk focused only on the cloud and colocation services, discarding the option of building your own data center as too complex and expensive. The question is whether it still makes sense to operate your own servers when, as Dyachuk explained, "CPU and memory have become a utility", a transition that Kubernetes is also helping push forward.
Another assumption of his talk is that server uptime isn't that critical anymore; there used to be a time when system administrators would proudly brandish multi-year uptime counters as a proof of server stability. As an example, Dyachuk performed a quick survey in the room and the record was an uptime of 5 years. In response, Dyachuk asked: "how many security patches were missed because of that uptime?" The answer was, of course "all of them". Kubernetes helps with security upgrades, in that it provides a self-healing mechanism to automatically re-provision failed services or rotate nodes when rebooting. This changes hardware designs; instead of building custom, application-specific machines, system administrators now deploy large, general-purpose servers that use virtualization technologies to host arbitrary applications in high-density clusters.
When presenting his calculations, Dyachuk explained that "pricing is complicated" and, indeed, his spreadsheet includes hundreds of parameters. However, after reviewing his numbers, I can say that the list is impressively exhaustive, covering server memory, disk, and bandwidth, but also backups, storage, staffing, and networking infrastructure.
For servers, he picked a Supermicro chassis with 224 cores and 512GB of memory from the first result of a Google search. Once amortized over an aggressive three-year rotation plan, the $25,000 machine ends up costing about $8,300 yearly. To compare with Amazon, he picked the m4.10xlarge instance as a commonly used standard, which currently offers 40 cores, 160GB of RAM, and 4Gbps of dedicated storage bandwidth. At the time he did his estimates, the going rate for such a server was $2 per hour or $17,000 per year. So, at first, the physical server looks like a much better deal: half the price and close to quadruple the capacity. But, of course, we also need to factor in networking, power usage, space rental, and staff costs. And this is where things get complicated.
First, colocation rates will vary a lot depending on location. While bandwidth costs are often much lower in large urban centers because of proximity to fast network links, real estate and power prices are often much higher. Bandwidth costs are now the main driver in hosting costs.
For the purpose of his calculation, Dyachuk picked a real-estate figure of $500 per standard cabinet (42U). His calculations yielded a monthly power cost of $4,200 for a full rack, at $0.50/kWh. Those rates seem rather high for my local data center, where that rate is closer to $350 for the cabinet and $0.12/kWh for power. Dyachuk took into account that power is usually not "metered billing", when you pay for the actual power usage, but "stepped billing" where you pay for a circuit with a (say) 25-amp breaker regardless of how much power you use in said circuit. This accounts for some of the discrepancy, but the estimate still seems rather too high to be accurate according to my calculations.
Then there's networking: all those machines need to connect to each other and to an uplink. This means finding a bandwidth provider, which Dyachuk pinned at a reasonable average cost of $1/Mbps. But the most expensive part is not the bandwidth; the cost of managing network infrastructure includes not only installing switches and connecting them, but also tracing misplaced wires, dealing with denial-of-service attacks, and so on. Cabling, a seemingly innocuous task, is actually the majority of hardware expenses in data centers, as previously reported. From networking, Dyachuk went on to detail the remaining cost estimates, including storage and backups, where the physical world is again cheaper than the cloud. All this is, of course, assuming that crafty system administrators can figure out how to glue all the hardware together into a meaningful package.
Which brings us to the sensitive question of staff costs; Dyachuk described those as "substantial". These costs are for the system and network administrators who are needed to buy, order, test, configure, and deploy everything. Evaluating those costs is subjective: for example, salaries will vary between different countries. He fixed the person yearly salary costs at $250,000 (counting overhead and an actual $150,000 salary) and accounted for three people on staff. Those costs may also vary with the colocation service; some will include remote hands and networking, but he assumed in his calculations that the costs would end up being roughly the same because providers will charge extra for those services.
Dyachuk also observed that staff costs are the majority of the expenses in a colocation environment: "hardware is cheaper, but requires a lot more people". In the cloud, it's the opposite; most of the costs consist of computation, storage, and bandwidth. Staff also introduce a human factor of instability in the equation: in a small team, there can be a lot of variability in ability levels. This means there is more uncertainty in colocation cost estimates.
In our discussions after the conference, Dyachuk pointed out a social aspect to consider: cloud providers are operating a virtual oligopoly. Dyachuk worries about the impact of Amazon's growing power over different markets:
Demand management
Once the extra costs described are factored in, colocation still would appear to be the cheaper option. But that doesn't take into account the question of capacity: a key feature of cloud providers is that they pool together large clusters of machines, which allow individual tenants to scale up their services quickly in response to demand spikes. Self-hosted servers need extra capacity to cover for future demand. That means paying for hardware that stays idle waiting for usage spikes, while cloud providers are free to re-provision those resources elsewhere.
Satisfying demand in the cloud is easy: allocate new instances automatically and pay the bill at the end of the month. In a colocation, provisioning is much slower and hardware must be systematically over-provisioned. Those extra resources might be used for preemptible batch jobs in certain cases, but workloads are often "transaction-oriented" or "realtime" which require extra resources to deal with spikes. So the "spike to average" ratio is an important metric to evaluate when making the decision between the cloud and colocation.
Cost reductions are possible by improving analytics to reduce over-provisioning. Kubernetes makes it easier to estimate demand; before containerized applications, estimates were per application, each with its margin of error. By pooling together all applications in a cluster, the problem is generalized and individual workloads balance out in aggregate, even if they fluctuate individually. Therefore Dyachuk recommends to use the cloud when future growth cannot be forecast, to avoid the risk of under-provisioning. He also recommended "The Art of Capacity Planning" as a good forecasting resource; even though the book is old, the basic math hasn't changed so it is still useful.
The golden ratio
Colocation prices finally overshoot cloud prices after adding extra capacity and staff costs. In closing, Dyachuk identified the crossover point where colocation becomes cheaper at around $100,000 per month, or 150 Amazon m4.2xlarge instances, which can be seen in the graph below. Note that he picked a different instance type for the actual calculations: instead of the largest instance (m4.10xlarge), he chose the more commonly used m4.2xlarge instance. Because Amazon pricing scales linearly, the math works out to about the same once reserved instances, storage, load balancing, and other costs are taken into account.
He also added that the figure will change based on the workload; Amazon is more attractive with more CPU and less I/O. Inversely, I/O-heavy deployments can be a problem on Amazon; disk and network bandwidth are much more expensive in the cloud. For example, bandwidth can sometimes be more than triple what you can easily find in a data center.
Your mileage may vary; those numbers shouldn't be taken as an absolute. They are a baseline that needs to be tweaked according to your situation, workload and requirements. For some, Amazon will be cheaper, for others, colocation is still the best option.
He also emphasized that the graph stops at 500 instances; beyond that lies another "wall" of investment due to networking constraints. At around the equivalent of 2000-3000 Amazon instances, networking becomes a significant bottleneck and demands larger investments in networking equipment to upgrade internal bandwidth, which may make Amazon affordable again. It might also be that application design should shift to a multi-cluster setup, but that implies increases in staff costs.
Finally, we should note that some organizations simply cannot host in the cloud. In our discussions, Dyachuk specifically expressed concerns about Canada's government services moving to the cloud, for example: what is the impact on state sovereignty when confidential data about its citizen ends up in the hands of private contractors? So far, Canada's approach has been to only move "public data" to the cloud, but Dyachuk pointed out this already includes sensitive departments like correctional services.
In Dyachuk's model, the cloud offers significant cost reduction over traditional hosting in small clusters, at least until a deployment reaches a certain size. However, different workloads significantly change that model and can make colocation attractive again: I/O and bandwidth intensive services with well-planned growth rates are clear colocation candidates. His model is just a start; any project manager would be wise to make their own calculations to confirm the cloud really delivers the cost savings it promises. Furthermore, while Dyachuk wisely avoided political discussions surrounding the impact of hosting in the cloud, data ownership and sovereignty remain important considerations that shouldn't be overlooked.
A YouTube video and the slides [PDF] from Dyachuk's talk are available online.
[We would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend KubeCon + CloudNativeCon.]
Page editor: Jonathan Corbet
Next page:
Brief items>>