|
|
Log in / Subscribe / Register

Leading items

Welcome to the LWN.net Weekly Edition for August 1, 2019

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Python and public APIs

By Jake Edge
July 31, 2019

In theory, the public API of a Python standard library module is fully specified as part of its documentation, but in practice it may not be quite so clear cut. There are other ways to specify the names in a module that are meant to be public, and there are naming conventions for things that should not be public (e.g. the name starts with an underscore), but there is no real consistency in how those are used throughout the standard library. A mid-July discussion on the python-dev mailing list considered the problem and some possible solutions; the main outcome seems to be interest in making the rules more explicit.

It should be noted that the Python language does not enforce any access restrictions at all; any program that can import a module can access any top-level name defined in it. All of the "rules" that govern access restrictions are simply conventions, though they are meant to delineate things that can be changed by a module maintainer without going through the usual deprecation cycle. A big part of the public API is effectively a list of names that the module maintainer promises not to change without a good deal of warning (at least two full development cycles).

Rules?

Serhiy Storchaka raised the issue by listing the rules that he thought governed the public/private question for names in modules. They revolve around the use of the __all__ attribute, which is a way to list names (or submodules) that should be imported when a "from module import *" is executed. If there is no __all__, Python will import any names that do not start with underscore in a from import, so those names would be part of the public API, Storchaka suggested. In addition, any name that was explicitly documented to part of the public API would be.

He noted that two bug reports with some recent comments seemed to violate his mental model of how the public API is specified. In the first, Raymond Hettinger asked that all non-public functions in the calendar module be renamed to start with an underscore. In the other, Gregory P. Smith suggested documenting the escape_decode() function in the codecs module because "it is public by virtue of its name". escape_decode() is recommended in answers at sites like Stack Overflow, which is part of what motivated the suggestion.

But in both cases, Storchaka said, the modules have __all__ attributes where the names in question are not listed, so they should not be considered part of the public API. Thus they don't need underscores or documentation, Stack Overflow notwithstanding. Hettinger argued that the calendar module was one that had adhered to the underscore convention along the way until "a recent patch went against that practice". It came to his attention via a tweet from a confused user.

As Storchaka pointed out, however, calendar already had quite a few non-public functions that did not start with an underscore back in Python 3.6. Part of the problem is that some people are using the dir() builtin to examine the names in a module. But dir(module) will give a list of all of the names, public or private, without regard for __all__ or the underscore-prefix convention. Storchaka said that dir() is not the proper tool and suggested the help() builtin instead (e.g. help(module)).

The first line of Hettinger's mail should cover the question: "The RealDefinition™ is that whatever we include in the docs is public, otherwise not." His point about maintaining the conventions used by a module (though calendar is apparently not a good example) was a good one, Brett Cannon said. He thought that the core developers should encourage the leading-underscore practice for new modules, in fact.

But a suggestion from Kyle Stanley to do a mass rename of the standard library did not get far. There are logistical hurdles, in terms of the deprecation cycle, but there is also a question of whether it would solve a real problem or not. Steven D'Aprano pointed out that he had rarely seen people misuse the private parts of a module, "but frankly that's going to happen even if we named them all '_private_implementation_detail_dont_use_this_you_have_been_warned' *wink*". Meanwhile, though, there are a lot of costs to making the change, which D'Aprano described at some length.

He also mentioned a "rule" that governs all of this: "unless explicitly documented public, all imports are private even if not prefixed with an underscore". Stanley replied that he was rethinking his advocacy of a tree-wide change, but wondered where the rule was specified. Steve Dower said that the rule was probably not documented anywhere, but thought D'Aprano's formulation was a reasonable one. In addition, Stanley suggested another path toward cleaning up the inconsistencies in the standard library: the @public decorator.

atpublic

Early on in the thread, Barry Warsaw pointed to his @public decorator project noting one of the "Zen of Python" principles: "Explicit is better than implicit." The module is available from PyPI (thus via pip) under the name "atpublic"; it provides a simple decorator that can be used to explicitly indicate the public names in a module:

    @public
    def foo():
        pass

    def bar():
        pass

    @public
    class Baz:
        pass

    public(QUX=42)
The function foo() and class Baz would be listed in the __all__ attribute, while bar() would not. Since constants cannot be decorated, public() can be used to both define the name and add it to __all__ as seen with QUX above. That way, __all__ will always reflect the current state of the intended public API.

Dower did not think @public was the way forward, in part because it should be harder, rather than easier, to change the names of the public API. He also mentioned the runtime overhead of @public, but Warsaw and others pointed out various ways to make that essentially disappear (even though it is pretty tiny even with the pure Python implementation of the module). Dower backed away from the performance impact question and, instead, looked at the downsides of a tree-wide change. He was also concerned with making changes before the actual policy was clearly articulated:

We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin).

So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is.

Stanley and Warsaw were both in favor of making it clear what the rule is for delineating the public API, and Cannon said that PEP 8 ("Style Guide for Python Code") would be the right place to put it. Warsaw noted that he was not envisioning some tree-wide operation should @public be adopted; that decorator can and should only be added incrementally. He was also concerned that there have been inconsistencies between the code and the documentation in the past. "The question always becomes whether the source or the documentation is the source of truth. For any individual case, we don't always come down on the same side of that question."

For the most part, there was widespread agreement on the underlying rules for determining the public API. D'Aprano's formulation seems to be a nice, compact way to put it, but a more detailed statement might make it more clear. APIs are tricky beasts; in general, development projects do not spend enough time designing, reviewing, and testing them before they commit to them for the long term. If the rules governing what is even in the API are not clear, it makes things that much worse. Resolving that ambiguity for Python would be a nice step forward.

Comments (2 posted)

Access to complex video devices with libcamera

By Jonathan Corbet
July 25, 2019

OSS Japan
Laurent Pinchart began his Open Source Summit Japan 2019 talk with a statement that, once upon a time, camera devices were simple pipelines that produced a sequence of video frames. Applications could control cameras using the Video4Linux (V4L) API by way of a single device node; there were "lots of knobs", but the overall task was straightforward. That situation has changed over the years, and application developers need more help; that is where the libcamera project comes in.

In truth, if your editor may interject a brief comment, even the basic V4L API is not entirely straightforward for the uninitiated. There is a negotiation process that must happen so that the application can determine whether a given camera can deliver the sort of data stream that is needed. The number of parameters to tweak is large. It was not uncommon to find applications that worked with some camera devices, but not others. V4L makes many things possible, but even the simplest tasks may not be easy.

libcamera

Now (returning to the talk), consider the situation with contemporary hardware, which is much more complex. A typical camera device has a collection of processing blocks (handling tasks like image scaling, color [Laurent
Pinchart] correction, color-space conversion, autofocus, etc.) that can be interconnected in a variety of ways. The media controller subsystem was introduced to expose this complexity to user space, but it only helps so much. A special-purpose application that, for example, understands a single video device found on a specific handset can be managed, but writing an application that can handle a wide variety of video hardware is challenging at best.

Developers at Nokia had, some years ago, envisioned a plugin-based mechanism for camera configuration but, before it could be implemented, Nokia canceled its smartphone project and development lapsed. Ten years later, libcamera was started with the intent of being "the Mesa of the camera stack"; its purpose is to make it easy for applications to interface with camera devices.

What is envisioned is a four-layer stack:

  • libcamera is the lowest-level layer, interfacing directly with the kernel. It is implemented entirely in user space, with no changes to kernel APIs planned.
  • A set of bindings will make libcamera available in a range of different programming languages.
  • The "adaptation layer" provides a set of interfaces to libcamera for existing applications; they will include a V4L compatibility layer, an Android HAL interface, and a GStreamer interface. The intent is to make libcamera suitable for all Linux-based devices.
  • The application layer exists already, in the form of GStreamer, native V4L applications, Android applications, etc. There will also be native libcamera applications in the future.

Application interface

The first thing a libcamera application has to do is to enumerate the available cameras. A "camera" in this context is what users might see as a camera device; much of the underlying complexity (sensor, DMA bridge, processing units, etc.) is hidden within each camera device. These cameras expose a set of capabilities, such as how many concurrent video streams they can support, what types of controls they have, and the resolutions they are capable of. "Profiles" exist as a way of pulling together the capabilities needed for given tasks; there can be profiles for applications like "point-and-shoot camera" or "video conferencing".

The handling of concurrent streams is a key feature of libcamera. For example, a point-and-shoot device might have one mid-resolution stream used to preview a scene on a handset's screen and a full-resolution stream for image capture.

Controls in libcamera can be set on a per-frame basis, hardware permitting. These controls can include exposure time, focus settings, white balance, etc. This is not a useful feature for applications like video conferencing, Pinchart said, but it's important for tasks like face recognition or machine vision, where the application needs to know the parameters associated with each frame.

A native libcamera application will, after enumeration of the available devices, reserve access to the device(s) needed. Access is exclusive in libcamera; if multiplexing is needed, a framework like GStreamer can provide it. There is a configuration stage where the camera creates a template configuration from a set of available roles; the application will then tweak the parameters as needed and validate the result to ensure that the camera can support it. There is, thus, still a negotiation process required, but the creation of an initial configuration should ease that process considerably. This configuration is done for every stream that the application needs.

Once that is done, the application will allocate a set of buffers for incoming video data. A "create request" operation will create a request to capture a single video frame with a given set of parameters and queue it to the camera. Most applications will queue multiple requests to keep the video pipeline flowing; buffers can be turned around and queued with new requests after their data is consumed.

Advanced algorithms

Naturally, there is full support for image-processing algorithms, ranging from automatic exposure, white-balance, and focus setting through to advanced noise reduction and more. There is a balancing act required here: these algorithms, it seems, are often provided by the manufacturer of the camera, and many of them are proprietary software. Libcamera will support them as separate, loadable modules; Pinchart said that we want all of this code to be open source, but that's not the case now and the first priority is to make it all work in a safe and reliable manner.

One important design decision here is that image-processing modules do not talk directly to the hardware; they go through the standard interfaces like everything else. A typical module will get statistics (or image data) from the hardware, compute the optimal image parameters, then use libcamera interfaces to configure the device accordingly. "There will be no secret ioctl() calls", he said. Modules will also be sandboxed to limit the damage they can do to the rest of the system.

The low-level camera device abstraction is designed with as much device-independent code as possible. There are a lot of independent low-level camera implementations now; for example, the Android and ChromeOS teams do not talk to each other and each create their own implementations, he said. The intent is to make it easy for vendors to add support for their devices directly to libcamera so that everybody can work from the same implementation.

The libcamera project is in a relatively early stage of development; no actual releases have been made yet. It worked well enough for Pinchart to do a quick video-conferencing demonstration with an obliging developer in Europe who stayed awake until the time came; the image quality showed that work remains to be done, but the basic pipeline works. If libcamera continues to progress and meets its goals, it seems likely to show up on systems in the not-too-distant future.

[Your editor thanks the Linux Foundation for supporting his travel to the event.]

Comments (9 posted)

Completing the pidfd API

By Jonathan Corbet
July 26, 2019
Over the last few kernel releases, the kernel has gained the concept of a "pidfd" — a file descriptor that represents a process. What started as a way of sending signals to processes without race conditions has evolved into a more complete process-management interface. Now one of the last pieces is being put into place: the ability to wait for processes using pidfds. But, naturally, that API has had to go through some revisions first.

A pidfd recap

Unix-like systems traditionally represent many objects as files, but processes have always been an exception. They are, instead, represented by process IDs (PIDs), which are small integers — limited to 32767 by default, though that limit can be raised on Linux systems. There are a few problems with this representation, but the biggest one is arguably that PIDs are reused; when a process exits, its PID can be assigned to a new, unrelated process, and this can happen quickly. That creates a race condition where code that operates on a process, most often by sending it a signal, might end up performing an action on the wrong process.

A pidfd is, instead, a file descriptor that refers to an existing process. Once the pidfd exists, it will only refer to that one process, so it can be used to send signals without worry that the wrong process might end up being the recipient. This feature is valuable enough that some process-management systems, most notably the one used by Android, are being rewritten to take advantage of it.

There are two ways to create a pidfd. The preferred method in most cases will be to supply the CLONE_PIDFD flag to the clone() system call (or perhaps clone3() in the future); upon successful process creation, a pidfd representing the child will be returned to the parent. It is also possible to create a pidfd for an existing process with pidfd_open(), which was merged for the 5.3 kernel.

A process holding a pidfd for a process can send a signal to that process using pidfd_send_signal():

    int pidfd_send_signal(int pidfd, int signal, siginfo_t *info, unsigned int flags);

The 5.3 kernel also adds the ability to pass a pidfd to poll(), which will provide a notification when the process represented by that pidfd exits.

Waiting on a pidfd

While it is now possible to use poll() to learn when a process has exited, that is not a complete solution for process-management systems, which need to be able to wait for specific processes and reap the exit information once they are done. That requires some sort of variant on the wait() system call. To fill in that gap, Christian Brauner proposed the addition of yet another new system call:

    int pidfd_wait(int pidfd, int *stat_addr, siginfo_t *info,
    		   struct rusage *rusage, int states, int flags);

This call would wait for the given pidfd; the states parameter can be used to specify which state transitions (WSTOPPED for when the process receives a stop signal, for example) to wait for. The flags field offers additional options, including WNOHANG for non-blocking operation; see the above-linked patch cover letter for the full list.

This call, Brauner said, is "one of the few missing pieces to make it possible to manage processes using only pidfds". It is destined to remain missing, though, at least in that form; Linus Torvalds made it clear that he didn't like it. He had no objection to the desired functionality, but questioned the need for a new system call; instead, he said, the waitid() system call should simply be extended with a new flag.

That is exactly what was done in a new patch series posted by Brauner; waitid() has gained a new P_PIDFD ID-type value that causes the given ID to be interpreted as a pidfd. This approach ended up being a rather smaller patch that does not need to add a new system call; there have been no responses to it as of this writing, but it would be unsurprising if this change were to be merged for 5.4.

Beyond the ability to unambiguously specify which process should be waited for, this change will eventually enable another interesting feature: it will make it possible to wait for a process that is not a child — something that waitid() cannot do now. Since a pidfd is a file descriptor, it can be passed to another process via an SCM_RIGHTS datagram in the usual manner. The recipient of a pidfd will, once this functionality is completed, be able to use it in most of the ways that the parent can to operate on (or wait for) the associated process.

There was one other interesting piece in the original pidfd_wait() proposal: a new clone() flag (CLONE_WAIT_PID) that would cause the newly created process to be invisible to most wait() calls. Only a variant of wait() that specified that process in particular (by specifying its pidfd, for example) would be able to reap its exit information. There are a few use cases for this functionality; one that was listed is a library that needs to create a helper process that won't show up if the calling application calls wait(). This feature was not part of the second patch set, but is expected to show up in a separate posting in the near future.

There will almost certainly be other pidfd-oriented enhancements in the future; this feature is new and should not be considered to be complete. But the ability to wait on a pidfd might be seen as the end of the first round of development for the pidfd concept. It has been a relatively quiet set of changes, but the move to pidfds is a fundamental change in how processes are managed on Linux systems.

Comments (29 posted)

Bounded loops in BPF for the 5.3 kernel

July 31, 2019

This article was contributed by Marta Rybczyńska

BPF programs have gained significantly in capabilities over the last few years and can now perform many useful operations. That said, BPF developers have had to work around an annoying limitation until recently: they could not use loops. This restriction was recently lifted by a patch set from Alexei Starovoitov that was merged for Linux 5.3. In addition to adding support for loops, it also greatly decreases the load time of most BPF programs.

The problem

Before a BPF program runs, it needs to be checked to ensure that it cannot cause harm to the system. For example, if it does not complete in a bounded time, it could be used to carry out a denial-of-service attack on the system. The task of checking programs is performed by the BPF verifier. Until recently, the verifier could not handle loops, meaning that all programs with loops were rejected.

Since loops are one of the basic ingredients of a computer program, this limitation was often hit by developers, who worked around it by unrolling loops (either by hand or using a compiler pragma). It comes as no surprise that there have been a number of attempts to change the situation by allowing loops that provably have a limited (and reasonable) number of iterations. Those include a proof-of-concept proposal from Edward Cree. Another attempt by John Fastabend tried to solve the problem with loop analysis: identifying each loop's induction variable and verifying its use. Fastabend covered the theoretical background and the possible solutions in a presentation at the 2018 Linux Plumbers Conference. The kernel developers decided not to take either solution at that time.

In parallel, important work has been put into optimizing the BPF verifier, as shown in a set of Linux Storage, Filesystem and Memory Management Summit slides by Jakub Kicinski. One important outcome of this work was increasing the size limitation for BPF programs in the 5.2 kernel; instead of 4096 instructions, a program can execute up to one million. These optimizations allow a more direct — even brute-force — approach to the bounded-loop-checking problem: instead of adding special handling for loops, the verifier can simply simulate the iterations of a loop as a collection of states no different from any others.

The BPF verifier

When the BPF verifier analyzes a program, it creates a model in the form of a state machine. It checks each state for incorrect behavior; while it does so, it records the states it has already verified. Later on, when it reaches a state that is equivalent to one that was already verified, it can conclude there is nothing more to do on that path, which can thus be pruned. This pruning significantly reduces the amount of work that the verifier must do.

State-based pruning is essential for verifier performance, but it imposes its own cost in the form of comparing states and copying the safe ones. Recent analysis of a set of networking-related BPF programs showed that 80% of the saved states will never be matched and, thus, will never prune a future search. As a result, there may be performance gains to be had by reducing the number of states and pruning points maintained by the verifier.

The old optimizer worked by placing pruning points both before and after each jump instruction, resulting in pruning points being added approximately every four instructions. It turns out that simply placing a pruning point every ten instructions, regardless of the instruction type, improves performance considerably. The verifier was also modified to aggressively drop saved states that do not actually prune paths. These changes improve verifier performance by up to 20% and increase the length of a program that can be verified in a reasonable time by one-third.

Adding bounded loops

Starovoitov's patch set introducing bounded loops builds on that earlier work. It also includes a number of other improvements. This is partly the result of a big performance regression introduced by the first patch in the series that is a necessary building block for the final solution.

That regression takes the form of a performance degradation when adding variable tracking on the stack. The compiler often puts variables on the stack (or "spills" them) when it needs to free up some registers. The verifier should be able to track such variables, since it may need to make decisions based on their contents. Until now it did not do so, resulting in certain programs being incorrectly rejected. It turns out that loop induction variables are often spilled, so fixing the tracking of their contents was necessary to be able to verify loop termination.

On the other hand, tracking specific variable values (as opposed to ranges of possible values) creates more states, decreasing the effectiveness of state pruning; tracking them on the stack makes the problem even worse. The number of states increases, so the verification time also increases. When debugging this issue, Starovoitov found another effect; the performance penalty is aggravated by changes in the Clang compiler. It turns out that newer Clang versions spill fewer variables onto the stack, reducing state pruning even in the absence of complete value tracking. The two problems together caused an important degradation of verifier performance, with the exact results available in the commit message.

Another feature needed by the bounded-loop support is extending the way the verifier handles conditional branches. If a comparison takes place between two constants, the verifier can easily determine which branch will be taken. Comparisons involving variable values are clearly harder, which is why, until now, the verifier supported only comparisons of a register with a constant. In this patch set, Starovoitov added support for tests comparing two registers as well. This enhancement was necessary to be able to simulate the execution of loop tests.

The third and final piece consists of adding a parent-child relationship between the states. When the verifier needs to explore two branches in the program, it considers them both as child states of the parent state. It also counts the number of branches to explore so that it knows how many are left. With those features, and more aggressive heuristics in the exploration state pruning, the verifier can support bounded loops. It does it simply by simulating all possible iterations of their execution.

With bounded-loop support added, only one item remained: the regression introduced at the beginning of the series. The solution comes in the last patch of the series. Based on the improvements added before (especially the parentage tracking), Starovoitov added tracking of precise scalar values. Those values are stored in registers and will be modified during program execution. The verifier needs to have precise values to analyze branches correctly, but it need not track the value of every register precisely; only those that control branching require that precision. So the verifier does not incur the cost of tracking all registers precisely; instead, when the need arises, it backtracks through the use of a register in the code to generate a precise value if possible.

Summary

The BPF verifier has undergone a number of changes in this patch set. The resulting code not only adds support for bounded loops, but also a number of important optimizations. Writing BPF programs for the kernel should be rather easier in the 5.3 release, though undoubtedly BPF developers will still have ample opportunity to complain about the remaining hoops they have to jump through to convince the verifier that their programs are safe.

Comments (21 posted)

KernelShark releases version 1.0

By Jake Edge
July 31, 2019

It has been the better part of a decade since the last KernelShark article appeared here; in the interim, the kernel-tracing visualization tool has undergone some major changes. While the high-level appearance is largely similar, the underlying code has switched from GTK+ 2.0 to Qt 5. On July 26, maintainer Steven Rostedt announced the release of KernelShark version 1.0, which makes it a good time to take another peek.

KernelShark is a graphical interface to help track down information in the voluminous kernel traces that trace-cmd can produce. trace-cmd is a front end for the ftrace kernel tracer. Rostedt wrote about trace-cmd and ftrace (part 1 and part 2) for LWN nearly a decade ago as well. Ftrace can collect an enormous amount of information from within a running kernel; trace-cmd simply makes it much easier for users to configure and manage those traces. KernelShark adds yet another level of capabilities.

[KernelShark]

As can be seen in the screen shot from a lightly loaded system (click through for a full-resolution view), KernelShark maintains its overall look, with two main panes to display the data it found in trace.dat (by default); trace.dat is the name of the default trace-cmd output file. The top gives a graphical view of the trace events that were gathered, organized by CPU, at least by default. Each horizontal bar indicates the activity on a particular CPU; that activity is color-coded based on the task running. Full-height indicators on the bar denote an event captured, while half-height indicators just show that the task is running; a bare line shows idle time.

The graphical display can show task bars instead of, or in addition to, the CPU bars. The task bars show the events only for tasks of interest, which are chosen from the "Tasks" entry in the "Plots" menu. The display shows a segment of the time covered by the trace; initially, it shows the entire duration, with start and end times at either end. But one can zoom in on the trace, reducing the amount of time shown and increasing the granularity of the view (the screen shot below is zoomed in from the one above). That will allow users to find and focus on places of interest in the trace.

Zooming can be accomplished a number of different ways. There are buttons directly above the graphical pane; "+" and "-" increase and decrease the zoom factor, while "++" and "--" zoom all the way in or out. The mouse scroll wheel can also be used to zoom in and out. One note: if there are more horizontal bars than the graphical pane can show, so that there is a scroll bar for the pane, it can be a little unexpected to get the zoom function rather than the expected scroll behavior. Scrolling requires using the scroll bar (or increasing the size of the pane or KernelShark window to eliminate the need). Yet another mechanism to zoom in is to click and drag within the graphical view, which outlines a rectangle that will define the region of interest; when the mouse is released it zooms in to show that region.

The second main pane is the list pane, which shows the events that were recorded in the trace. For each event, there is an index number for the event's position in the trace, the CPU on which it was recorded, a timestamp, the name of the task and its process ID, some flags (interrupts disable, need reschedule, etc.), the event name, and the output from the event. Above the list pane is a search interface that can be used to search any of the columns, as well as a "Graph follows" checkbox, which governs the action in the graphical pane when a particular list entry is selected.

[KernelShark zoomed in]

If "Graph follows" is checked (which is the default), then a marker (vertical line) is placed in the graphical view where an event that was selected in the list view occurs. Users can see what else is going on around that time at various timescales by zooming in and panning (using the arrow keys). There are two markers available (A and B); they can be set from either the list view or the graphical view. Double-clicking in the graphical view will set the marker (whichever has its button highlighted above the view), but it is being set on an "nearby" event, so the double-click must be close to an event or the marker will not be placed anywhere. Placing the marker will highlight the event in question in the list view as well.

Each marker has a timestamp associated with it (i.e. the same as the timestamp on the event it refers to) that is listed with the buttons above the graphical view. If both markers are set, the difference between their timestamps is also displayed there to 100ns precision.

There are multiple filtering capabilities available from the "Filter" menu. Tasks, events, and CPUs can be removed (or added back in) from the graphical view, list view, or both. The advanced filtering can do even more, allowing access to the filtering mechanisms provided by trace-cmd/ftrace. Filters can also be saved in order to reuse them later.

KernelShark saves its current state when it exits and restores the state of that session when it is restarted. Beyond that, sessions can be saved at any time and then, naturally, can be loaded when they are needed.

The KernelShark documentation is rather terse, but covers most everything needed. The "Tools" menu has some entries that could use some explanation (e.g. on plugins, the seemingly non-functional color-scheme slider, and the "Record" option), however. The short build instructions provided everything needed to build KernelShark on a Fedora system; there is dependency information for Ubuntu as well.

KernelShark currently lives in the trace-cmd repository, though that will change eventually. In the announcement, Rostedt noted that he will be transitioning the maintainership of KernelShark to Yordan Karadzhov; they will be sharing the maintainer duties for a while. As part of that transition, KernelShark will be moving to its own repository.

For those digging into kernel traces, KernelShark looks to be an excellent tool to help find whatever problem they are investigating. It would seem that the KernelShark team is not resting on its laurels, either, as Rostedt specifically mentions getting started on work toward KernelShark 2.0. No roadmap seems to be available for that, but one would guess that user suggestions (and, better still, code) would be welcome at this point.

Comments (12 posted)

Darling: macOS compatibility for Linux

July 30, 2019

This article was contributed by Sean Kerner

There is an increasingly active development effort, known as Darling, that is aiming to provide a translation layer for macOS software on Linux; it is inspired in part by Wine. While Darling isn't nearly as mature as Wine, contributors are continuing to build out capabilities that could make the project more useful to a wider group of users in the future.

The project released a progress report for the second quarter of 2019 on July 23 outlining recent contributions and the state of the project overall:

We are very excited to say that in Q2 2019 (April 1 to June 30) we saw more community involvement than ever before. Many pull requests were submitted that spanned from bug fixes for our low level assembly to higher level modules such as the AppKit framework.

According to the project's website, the name Darling is a combination of "Darwin" and "Linux". Darwin is an open-source effort that is at the foundation of macOS, providing a Unix-based layer that macOS is built upon.

Licensing

Darling is licensed under GPLv3 and, according to the project home page, it does not violate Apple's End User License Agreement (EULA) since it only uses the parts of Darwin that have been released as free software. Darwin, however, is licensed under the Apple Public Source License (APSL), which is a free-software license, but is not compatible with the GPL according to the FSF.

In an issue posted on BountySource on July 25, Richard Yao suggested that Darling opt for a different license, such as the LGPLv2.1, which in his view would be compatible with the APSL. Darling contributor Lubos Dolezel disagreed, saying that it is possible to distribute GPL code with differently licensed code:

If you take most Linux distributions, you'll find packages that have mutually somewhat incompatible licenses. That doesn't mean they cannot co-exist in the same RPM/DEB repository or that you cannot write a Bash script that uses executables from both worlds.

Darling is more than an application, it is kind of a macOS distribution.

It is not clear that this reasoning would extend to the kernel module, which is licensed under the GPL but that, according to Yao, contains a fair amount of XNU (Darwin kernel) code that is under the APSL.

Beyond Darwin

There are also tools and libraries outside of Darwin that are part of Darling, including Cocotron, which is an open-source implementation of Cocoa — Apple's API layer for desktop applications.

In an interview, Darling contributor Andrew Hyatt explained that it's best to think of the project as being made up of many different components. Anything on https://opensource.apple.com/ that is useful is pulled down and included by the project. "This is typically command line tools but does include some system libraries/frameworks, such as Security and libsystem," Hyatt said. "Things like AppKit have source unrelated to Apple. We based our AppKit and Foundation on the source of Cocotron, which was outdated when we forked it but we have been slowly adding missing bits to it. I'd say the issues we face are from our implementations being incomplete more than they are divergent."

AppKit is a framework that includes libraries and objects used to build user-interface elements for an application. There are many other frameworks that are now being added into Darling, with a good number of them started in the past quarter by contributor James Urquhart. The report noted that Urquhart's pull requests gave the project more stubs for many frameworks.

Urquhart explained in an interview that stubs are basic implementations of API functions that are meant to allow programs which use the API to load. "In most cases they do absolutely nothing besides this, so they don't provide any guarantees that the program will run correctly," he said. "So the stubs are mere stepping stones to implementing the full required API."

Among the framework stubs that Urquhart contributed is AGL, for creating and managing OpenGL rendering contexts. There are also multiple stubs contributed by Urquhart for frameworks that are related to Carbon, which is a C-based API used for enabling backward compatibility for Mac OS 8 and 9 applications on Mac OS X. The Core Services framework works with Carbon to provide identity and other services, while ApplicationServices brings additional features to Carbon.

Urquhart explained that, in general, the focus for his contributions was to get legacy applications running on a basic level, without a graphical user interface (GUI). This required stubs implemented for the aforementioned frameworks in order for the executable to even load.

The Darling status update also mentions the challenge of nested frameworks within macOS:

Some frameworks on macOS appear to be just one framework but under the hood are made up of sub-frameworks which are usually reexported so that when being linked to it appears to be one big framework. Until now, our build system didn't support nested frameworks. In June, Andrew [Hyatt] added support for this and system frameworks such as Accelerate now have the same file structure as they do on macOS. This was accomplished using some CMake magic and carefully replicating the structure of some system frameworks.

What can you do with Darling?

Unlike Wine, Darling cannot run full macOS GUI applications such as the Xcode IDE on Linux yet. Xcode is a collection of Apple's core development tools that are used for building both macOS and iOS applications. Urquhart said that he's not using Darling in any production environment — most of his tinkering at the moment is for proof-of-concept work.

While a full-GUI macOS application won't run on Linux via Darling, that doesn't mean that macOS applications won't work. Hyatt explained that, if what you want to run can be done from the command line, there is a decent chance it will work. "In the past year we have made some big steps towards getting xcodebuild working, which allows you to compile Xcode projects from the command line," Hyatt said. "I think if we get that fully working it will really grow the project because then it would be possible to build your iOS/macOS apps on a Linux server."

Hyatt highlighted a GitHub issue for the Darling project that illustrates an interesting use case. In that issue, Tom Medema asked if it was possible to run sketchtool, which is the command-line interface for the popular Sketch macOS app. Progress has been made, as can be seen in the bug report, to the point that it will start up and print its usage text.

Another use case for Darling in the future could be for 32-bit software. With the upcoming macOS Catalina release set for later this year, Apple is dropping support for 32-bit applications. Hyatt emphasized that Darling has no such plans and will continue to include 32-bit support. "Our position also allows us to potentially have better compatibility for older apps than macOS itself because we can bake in conditional logic into our APIs that behaves in a more compatible way for older programs," he said.

Looking forward, Hyatt commented that he expects that people will merge patches to improve compatibility for applications they want to use Darling for. "It's tough to put a number on full GUI app support," he said. "If it's in years, I'm not sure how many, I guess it depends on if we can attract even more contributors with the skill and time to work on that."

Overall, it remains to be seen how successful Darling will become and if it will eventually emulate Wine's success by enabling a broad range of macOS applications to run on Linux. It took years of effort for Wine to become stable and it is a continuous effort to enable a growing list of applications; the same trajectory is likely for Darling. What is clear is that there is a dedicated group of developers who are interested and committed to figuring out how make the macOS application layer work in some capacity on Linux.

Comments (11 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>


Copyright © 2019, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds