Kernel development
Brief items
Kernel release status
The 4.10 merge window remains open; it can be expected to close on or before December 25.Stable updates: 4.8.15 and 4.4.39 were released on December 15.
Quotes of the week
Given the nature of RCU, the only possible answer I can give to that question is "probably".
Kernel development news
4.10 Merge window part 2
Just before the 4.9 release, Linus Torvalds said that the 4.10 development cycle was "shaping up to be on the smaller side". As of this writing, 11,087 non-merge changesets have been pulled into the mainline for 4.10. So this release isn't shaping up to be all that small in the end. It will not match 4.9 (which saw 14,308 changesets during the merge window), but it will be comparable with most other recent development cycles.
Just over 4,000 of those changes have been pulled since last week's summary. Some of the more interesting user-visible changes merged since then are:
- "Fail fast" support has been added to
the MD RAID subsystem. If a
drive starts showing errors in a RAID 1 or RAID 10 array, it
will be immediately marked bad and avoided, thus avoiding long delays
while error recovery is attempted.
- The XFS barrier and nobarrier mount options don't
actually do anything in current kernels; they have been deprecated
with removal targeted for 4.15 or later.
- The UBIFS filesystem now supports encryption.
- Tracepoint filters have long supported simple wildcard matching; in
4.10, that has been expanded to something closer to the file-name
matching supported by shells.
- There is a new crypto device in the virtio subsystem, allowing the
provision of cryptographic operations to virtualized guests.
- The PowerPC architecture has gained support for the
kexec_file_load() system call
and for compilation with stack protection.
- The logfs filesystem, unmaintained for
years and seemingly unused, has been removed from the kernel.
- The kernel's integrity measurement
subsystem has been extended with the ability to carry the measurement
list across a soft reboot of the system. The measurement depends on
TPM PCR registers, and those are only reset in a hard boot, so
validation after a soft reboot requires this capability.
- New hardware support includes:
- Audio:
Cirrus Logic CS35L34 and CS42L42 codecs,
Cirrus Logic wm8581 codecs,
Qualcomm MSM8916 WCD analog codecs,
Qualcomm MSM8916 WCD digital codecs,
Axentia TSE-850 transmitter station equipment,
Realtek rt5665 codecs, and
Allwinner sun8i codec analog controls.
- Input:
ASUS I2C touchpads and
THQ PS3 uDraw tablets.
- Miscellaneous: HiSilicon BVT pulse-width modulators, VMware paravirtualized RDMA adapters, Mellanox CPLD-based I2C multiplexers, IMX low-power I2C interfaces, ARC 32-bit timers, NVIDIA Tegra boot and power management processors, NVIDIA Tegra generic memory interfaces, Texas Instruments da8xx DDR2/mDDR memory controllers, TI system control interfaces, Renesas fine display processors, Oxford Semiconductor OXNAS NAND controllers, Tango NAND flash controllers, and EPSON TOYOCOM RTC-7301SF/DG realtime clocks.
- Audio:
Cirrus Logic CS35L34 and CS42L42 codecs,
Cirrus Logic wm8581 codecs,
Qualcomm MSM8916 WCD analog codecs,
Qualcomm MSM8916 WCD digital codecs,
Axentia TSE-850 transmitter station equipment,
Realtek rt5665 codecs, and
Allwinner sun8i codec analog controls.
Changes visible to kernel developers include:
- The crypto layer includes a new "acomp" subsystem to perform asynchronous compression of data; several algorithms are supported. For the time being, the only documentation available is the kerneldoc comments in this commit.
At this point, the flow of changes into the mainline has slowed significantly; most maintainers appear to have respected Linus's request to get their pull requests in early. The 4.10-rc1 release can be expected on or before December 25; chances are that there will be few additional significant changes during that time. Should any happen, though, they will be summarized in the January 5 edition.
The LPC Android microconference, part 2
The Linux Plumbers Android microconference was held in Santa Fe on November 3rd; this is the second in a series of two articles covering the discussions held at that event. Part 1 looked at the staging tree, background updates, memory management, graphics, and more. Read on for a summary of the other discussions held that day.
OP-TEE and Trusty
Jens Wiklander from Linaro shared an overview on secure-world implementations, OP-TEE and Trusty in particular (slides [PDF]). Jens works on OP-TEE, which provides an open-source trusted execution environment (TEE) that has been a valuable reference implementation for research and education; most other TEE implementations are proprietary. He noted that Google also has its Trusty TEE environment that is also open source, and that he would try to provide some comparison between the two where he could in his presentation.
He provided an outline of how TEEs and the secure-world applications interact with the kernel and user space, applicable to both OP-TEE and Trusty. OP-TEE's implementation, in particular, tries to implement a generic TEE API that user-space applications and the TEE supplicant can interact with; it forwards commands to the secure world via secure monitor calls (SMCs). Jens noted that Trusty's approach is similar, but it uses a virtio driver to communicate with the secure world.
One interesting aspect of the secure world is that there isn't necessarily any persistent storage that it can use, so OP-TEE implements secure storage by having trusted applications in the secure world encrypt and sign data that they want to persist. It passes that data out through the kernel to the TEE supplicant, which stores the encrypted data in a database in the filesystem.
Another aspect of the TEEs that he covered is how they do scheduling. Entry into the secure world is normally done via an SMC or via an FIQ interrupt but, for OP-TEE, the work is done only via SMC calls, so all the scheduling done in the secure world is done as part of a command or request from the kernel in the normal world. Trusty, instead, includes an integrated scheduler which is triggered by the FIQ. Communication between the TEE and normal world is done via shared memory, which is implemented via a reserved contiguous memory region. That region must then be set up with the same cache settings in both secure and non-secure world.
OP-TEE currently supports a set of devices, including a number of arm and arm64 devices. He also covered the xtest test suite that OP-TEE uses for validation, which can be extended to support the GlobalPlatform TEE compliance test suite.
Rom Lemarchand asked how much of the GlobalPlatform API is supported with OP-TEE. The answer was that both the internal and client APIs are supported. There was also a question as to which crypto IP blocks are supported; currently none are. When asked if secure memory is supported on any of the supported devices, Jens's answer was all except HiKey. Rom was asked if he could further contrast Trusty and OP-TEE to call out anything that wasn't covered. He clarified that Trusty doesn't support GlobalPlatform at all, and that Trusty needs crypto block and secure memory functionality to work. Jens noted that, while OP-TEE is scheduled by the Linux side, Trusty can take some cores away for exclusively secure execution. At that point the discussion died down and the session moved on.
Multiple devices in AOSP
Rob Herring from Linaro discussed efforts to improve support for multiple devices in the Android open-source project (AOSP) (slides [PDF]). His larger goal is to try to consolidate the hardware abstraction layers (HALs) that Android uses, and allow vendors to develop kernel support for a device once and have it work on Android, ChromeOS, as well as traditional Linux distributions. Vendors could then avoid the effort of developing custom HALs when they don't bring new value. This would make upgrading to newer Android releases easier, allow devices with mainline kernel support to "just work", and would allow for an upstream community for Android devices to grow.
Currently if one wants to bring up Android on a new device, the seemingly "standard" approach is to copy over the device directory for a system that seems similar and rename all the files and variables with the new product's name. This trivial renaming of code has the unfortunate side effect of making diffs between implementations impossible. Then one might scan through other devices to find any needed missing functionality and copy-and-paste those in before building and testing. This is pretty poor. Determining what's actually different between two builds for different devices is quite complicated; Rob recalled a time where he tweaked the build system to generate multiple megabytes of configuration logs just so he could diff two builds. Another problem facing the Android ecosystem is how to quickly upgrade devices to new Android releases, which now come out monthly. This problem compounds if you're trying to support more than one device.
The solution he's working on is trying to allow multiple devices to be supported from a single build target, ideally moving to supporting devices of the same architecture with a single filesystem image. He has implemented this using Kconfig, which is used to generate BoardConfig.mk and device.mk files. This provides a nice UI to allow configurations to be more discoverable, and allows higher-level features that otherwise require multiple configuration settings to be wrapped under a single element.
His build target currently supports DB410c, HiKey, Nexus 7, RaspberryPi 3, and QEMU (for x86_64, arm, and arm64). He gave a brief demo of the Kconfig options and outlined what he sees as the next steps: trying to push this to AOSP, adding any missing Kconfig options as device supports grows, supporting custom compilers or compiler flags, integrating a kernel build, as well as supporting other features like malloc() or filesystem selections. He then provided some input for folks who want to contribute, specifying that there should not be device-specific configurations but, instead, more fine-grained configuration options as needed, along with using device-specific default configuration ("defconfig") files to provide an easy way for folks to get a build for a specific device.
He also outlined some drawbacks with Kconfig. It doesn't integrate nicely with the top-level Android build, so some stale files can be left in the build tree when a configuration changes. The defconfig files also have to be kept current or they can grow stale as new configurations are added. The Google developers in the crowd thought this effort was interesting, but they suggested that Rob talk with the developer who is reworking the Android build system to get his feedback.
HiKey in AOSP
John Stultz provided an update on support for HiKey boards in AOSP (slides [PDF]). HiKey is an arm64 board that was included as a target in AOSP back in March. Since then, work has continued and many features added, including moving to the 4.4 kernel and updating to Android 7.0 (Nougat). Recent work has been done on integrating the energy-aware scheduler, which is a collaboration between folks at Google, ARM, and Linaro. Integration of OP-TEE and Trusty so that they can coexist as build-time options has been worked on. Since most Android devices don't have to deal with dynamic peripherals on buses that cannot be probed, work has been done on creating a device-tree overlay manager so that different device-tree fragments can be selectively applied, depending on boot arguments. Since downloading and building Android can take multiple hours (or days), pre-built factory images have been created so that developers can more easily get started using HiKey boards with the latest Android release.
There has been a lot of work related to HiKey in the generic Android common.git kernel tree: adding new features, cleaning up unused ones, and prepping for an android-4.9 branch. John pointed out Amit Pundir's talk [YouTube] on the status of the Android common tree, as Amit couldn't present it in person. He also mentioned that HiKey was a target for Rob Herring's generic-build effort discussed earlier.
Since HiKey will be supported for another year and a half in AOSP, there is still a fair amount of planned work on things like moving forward to the android-4.9 kernel (and beyond), finishing the OP-TEE/Trusty integration, enabling A/B style updates, and enabling memory reductions. John considers this work useful, because devices released over the last few years often use a kernel version that is one or sometimes two years old, and stick with them.
HiKey's immediate migration to the latest LTS-based Android kernel allows for testing and validation to help make those kernels more stable and reliable. That should result in less work to do when vendors do migrate to newer kernels, hopefully allowing it to be done more quickly. Further, since HiKey development follows upstream kernels as well, testing and validating mainline has caught a number of regressions, allowing them to be reverted or fixed quickly, keeping those issues out of future LTS releases.
Many of the kernel changes needed for HiKey have been upstreamed recently, with a handful of patches related to USB, HDMI-audio, and HDMI-output reliability issues being pushed upstream but not yet merged. Of course, there is still the large issue of Mali GPU support being based on proprietary user-space code, so upstreaming that kernel driver (which makes up by far the majority of the out-of-tree HiKey code) isn't an option, but the dream of an open-source Mali driver persists.
John emphasized why this HiKey effort is so important: it's because we really have two separate communities. One of those is the AOSP and hardware vendor community, which focuses on shipping a fully enabled, single device quickly, where the specific kernel version used doesn't really matter that much. The other is the upstream kernel community, where the priorities are on long-term maintenance and common solutions that allow multiple devices to work with a single binary, utilizing the mainline or linux-next. These different focuses both make sense, but they also make communication between the two communities difficult. So HiKey enables a small but useful area of overlap by providing an affordable device that works against the latest upstream kernels and is also supported by the latest AOSP user space. Thus, the device becomes a shared concern between the two communities, and hopefully will bring them closer together, allowing for more prototyping and validation between the communities.
Systems-level programming in Java
Elliott Hughes from Google shared recent developments to enable systems-level programming using the Java language in AOSP (slides [PDF]). He talked a bit about his own history with the Android project, working on native (C/C++ based) tools and libraries, as well as ART, but specifically his early efforts focused on cleaning up much of the native code in Android.
The libcore code was initially written in C and suffered lots of leaks and issues. His group switched to C++ and used tools to help clean up a lot of leaks and crashes, but still left a fair amount of native code, which can be problematic due to the semantic differences between Java and Unix. So instead the developers decided to expose POSIX to Java, leaving the native code to just marshal the data back and forth from the POSIX API calls. This allowed for easier debugging in Java, but also allowed for more flexibility.
With a few examples he showed how simple the code was, along with how some of the complexity POSIX imposes with error conditions returned via the global errno variable is handled. This code has started seeing use outside of the libcore code, as it allowed folks to avoid having to write their own Java Native Interface logic (which is difficult to do correctly), and soon normal apps in the Play Store wanted to use these interfaces too. So with Android Lollipop (5.0), Google introduced android.system, which makes POSIX generally available.
Some design choices were needed in its implementation; for example in the translation of C types (such as choosing when to make a char * a String and when to make it an Object) and creating Java classes to match C structures. Another example was how to handle errno, using exceptions which return the errno value as well as the string name of the function, which provides nice output for debugging like "open failed: ENOENT (No such file or directory)". The group also provided OsConstants, which allows the Java code to look much like C.
He covered a number of other miscellaneous design choices made. For example, functions always return new structures rather than filling in existing ones, as it's safer and not that expensive. All of this functionality is used in areas like networking code for DHCP, or dup() and lseek() usage in media code, extended-attribute usage, etc. Elliott concluded by saying that, while this isn't a particularly new feature, he mostly wanted to use this talk to advertise it, as he thinks it will be useful.
Using Clang in AOSP
The last speaker of the day was Bernhard (Bero) Rosenkränzer from Linaro, speaking about the use of the LLVM Clang compiler in AOSP (slides [PDF]). Since the release of Nougat, Clang is now used instead of GCC to build Android's user space. This effort required mostly small changes like making the code stick closer to C standards, but there were some more difficult areas like variable-length arrays in structures and differing semantics in the definition of extern. Several real bugs were found and fixed thanks to Clang's extra warnings.
The performance of the system is unchanged, but build times are nicely reduced. GCC is now only used for the kernel and some older device HAL implementations. However, work to get the kernel building with Clang is also happening. Bero and others at Linaro have been working on getting the HiKey kernel building with Clang. That work is a little outdated and will be updated soon to 4.9. While some of the patches needed are on their way upstream, there are still quite a few bad hacks involved.
Bero went over a number of the issues they encountered and workarounds they needed in this project and pointed to a Docker image he developed to make it easy for folks to get the proper toolchain environment and kernel source to try it themselves. Currently, the system runs well, but there are a few odd bugs like crashes in Firefox. While Bero wasn't sure if there was much interest from Google in moving to Clang for the kernel, the Google developers in attendance did say there was interest, but they had been working on this in earnest for just a few weeks, and suggested they sync up and collaborate to avoid duplicate work.
Now that the Clang patches needed to build AOSP user space have been merged, Linaro is working to stay on top of the latest Clang. He outlined the testing efforts done to build AOSP master with the latest Clang snapshots using continuous integration. He pointed out some of the new warnings and errors that Clang 4.0 is producing and noted that, while they aren't seeing too many compiler bugs in testing development branches of Clang, there are the occasional issues that have to be reported. As for the future of GCC in AOSP: AOSP is using GCC 4.9 and Bero suspects that is not likely to change. However, Linaro continues to do testing building AOSP master with GCC 6.x releases; since different toolchains report different warnings, building with both can help uncover issues that one toolchain might miss. However, he worries that the Android build system might someday depend too strongly on Clang plugins, preventing this effort from being useful.
Conclusion
With that, after the microconference concluded, somehow on schedule despite the huge number of sessions. Many thanks to all the speakers and organizers for putting a great session together. Its really great to see all the participation between community members and Android developers. Its quite likely there will be another Android microconference at next year's Plumbers, which we look forward to.
Enhancing lockdep with crossrelease
Lockdep is a runtime locking correctness validator that detects and reports a deadlock or its possibility by checking dependencies between locks. It's useful since it does not report just an actual deadlock but also the possibility of a deadlock that has not actually happened yet. That enables problems to be fixed before they affect real systems.However, this facility is only applicable to typical locks, such as spinlocks and mutexes, which are normally released within the context in which they were acquired. Under that assumption, the lockdep implementation becomes simple but its capacity for detection is limited, with the result that it cannot find all possible deadlocks. In particular, synchronization primitives like page locks or completions, which are allowed to be released in any context, also create dependencies and can cause a deadlock. So lockdep should track these locks to do a better job; it would be useful for these locks as well if we were able to identify dependencies created by them. The proposed "crossrelease" feature provides a way to do that.
A page lock is used to ensure exclusive access to a page structure; it is allowed to be released in a context other than that in which it was acquired. For example, a page lock could be acquired in process context, then released in software interrupt context after the event it is waiting for has occurred. With the proposed crossrelease feature, the page-lock-related deadlock in the following example can be detected, which cannot be done by current lockdep.
CONTEXT X CONTEXT Y CONTEXT Z mutex_lock(A) lock_page(B) lock_page(B) mutex_lock(A) /* DEADLOCK */
mutex_unlock(A)
unlock_page(B) /* acquired by X */unlock_page(B) mutex_unlock(A)
In this example, Y acquires the mutex A, then waits for B (a page lock) while holding A. Z, which can release B, is waiting for A; since A is held by Y, Z is blocked and cannot release B. In other words, both Y and Z are waiting for events which can never happen. It's a deadlock.
How can we detect that kind of deadlock? Let's see the way starting from lockdep fundamentals.
Lockdep fundamentals
A deadlock occurs when a context is waiting for an event to happen, but that event is impossible because another context that can trigger the event is also waiting for another event to happen, and that second event is also impossible due to the same reason.
A dependency might exist between two waiters and a deadlock might happen due to an incorrect relationship between dependencies. Thus, we have to define what a dependency is first. A dependency exists between if:
- There are two waiters waiting for each event at a given time.
- The only way to wake up each waiter is to trigger its event.
- The ability for one to be woken up depends on whether the other can.
If any partial set of dependencies forms a loop, for example, "A->B" and "B->A" (where "A->B" means that event A depends on event B), then it might lead to a deadlock since no waiter can meet its condition to wake up. Thus, detecting circular dependencies is a key to detecting the possibility of a deadlock. Precisely speaking, a dependency is one between whether a waiter for an event can be woken up and whether another waiter for another event can be woken up. However from now on, we will describe a dependency as if it's one between an event and another event for simplicity. The purpose of lockdep is to track these dependencies in a graph and identify situations where circular dependencies are created.
For example, consider a graph built by lockdep that looks like:
In this diagram, each node is a specific lock class, and the arrows indicate dependencies between those locks. Lockdep will add a dependency into the graph whenever a new dependency is detected. For example, it will add a dependency "E->C" when a new dependency between lock E and lock C is detected. Then the graph will be:![]()
This graph contains a subgraph which demonstrates a circular dependency:
This is the sort of condition under which a deadlock might occur. Lockdep reports it on detection after adding a new dependency.
What crossrelease does
Detecting and adding dependencies into the graph is important for lockdep to work; adding a dependency is the opportunity to check whether it might cause a deadlock. The more dependencies are added, the more thoroughly it can work. Therefore, lockdep has to do its best to add as many true dependencies as possible into the graph. By relaxing the assumption that locks must be released within their acquisition context, lockdep can add more dependencies reflecting how new types of locks, such as page locks or completions, are used.
Any dependency, for example "A->B", can be identified only in the context where A is released. That is not a problem for typical locks, because each acquisition context is same as its release context, thus lockdep can determine the dependencies at acquisition time. However, for "crosslocks" (those released in a different context), lockdep cannot make the decision in the acquire context but has to wait until the release context is finally identified. Therefore, lockdep has to queue all acquisitions which might create dependencies until the decision can be made. In this way, true dependencies can also be identified even for crosslocks.
How crossrelease works
As described above, lockdep queues all acquisitions until their true dependencies can be identified, and then adds the dependencies into the graph in batches. We call this new step "commit", which is the key for the crossrelease feature to work. Lockdep works well even without commit for typical locks. However, the commit step is necessary once crosslocks are involved, until all outstanding crosslocks are released. With the introduction of commit, lockdep performs three steps: acquisition, commit, and release. What lockdep does in each step is:
- Acquisition: For a typical lock,
lockdep does what it originally did and queues the lock so that
lockdep can check dependencies using it at the commit step.
Crosslocks are added to a global linked list so that lockdep can
check dependencies at the commit step.
- Commit: No action is required for typical locks. For crosslocks,
lockdep adds true dependencies using the data saved at the
acquisition step.
- Release: No changes are required for typical locks. When a crosslock is released, lockdep just removes the crosslock from the linked list.
By queuing data properly and performing the commit step, lockdep is able to track dependencies created by both typical locks and crosslocks.
Conclusion
Detecting a deadlock (or the possibility of one) involving locks that are allowed to be released in any context may look impossible, but it's not. The crossrelease feature is designed to do deadlock detection in a more general way. So both typical locks and crosslocks can be handled by the feature. However, since the assumption that locks are released within their acquisition context makes the lockdep implementation simple and efficient, the original algorithm using this assumption is preferred when possible. However, we cannot avoid using the crossrelease feature if we want to make lockdep also work for crosslocks.
Crossrelease makes lockdep able to handle more dependencies, which cannot be done by the lockdep implementation. Yet, there might possibly be more dependencies that cannot be handled even by crossrelease. If so, we will have to make the additional ones work by enhancing crossrelease or introducing another feature. Currently, crossrelease cannot identify some dependencies between two crosslocks since it's a rather complex problem. Work on that issue is currently in progress.
Patches and updates
Kernel trees
Architecture-specific
Build system
Device drivers
Device driver infrastructure
Filesystems and block I/O
Memory management
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>