A Q&A about the realtime patches

By Jake Edge
July 18, 2023

In a session at the 2023 Real Time Linux Summit, Thomas Gleixner answered questions about the realtime feature of the kernel, its status, and the Real-Time Linux project's plans for the future. The talk was billed as a "Q&A about PREEMPT_RT" with a caveat: "anything except printk() and documentation". As might be guessed, the first two questions were on just those topics, but there were plenty of other questions (and answers) too. The summit was held in conjunction with the inaugural Embedded Open Source Summit in Prague, Czechia at the end of June.

Documentation and `printk()`

Right at the start of the session, Steven Rostedt could not resist asking: "what's wrong with documentation?" That was met with a big laugh from Gleixner; "lots", he said. The biggest problem with documentation "is that it mostly doesn't exist" for realtime Linux. His printk() caveat was because the usual question is "when will it be done?", but that is "subject to crystal balls". He would be happy to answer technical questions about "why printk() is a horror-show".

With that, he advanced to his second (and final) slide: "Questions?", which elicited a big laugh from the audience. Tim Bird asked the next question, inevitably on the second "off-limits" topic: "is printk() okay if you are not using a serial console?" Gleixner said "no ... I mean, kind of"; there were some problems in the printk() core, aside from using consoles, that made it unsuitable to use with the realtime patches. Those have been fixed, but there are still the problems with using the console driver. Those problems are not truly realtime-specific, but running the kernel with a realtime configuration makes them even more obvious.

The printk() code contains a large amount of duct tape, he said, which is a pattern that the realtime developers have encountered in multiple parts of the kernel along the way. For example, the CPU hotplug code was in a similar position; everyone knew that the code was broken from a design perspective. Instead of fixing the design problems, more and more duct tape and ... other stuff ... has been added in, to the point where it "slowly composts into concrete, but it doesn't work". Eventually, "you have to bite the bullet and rewrite it".

Bird said that he had looked at the realtime patches recently, noting that there are around 80 of them scattered around the kernel, though mostly related to the serial console, with only about 4000 lines of code. He has been telling people that most of it is now upstream and that they do not have to apply the patches; "is that correct?" Gleixner had an initial one-word answer for that: "no".

You still cannot enable realtime on the mainline kernel due to the lack of the "printk() bits and pieces"; the other patches in the set are for things that can be disabled, so those are not required. Once the threaded printk() patches hit the mainline, then it will be time to ask Linus Torvalds to enable realtime for x86 and arm64. The problems with printk() have been solved, he said, according to John Ogness, who has been working on the code, and printk() maintainer Petr Mladek. "I will believe it once it hits upstream", Gleixner said.

THP and networking

An attendee asked about transparent huge page (THP) support; currently, it is disabled when PREEMPT_RT is chosen. He wondered if there is something that can be done about that. Gleixner said that the problems with THP for realtime need to be fixed, "patches welcome". The realtime project has been focused on getting other things done, and has not tackled THP yet. There is no technical reason why the two cannot work together, they just do not right now. The THP migration and coalescing for memory have unacceptable latencies for the realtime kernel.

The attendee mentioned the advantages of reduced translation lookaside buffer (TLB) pressure that come with THP, which Gleixner acknowledged, but said that the project needed to prioritize getting the core of the patch set into the mainline. There is nothing stopping others from doing that work (or hiring a consulting company to do so); the project will probably look into it at some point in the future, but it would be better if others who need the feature take it on now.

The priority of the software interrupts was the subject of a second question from the attendee; he wondered if their priority could be increased for the realtime kernel. Gleixner said that the priorities should be set from user space by the administrator, based on the needs of the system as a whole. The problem is that "software interrupts are semantically ill-defined", so the priority that might work for one application would be totally wrong for another. Those interrupts are "context stealing and not really controllable"; the network developers have defended using software interrupts "tooth and nail for a decade", but they have come around to the idea that they need to rethink that, he said.

The attendee said that currently networking is basically broken for realtime processes; but Gleixner said that it was a complaint "about facts that have been well-known for years". Once again, he wondered why people were simply complaining, rather than digging in and working on the problem.

Another attendee noted that you can switch the NAPI thread to a realtime priority using sysfs. Gleixner said that the networking developers are moving to a threaded NAPI right now, which solves a lot of the problem for realtime, but not all of it. There are still lots of bottom-half disables within the networking code, but those can be removed once networking fully switches to threaded NAPI.

He likened the local bottom-half disable (i.e. local_bh_disable()), which prevent software interrupts from running, to the big kernel lock (BKL). Though it is a per-CPU lock, it is completely unspecified what local_bh_disable() protects, just like the BKL. And, as with the BKL, removing those calls breaks things, "but you couldn't tell why".

The process of removing the BKL was useful, in that regard, because it allowed the kernel developers to figure out what it was protecting everywhere within the kernel, with one exception: the TTY layer. That brought up a question about the TTY layer and the realtime patches. It turned out that the attendee really wanted to be able to use all of the UART devices available in the kernel, but the only path to those devices right now is via the TTY layer.

Toast and TTYs

"If you go through TTY, you're toast", Gleixner said. "Good luck fixing the TTY layer", he continued; he would not be "touching that with a ten-foot pole, even if you pay me money". If there is a need for serial communication from realtime processes, then some other mechanism needs to be added because TTY "is unfixable" for realtime.

The attendee said that they were not sending much data via the serial device, but Gleixner said that did not matter; if the only code path to access the device is via the TTY layer, then "you have a problem". If there is a real use case for non-TTY access to these devices, then some other code path could be added; the attendee agreed that his use case has nothing to do with TTY.

In fact, the questioner said that he has been maintaining an out-of-tree UART driver since the Linux 1.2 days, but it relies on a particular chip, which may not continue to be available. Gleixner said that a problem known since 1.2, which was released nearly 30 years ago, seems like it should have been fixed long ago by working with the upstream kernel developers on a proper solution. There is already support for so many different kinds of oddball devices, adding another should not pose much of a problem, given that there is a use case for it and a reason why the TTY layer needs to be bypassed.

A question about a return of the i915 (Intel graphics) driver brought laughter from Gleixner and much of the room. It is apparently disabled for realtime kernels. The only way to get a driver for that hardware is to wait for the new driver that is under development, Gleixner said. The current driver is not fixable and the patches that are in the realtime tree "are extremely horrible"; perhaps they could eventually be merged, but it will have to come later, and he seemed skeptical about it even then.

Some of the locking code paths in the existing i915 driver "are completely homebrew and out of any rational locking scheme in the world". That is one of the reasons that the new driver is being developed; "the replacement driver stack is coming along, you just have to wait for it". It is another example of "train wrecks in the kernel" that the realtime developers have tried to fix along the way.

Things to avoid

An attendee asked if there was a list of things to avoid using with the realtime patches beyond the TTY layer and i915 driver that had already been mentioned. Gleixner said that i915 actually works with the "hacky patches" in the realtime tree, at least "by some definition of 'works'". But using the TTY layer from a realtime task should surely be avoided; you can call printf() from your highest priority task, but it probably is not the best idea. Doing I/O from a realtime task is not generally the right design.

The questioner wondered about filesystems; are they problematic when running the realtime kernel? Gleixner said that he has not seen any problems with filesystems for a long time. Daniel Bristot de Oliveira said that he had seen lengthy latencies due to Btrfs recently, but Gleixner was not aware of those reports. It is the case that other kernel developers can always "needlessly slap a preempt_disable() somewhere in their code"; those kinds of problems need to be tracked down and the developers have to be asked not to do that. It is part of why he is "urgently needing to find the cycles to complete the 'Kernel Developers Guide to RT'"; once that is done, the realtime developers can point other kernel hackers at it.

But filesystems largely stay out of the way, because they are not part of the realtime computation, he said; "if you write your logfile from your realtime task, fine, you asked for it ... if the disk stalls, you wait". What about an in-memory filesystem of some sort, he was asked. Gleixner said that might work, "but, seriously, don't do it". Doing so violates all of the principles of realtime, he said; that is not a Linux-specific problem as all of the different realtime operating systems will warn people away from write(), read(), and the like.

A realtime program should either read its data up front or, if it needs to continuously update the data, have another, non-realtime process with large buffers to do the reading, he said. If the realtime task needs to write data continuously, it should be written to a ring buffer that a separate non-realtime task writes out. That is basic realtime theory, which is not at all Linux-specific, he said.

There are systems that need to handle streaming data in realtime, from cameras for example, an attendee said; how should something like that be done? "That's a system-design problem", Gleixner said; you will need a dedicated network queue, for example, but there is no "general recipe how to make that work". The current networking code does not work all that well with realtime, but a system can be tuned to the point where it can handle high-speed, streaming data. There are also options for handling the network traffic in user space to avoid some of the problems with realtime and the current networking code.

Long journey

Bird asked about the Intel acquisition of Linutronix, which employs Gleixner and some of the other realtime developers; he wondered if Intel was now funding work on realtime Linux. Gleixner said that Intel had always helped fund the realtime work via the Real-Time Linux project; both Intel and Arm have an interest in realtime Linux, which is reflected in their project membership. Kate Stewart, Linux Foundation VP for Dependable Embedded Systems and the organizer of the project, said that Intel, Arm, TI, National Instruments, Red Hat, and others have all been part of the "long journey" to get the full realtime patch set into the mainline.

Rostedt noted that the long journey would be 20 years in 2024, but Gleixner said that was only the public part of the journey. It was first posted to the Linux kernel mailing list in 2004, but for him the journey started at the end of 1999. That means it will be 25 years for him since the beginning of the project; "it's a long journey and there are a lot of things we need to address and improve over time", though there is "only so much capacity". He has tried working day and night, but has found that "it doesn't make things more effective".

The final question was about the role of the cyclictest tool; is it a good reference application? Gleixner said that cyclictest is useful for testing, but that it does not "resemble any particular real-world application". The questioner wondered if there were any good examples of real-world applications that could be reviewed. Gleixner said that he did not know of one, but that cyclictest and other test/benchmarking applications do provide a kind of basic reference implementation; however, real-world applications have a wide variety of requirements and levels of complexity.

Part of the problem with coming up with a reference realtime application is the need for specialized hardware, Gleixner said. That is a difficulty for testing and benchmarking realtime systems, he said; the results are effectively not reproducible without access to the same hardware. It is particularly hard to integrate such a test into continuous-integration (CI) systems. There is infrastructure that allows people running CI or other tests on their hardware to report it back to the realtime project, which can help detect and find regressions and other bugs. With that, Gleixner noted that he was no longer standing between attendees and the bar (or other evening activities), and the 2023 Real Time Linux Summit was complete.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for assisting with my travel to Prague.]

Index entries for this article
Kernel	Realtime
Conference	Embedded Open Source Summit/2023

A Q&A about the realtime patches

Posted Jul 18, 2023 15:23 UTC (Tue) by andreashappe (subscriber, #4810) [Link] (5 responses)

Just curious: are there any well-known (mostly) out-of-tree patchsets that have survived 25+ years?

I remember when I first saw rt patches on the lkml, mad respect for the people developing/maintaining them (also a reminder of how quickly time passes).

A Q&A about the realtime patches

Posted Jul 18, 2023 16:54 UTC (Tue) by tsoni.lwn (subscriber, #139617) [Link]

..and let's not forget the FSM Labs RTLinux story around 2003-2004.

A Q&A about the realtime patches

Posted Jul 18, 2023 17:32 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> Just curious: are there any well-known (mostly) out-of-tree patchsets that have survived 25+ years?

DAHDI drivers for telephony: https://github.com/asterisk/dahdi-linux

A Q&A about the realtime patches

Posted Jul 18, 2023 18:42 UTC (Tue) by vstinner (subscriber, #42675) [Link]

GRsecurity maintained large patches outside upstream and it took a few years to get most of them merged upstream, no? I don't know how many years, but it seems like it's still a work-in-progress 😁

A Q&A about the realtime patches

Posted Jul 18, 2023 20:44 UTC (Tue) by flussence (guest, #85566) [Link]

I've seen a few last around 10-15 years. Reiser4 and BFQ come to mind.

Those are bolt-on features in a framework designed for pluggable drivers though. RT is a fundamental rework of the kernel to make it do things beyond what it's designed/allowed to do, in hostile environments (cpu scheduler code, a burnout graveyard for enthusiastic hackers), *and* it's survived several major eras in kernel development (not just the major version numbers, but the BKL, and the state of the early-2000s LKML besides).

It's a statistical miracle that this project has gotten this far. It deserves respect for that.

A Q&A about the realtime patches

Posted Jul 18, 2023 21:00 UTC (Tue) by ppisa (subscriber, #67307) [Link]

The mentioned out of tree project which required special chip level of support at UART chips level is uLAN RS-486 protocol https://ulan.sourceforge.net/ which use in our instruments predates Linux. The actual driver uses single source for DOS, Linux, Windows, NuttX and system-less bare metal application. API is and C level compatible even with assembly driver for 8051 chips. C driver is usable for ISA, PCI, PCIe and USB HW but with limited range of chips.

It would be great to integrate code into mainline Linux, but it would require complete rewrite to get rid of all compatibility with old kernels and other systems and I have minimal resources to investment into rewrite. I have no personal income from this project for its whole existence even that others use it to obtain funding from European Union. And our main area of investment into laboratory instruments has been used by others often without paying the penny as well.

So I focus to CAN/CAN FD where we strive to go into Linux, NuttX, QEMU, RTEMS mainline with or even more without funding, because this is area where broader community would profit.

As for the TTY layer based project, we have implemented on Volkswagen contract LIN bus solution - slLIN https://github.com/lin-bus/linux-lin which I try to maintain usable for others, again mostly in my spare time when I do not teach https://comparch.edu.cvut.cz/ nor do work on or company motion control systems and other solution for partner companies and even ESA now...

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 18, 2023 20:44 UTC (Tue) by ppisa (subscriber, #67307) [Link] (10 responses)

Even that discussion confirmed that networking in its current form is not more suitable for areas requiring bounded latency than when many years ago, there are were and are applications which need at least some level of time predictability.

We have restarted our initiative to analyze CAN subsystem latency profiles. This time automated with daily results for mainline and RT-Preempt kernels. The results overview https://canbus.pages.fel.cvut.cz/can-latester/ with option to switch to detailed insect for individual loads and kernel variants. Actual tests are running on AMD/Xilinx Zynq based MZ_APO educational kits with our own CTU CAN FD IP cores. But device under the test can be replaced easily, there are no specific needs, only two CAN or better CAN FD interfaces are required. The monitoring and traffic generation system needs precise time-stamping in addition, CTU CAN FD offers 10 ns resolution with common time-base used for four controllers in the given configuration.

The project is part of our CAN/CAN FD support investments into GNU/Linux, NuttX, RTEMS, QEMU... Page with links to some of the efforts including CAN FD IP core project there https://canbus.pages.fel.cvut.cz/ . Some more links to project related to RT and control systems can be found there https://gitlab.fel.cvut.cz/otrees/org/-/wikis/knowbase

Short recapitulation of about 30 years with CAN on CTU and in my projects has been presented at DevConf CZ 2023 https://devconfcz2023.sched.com/event/1MYjG . YouTube https://youtu.be/RwmQYjfzQAg

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 7:06 UTC (Wed) by taladar (subscriber, #68407) [Link] (9 responses)

> Even that discussion confirmed that networking in its current form is not more suitable for areas requiring bounded latency than when many years ago, there are were and are applications which need at least some level of time predictability.

Wouldn't networking with bounded latency require circuit switching instead of packet switching?

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 9:10 UTC (Wed) by paulj (subscriber, #341) [Link] (8 responses)

Presumably why they mentioned CAN. CAN has prioritisation. High priority node can always transmit, and collision resolution is bounded, via synchronisation. Assuming you have some bound on higher-priority nodes, that should let you have bounded latency on all nodes. Not switched though.

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 9:27 UTC (Wed) by ppisa (subscriber, #67307) [Link] (2 responses)

But CAN protocol goes through full networking packets SHB machinery in the case of the SocketCAN implementation. Simple character drivers provided less comfort but has not been dependent on soft-irq mess. There the light on the tunnel end now, the individual NAPI real threads per interface. So may it be, we will se improvement in our tests over years.

Another problem is that most SocketCAN drivers provide only single FIFO Tx queue and when message from its head is the only one participating in arbitration then classical priority inversion is inevitable. So there is really large space to improve situation. CTU CAN FD IP core is prepared for that because it allows to change order of messages participating in Tx arbitration without need to request stop attempts to succeed on the bus for the current one. This setup allows to maintain virtual multiple FIFO queues. Then the head messages witch highest priority specify from which queues are messages pushed into HW Tx buffers and if FIFO with higher priority appears then the messages in HW Tx buffers can be rescheduled.

But that would be long way for SockeCAN and even our driver to get there. We probably test that approach during our planned bring-up of CAN FD support in RTEMS.org which is much smaller RTOS and we have possibility to introduce more principal changes in shorter time.

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 21, 2023 22:07 UTC (Fri) by andy_shev (subscriber, #75870) [Link] (1 responses)

Does Zephyr support CAN? It may be the best choice as the RTOS is well discributed.

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 31, 2023 7:27 UTC (Mon) by ppisa (subscriber, #67307) [Link]

Zephyr has CAN support. But I have no own experience with Zephyr. In the past, I did not like approach of integration of the MCU support based on the layers specific to individual chip producers. NuttX has had much more unified approach. RTEMS lacks broad MCU support but the core and scheduler has been much more durable and scalable and it is prequalified for serious serious space grade missions now. But the precise support targets small number of expensive platforms. Support for Raspbery Pi and others is mainly a toy to allow playing with system on cheap hardware which can lead to core enhancements and or gaining experience to continue in real flight applications with appropriate HW...

But I agree that Zephyr gains momentum and API refines to real POSIX alternative now.

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 9:30 UTC (Wed) by paulj (subscriber, #341) [Link] (4 responses)

There seems to be a push to replace CAN with "Automotive Ethernet". Basically, 10Base-T1S for a shared ethernet bus, using a single twisted-pair, with CSMA/CD. "But, doesn't CSMA/CD suck for bounded latency and performance collapse under load?", yes, so they're augmenting CSMA/CD with a "Phy-Level Collision Avoidance Reconciliation Scheme" (PLCA RS). In PLCA-RS the nodes on the ethernet bus are each assigned an ID, the IDs encode a priority (like CAN). The highest priority node (ID 0) sends a BEACON frame that marks the start of an ordered series of "transmission opportunities" - each node uses CSMA/CD access + a local timer to determine when it has a slot to transmit.

Kind of an adaptive TDMA, without a shared synchronised clock - but nodes still have to run local clocks.

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 10:09 UTC (Wed) by ppisa (subscriber, #67307) [Link]

I am not sure if Ethernet prevails in the count of connection in automotive. I agree that for IP based traffic and interconnection of main nodes it can be more straightforward than CAN XL - up to 2 kB frames, full push pull symmetric data phase etc... But is seems that simplicity of CAN and it iterfacing leads to actual increase of CAN FD cores on SoCs, i.e. 14 of these on a single SoC, so I expect that local electronics nests will even more intensively use CAN in future. And there is CAN FD lite which can replace LIN and can be integrated into chips directly as I2C but allows to communicate even out of single PCB.

By the way, it funny that our uLAN protocol mentioned above provided deterministic arbitration with 11 times lower pulses at arbitration phase (max 64 nodes on the net) with dominant recessive logic and then switched to push-pull data phase and provided priority rotation without tokens or other single point of failure nodes for cases where 16 full time data flood pushing units have been used at given time. All that with commodity HW in 1991 and fully documented and open source without patents costs etc...

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 19, 2023 18:31 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> The highest priority node (ID 0) sends a BEACON frame that marks the start of an ordered series of "transmission opportunities"
Token Ring v2.0!

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 20, 2023 9:49 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

Yeah, a bit! Also, as the creator pointed out, it's also definitely /not/ TMDA, cause it's not synchronous - no master clock. There's just synchronisation via the BEACON frame, CSMA/CD, and the local clocks that run to determine the min slot times (hopefully those clocks can't drift too much from each other in the short time they need to be roughly consistent).

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

Posted Jul 20, 2023 9:50 UTC (Thu) by paulj (subscriber, #341) [Link]

Oh, and token ring didn't need nodes to be statically configured with a priority-ID before hand, did it? :)

A Q&A about the realtime patches

Posted Jul 24, 2023 23:04 UTC (Mon) by DemiMarie (subscriber, #164188) [Link] (1 responses)

I seriously question whether Linux is the correct platform for hard realtime work, as opposed to a safety-certified RTOS.

A Q&A about the realtime patches

Posted Jul 25, 2023 7:58 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Live audio performance is a hard realtime workload.

A safety-certified RTOS is overkill for running Ardour or Blue or Cecilia.

A Q&A about the realtime patches

Documentation and printk()

THP and networking

Toast and TTYs

Things to avoid

Long journey

A Q&A about the realtime patches

A Q&A about the realtime patches

A Q&A about the realtime patches

A Q&A about the realtime patches

A Q&A about the realtime patches

A Q&A about the realtime patches

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches - CAN / CAN FD subsystem testing

A Q&A about the realtime patches

A Q&A about the realtime patches

Documentation and `printk()`