Enhancing screen-reader functionality in modern GNOME

By Joe Brockmeier
June 17, 2025

Accessibility features and the work that goes into developing those features often tend to be overlooked and are poorly understood by all but the people who actually depend on such features. At Fedora's annual developer conference, Flock, Lukáš Tyrychtr sought to improve understanding and raise awareness about accessibility with his session on accessibility barriers and screen-reader functionality in GNOME. His talk provided rare insight into the world of using and developing open-source software for visually-impaired users—including landing important accessibility improvements in the latest GNOME release.

I did not attend Flock, which was held in Prague from June 5 to June 8. However, I was able to watch the talk shortly after it was given via the recording of the live stream from the event. Slides from the talk have not yet been published.

Understanding accessibility

Much of Tyrychtr's talk was about simply laying the groundwork for the audience to understand accessibility tools and what visually-impaired users need. He began by introducing himself as a member of Red Hat's tools and accessibility team. Most of the team is working on the tools part, but he is working on the accessibility part. "Of course, one of the reasons is that I'm blind", he said, and he runs into accessibility problems quite often. That may be an advantage, but it can also be a disadvantage, as the work would benefit from new perspectives—if anyone wanted to help, he said, they would be welcome. Before getting to the current status of GNOME accessibility, though, there was some background to cover:

First we will learn how the visually impaired use the computer, and why it's so different, why we need keyboard access and features so advanced that the security guys are frightened of us.

A screen reader, he said, is a piece of software that gives a visually-impaired person information about what is on the screen through speech output. It also announces changes to the screen content and allows the user to interact with that content. For example, reviewing text on the screen, telling the user about available controls, and so on.

Linux users have "basically only one screen reader we can use", which is Orca. The job of a screen reader is to tell the user a lot of information, and it usually does this by using a speech synthesizer (or sometimes "speech engine"). It gets some text as an input, and it produces sound as an output. On Linux, the screen reader does not talk to the speech engine directly, Tyrychtr said. It uses a user-session daemon called a speech dispatcher so that screen-reader developers don't have to create a direct interface for each speech-synthesis engine.

He said that the most commonly used engine is a fork of eSpeak called eSpeak NG. Other open-source options include the Festival Speech Synthesis System, and RHVoice, which he said was "quite nice if you give it the right data for a voice". The English voices were good, but it only has a few Czech voices to choose from. And, of course, there's an AI-entrant as well in the form of Piper: a local neural text-to-speech system that runs locally. Once again, he noted that the English voices for Piper are "quite nice", and complained that the Czech voices sounded weird. "But the computations aren't so complicated, and you don't need a huge graphic card far to run this thing", so it is quite usable if the right voice is available.

Using accessibility tools

Next, he wanted to talk about how visually-impaired people use the computer. "Very often I hear the same questions again and again, so let's get them out of the way". Even though there is assistive hardware for blind users or people with low vision, most use normal computers with normal keyboards for one simple reason: the cost of special assistive hardware can be "'wow', or at least crazy, maybe insane as well". He did not go into specifics, but a brief search shows that keyboards with built-in speech synthesis might run more than $600, while a portable Braille display can be nearly $10,000. Some visually-impaired people do use specialized hardware, he said, but the majority use standard computer hardware.

Touch typing is something that visually-impaired people have to learn, and learn early, Tyrychtr said: "I started learning the keyboard even before I had my own computer and that was a good decision because I wasn't lost after I got one". There was much more than touch typing to learn, too, because it's not very common for visually impaired people to use a mouse.

So, after we get a computer, we have to learn all the keyboard shortcuts, and there are a lot of them—even without using Emacs, which some of the visually impaired actually do. But, personally, I'm sticking to more mainstream user interfaces, so I can actually fix the issues for more users.

After learning all the shortcuts, "we actually need the applications to be keyboard accessible". He said that application accessibility was complicated; instead of trying to squeeze it into his Flock presentation, he mentioned presentations he had given previously at FOSDEM and DevConf.cz for those who wanted to learn about the keyboard-accessibility topic.

All the key events

He also wanted to explain why visually-impaired users had such "weird requirements for hardware system access". The screen reader needs to see all keyboard events, and do something with them. This is because the screen reader has its own keyboard shortcuts, perhaps in the hundreds, including a modifier key that toggles the screen reader on or off. To avoid clashing with system and application shortcuts, all keyboard input needs to be sent to the screen reader first. Some of the input is meant to request that the screen reader tell the user what their location is on the screen, review notifications, describe elements on the screen, and more. Those commands should not be sent on to the compositor or application at all:

The visually-impaired user very often wants to hear what he or she is typing on a keyboard, so you need to see the keyboard events for this as well. You can't actually restrict this to text controls. You want this service to be available everywhere, so it's consistent, and not surprising for the user. So you want to see the shortcut, see the event, but now you want to pass it along so it actually does the usual thing that it does in the application or compositor and so on.

Users also need a shortcut to tell Orca to shut up, he said. "When the speech synthesis is speaking, it may be something quite long you don't want to hear completely", so it's handy to have a shortcut to stop it from reading the entire text. In summary, the screen reader wants to see every keyboard event, every key press, every key release, even for modifiers. Then it can do one of two things: consume the event or observe it and pass it along to the compositor or application.

GTK 4 and AT-SPI2

The next part of the story is how the screen reader actually works, or in this case, worked in the past before GTK 4 was released at the end of 2020. The Assistive Technology Service Provider Interface (AT-SPI2) registry daemon provided an interface over DBus for keyboard reporting to the screen reader. As Matthias Clasen explained in a blog post about GTK 4 accessibility changes, this was done "in an awkwardly indirect way" in GTK 2 and GTK 3 that involved translating from GTK widgets to accessibility-toolkit interfaces and then from the AT-SPI interfaces into Python objects:

This multi-step process was inefficient, lossy, and hard to maintain; it required implementing the same functionality over at least three components, and it led to discrepancies between the documented AT-SPI methods and properties, and the ones actually sent over the accessibility bus.

So, GTK 4 featured a complete rewrite of the accessibility subsystem. It now handles communication with the AT-SPI registry daemon directly, without any intermediate libraries. That meant, Tyrychtr said, that keyboard input from GTK 4 applications was broken for screen readers. The solution was simple and used direct communication with the X.org server, but it did not take Wayland into account because "not many users, or at least not many visually-impaired users, were using wayland" at the time. Now, he said, everything was working as intended until "Wayland actually happened".

GTK 4 applications did not send the legacy keyboard events, he said, which was the only source of keyboard events that remained on a Wayland session. Users were unable to do anything screen-reader specific; they could not stop speech synthesis of long text, open Orca's settings, review window components, or even read a text box. Meanwhile, the number of GTK 4 applications kept growing, and X.org was being phased out for GNOME on Fedora and RHEL. "The situation, I'd say, was quite sad." It was possible that visually-impaired users might switch away from Linux completely.

Something had to happen

Finally, he said, "we started to do something" about the state of accessibility on GNOME. The first step was to get people together who worked on the components that needed to change. Tyrychtr said that Clasen helped facilitate work with Mutter display server and AT-SPI maintainers to decide what should and shouldn't happen.

The first draft of the solution was an accessibility DBus interface that connected to the compositor's accessibility interface and sent only the keyboard shortcuts that should be used by the screen reader. "This was quite nice because now the decision to process whether to process the shortcut was handled by the same code that actually got the hardware events". This improved the speed, he said, but "just sending keyboard events whether someone wants them or not isn't a good idea".

The next iteration required the screen reader to ask the compositor for keyboard events, rather than parsing all the event sent to the accessibility interface. That, however, had problems too. "You don't actually want an API which basically gives you keylog capability without any work". Any process running as the user could ask for, and receive, everything the user typed "without any question asked, so this definitely wasn't the best thing to have". They needed to limit access to privileged clients.

The solution was to provide a check that an application owns the right DBus service name. "There's no cryptography handshakes or so on". Currently, Mutter uses a hard-coded list of service names, but that could be changed. He said that, in theory, the compositor could ask the user for confirmation of a service before providing access to key events, but they "wanted to do anything to move this design to the finish line, so no interacting things were implemented". That decision is compositor-dependent, though, so other compositors could operate differently and implement their own access checks.

Now there was a working prototype for keyboard monitoring that worked on Wayland; the next step was to get things upstream. To avoid having to do major rewrites in Orca, Tyrychtr created a backend called the a11y-manager for AT-SPI that was merged and included when version 2.56 was released in March.

Orca still needed a few changes, though. The old interfaces used hardware keyboard codes, he said, which did not work with Wayland. They decided to use XKB key symbols ("keysyms") instead, which he said that every compositor uses somewhere in keyboard-event processing.

The most complicated thing, according to Tyrychtr, was getting support for a client interface into the compositor. It took some back-and-forth with the maintainer for Mutter, "but we managed to get this API into Mutter 48" as well as GNOME 48 with a recent AT-SPI release, so now everything works as he wants it to work on Wayland with GNOME. Mutter is not the only compositor, though. He said that the same interface would be included with the KWin compositor in KDE 6.4 and hoped that other compositors would include it as well.

Tyrychtr said that usually he has worked on smaller fixes, so this project was one of the larger projects that he has completed, and he has learned a lot from the experience. The most important thing, he said, was that talking to people was much more important than coding:

Because when you start talking with people and you get them together you can come up with some ideas which everyone can agree on quite easily. Then you have the programming, but you already know what to do and what's the right approach. So you just code it.

Q&A

The first question was whether Tyrychtr knew how different communities with the same goals could work more closely together on accessibility topics. Tyrychtr said that was a great question, but he had no specific answers. There are accessibility rooms, he said, presumably referring to GNOME's Matrix instance, but he had no suggestion that included all developer groups that might be working on accessibility topics.

The next question was: what would stop another process from claiming the DBus name used for accessibility? Wasn't it just a race against the clock? Tyrychtr said he was afraid someone would ask that. "At least for the blind users, the screen reader is the very first thing or within the very first things which start". But, on a system where that was not the case, making it foolproof would require a portal proxy and some user verification.

Another attendee said that things regarding accessibility "often start with bad news of what was broken, and then it was fixed". Was there any coordination in place to avoid breaking things first, or is there some way for interested users to spread awareness of pending breakage? Tyrychtr said, "we actually knew what would happen, but the accessibility stuff wasn't popular hot new stuff in the old times around 2022". He added, "we tried to get to the developers, but we probably didn't say it so bluntly, so they didn't understand how much of a bad news this would be". Now, however, he thought that the situation was much better and that awareness of accessibility issues was much greater, so he hoped something like the accessibility problems with GTK 4 would not happen again in the future.

Index entries for this article
Conference	Flock/2025

Tests

Posted Jun 17, 2025 18:00 UTC (Tue) by SLi (subscriber, #53131) [Link] (3 responses)

I realize this is probably not the easiest thing to have automated tests for, but how much is actually possible? I'd assume the normal software engineering rule of "things that don't have a test break all the time" applies to this too.

Conversely, having tests documents some of the things that at least someone expects to hold, and applies that all important subtle pressure to at least think about things and possibly ask someone if a change you make causes a test to fail and you are not sure of the implications.

Of course there are things to catch that very traditional tests do not manage well (someone adding text as an image, things like logical order of buttons). But I'd imagine it would be at least possible nowadays to flag changes for further inspection. There are modern ways to get an answer to the question of "is all the text in the screenshot present in this text".

eTests

Posted Jun 18, 2025 6:17 UTC (Wed) by tyrylu (subscriber, #129103) [Link] (2 responses)

Making tests for the a11y stack is definitely possible, and, yes, some A.I. approaches will certainly be useful for these tests. Doing tests on the a11y objects tree is certainly doable and is actually done, but it would seem that we also need end to end tests for the whole stack.

eTests

Posted Jun 18, 2025 7:32 UTC (Wed) by taladar (subscriber, #68407) [Link]

End-to-end might be hard to automatically test but I wonder if we could at least automate the bit up to the text generation that forms the input to the text-to-speech part.

eTests

Posted Jun 18, 2025 11:24 UTC (Wed) by sthibaul (✭ supporter ✭, #54477) [Link]

The difficult part in whole-stack testing is that while you can produce testing scenarii that you can run, each time the end-user application interface changes, you have to update the test, it's quite tedious. This is somewhat done in e.g. firefox, but tedious to maintain.

Some systematic testing can be done, for instance gla11y is run on libreoffice to make sure a minimum level of accessibility of the interface.

I have ideas about some automatic keyboard-reachability testing that would be independent from the interface (essentially check that all widgets are somehow reachable with some shortcut, and that e.g. tab-browsing is consistent), but never managed to take the time to write something.

How about the others?

Posted Jun 17, 2025 18:17 UTC (Tue) by smurf (subscriber, #17840) [Link] (1 responses)

That's interesting but it's limited to GNOME. Is there some resource that tells me how things work with KDE, and/or wlroot-based compositors, and/or what happens when you run a KDE program within Gnome's a11y environment (or wlroot and/or vice versa)?

How about the others?

Posted Jun 17, 2025 19:06 UTC (Tue) by willy (subscriber, #9762) [Link]

> He said that the same interface would be included with the KWin compositor in KDE 6.4 and hoped that other compositors would include it as well.

accessibility getting some resources

Posted Jun 18, 2025 4:18 UTC (Wed) by raven667 (subscriber, #5198) [Link] (7 responses)

Just the other day I was saying that infrastructure like accessibility needed resources beyond volunteers "scratching their own itch' to refactor it to fit in the current system design as it was clear that the resources which existed could only do the minimal amount of maintenance, eg the 2020 decision to base GTK4 accessibility API on X11 when the default output mode was already Wayland by that point, punting until now when someone had the budget and priority to perform and more thorough reengineering and coordination effort. I hope that desktop Linux is seen as worth investing time and effort in to do all the professional work which isn't hot and exciting and isn't the minimum necessary, such as accessibility, documentation, scientific UX testing, etc.

accessibility getting some resources

Posted Jun 19, 2025 11:11 UTC (Thu) by khim (subscriber, #9252) [Link]

> I hope that desktop Linux is seen as worth investing time and effort in to do all the professional work which isn't hot and exciting and isn't the minimum necessary, such as accessibility, documentation, scientific UX testing, etc.

You can be absolutely sure that's the case! There are a lot of work around desktop Linux, accessibility, documentation, UX testing, everything…

Hardware vendors are really excited, too.

We would see how everything would work out when Google-baked Android would, finally, arrive on desktop next year. Wonder if any preview would be released this year, though.

accessibility getting some resources

Posted Jun 19, 2025 11:24 UTC (Thu) by ebassi (subscriber, #54855) [Link] (5 responses)

> the 2020 decision to base GTK4 accessibility API on X11 when the default output mode was already Wayland by that point

There was no such decision.

Back in early 2020, almost 10 months before the GTK 4.0 release, we were looking at ways to redesign the accessibility stack with various stakeholders, even before rewriting the implementation of ATSPI in GTK4. Of course, the problem was that redesigning a whole accessibility stack—protocols, integration with toolkits, compositors, and assistive technologies—requires resources and time, and nobody was up for sponsoring this work.

The closest thing we've got was the STF-sponsored exploratory work on Newton, a new accessibility protocol; it's still very much experimental, and as far as I know there is no grant to actually make it work.

The harsh truth is that the accessibility stack has been limping along for nearly 20 years, after the initial inception during the Sun days. We've had *some* changes, like rewriting ATSPI using D-Bus instead of CORBA in 2010-2011 (in time for GNOME 3) and the current work to move it towards a Wayland-only environment (mainly done, once again, by GNOME). Of course, even after you fix the protocol and its implementations and integration with toolkits and system components, you're still facing the fact that application developers rarely spend time on making their projects accessible in the first place; toolkits can only do so much.

accessibility getting some resources

Posted Jun 20, 2025 14:15 UTC (Fri) by raven667 (subscriber, #5198) [Link]

> the problem was that redesigning a whole accessibility stack—protocols, integration with toolkits, compositors, and assistive technologies—requires resources and time, and nobody was up for sponsoring this work.

I may have misunderstood or spoke poorly but this is what I was trying to communicate, volunteers are out there burning themselves out, taking flak from angry users (and the internet peanut gallery) but there aren't enough resources being allocated to make this stuff first-class, especially when it needs more than the minimal maintenance. It's great that Redhat/IBM is sponsoring some work, and it's too bad that the kind of work needed doesn't fit with the customer-base of the Steam Deck very well, as Valve has done great work in parts of the stack that affect their product, in the same way Sun did when they made a go of selling GNOME desktops.

accessibility getting some resources

Posted Jun 20, 2025 15:41 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link] (3 responses)

> you're still facing the fact that application developers rarely spend time on making their projects accessible in the first place; toolkits can only do so much.

Is that true? Isn't Windows the gold standard for screen readers, and doesn't Windows accessibility almost exclusively rely on the GDI, which is their toolkit level? I'm not arguing; I'm asking. My impression when researching dish was that making the toolkits accessible by default was ingenious and was probably the only realistic option for making most of the Linux GUI accessible, precisely because only a very small number of application developers are going to bother to support or even think about accessibility.

If you do need application developers to care about something, my suggestion in this comment later in the thread may help with that: https://lwn.net/Articles/1026300/

accessibility getting some resources

Posted Jun 20, 2025 17:49 UTC (Fri) by ebassi (subscriber, #54855) [Link] (1 responses)

> Isn't Windows the gold standard for screen readers, and doesn't Windows accessibility almost exclusively rely on the GDI, which is their toolkit level? I'm not arguing; I'm asking.

The toolkit is involved, of course, and can fill in the basics for you; but GTK, or Qt, or any other toolkit cannot come up with a textual description for your own application's UI components. Or the relations between those components, especially once you write your own custom UI elements to group things, for instance. A toolkit cannot read your mind, or read the mind of the people using your application. We can have educated guesses about it, and encode them into the API, but failing an educated guess is worse than not putting anything in: the latter is, at least, invisible, while the former can lead to data loss.

Just like the web, GTK has authoring practices that application developers have to follow: https://docs.gtk.org/gtk4/section-accessibility.html#auth...

accessibility getting some resources

Posted Jun 21, 2025 2:56 UTC (Sat) by linuxrocks123 (subscriber, #34648) [Link]

> Just like the web, GTK has authoring practices that application developers have to follow

No, just like the W3C, GTK has written some words that almost nobody actually pays any attention to.

accessibility getting some resources

Posted Jun 20, 2025 18:02 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Windows GDI does not participate in the accessibility, it's too low for that. For the standard controls (implemented in user32.dll) Windows has built-in a11y support, and if you're making a custom control or don't use Windows controls, then you need to implement the IAccessible interface. It allows Windows to walk through the control hierarchy.

Display Shell

Posted Jun 20, 2025 15:29 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link]

I've discovered that AT-SPI2 is useful for programmatically interacting with GUI applications, so I've implemented it as an alternative backend for my dish tool:

https://github.com/linuxrocks123/dish

See dish-atk-backend.c in that repo.

I wrote dish as a way to script interactions with GUI applications -- start this web browser, log into this website, download this HTML, etc. -- and that's currently what I use it for. However, I also wrote it as a backend for accessibility software that allows control of the computer through the spoken word, for people whose hands don't like using the keyboard or mouse, or who don't have hands. sonic.cpp in that repo implements a very crude prototype of such software, taking as its input the output of voice2json transcribe-stream.

Unfortunately, the reason AT-SPI2 is an alternative backend to dish, rather than the primary backend, is because not every application supports ATK or AT-SPI2. I would like it very much if every application did support one of those interfaces. I am sure blind people would like that even more. But, since some applications do not register, the primary backend for dish is instead an AI OCR nightmare called paddleocr.

My suggestion for the accessibility community is to make the AT-SPI2 layer as useful for GUI scripting and UI testing as possible. Explicitly support and advocate for that use case in addition to the primary use case of providing data for screen readers and similar tools. Shell scripting the GUI is cool, but the real reason to promote it is that the number of Linux users who would like to shell script the GUI is probably about two orders of magnitude larger than the number of Linux users who are blind.

Right now, if you break something that blind people need to use Linux, you'll get a few people complaining, and maybe it'll get fixed next year. But, if you break Firefox's CI, that thing you broke is going to get fixed within the hour. Make the AT-SPI2 layer the foundation of every major Linux GUI application's CI, as well as the foundation for lots of hackers who want to write GUI shell scripts, and you'll get a much larger outcry when some project's AT-SPI2 support breaks or degrades, and a lot more people asking for AT-SPI2 support as a new feature for software that currently doesn't have it.

Strength in numbers.