Enhancing screen-reader functionality in modern GNOME
Accessibility features and the work that goes into developing those features often tend to be overlooked and are poorly understood by all but the people who actually depend on such features. At Fedora's annual developer conference, Flock, Lukáš Tyrychtr sought to improve understanding and raise awareness about accessibility with his session on accessibility barriers and screen-reader functionality in GNOME. His talk provided rare insight into the world of using and developing open-source software for visually-impaired users—including landing important accessibility improvements in the latest GNOME release.
I did not attend Flock, which was held in Prague from June 5 to June 8. However, I was able to watch the talk shortly after it was given via the recording of the live stream from the event. Slides from the talk have not yet been published.
Understanding accessibility
Much of Tyrychtr's talk was about simply laying the groundwork for the audience to
understand accessibility tools and what visually-impaired users need. He began by
introducing himself as a member of Red Hat's tools and accessibility team. Most
of the team is working on the tools part, but he is working on the accessibility
part. "Of course, one of the reasons is that I'm blind
", he said, and he runs
into accessibility problems quite often. That may be an advantage, but it can also be
a disadvantage, as the work would benefit from new perspectives—if anyone
wanted to help, he said, they would be welcome. Before getting to the
current status of GNOME accessibility, though, there was some
background to cover:
First we will learn how the visually impaired use the computer, and why it's so different, why we need keyboard access and features so advanced that the security guys are frightened of us.
A screen reader, he said, is a piece of software that gives a visually-impaired person information about what is on the screen through speech output. It also announces changes to the screen content and allows the user to interact with that content. For example, reviewing text on the screen, telling the user about available controls, and so on.
Linux users have "basically only one screen reader we can use
",
which is Orca. The job of a screen reader is to
tell the user a lot of information, and it usually does this by using a speech synthesizer (or
sometimes "speech engine"). It gets some text as an input, and it produces sound as an
output. On Linux, the screen reader does not talk to the speech engine directly, Tyrychtr
said. It uses a user-session daemon called a speech dispatcher so that screen-reader
developers don't have to create a direct interface for each speech-synthesis engine.
He said that the most commonly used engine is a fork of eSpeak called eSpeak NG. Other open-source options include the Festival Speech
Synthesis System, and RHVoice,
which he said was "quite nice if you give it the right data for a
voice
". The English voices were good, but it only has a few Czech
voices to choose from. And, of course, there's an AI-entrant as well
in the form of Piper:
a local neural text-to-speech system that runs locally. Once again, he
noted that the English voices for Piper are "quite nice
", and
complained that the Czech voices sounded weird. "But the
computations aren't so complicated, and you don't need a huge graphic
card far to run this thing
", so it is quite usable if the right
voice is available.
Using accessibility tools
Next, he wanted to talk about how visually-impaired people use the
computer. "Very often I hear the same questions again and again, so let's get them
out of the way
". Even though there is assistive hardware for blind users or
people with low vision, most use normal computers with normal keyboards for one
simple reason: the cost of special assistive hardware can be "'wow', or at least
crazy, maybe insane as well
". He did not go into specifics, but a brief search
shows that keyboards with
built-in speech synthesis might run more than $600, while a portable
Braille display can be nearly $10,000. Some visually-impaired people do use
specialized hardware, he said, but the majority use standard computer hardware.
Touch typing is something that visually-impaired people have to learn, and learn
early, Tyrychtr said: "I started learning the keyboard even before I had my own
computer and that was a good decision because I wasn't lost after I got
one
". There was much more than touch typing to learn, too, because it's not very
common for visually impaired people to use a mouse.
So, after we get a computer, we have to learn all the keyboard shortcuts, and there are a lot of them—even without using Emacs, which some of the visually impaired actually do. But, personally, I'm sticking to more mainstream user interfaces, so I can actually fix the issues for more users.
After learning all the shortcuts, "we actually need the applications to be
keyboard accessible
". He said that application accessibility was complicated;
instead of trying to squeeze it into his Flock presentation, he mentioned
presentations he had given previously at FOSDEM and DevConf.cz for
those who wanted to learn about the keyboard-accessibility topic.
All the key events
He also wanted to explain why visually-impaired users had such "weird
requirements for hardware system access
". The screen reader needs to see all
keyboard events, and do something with them. This is because the screen
reader has its own keyboard shortcuts, perhaps in the hundreds, including a modifier
key that toggles the screen reader on or off. To avoid clashing with system and
application shortcuts, all keyboard input needs to be sent to the screen reader
first. Some of the input is meant to request that the screen reader tell the
user what their location is on the screen, review
notifications, describe
elements on the screen, and more. Those commands should not be sent on to the
compositor or application at all:
The visually-impaired user very often wants to hear what he or she is typing on a keyboard, so you need to see the keyboard events for this as well. You can't actually restrict this to text controls. You want this service to be available everywhere, so it's consistent, and not surprising for the user. So you want to see the shortcut, see the event, but now you want to pass it along so it actually does the usual thing that it does in the application or compositor and so on.
Users also need a shortcut to tell Orca to shut up, he said. "When the speech
synthesis is speaking, it may be something quite long you don't want to hear
completely
", so it's handy to have a shortcut to stop it from reading the entire
text. In summary, the screen reader wants to see every keyboard event, every key
press, every key release, even for modifiers. Then it can do one of two things:
consume the event or observe it and pass it along to the compositor or
application.
GTK 4 and AT-SPI2
The next part of the story is how the screen reader actually works, or in this
case, worked in the past before GTK 4 was released at the
end of 2020. The Assistive Technology
Service Provider Interface (AT-SPI2) registry daemon provided an interface over
DBus for keyboard reporting to the screen reader. As Matthias Clasen explained in a
blog post about
GTK 4 accessibility changes, this was done "in an awkwardly indirect
way
" in GTK 2 and GTK 3 that involved translating from GTK
widgets to accessibility-toolkit interfaces and then from the AT-SPI interfaces into
Python objects:
This multi-step process was inefficient, lossy, and hard to maintain; it required implementing the same functionality over at least three components, and it led to discrepancies between the documented AT-SPI methods and properties, and the ones actually sent over the accessibility bus.
So, GTK 4 featured a complete rewrite of the accessibility subsystem. It now
handles communication with the AT-SPI registry daemon directly, without any
intermediate libraries. That meant, Tyrychtr said, that keyboard input from GTK 4
applications was broken for screen readers. The solution was simple and used direct
communication with the X.org server, but it did not take Wayland into account because
"not many users, or at least not many visually-impaired users, were using
wayland
" at the time. Now, he said, everything was working as intended until
"Wayland actually happened
".
GTK 4 applications did not send the legacy keyboard events, he said, which
was the only source of keyboard events that remained on a Wayland session. Users were
unable to do anything screen-reader specific; they could not stop speech synthesis of
long text, open Orca's settings, review window components, or even read a text
box. Meanwhile, the number of GTK 4 applications kept growing, and X.org was
being phased out for GNOME on Fedora and RHEL. "The situation, I'd say, was quite
sad.
" It was possible that visually-impaired users might switch
away from Linux completely.
Something had to happen
Finally, he said, "we started to do something
" about the state of
accessibility on GNOME. The first step was to get people together who worked on the
components that needed to change. Tyrychtr said that Clasen helped facilitate work
with Mutter display server and AT-SPI
maintainers to decide what should and shouldn't happen.
The first draft of the solution was an accessibility DBus interface
that connected to the compositor's accessibility interface and sent
only the keyboard shortcuts that should be used by the screen
reader. "This was quite nice because now the decision to process
whether to process the shortcut was handled by the same code that
actually got the hardware events
". This improved the speed, he
said, but "just sending keyboard events whether someone wants them
or not isn't a good idea
".
The next iteration required the screen reader to ask the compositor
for keyboard events, rather than parsing all the event sent to the
accessibility interface. That, however, had problems too. "You
don't actually want an API which basically gives you keylog capability
without any work
". Any process running as the user could ask for,
and receive, everything the user typed "without any question asked,
so this definitely wasn't the best thing to have
". They needed to
limit access to privileged clients.
The solution was to provide a check that an application owns the right DBus
service name. "There's no cryptography handshakes or so on
". Currently, Mutter
uses a hard-coded list of service names, but that could be changed. He said that, in
theory, the compositor could ask the user for confirmation of a service before
providing access to key events, but they "wanted to do anything to move this
design to the finish line, so no interacting things were implemented
". That
decision is compositor-dependent, though, so other compositors could operate
differently and implement their own access checks.
Now there was a working prototype for keyboard monitoring that worked on Wayland; the next step was to get things upstream. To avoid having to do major rewrites in Orca, Tyrychtr created a backend called the a11y-manager for AT-SPI that was merged and included when version 2.56 was released in March.
Orca still needed a few changes, though. The old interfaces used hardware keyboard codes, he said, which did not work with Wayland. They decided to use XKB key symbols ("keysyms") instead, which he said that every compositor uses somewhere in keyboard-event processing.
The most complicated thing, according to Tyrychtr, was getting
support for a client interface into the compositor. It took some
back-and-forth with the maintainer for Mutter, "but we managed to
get this API into Mutter 48
" as well as GNOME 48 with a recent
AT-SPI release, so now everything works as he wants it to work on
Wayland with GNOME. Mutter is not the only compositor, though. He said
that the same interface would be included with the KWin compositor in
KDE 6.4 and hoped that other compositors would include it as
well.
Tyrychtr said that usually he has worked on smaller fixes, so this project was one of the larger projects that he has completed, and he has learned a lot from the experience. The most important thing, he said, was that talking to people was much more important than coding:
Because when you start talking with people and you get them together you can come up with some ideas which everyone can agree on quite easily. Then you have the programming, but you already know what to do and what's the right approach. So you just code it.
Q&A
The first question was whether Tyrychtr knew how different communities with the same goals could work more closely together on accessibility topics. Tyrychtr said that was a great question, but he had no specific answers. There are accessibility rooms, he said, presumably referring to GNOME's Matrix instance, but he had no suggestion that included all developer groups that might be working on accessibility topics.
The next question was: what would stop another process from claiming the DBus name
used for accessibility? Wasn't it just a race against the clock? Tyrychtr said he was
afraid someone would ask that. "At least for the blind users, the screen reader is
the very first thing or within the very first things which start
". But, on a
system where that was not the case, making it foolproof would require a portal proxy
and some user verification.
Another attendee said that things regarding accessibility "often start with bad
news of what was broken, and then it was fixed
". Was there any coordination in
place to avoid breaking things first, or is there some way for interested users to
spread awareness of pending breakage? Tyrychtr said, "we actually knew what would happen, but the
accessibility stuff wasn't popular hot new stuff in the old times around
2022
". He added, "we tried to get to the developers, but we probably didn't say
it so bluntly, so they didn't understand how much of a bad news this would
be
". Now, however, he thought that the situation was much better and that
awareness of accessibility issues was much greater, so he hoped something like the
accessibility problems with GTK 4 would not happen again in the future.
Index entries for this article | |
---|---|
Conference | Flock/2025 |
Posted Jun 17, 2025 18:00 UTC (Tue)
by SLi (subscriber, #53131)
[Link] (3 responses)
Conversely, having tests documents some of the things that at least someone expects to hold, and applies that all important subtle pressure to at least think about things and possibly ask someone if a change you make causes a test to fail and you are not sure of the implications.
Of course there are things to catch that very traditional tests do not manage well (someone adding text as an image, things like logical order of buttons). But I'd imagine it would be at least possible nowadays to flag changes for further inspection. There are modern ways to get an answer to the question of "is all the text in the screenshot present in this text".
Posted Jun 18, 2025 6:17 UTC (Wed)
by tyrylu (subscriber, #129103)
[Link] (2 responses)
Posted Jun 18, 2025 7:32 UTC (Wed)
by taladar (subscriber, #68407)
[Link]
Posted Jun 18, 2025 11:24 UTC (Wed)
by sthibaul (✭ supporter ✭, #54477)
[Link]
Some systematic testing can be done, for instance gla11y is run on libreoffice to make sure a minimum level of accessibility of the interface.
I have ideas about some automatic keyboard-reachability testing that would be independent from the interface (essentially check that all widgets are somehow reachable with some shortcut, and that e.g. tab-browsing is consistent), but never managed to take the time to write something.
Posted Jun 17, 2025 18:17 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Jun 17, 2025 19:06 UTC (Tue)
by willy (subscriber, #9762)
[Link]
Posted Jun 18, 2025 4:18 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (7 responses)
Posted Jun 19, 2025 11:11 UTC (Thu)
by khim (subscriber, #9252)
[Link]
You can be absolutely sure that's the case! There are a lot of work around desktop Linux, accessibility, documentation, UX testing, everything… Hardware vendors are really excited, too. We would see how everything would work out when Google-baked Android would, finally, arrive on desktop next year. Wonder if any preview would be released this year, though.
Posted Jun 19, 2025 11:24 UTC (Thu)
by ebassi (subscriber, #54855)
[Link] (5 responses)
There was no such decision.
Back in early 2020, almost 10 months before the GTK 4.0 release, we were looking at ways to redesign the accessibility stack with various stakeholders, even before rewriting the implementation of ATSPI in GTK4. Of course, the problem was that redesigning a whole accessibility stack—protocols, integration with toolkits, compositors, and assistive technologies—requires resources and time, and nobody was up for sponsoring this work.
The closest thing we've got was the STF-sponsored exploratory work on Newton, a new accessibility protocol; it's still very much experimental, and as far as I know there is no grant to actually make it work.
The harsh truth is that the accessibility stack has been limping along for nearly 20 years, after the initial inception during the Sun days. We've had *some* changes, like rewriting ATSPI using D-Bus instead of CORBA in 2010-2011 (in time for GNOME 3) and the current work to move it towards a Wayland-only environment (mainly done, once again, by GNOME). Of course, even after you fix the protocol and its implementations and integration with toolkits and system components, you're still facing the fact that application developers rarely spend time on making their projects accessible in the first place; toolkits can only do so much.
Posted Jun 20, 2025 14:15 UTC (Fri)
by raven667 (subscriber, #5198)
[Link]
I may have misunderstood or spoke poorly but this is what I was trying to communicate, volunteers are out there burning themselves out, taking flak from angry users (and the internet peanut gallery) but there aren't enough resources being allocated to make this stuff first-class, especially when it needs more than the minimal maintenance. It's great that Redhat/IBM is sponsoring some work, and it's too bad that the kind of work needed doesn't fit with the customer-base of the Steam Deck very well, as Valve has done great work in parts of the stack that affect their product, in the same way Sun did when they made a go of selling GNOME desktops.
Posted Jun 20, 2025 15:41 UTC (Fri)
by linuxrocks123 (subscriber, #34648)
[Link] (3 responses)
Is that true? Isn't Windows the gold standard for screen readers, and doesn't Windows accessibility almost exclusively rely on the GDI, which is their toolkit level? I'm not arguing; I'm asking. My impression when researching dish was that making the toolkits accessible by default was ingenious and was probably the only realistic option for making most of the Linux GUI accessible, precisely because only a very small number of application developers are going to bother to support or even think about accessibility.
If you do need application developers to care about something, my suggestion in this comment later in the thread may help with that: https://lwn.net/Articles/1026300/
Posted Jun 20, 2025 17:49 UTC (Fri)
by ebassi (subscriber, #54855)
[Link] (1 responses)
The toolkit is involved, of course, and can fill in the basics for you; but GTK, or Qt, or any other toolkit cannot come up with a textual description for your own application's UI components. Or the relations between those components, especially once you write your own custom UI elements to group things, for instance. A toolkit cannot read your mind, or read the mind of the people using your application. We can have educated guesses about it, and encode them into the API, but failing an educated guess is worse than not putting anything in: the latter is, at least, invisible, while the former can lead to data loss.
Just like the web, GTK has authoring practices that application developers have to follow: https://docs.gtk.org/gtk4/section-accessibility.html#auth...
Posted Jun 21, 2025 2:56 UTC (Sat)
by linuxrocks123 (subscriber, #34648)
[Link]
No, just like the W3C, GTK has written some words that almost nobody actually pays any attention to.
Posted Jun 20, 2025 18:02 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jun 20, 2025 15:29 UTC (Fri)
by linuxrocks123 (subscriber, #34648)
[Link]
https://github.com/linuxrocks123/dish
See dish-atk-backend.c in that repo.
I wrote dish as a way to script interactions with GUI applications -- start this web browser, log into this website, download this HTML, etc. -- and that's currently what I use it for. However, I also wrote it as a backend for accessibility software that allows control of the computer through the spoken word, for people whose hands don't like using the keyboard or mouse, or who don't have hands. sonic.cpp in that repo implements a very crude prototype of such software, taking as its input the output of voice2json transcribe-stream.
Unfortunately, the reason AT-SPI2 is an alternative backend to dish, rather than the primary backend, is because not every application supports ATK or AT-SPI2. I would like it very much if every application did support one of those interfaces. I am sure blind people would like that even more. But, since some applications do not register, the primary backend for dish is instead an AI OCR nightmare called paddleocr.
My suggestion for the accessibility community is to make the AT-SPI2 layer as useful for GUI scripting and UI testing as possible. Explicitly support and advocate for that use case in addition to the primary use case of providing data for screen readers and similar tools. Shell scripting the GUI is cool, but the real reason to promote it is that the number of Linux users who would like to shell script the GUI is probably about two orders of magnitude larger than the number of Linux users who are blind.
Right now, if you break something that blind people need to use Linux, you'll get a few people complaining, and maybe it'll get fixed next year. But, if you break Firefox's CI, that thing you broke is going to get fixed within the hour. Make the AT-SPI2 layer the foundation of every major Linux GUI application's CI, as well as the foundation for lots of hackers who want to write GUI shell scripts, and you'll get a much larger outcry when some project's AT-SPI2 support breaks or degrades, and a lot more people asking for AT-SPI2 support as a new feature for software that currently doesn't have it.
Strength in numbers.
Tests
eTests
eTests
eTests
How about the others?
How about the others?
accessibility getting some resources
> I hope that desktop Linux is seen as worth investing time and effort in to do all the professional work which isn't hot and exciting and isn't the minimum necessary, such as accessibility, documentation, scientific UX testing, etc.
accessibility getting some resources
accessibility getting some resources
accessibility getting some resources
accessibility getting some resources
accessibility getting some resources
accessibility getting some resources
accessibility getting some resources
Display Shell