March 7, 2012
This article was contributed by Chase Douglas
The XInput multitouch extension provides for multitouch input events to
be sent from the device to the appropriate window on the desktop.
Multitouch events can then be used for gesture recognition, multi-user
interactions, or multi-point interactions such as finger painting. While
the general concepts behind delivering multitouch events through a
window server are fairly well defined, there are many devils hiding in
the details. Here, we provide a look into the development of the
multitouch extension and many of the issues encountered along the way.
Motivations
For Henrik Rydberg, it began as an attempt to make the trackpad on his
Apple Macbook work just as well on Ubuntu as it does on OS X. For
Stéphane Chatty, it began as a research project to develop new user
interface paradigms. For your author, it began as a quest to enable
smooth scrolling from an Apple Magic Mouse.
Like many undertakings in open source, multitouch on the Linux desktop
is the culmination of the many efforts of people with disparate goals.
With the release of the X.org server 1.12, we now have a modern
multitouch foundation for toolkits and applications to build upon.
The kernel
The beginning of multitouch support for Linux can be traced back to the
2.6.30 kernel.
Henrik had just merged additions to the evdev input subsystem for
multitouch along with support for the trackpads found in all Apple
unibody Macbooks. Stéphane then added multitouch support to some
existing Linux drivers, such as hid-ntrig, and some new drivers, such as
hid-cando.
Some developers started playing around with the new Linux multitouch
support. Over time, the libavg and kivy specialized media and user interface
toolkits added multitouch
support based on the evdev interface. However, there was a glaring
issue: the absence of window-based event handling. Applications had to
assume that they were full-screen, and all touch events were directed
to them exclusively. This was a fair assumption for games, which was
the main impetus for libavg touch support. However, it was clear we
needed to develop a generic multitouch solution working through the X
window system server.
The X.org server and the X gesture extension
Discussions began on how to incorporate touch events into the X input
model shortly after kernel support was present. Initial work by
Benjamin Tissoires and Carlos Garnacho extended XInput 2.0's new
multi-device support for multitouch. Each time a touch began, a new
pointing "device" would be created. Alternatively, a pool of
pre-allocated touch "devices" could be used. However, this approach
broke many assumptions about how devices and events should be handled.
As a simple example, a "touch begin" event would appear to the client as though
the pointer had moved to a new location. How would the client know that
the previous touch hadn't simply moved, as opposed to a new touch
starting? At this point Peter Hutterer, the X.org input subsystem
maintainer, decided we needed completely new semantics for touch input
through X.
Around the same time, Canonical was interested in adding multitouch
interfaces to the Linux desktop. The uTouch team, of which your author
is a member, was formed to develop a gesture system that could
recognize and handle system-level and application-level gestures. Since
X did not have touch support yet, the team focused on providing
gestures through the X.org server using a server extension. The
result was shipped in Ubuntu 10.10 and the extension was proposed for
upstream X.org.
While many developers were enthusiastic about the potential for gesture
support through the X.org server, it was not meant to be. X.org as a
foundation holds backward compatibility in high regard. Applications
written over 20 years ago should still function properly today, in
theory. Though backward compatibility has benefits, it is a
double-edged sword. Any new functionality must be thoroughly reviewed,
and inclusion in one X.org release means inclusion in all future
releases. Even to this day, gesture support is not a settled
technology. It is highly probable that an X gesture extension created a
year and a half ago would not be sufficient for use cases we are coming
up with today, let alone potentially years from now. So the X developers
are reluctant to include gesture support at this time.
XInput multitouch was born
Those concerns notwithstanding, the need for touch through the X server
grew stronger. Peter and Daniel
Stone developed a first draft of the XInput 2.1 protocol, which later
became XInput 2.2, where touches send separate events from traditional
pointer motion and button press events. Three event types ("touch begin,"
"update," and "end") were specified. However, the need to support
system-level gestures added a requirement for a new method of event
handling: the touch grab.
X11 input device grabs allow for one client to request exclusive access
to a device under certain criteria. A client can request an active grab
so that all events are sent to it. A client can also request a passive grab,
where events are sent to it when a button on the mouse is pressed while
the cursor is positioned over a window, or when a key is pressed on a
keyboard while a window is focused. Passive grabs allow for raising a
clicked window in a click-to-focus desktop environment, for example.
When the user presses the mouse button over a lower window, the window
manager receives the event first through a passive grab. It raises the
window to the top and then replays the event so the application can
receive the button press. However, X only allows for a passively
grabbing client to receive one event before it needs to make a decision
on whether to accept it and all future events until a release event, or
to request that the server replay the event to allow another client to receive the events.
This mechanism has been adequate for decades, but doesn't quite work
for system-level gestures. Imagine that the window manager wants to
recognize a three-touch swipe. It is impossible to know if a three-touch
swipe has been performed if the window manager can only view touch
begin events; it must be able to receive the subsequent events to determine
whether the user is performing a swipe or not.
The idea behind touch grabs is that the grabbing client can receive all
events until it makes a decision about whether to accept or reject a
touch sequence. Now, the window manager can listen for all touches that
begin around the same time and watch them as they move. If there are
three touches and they all move in the same direction, the window
manager recognizes a drag gesture and accepts the touch sequences. No
one else will see the touch events. However, if the touches don't match
for any reason, the window manager rejects the touch sequences so other
clients, such as a finger painting application, can receive the events.
This works great for system-level gesture recognition. However, it
necessarily imposes lag between a physical touch occurring and an
application receiving the touch events if the the system is attempting
to recognize gestures. At the X Developer Summit 2010, your author
presented an overview of the vision for an XInput multitouch-based
uTouch gesture stack. One afternoon, while eating lunch and discussing
things over beer, the issue of the potential for lag came up. Between
those at the table, including Peter, Kristian Høgsberg, and myself, the
solution was elusive. However, at some point later in the conference
the issue came up again on IRC. Keith Packard made the suggestion that
touch events be sent to all clients, even before they become the owner
of touch sequences. With the idea at hand, your author scurried home
and drafted up the beginning of what would later become ownership event
handling.
As Nathan Willis explained in his overview of the XInput 2.2 protocol, a client may elect to
receive events for a touch sequence before it becomes the owner of the
sequence by requesting touch ownership events alongside touch begin,
update, and end events. The client will receive touch events without
delay, but must watch for notification of ownership. Once a touch
ownership event is received for a sequence, the client owns the
sequence and may process it as normal. Alternatively, if a preceding
touch grab is accepted, the client will receive a touch end event for
the touch sequence without ever receiving a touch ownership event. This
mechanism allows for a client to perform any processing as touch events
occur, but the client must take care to undo any state if the touch
sequences are ultimately accepted by some other client instead.
With the basic concepts hammered out, your author, with an initial base
of work contributed by Daniel Stone, began a prototype implementation
that shipped in Ubuntu 11.04 and 11.10. The uTouch gesture system based
around XInput multitouch began to take form. This was enough to prove
that the protocol was reasonably sound, and efforts began in earnest on
an upstream implementation for the X.org server 1.12 release.
It is interesting to note how XInput multitouch compares to other
window server touch handling. On one end of the spectrum are phones and
tablets, which run most applications full screen. This, and the lack of
support for indirect touch devices, e.g. touchpads, means mobile device
window manager multitouch support is much simpler. On the other end of
the spectrum are desktop operating systems. Windows 7 shipped with
multitouch support, but only for touchscreens. For an unknown reason,
Windows also only supports raw multitouch events or gesture events on a
window, but not both. As an example of the consequences of this
shortcoming, Qt had to build their own gesture recognition system so it
could support both raw multitouch events and gestures at the same time.
OS X only supports touchpads, but this simplification alone ensures
that touches are only ever sent to one window at a time. The event
propagation model they chose would not work for touchscreens. In
comparison, the XInput multitouch implementation allows for system- and
application-level gestures and raw multitouch events at the same time
across both direct and indirect touch devices. In your author's biased
opinion, this is a key advantage of Linux on the desktop.
A few bumps in the road
Although development of multitouch through X took more time than anyone
wanted, it was shaping up well for the 1.12 X.org server release. Many
complex issues, such as pointer emulation for touchscreens, were behind
us. However, touchpad support had yet to be finalized. Two large issues
surfaced involving scrolling and other traditional touchpad gestures.
The first issue involved the ability to scroll in two separate windows
while leaving one finger on the touchpad at all times. Imagine there
are two windows side by side. The user positions the cursor over one
window and begins a two-touch scroll motion on the trackpad. The user
then lifts one finger and uses the remaining finger to move the cursor
over the second window. The second finger is then placed on the trackpad
again, and a second scroll motion is performed. Under the XInput
multitouch protocol, a touch sequence is locked to a window once it
begins. If two-touch scrolling is performed through gesture recognition
based on XInput touch events, the touch that began over the first
window could not be used for a scroll gesture over the second window
because the touch events would remain locked to the first. In order to
resolve this difficulty, it was decided that, when only one touch is active on a
touchpad, no touch events are sent. In order to not send two events for
one physical action, pointer motion was also prevented when more than
one touch was present on a touchpad.
This fix resolved pointer motion, but other traditional touchpad
gestures are even more problematic. Particularly troublesome is
two-finger scrolling. When mice with scroll wheels were first
introduced, they had discrete scroll intervals. The wheels often
clicked up and down. This led to an unfortunate API design for scroll
events in the X server. The X core protocol
cannot send pointer events with arbitrary values,
such as a scroll amount. To provide for scrolling through the X core
protocol buttons 4, 5, 6, and 7 were redefined from general purpose
buttons to scroll up, down, left, and right.
When the user scrolls up using a scroll wheel, the X server
sends a button 4 press event and then a button 4 release event. As an
aside, this is the reason why we don't yet have smooth scrolling on the
Linux desktop.
The problem for multitouch lies in the possibility of sending two
separate events for one physical action. This would occur if we sent
touch events at the same time we sent scroll button events. It was
decided that touch events may not be sent while the server is also
sending other events derived from touches. This means that if the user
enables two-finger scrolling, touch events are inhibited unless three
touches are active on the touchpad. Likewise, if the user performs a
two-finger tap to emit a right click, touch events are also inhibited
unless three touches are active on the touchpad, and so on.
Many workarounds were considered, but nothing provided an air-tight
solution. The double-edged sword of backward compatibility prevents X
from supporting scroll events, click emulation, and touch events at the
same time. Your author hopes this situation will end up hastening
support for traditional trackpad gestures on the client side of X
instead of the server side.
Wrapping up
The development of the multitouch extension finished with the release
of the X.org server 1.12 on March 5th, 2012. Many upcoming distribution
releases, including Ubuntu 12.04 LTS, will be shipping it soon.
Although this is the end of the X.org multitouch story, it is only the
beginning for toolkits and applications. GTK+ recently merged an API
for handling raw touch events for 3.4, and your author hopes to merge
raw touch support for Qt in the near future. Next on the roadmap will
be gesture support included in standard toolkit widgets and APIs for
application developers. There is still plenty of work to do, but the
will of those hoping to bring smooth scrolling support to the Apple
Magic Mouse and many other multitouch features is quite strong.
(
Log in to post comments)