| Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net. |
The XInput multitouch extension provides for multitouch input events to be sent from the device to the appropriate window on the desktop. Multitouch events can then be used for gesture recognition, multi-user interactions, or multi-point interactions such as finger painting. While the general concepts behind delivering multitouch events through a window server are fairly well defined, there are many devils hiding in the details. Here, we provide a look into the development of the multitouch extension and many of the issues encountered along the way.
For Henrik Rydberg, it began as an attempt to make the trackpad on his Apple Macbook work just as well on Ubuntu as it does on OS X. For Stéphane Chatty, it began as a research project to develop new user interface paradigms. For your author, it began as a quest to enable smooth scrolling from an Apple Magic Mouse.
Like many undertakings in open source, multitouch on the Linux desktop is the culmination of the many efforts of people with disparate goals. With the release of the X.org server 1.12, we now have a modern multitouch foundation for toolkits and applications to build upon.
The beginning of multitouch support for Linux can be traced back to the 2.6.30 kernel. Henrik had just merged additions to the evdev input subsystem for multitouch along with support for the trackpads found in all Apple unibody Macbooks. Stéphane then added multitouch support to some existing Linux drivers, such as hid-ntrig, and some new drivers, such as hid-cando.
Some developers started playing around with the new Linux multitouch support. Over time, the libavg and kivy specialized media and user interface toolkits added multitouch support based on the evdev interface. However, there was a glaring issue: the absence of window-based event handling. Applications had to assume that they were full-screen, and all touch events were directed to them exclusively. This was a fair assumption for games, which was the main impetus for libavg touch support. However, it was clear we needed to develop a generic multitouch solution working through the X window system server.
Discussions began on how to incorporate touch events into the X input model shortly after kernel support was present. Initial work by Benjamin Tissoires and Carlos Garnacho extended XInput 2.0's new multi-device support for multitouch. Each time a touch began, a new pointing "device" would be created. Alternatively, a pool of pre-allocated touch "devices" could be used. However, this approach broke many assumptions about how devices and events should be handled. As a simple example, a "touch begin" event would appear to the client as though the pointer had moved to a new location. How would the client know that the previous touch hadn't simply moved, as opposed to a new touch starting? At this point Peter Hutterer, the X.org input subsystem maintainer, decided we needed completely new semantics for touch input through X.
Around the same time, Canonical was interested in adding multitouch interfaces to the Linux desktop. The uTouch team, of which your author is a member, was formed to develop a gesture system that could recognize and handle system-level and application-level gestures. Since X did not have touch support yet, the team focused on providing gestures through the X.org server using a server extension. The result was shipped in Ubuntu 10.10 and the extension was proposed for upstream X.org.
While many developers were enthusiastic about the potential for gesture support through the X.org server, it was not meant to be. X.org as a foundation holds backward compatibility in high regard. Applications written over 20 years ago should still function properly today, in theory. Though backward compatibility has benefits, it is a double-edged sword. Any new functionality must be thoroughly reviewed, and inclusion in one X.org release means inclusion in all future releases. Even to this day, gesture support is not a settled technology. It is highly probable that an X gesture extension created a year and a half ago would not be sufficient for use cases we are coming up with today, let alone potentially years from now. So the X developers are reluctant to include gesture support at this time.
Those concerns notwithstanding, the need for touch through the X server grew stronger. Peter and Daniel Stone developed a first draft of the XInput 2.1 protocol, which later became XInput 2.2, where touches send separate events from traditional pointer motion and button press events. Three event types ("touch begin," "update," and "end") were specified. However, the need to support system-level gestures added a requirement for a new method of event handling: the touch grab.
X11 input device grabs allow for one client to request exclusive access to a device under certain criteria. A client can request an active grab so that all events are sent to it. A client can also request a passive grab, where events are sent to it when a button on the mouse is pressed while the cursor is positioned over a window, or when a key is pressed on a keyboard while a window is focused. Passive grabs allow for raising a clicked window in a click-to-focus desktop environment, for example. When the user presses the mouse button over a lower window, the window manager receives the event first through a passive grab. It raises the window to the top and then replays the event so the application can receive the button press. However, X only allows for a passively grabbing client to receive one event before it needs to make a decision on whether to accept it and all future events until a release event, or to request that the server replay the event to allow another client to receive the events.
This mechanism has been adequate for decades, but doesn't quite work for system-level gestures. Imagine that the window manager wants to recognize a three-touch swipe. It is impossible to know if a three-touch swipe has been performed if the window manager can only view touch begin events; it must be able to receive the subsequent events to determine whether the user is performing a swipe or not. The idea behind touch grabs is that the grabbing client can receive all events until it makes a decision about whether to accept or reject a touch sequence. Now, the window manager can listen for all touches that begin around the same time and watch them as they move. If there are three touches and they all move in the same direction, the window manager recognizes a drag gesture and accepts the touch sequences. No one else will see the touch events. However, if the touches don't match for any reason, the window manager rejects the touch sequences so other clients, such as a finger painting application, can receive the events.
This works great for system-level gesture recognition. However, it necessarily imposes lag between a physical touch occurring and an application receiving the touch events if the the system is attempting to recognize gestures. At the X Developer Summit 2010, your author presented an overview of the vision for an XInput multitouch-based uTouch gesture stack. One afternoon, while eating lunch and discussing things over beer, the issue of the potential for lag came up. Between those at the table, including Peter, Kristian Høgsberg, and myself, the solution was elusive. However, at some point later in the conference the issue came up again on IRC. Keith Packard made the suggestion that touch events be sent to all clients, even before they become the owner of touch sequences. With the idea at hand, your author scurried home and drafted up the beginning of what would later become ownership event handling.
As Nathan Willis explained in his overview of the XInput 2.2 protocol, a client may elect to receive events for a touch sequence before it becomes the owner of the sequence by requesting touch ownership events alongside touch begin, update, and end events. The client will receive touch events without delay, but must watch for notification of ownership. Once a touch ownership event is received for a sequence, the client owns the sequence and may process it as normal. Alternatively, if a preceding touch grab is accepted, the client will receive a touch end event for the touch sequence without ever receiving a touch ownership event. This mechanism allows for a client to perform any processing as touch events occur, but the client must take care to undo any state if the touch sequences are ultimately accepted by some other client instead.
With the basic concepts hammered out, your author, with an initial base of work contributed by Daniel Stone, began a prototype implementation that shipped in Ubuntu 11.04 and 11.10. The uTouch gesture system based around XInput multitouch began to take form. This was enough to prove that the protocol was reasonably sound, and efforts began in earnest on an upstream implementation for the X.org server 1.12 release.
It is interesting to note how XInput multitouch compares to other window server touch handling. On one end of the spectrum are phones and tablets, which run most applications full screen. This, and the lack of support for indirect touch devices, e.g. touchpads, means mobile device window manager multitouch support is much simpler. On the other end of the spectrum are desktop operating systems. Windows 7 shipped with multitouch support, but only for touchscreens. For an unknown reason, Windows also only supports raw multitouch events or gesture events on a window, but not both. As an example of the consequences of this shortcoming, Qt had to build their own gesture recognition system so it could support both raw multitouch events and gestures at the same time. OS X only supports touchpads, but this simplification alone ensures that touches are only ever sent to one window at a time. The event propagation model they chose would not work for touchscreens. In comparison, the XInput multitouch implementation allows for system- and application-level gestures and raw multitouch events at the same time across both direct and indirect touch devices. In your author's biased opinion, this is a key advantage of Linux on the desktop.
Although development of multitouch through X took more time than anyone wanted, it was shaping up well for the 1.12 X.org server release. Many complex issues, such as pointer emulation for touchscreens, were behind us. However, touchpad support had yet to be finalized. Two large issues surfaced involving scrolling and other traditional touchpad gestures.
The first issue involved the ability to scroll in two separate windows while leaving one finger on the touchpad at all times. Imagine there are two windows side by side. The user positions the cursor over one window and begins a two-touch scroll motion on the trackpad. The user then lifts one finger and uses the remaining finger to move the cursor over the second window. The second finger is then placed on the trackpad again, and a second scroll motion is performed. Under the XInput multitouch protocol, a touch sequence is locked to a window once it begins. If two-touch scrolling is performed through gesture recognition based on XInput touch events, the touch that began over the first window could not be used for a scroll gesture over the second window because the touch events would remain locked to the first. In order to resolve this difficulty, it was decided that, when only one touch is active on a touchpad, no touch events are sent. In order to not send two events for one physical action, pointer motion was also prevented when more than one touch was present on a touchpad.
This fix resolved pointer motion, but other traditional touchpad gestures are even more problematic. Particularly troublesome is two-finger scrolling. When mice with scroll wheels were first introduced, they had discrete scroll intervals. The wheels often clicked up and down. This led to an unfortunate API design for scroll events in the X server. The X core protocol cannot send pointer events with arbitrary values, such as a scroll amount. To provide for scrolling through the X core protocol buttons 4, 5, 6, and 7 were redefined from general purpose buttons to scroll up, down, left, and right. When the user scrolls up using a scroll wheel, the X server sends a button 4 press event and then a button 4 release event. As an aside, this is the reason why we don't yet have smooth scrolling on the Linux desktop.
The problem for multitouch lies in the possibility of sending two separate events for one physical action. This would occur if we sent touch events at the same time we sent scroll button events. It was decided that touch events may not be sent while the server is also sending other events derived from touches. This means that if the user enables two-finger scrolling, touch events are inhibited unless three touches are active on the touchpad. Likewise, if the user performs a two-finger tap to emit a right click, touch events are also inhibited unless three touches are active on the touchpad, and so on.
Many workarounds were considered, but nothing provided an air-tight solution. The double-edged sword of backward compatibility prevents X from supporting scroll events, click emulation, and touch events at the same time. Your author hopes this situation will end up hastening support for traditional trackpad gestures on the client side of X instead of the server side.
The development of the multitouch extension finished with the release of the X.org server 1.12 on March 5th, 2012. Many upcoming distribution releases, including Ubuntu 12.04 LTS, will be shipping it soon. Although this is the end of the X.org multitouch story, it is only the beginning for toolkits and applications. GTK+ recently merged an API for handling raw touch events for 3.4, and your author hopes to merge raw touch support for Qt in the near future. Next on the roadmap will be gesture support included in standard toolkit widgets and APIs for application developers. There is still plenty of work to do, but the will of those hoping to bring smooth scrolling support to the Apple Magic Mouse and many other multitouch features is quite strong.
Excellent article, and a sugestion
Posted Mar 7, 2012 18:05 UTC (Wed) by dgm (subscriber, #49227) [Link]
Excellent article, and a sugestion
Posted Mar 7, 2012 18:51 UTC (Wed) by cnd (guest, #50542) [Link]
I believe the "correct" solution is to use the XACE extension, but I'm not very familiar with it. However, I think most distros use the XAUTHORITY mechanism instead, to keep clients from even connecting to displays they shouldn't have access to. It's assumed that if some malicious software has access to your X server you've already failed.
Excellent article, and a sugestion
Posted Mar 7, 2012 22:34 UTC (Wed) by dlang (subscriber, #313) [Link]
This doesn't sound right.
Read literally, this means that if you run angry birds, you should expect that it can see your password as you type it into your screen saver, or when you type your root password into your package manager (or sudo authorization)
I think this is what is meant.
Excellent article, and a sugestion
Posted Mar 7, 2012 22:42 UTC (Wed) by daniels (subscriber, #16193) [Link]
Even if you plug what you'd think would be all the obvious holes, there's nothing to stop Angry Birds taking over your browser and hijacking everything. And your terminals. Especially if you're running it as your user, it's already compromised your entire session, for good, and realistically the only way around it is trashing your entire profile and starting anew.
Either that, or just not running apps you don't trust if security is a concern.
Excellent article, and a sugestion
Posted Jun 7, 2012 10:27 UTC (Thu) by cheako (guest, #81350) [Link]
I know that it's not easy to be secure, even Browsers have had issues with loading some image files. Leaving doors like this open because "There are other security issues" is not acceptable! There are always other security issues and what kind of world do you think we'd have if that excuse worked. Sudo doesn't really need to verify passwords because we already know the user has logged in, network VPNs need to be encrypted but there is no point in verifying the data isn't forged because no one would have the key used to make an encrypted packet. I hope I've made my point, but I'll try and get a few examples more on topic. Email servers can forward the BCC header to every one, it'll be removed on the receiving end by the users MUA. Passwords can be saved along with there hashes, because no one could ever read the shadow file. Hmm, I'm still not happy with any of these. Ahh, SSH doesn't need to be encrypted because there is telnet. Self signed server certificates are just as good as any other because no one really knows what they have installed for trusted CA certs. I really like this last one, a lot.
Excellent article, and a sugestion
Posted Mar 8, 2012 8:05 UTC (Thu) by epa (subscriber, #39769) [Link]
Excellent article, and a sugestion
Posted Jun 7, 2012 10:42 UTC (Thu) by cheako (guest, #81350) [Link]
However having the default be insecure as this proposal suggests is not the way Linux development should be done. There are a number of applications that should make use of the 'lock keyboard on me' feature to prevent keyloggers, yes prevent keyloggers from getting password and not prevent keyloggers from being run in the first place. They say an ounce of prevention is worth a pound of cure, but simply not having a cure at all because absolute prevention is the better. It sounds wrong, because it is wrong.
If you work hard to prevent keyloggers from being able to log anything useful, then it makes keyloggers useless. If keyloggers are useless then you'll find there are less ppl using keyloggers. Thus your cure becomes your prevention, it's true that a good defense is a great offense. Make multi-touch vary offensive to any application that attempts to collect sensitive information. On the defensive side the user will do there best to make sure applications like that don't connect to the X server. If you don't do your part the team as a whole will suffer.
Excellent article, and a sugestion
Posted Jun 7, 2012 10:46 UTC (Thu) by cheako (guest, #81350) [Link]
No, that's not why. SSH doesn't expose the local X server to remote systems by default because it's more secure to have this feature disabled unless the user has specific need for it. Not because X is inherently insecure, if anything an SSH client that did not do this would be insecure.
Excellent article, and a sugestion
Posted Mar 8, 2012 22:01 UTC (Thu) by dpquigl (guest, #52852) [Link]
Excellent article, and a sugestion
Posted Mar 7, 2012 21:50 UTC (Wed) by mhelsley (guest, #11324) [Link]
Sounds fine for a touchscreen but it doesn't make as much sense on a touchpad where even a small "private" zone would be a poor use of the small area. I think something similar to a keyboard grab -- except for touch input -- might be suitable for the latter. Yet keyboard grabs can be quite annoying if/when applications fail to release them. So I'm not sure what else could reasonably be done to handle the need for "private" input.
Excellent article, and a sugestion
Posted Mar 8, 2012 7:56 UTC (Thu) by michaeljt (subscriber, #39183) [Link]
> Sounds fine for a touchscreen but it doesn't make as much sense on a touchpad where even a small "private" zone would be a poor use of the small area.
I realise that this train has left, but what is the reason for handling touch screens and touch pads the same? I would naively assume that all touch pad gestures would automatically go to the currently focussed window.
Excellent article, and a sugestion
Posted Mar 8, 2012 10:25 UTC (Thu) by daniels (subscriber, #16193) [Link]
Excellent article, and a sugestion
Posted Jun 7, 2012 10:58 UTC (Thu) by cheako (guest, #81350) [Link]
Thus we are convinced that touch screens and touch pads will have diverging code paths and it would be possible to implement a fix or feature in one with out effecting the other at all.
Thus the solution is to have private zones for a touch screen and use the existing key board locking for a touch pad.
Excellent article, and a sugestion
Posted Jun 7, 2012 10:52 UTC (Thu) by cheako (guest, #81350) [Link]
Another solution is needed, not that I have any ideas. I'm just saying that the current situation is not perfect and if we have the chance we should make things better.
Excellent article, and a sugestion
Posted Mar 7, 2012 22:12 UTC (Wed) by daniels (subscriber, #16193) [Link]
So what you've suggested doesn't really help in any meaningful way, but imposes a significant usability cost.
Excellent article, and a sugestion
Posted Mar 7, 2012 22:27 UTC (Wed) by josh (subscriber, #17465) [Link]
Excellent article, and a sugestion
Posted Mar 7, 2012 22:38 UTC (Wed) by daniels (subscriber, #16193) [Link]
And even all this still doesn't solve any of the security problems. The long and short of it is that if you give untrusted clients complete access to your X server, then you cannot -- cannot -- win in any way.
Excellent article, and a sugestion
Posted Jun 7, 2012 11:07 UTC (Thu) by cheako (guest, #81350) [Link]
I don't believe any system is totally secure. I know that a lock on a door can be picked easily. This doesn't stop me from locking my front door every time I leave. Just because something is hard, just because something is challenging and most importantly when something is impossible. Are you just going to give up and let the rest of your life amount to nothing?
That's fine by me, but don't take others with you when you go. If you'r going to write the application equivalent of a poisoned punch(even if it's only fatal vary rarely and even then only for stupid ppl), do go giving it out at a party... That's not cool!
Excellent article, and a sugestion
Posted Mar 8, 2012 7:58 UTC (Thu) by michaeljt (subscriber, #39183) [Link]
Daniel, sorry for the really stupid question (never used multi-touch) but what is an example of a global gesture in a touch screen/multi-application context?
Excellent article, and a sugestion
Posted Mar 8, 2012 9:22 UTC (Thu) by daniels (subscriber, #16193) [Link]
Excellent article, and a sugestion
Posted Mar 8, 2012 3:19 UTC (Thu) by russell (guest, #10458) [Link]
Excellent article, and a sugestion
Posted Mar 8, 2012 9:45 UTC (Thu) by dgm (subscriber, #49227) [Link]
Let me explain a bit more my idea: A "private" zone would correspond to parts of the client window. Clients should only allowed to request such zones on the windows they own. Because of that, this would really not apply to touch pads, as they are "relative" input devices and do not map directly to screen coordinates.
The utility of a private zone would be to capture touch events (not gestures) initiating in those zones. Touch events initiated elsewhere should not be affected, even when crossing private zones. Of course, an application should not be allowed to "privatize" the whole screen (even if shown full screen).
The life story of the XInput multitouch extension
Posted Mar 7, 2012 18:51 UTC (Wed) by daniels (subscriber, #16193) [Link]
The life story of the XInput multitouch extension
Posted Mar 8, 2012 21:34 UTC (Thu) by bronson (subscriber, #4806) [Link]
Will scrolling have momentum too? A quick flick is so much more satisfying than laboriously grinding your way down a document.
The life story of the XInput multitouch extension
Posted Mar 9, 2012 7:53 UTC (Fri) by tomeu (subscriber, #64689) [Link]
Yup: http://git.gnome.org/browse/gtk+/commit/?id=f6393199beb8
The life story of the XInput multitouch extension
Posted Mar 9, 2012 11:23 UTC (Fri) by daniels (subscriber, #16193) [Link]
Will scrolling have momentum too? A quick flick is so much more satisfying than laboriously grinding your way down a document.
It already does in the Synaptics driver even without smooth scrolling, but it just doesn't work particularly well. Of particular note is that momentum doesn't accumulate: you can have the scroll coasting to a slow stop, but then when you put two fingers down to scroll again, the scroll momentum starts from zero, rather than starting from the velocity of the coast. Patches are, of course, welcome. :)
The life story of the XInput multitouch extension
Posted Mar 10, 2012 20:14 UTC (Sat) by mgedmin (subscriber, #34497) [Link]
The life story of the XInput multitouch extension
Posted Mar 10, 2012 22:38 UTC (Sat) by mpr22 (subscriber, #60784) [Link]
Anyone who takes advantage of their mouse having a disengageable scroll wheel ratchet has run into that problem already (and not just on X), of course.
The life story of the XInput multitouch extension
Posted Mar 11, 2012 0:25 UTC (Sun) by dlang (subscriber, #313) [Link]
The life story of the XInput multitouch extension
Posted Mar 8, 2012 3:24 UTC (Thu) by russell (guest, #10458) [Link]
The life story of the XInput multitouch extension
Posted Mar 8, 2012 10:28 UTC (Thu) by daniels (subscriber, #16193) [Link]
The life story of the XInput multitouch extension
Posted Mar 8, 2012 21:39 UTC (Thu) by bronson (subscriber, #4806) [Link]
The ability to leave one finger on the touchpad is a good example: it's non-obvious at first and rather important. I'm really glad you guys took the extra time to get that right.
The life story of the XInput multitouch extension
Posted Mar 9, 2012 11:16 UTC (Fri) by Aissen (guest, #59976) [Link]
EFL
Posted Mar 9, 2012 10:43 UTC (Fri) by Aissen (guest, #59976) [Link]
The life story of the XInput multitouch extension
Posted Apr 13, 2012 18:04 UTC (Fri) by rydberg (guest, #84129) [Link]
Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds