X.Org 1.12, the next release of the reference X server, is currently in release candidate status. With it comes several new features, but the most anticipated is probably multi-touch support, courtesy of version 2.2 of the XInput2 extension. Peter Hutterer, maintainer of XInput2, has been writing about adding multi-touch support in his blog since December 2011 — including the architecture and what application developers will need to address before they can bring multi-touch and gesture support to users.
Of course, any discussion of multi-touch X begins by explaining what it is and what it isn't. Multi-touch refers specifically to the ability to recognize and use multiple input points on a single hardware device. For example, using more than one finger to manipulate objects on a touchscreen device, or multi-finger gestures on a touchpad. It is a different animal entirely from multi-pointer X (MPX), which is the ability to use two or more on-screen cursors at the same time, controlled by separate hardware devices. MPX support was added — also by Hutterer — to XInput2 in 2009, and was first released with X.Org 1.7.
Touchpad users are probably already familiar with two-finger scrolling and two- or three-finger mouse clicks. Although detecting multiple points-of-contact is involved, this is also not genuinely multi-touch — detecting the multiple taps or scrolling simply triggers a different event from the touchpad driver. A simple way to tell the difference is that with two-finger middle-clicks, the position of the user's fingers do not matter; the cursor stays (more or less) in one place. With a multi-touch gesture event like a pinch, however, tracking and interpreting the motion and relative positions of the fingers makes all the difference in the world.
XInput2.2 defines two distinct classes of multi-touch device that correspond to the two major modes of multi-touch user interaction. An XIDirectDevice is one where the touch event occurs on screen (as is the case with tablets, touch-screen monitors, and the like). In these cases, the coordinates where the event happens come directly from the position of the touch. The other class is an XIDependentDevice, which is typically a non-display input device like a touchpad. An XIDependentDevice controls a cursor in the "normal" fashion most of the time, but supports multi-touch events, too. The positions of the touch events on an XIDependentDevice are interpreted relative to the cursor position.
XInput2.2 also defines three event types that together describe a touch sequence in the wild: XI_TouchBegin, XI_TouchUpdate, and XI_TouchEnd. By definition, each touch event starts with an XI_TouchBegin, followed by zero or more XI_TouchUpdates, and ends with an XI_TouchEnd. Client applications that want to catch touch events must use the XISelectEvents method to register for all three event types.
To a client application listening for the new event types, touches appear no different from any other XInput event — with the addition of a "touch ID" that is returned in the event's detail field. This ID is a 32-bit value that the application must use to keep track of multiple, simultaneous touches. The touch events use the XIDeviceEvent structure that is also used for pointer and keyboard events; handling touches separately (including interpreting gestures) is left up to the application, library, or toolkit.
Although the Begin, Update, and End events cover almost all touch event cases, there is a fourth touch event called XI_TouchOwnership defined by XInput2.2 in order to provide no-delay touch event processing in unusual situations.
The need arises because it is not always unambiguously clear which X client ought to process a series of touch events. For example, on a tablet, a window manager and an image viewer might both reasonably be expected to listen for touch events on the same window — the user could click and drag the window, or perhaps pinch and zoom in on the image. XInput allows a client to stake out a claim to the events that happen in a window by "grabbing" touch events.
Calling the grab API tells the X server that the grabby client wants exclusive access to the input device. But a client application that holds a grab on a device could begin to process a touch event sequence and only then discover that the sequence is not intended for it. To follow up on the example above, the image viewer might grab a touch event sequence, then process it and determine that it is not a gesture it recognizes. In that case, the grabby client sends a reject to the server, and the server then sends the sequence to the next client (i.e., the window manager), which hopefully does recognize the gesture and can react accordingly. When a client decides that it does want to process the touch sequence, it sends an accept instead.
That process works reasonably well when only one or two clients are involved, but every client that studies and rejects the gesture adds lag time, which is exacerbated because the X server re-sends the touch sequence to each subsequent client. If there are several involved, the UI suddenly becomes sluggish to the user. XInput2.2's solution is the XI_TouchOwnership event. A client can select to listen for this event in addition to the usual touch sequence events. When it does so, the X server sends a copy of each event sequence to the client immediately — even if another client has a grab on the device.
The result is that the client listening for XI_TouchOwnership can begin to process touch events immediately and lower its response times. However, because one of the other clients may accept the touch event sequence, whatever action the listening client takes needs to be either invisible and reversible, or hold off on execution until the X server tells the client that its turn has come by sending it an XI_TouchOwnership event.
As the grabbing/ownership scenario illustrates, multi-touch support in X is difficult precisely because X handles multiple active applications running simultaneously. In 2010, Hutterer observed that this was a higher bar than that faced by Apple iOS or Microsoft's Surface project, both of which function in pre-defined environments with only a single full-screen application active at any one time. X is also expected to make multi-touch devices co-exist peacefully with keyboards and standard pointers, which touch-only devices generally do not support.
The result is that XInput2.2's multi-touch support arrives via a separate set of input device types and events, that applications will need to add support for manually. Consequently it may be several release cycles after X.Org 1.12 before window managers, GIMP, Firefox, and the rest fully support the new multi-touch features.
GUI Toolkit support is closer. Qt already has a stable multi-touch API that supports gestures, introduced in version 4.6. Ubuntu recommended Qt to application developers in 2011, while it maintained its own uTouch framework for the Unity environment during the XInput2.1 development cycle. Although uTouch included a patchwork of components in the past, the distribution is migrating it to XInput2.2 for its 12.04 release.
That leaves the GNOME side of field. Carlos Garnacho and others are working on adding multi-touch support to GTK+ and GDK in the multitouch branch. However, it is not clear when this work is expected to land. Garnacho noted his progress in his blog twice in early 2011, but there have been no formal updates there or on the mailing lists for months. Hutterer indicated on the Fedora wiki's multi-touch page that support may land as soon as GTK+ 3.4. Interested parties can keep up to date by watching the GNOME Shell Touchscreen wiki page.
The essential building blocks are lined up (if not all in place) for true multi-touch on X.org systems, so it is plausible that before 2013 arrives, desktop and mobile Linux users will consider multi-touch applications commonplace. But that does not mean that they will be happy about it. As Hutterer mentions on his blog, one of the most challenging aspects of multi-touch is coming up with good, intuitive user interface designs to fit together with the technical underpinnings. On that front, 2012 may also be the year of multi-touch experimentation.
Just look at Mozilla Firefox (moving from 4 to 9 at the same pace as they went from 0.7 to 1.0) or Google chrome (what version-number are they using anyway?), or the linux-kernel, going from 2.6.0 to 2.6.39 with entire subsystems being rewritten from scratch, and then moving from 2.6.39 to 3.0 without any radical change whatsoever.
Really, moving from 4.8 to 4010 is not really that big a deal, if it serves the right purpose.
Newsletters and articles
Page editor: Jonathan Corbet
Next page: Announcements>>
Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds