User: Password:
Subscribe / Log in / New account


Multi-touch support landing in X

January 18, 2012

This article was contributed by Nathan Willis

X.Org 1.12, the next release of the reference X server, is currently in release candidate status. With it comes several new features, but the most anticipated is probably multi-touch support, courtesy of version 2.2 of the XInput2 extension. Peter Hutterer, maintainer of XInput2, has been writing about adding multi-touch support in his blog since December 2011 — including the architecture and what application developers will need to address before they can bring multi-touch and gesture support to users.

Of course, any discussion of multi-touch X begins by explaining what it is and what it isn't. Multi-touch refers specifically to the ability to recognize and use multiple input points on a single hardware device. For example, using more than one finger to manipulate objects on a touchscreen device, or multi-finger gestures on a touchpad. It is a different animal entirely from multi-pointer X (MPX), which is the ability to use two or more on-screen cursors at the same time, controlled by separate hardware devices. MPX support was added — also by Hutterer — to XInput2 in 2009, and was first released with X.Org 1.7.

Touchpad users are probably already familiar with two-finger scrolling and two- or three-finger mouse clicks. Although detecting multiple points-of-contact is involved, this is also not genuinely multi-touch — detecting the multiple taps or scrolling simply triggers a different event from the touchpad driver. A simple way to tell the difference is that with two-finger middle-clicks, the position of the user's fingers do not matter; the cursor stays (more or less) in one place. With a multi-touch gesture event like a pinch, however, tracking and interpreting the motion and relative positions of the fingers makes all the difference in the world.

Devices and events

XInput2.2 defines two distinct classes of multi-touch device that correspond to the two major modes of multi-touch user interaction. An XIDirectDevice is one where the touch event occurs on screen (as is the case with tablets, touch-screen monitors, and the like). In these cases, the coordinates where the event happens come directly from the position of the touch. The other class is an XIDependentDevice, which is typically a non-display input device like a touchpad. An XIDependentDevice controls a cursor in the "normal" fashion most of the time, but supports multi-touch events, too. The positions of the touch events on an XIDependentDevice are interpreted relative to the cursor position.

XInput2.2 also defines three event types that together describe a touch sequence in the wild: XI_TouchBegin, XI_TouchUpdate, and XI_TouchEnd. By definition, each touch event starts with an XI_TouchBegin, followed by zero or more XI_TouchUpdates, and ends with an XI_TouchEnd. Client applications that want to catch touch events must use the XISelectEvents method to register for all three event types.

To a client application listening for the new event types, touches appear no different from any other XInput event — with the addition of a "touch ID" that is returned in the event's detail field. This ID is a 32-bit value that the application must use to keep track of multiple, simultaneous touches. The touch events use the XIDeviceEvent structure that is also used for pointer and keyboard events; handling touches separately (including interpreting gestures) is left up to the application, library, or toolkit.

Device grabbing and ownership

Although the Begin, Update, and End events cover almost all touch event cases, there is a fourth touch event called XI_TouchOwnership defined by XInput2.2 in order to provide no-delay touch event processing in unusual situations.

The need arises because it is not always unambiguously clear which X client ought to process a series of touch events. For example, on a tablet, a window manager and an image viewer might both reasonably be expected to listen for touch events on the same window — the user could click and drag the window, or perhaps pinch and zoom in on the image. XInput allows a client to stake out a claim to the events that happen in a window by "grabbing" touch events.

Calling the grab API tells the X server that the grabby client wants exclusive access to the input device. But a client application that holds a grab on a device could begin to process a touch event sequence and only then discover that the sequence is not intended for it. To follow up on the example above, the image viewer might grab a touch event sequence, then process it and determine that it is not a gesture it recognizes. In that case, the grabby client sends a reject to the server, and the server then sends the sequence to the next client (i.e., the window manager), which hopefully does recognize the gesture and can react accordingly. When a client decides that it does want to process the touch sequence, it sends an accept instead.

That process works reasonably well when only one or two clients are involved, but every client that studies and rejects the gesture adds lag time, which is exacerbated because the X server re-sends the touch sequence to each subsequent client. If there are several involved, the UI suddenly becomes sluggish to the user. XInput2.2's solution is the XI_TouchOwnership event. A client can select to listen for this event in addition to the usual touch sequence events. When it does so, the X server sends a copy of each event sequence to the client immediately — even if another client has a grab on the device.

The result is that the client listening for XI_TouchOwnership can begin to process touch events immediately and lower its response times. However, because one of the other clients may accept the touch event sequence, whatever action the listening client takes needs to be either invisible and reversible, or hold off on execution until the X server tells the client that its turn has come by sending it an XI_TouchOwnership event.

Toolkit and application support

As the grabbing/ownership scenario illustrates, multi-touch support in X is difficult precisely because X handles multiple active applications running simultaneously. In 2010, Hutterer observed that this was a higher bar than that faced by Apple iOS or Microsoft's Surface project, both of which function in pre-defined environments with only a single full-screen application active at any one time. X is also expected to make multi-touch devices co-exist peacefully with keyboards and standard pointers, which touch-only devices generally do not support.

The result is that XInput2.2's multi-touch support arrives via a separate set of input device types and events, that applications will need to add support for manually. Consequently it may be several release cycles after X.Org 1.12 before window managers, GIMP, Firefox, and the rest fully support the new multi-touch features.

GUI Toolkit support is closer. Qt already has a stable multi-touch API that supports gestures, introduced in version 4.6. Ubuntu recommended Qt to application developers in 2011, while it maintained its own uTouch framework for the Unity environment during the XInput2.1 development cycle. Although uTouch included a patchwork of components in the past, the distribution is migrating it to XInput2.2 for its 12.04 release.

That leaves the GNOME side of field. Carlos Garnacho and others are working on adding multi-touch support to GTK+ and GDK in the multitouch branch. However, it is not clear when this work is expected to land. Garnacho noted his progress in his blog twice in early 2011, but there have been no formal updates there or on the mailing lists for months. Hutterer indicated on the Fedora wiki's multi-touch page that support may land as soon as GTK+ 3.4. Interested parties can keep up to date by watching the GNOME Shell Touchscreen wiki page.

The essential building blocks are lined up (if not all in place) for true multi-touch on systems, so it is plausible that before 2013 arrives, desktop and mobile Linux users will consider multi-touch applications commonplace. But that does not mean that they will be happy about it. As Hutterer mentions on his blog, one of the most challenging aspects of multi-touch is coming up with good, intuitive user interface designs to fit together with the technical underpinnings. On that front, 2012 may also be the year of multi-touch experimentation.

Comments (4 posted)

Brief items

Quotes of the week

One always got the feeling that somebody was steering GNOME, but it wasn’t clear who. When it started, I thought it was Miguel and Nat, then Novell, then Redhat. Now it has that floaty, determined meandering that the best mass open source projects have. From a distance, everyone seems to be constantly bickering and regretting the next steps; but the steps get made, and slowly everyone adapts to them. GNOME feels like a nation now.
-- Danny O'Brien

But these days anything is possible with version-numbers really, except for going backwards. Which is precisely what we are avoiding here.

Just look at Mozilla Firefox (moving from 4 to 9 at the same pace as they went from 0.7 to 1.0) or Google chrome (what version-number are they using anyway?), or the linux-kernel, going from 2.6.0 to 2.6.39 with entire subsystems being rewritten from scratch, and then moving from 2.6.39 to 3.0 without any radical change whatsoever.

Really, moving from 4.8 to 4010 is not really that big a deal, if it serves the right purpose.

-- Stephan Arts for the Xfce project

Comments (13 posted)

Cheese 3.3.4 released

The Cheese webcam photo and video capturing application with multiple "fancy special effects" has released unstable version 3.3.4. The biggest new feature is support for photo and video sharing that "was added by GNOME Outreach Program for Women intern Patricia Santana Cruz". In addition, the test suite was improved, documentation creation now uses stylesheets, microphone control was added via PulseAudio, and more, including lots of bug fixes.

Full Story (comments: none)

Media Goblin 0.2.0 released

Media Goblin, an "AGPL-licensed federated multi-media hosting platform" that is part of the GNU project has released version 0.2.0 ("Our Tubes!"). New in this version is support for HTML 5 compliant video delivery, better handling of resizing uploaded images, additional customization options, and better documentation. "MediaGoblin's big picture goal is to support loads of different media types, so video is just the beginning. In the near future MediaGoblin will be able host slide sharing, three dimensional model uploading and viewing, even ascii art."

Full Story (comments: none)

Salt 0.9.5 released

The Salt the remote execution manager has released version 0.9.5. The release notes detail all the changes that have gone into "one of the largest steps forward in the development of Salt". Those changes include moving from Python pickles to Message Pack for better network serialization performance, automatic module loading and reloading without requiring a restart, easier module deployment, node groups, and more. "Salt allows commands to be executed across large groups of servers. This means systems can be easily managed, but data can also be easily gathered. Quick introspection into running systems becomes a reality. Remote execution is usually used to set up a certain state on a remote system. Salt addresses this problem as well, the salt state system uses salt state files to define the state a server needs to be in. Between the remote execution system, and state management Salt addresses the backbone of cloud and data center management."

Comments (1 posted)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Organizing Open Source Efforts at NASA (

Over at, Rikki Endsley interviews two NASA workers who are responsible for organizing and expanding the US space agency's open source efforts. "In December, [William] Eshagh announced NASA's presence on GitHub, and their first public repository houses NASA's World Wind Java project, an open source 3D interactive world viewer. Additional projects are being added, including OpenMDAO, an open-source Multidisciplinary Design Analysis and Optimization (MDAO) framework; NASA Ames StereoPipeline, a suite of automated geodesy and stereogrammetry tools; and NASA Vision Workbench, a general-purpose image processing and computer vision library."

Comments (none posted)

Kruisselbrink: Calligra on Android

On his blog, Marijn Kruisselbrink reports on getting Calligra Mobile working on Android. In it he describes various problems he ran into in porting the mobile office suite, including a lack of DBus and KSyCoCa support in Android. "So after some (sometimes frustrating) hacking, I've got the first results: Calligra Mobile running on an android tablet. There are still lots of rough edges, and not everything works correctly, but as you can see in these screenshots, it does actually run and work. To get to this point I had to make some rather ugly hacks though to work around some of the android limitations." (Thanks to Inge Wallin.)

Comments (2 posted)

Typing at 255 WPM shouldn't cost $4000: Plover, the open source steno system (

Mel Chua writes about Plover, an open suite for stenography, at "Plover isn't just a straight-out copy-paste of existing proprietary CART [Communication Access Realtime Transcription] software; it also has several feature advantages over them. Most steno software has a time-based buffer, forcing the user to conform to the software's timing; Plover is designed the other way around, so the software responds to a human, and typists can take their time to think and control the pacing of their words. Plover is also the first steno software of any kind that follows the Unix design principle of modularity, acting essentially as a keyboard emulator - no different from any other alternative input option such as on-screen keyboards for tablets or input methods for the disabled. In contrast, proprietary steno programs contain full-fledged word processors that typists are then forced to use."

Comments (23 posted)

Raghavan: PulseAudio vs. AudioFlinger: Fight!

On his blog, Arun Raghavan reports on comparing the performance of PulseAudio vs. Android's AudioFlinger, running them both on a Galaxy Nexus smartphone under Ice Cream Sandwich (Android 4.0). He compares CPU, memory, power usage, latency, and the features offered by both, and PulseAudio fares quite well. "For future work, it would be interesting to write a wrapper on top of PulseAudio that exposes the AudioFlinger audio and policy APIs — this would basically let us run PulseAudio as a drop-in AudioFlinger replacement. In addition, there are potential performance benefits that can be derived from using Android-specific infrastructure such as Binder (for IPC) and ashmem (for transferring audio blocks as shared memory segments, something we support on desktops using the standard Linux SHM mechanism which is not available on Android)."

Comments (28 posted)

Page editor: Jonathan Corbet
Next page: Announcements>>

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds