May 22, 2012
This article was contributed by Chase Douglas
As the Linux desktop increases in popularity, the user interface experience
has become increasingly important. For example, most laptops today have
multitouch
capabilities that have yet to be fully exposed and exploited in the free
software ecosystem. Soon we will be carrying around multitouch tablets with
a traditional Linux desktop or similar foundation. In order to provide a
high-quality and rich experience we must fully exploit multitouch gestures. The
uTouch stack developed by Canonical aims to provide a foundation for
gestures on the Linux desktop.
uTouch capabilities
The new X.org multitouch features allow for
multitouch support in
applications. We now have a software stack, uTouch, built on top of this
multitouch support that can provide for practically any gesture scenario
imaginable.
A "gesture" is normally thought of as a two-dimensional movement made by
the user on some sort of input device—a two-finger pinch, for example, or a
three-finger downward drag. Teaching a computer to recognize these
movements requires a lower-level description, though; in uTouch, this
description consists of values like the number of touches, movement
thresholds, and timeout values. An application may register a "gesture
subscription" describing a specific gesture and be notified when that
gesture is recognized by the uTouch subsystem. Those notifications take
the form of a sequence of events describing the gesture motion over time.
Key to understanding how uTouch works is knowledge of all the typical
gesture use cases. First, we have the concept of gesture primitives: drag,
pinch (including both "pinch" and "spread"), rotate, and tap. These
primitives make up the foundation of all intuitive gestures. They can be
strung together as needed for more complex gestures, such as a double
tap. Stroke gestures, such as drawing an ‘M’ to open the mail client, may
be recognized as a specific long gesture sequence, or as a sequence of drag
gestures. Note, however, that uTouch does not have stroke gesture detection
facilities built-in.
Second, there are two fundamental object interaction types: single motion,
single interpretation gestures and direct object manipulation. The former
involves gestures like a two-touch swipe to go backward and forward through
browser history, while the latter involves gestures like a three touch drag
to move an application window around the desktop.
The single motion, single interpretation gestures require thresholds and/or
timeouts. For example, the colloquially implied difference between a swipe
and a drag is that a swipe must be a quick motion in a given direction,
whereas a drag may be any motion that manifests in a displacement in
space. To put it in uTouch gesture subscription terms, a swipe is a drag
primitive gesture with a displacement threshold that must be crossed within
a specific amount of time. For example, when implementing browser history
gestures a two-touch swipe may be implemented with a threshold of 100
pixels over half of a second. In contrast, direct object manipulation
usually implies a zero threshold. For example, as soon as three touches
begin on a window, the window should be movable.
Most simple gesture interactions may be handled through gesture
subscriptions consisting of the required gesture primitives and the object
interaction types. However, there are times when an application needs to
have further control over gesture recognition. For example, a bezel drag
gesture occurs when the user begins a drag from the bezel of the screen and
moves inward. This gesture must be distinguished from the user initiating a
touch at the edge of the screen. The problem lies in the fact that both the
bezel drag and the direct touch near the edge of the screen look
indistinguishable at the beginning of the gesture. The
distinguishing aspect is that the bezel drag is perpendicular to the
bezel and
has a non-zero initial velocity as seen by the touchscreen, whereas the
direct touch near the edge of the screen will likely not have an initial
velocity and/or may not be moving perpendicular to the bezel. To
cater for a client that cares about one of these gestures but not the
other, uTouch requires the client to accept or reject every gesture. When a
gesture is rejected, the touches may be replayed to the X server, which
allows for the mixing of gestures and raw multitouch in the same
application.
Another facet of uTouch, as hinted above, is that, by default, it operates
through
"touch grab" semantics. When used on top of X.org, uTouch gestures are
recognized from touches received through touch grabs. One benefit of this
approach is the ability to mix gestures and raw multitouch in the same
application. However, it also allows for priority handling of gestures. For
example, system gestures may be handled by a client listening to touches
through a grab on the root window. When gestures are not recognized or are
rejected by the uTouch client, the touches are replayed to the next touch
grab or selecting client. Thus, global gestures, application gestures, and
raw multitouch events are all possible when using uTouch.
The last major feature of uTouch is the ability to recognize multiple
simultaneous gestures in the same area. For example, imagine a game where
the user pinches bugs on the screen to squash them. The screen is one large
gesture input area, but the user may use both hands to pinch bugs. In
order to facilitate this interaction mode, whenever new touches begin
within the gesture
area they are combinatorially matched with other touches that begin within
a "glue" time period. In our game example there is a two-touch pinch
gesture subscription. If four touches begin in the game area within the
glue time period, six combinations of potential gestures will be
matched. As touch events are delivered, the state of each matched gesture
will be updated and then checked against the threshold and timeout for the
gesture subscription. If a gesture meets the threshold and timeout
criteria, it will be delivered to the client. The client can then attempt
to match up the touches of the gesture against its context to determine
whether to accept or reject each gesture. In the example below, there will
be four pinch gestures sent to the client:
(Bug
icons licensed under LGPL)
There will be potential pinch gestures for: AB, CD, AD, and BC (AC and BD,
by virtue of moving in the same direction, are not considered to be
potential pinches). The
application must determine which gestures make sense. One method would be
to hit test the initial centroid of each gesture against the bugs on the
screen. All gestures that hit a bug are accepted. Note that uTouch
automatically rejects overlapping gestures, so as soon as AB and CD are
accepted, AD and BC will be implicitly rejected.
There is a twist to this complex logic, however. Gesture events are
received serially. The client may need to know if more gestures are
possible for a set of touches. For example, if both one-touch and two-touch
drag gestures are subscribed, a two touch drag will cause two one-touch
drag gestures and a two-touch drag gesture. If the uTouch client receives a
one-touch drag first, it may not realize that a two-touch drag is coming
for the touch as well. To handle this issue, a gesture property is provided
to denote the finish of gesture construction for all of its touches. When a
gesture has finished construction, the client knows that it has received
all possible gestures containing the same touches. Thus, in the one- and
two-touch drag example the one touch gesture will not emit the gesture
construction property until at least the two-touch gesture begin event has
been sent to the client.
The uTouch stack was designed to be flexible and provide for all possible
gesture use cases. However, it is recognized that not all clients will care
about multiple simultaneous gestures. There are plans to create a gesture
subscription option that precludes the ability to have multiple
simultaneous gestures. This will effectively push some policy into the
recognizer, such as a preference for gestures with more touches. This will
be particularly useful when subscribing to gestures on an indirect device,
like a touchpad, where multiple simultaneous gestures are likely not
wanted.
Lastly, uTouch is a complete gesture stack that surpasses the functionality
of all available consumer platforms. uTouch works well with both touchscreens and
touchpads, and supports both gestures and raw touch events in the same
window or region of an application. In contrast, Windows only supports
touchscreens and either gestures or raw touch events, but not both, in a
given window. OS X supports touchpads but not touchscreens. Mobile
platforms are limited to touchscreen support and single-application
gestures at a time due to their modal task design. In contrast to each of
these platforms, uTouch has been designed from the ground up to support all
device types and all known use cases, including multiple applications and
windows at the same time.
The technical architecture of uTouch
uTouch consists primarily of three components: uTouch-Frame, uTouch-Grail,
and uTouch-Geis. Each of these will be described briefly below.
uTouch-Frame groups touches into units that
are easier for uTouch-Grail to operate on. Gestures are recognized
per-device and per-window, so touches are grouped into units representing
pairs of devices and windows. This is also where all backends for each
window system are implemented. uTouch-Frame events are platform
independent.
Some window systems, like X11, also have the concept of touch sequence
acceptance and rejection. This functionality is provided through
uTouch-Frame as well.
Touch sequence acceptance and rejection is a core aspect of the uTouch
stack when used for system-level gestures. Imagine a finger painting
application listening for raw touch events (not gestures) is open on
a desktop
environment where three-touch swipes are used to switch between
applications. When the user performs such a swipe, uTouch accepts the touch
sequences on behalf of the window manager and switches applications. This
prevents the painting application from handling (or even seeing) the
touches. In contrast,
when the user performs a three-touch tap, uTouch rejects the touch
sequences because they do not match a known gesture. The painting
application then receives the rejected touch sequences.
uTouch-Grail is the gesture recognizer of the uTouch project. It takes the
per-device, per-window touch frames from uTouch-Frame and analyzes them for
potential gestures.
Grail events are generated by frame events. Rather than duplicate the
uTouch-Frame data, grail events contain gesture data and a reference to the
frame event that generated it. This allows for uTouch clients to see the
full touch data comprising a gesture.
Grail gesture events are comprised of a set of touches, a uniform set of
gesture properties, and a list of recognized gesture primitives. Again, the
supported primitives are: drag, pinch, rotate, and tap.
The gesture properties are:
- Gesture ID
- Gesture state (begin, update, end)
- A list of touch IDs for the touches comprising the gesture
- The uTouch-Frame event that generated the Grail event
- The original and current centroid position of the touches
- The original and current average radius, or distance from the
centroid, of the touches
- A best-fit 2D affine transformation of the touches from their original
positions
- A best-fit 2D affine transformation of the touches from their previous
positions
- A flag denoting whether the gesture construction has finished
Drag, pinch, and rotate properties are encapsulated by the affine
transformations. For more detail on how to use 2D affine transformations,
please see this
excellent Wikipedia article on transformation matrices.
During operation, a pool of recently-begun touches is maintained. In the
current implementation this pool includes any touches that have begun within the
past 60 milliseconds of "glue" time. When a new touch begins, it is
combined in all possible combinations with touches in this pool in order to
create potential gestures matching any active subscriptions.
A new gesture instance is created for each combination of touches. Each
instance has an event queue, and new instances have one begin event
describing the original state of the touches. The events are queued until
any gesture primitive is recognized. When frame events are processed, any
changes to touches in a gesture instance generate a new grail event. The
new touch state is analyzed, and subscription thresholds and timeouts are
analyzed to determine if any of the subscription gesture primitives have
been recognized. For example, the default rotate threshold is 1/50th of a
revolution, and the default rotate timeout is one half second. If the
threshold is met before the timeout expires, the rotate gesture primitive
is recognized.
When a gesture primitive has been recognized, the grail event queue is
flushed to the client. The client must process the gesture events and make
a decision on whether to accept or reject each gesture.
uTouch-Geis is the C API layer for the uTouch implementation. uTouch
originally began as a private X.org server extension. It has since been
updated, bringing it out of the X.org server and into the client side of
the X11 system. This required a complete rewrite of uTouch-Frame and
uTouch-Grail. However, we have managed to maintain API and ABI
compatibility through uTouch-Geis, albeit with a few behavioral
changes. uTouch-Geis has two API versions, version 1, a simpler interface,
and version 2, an advanced interface. Although both are currently
supported, the first version is deprecated in favor of the more flexible
second version.
uTouch-Geis also makes gesture event control simpler by wrapping much of
the X.org interaction behind an event loop abstraction. The uTouch stack
requires careful management of touch grabs and timers. Any client may use
uTouch-Frame and uTouch-Grail directly, but uTouch-Geis vastly simplifies
incorporating gestures into an application.
See the
uTouch-Geis API documentation for more information.
Toolkit and application development
uTouch-Geis is nice, but its C API is still a bit cumbersome in certain
scenarios. The uTouch team has created a
QML plugin called
uTouch-QML
in
order to make gesture integration in QML applications easier. This plugin
provides native QML elements for subscribing and handling gestures. It
currently uses a legacy gesture handling system in the uTouch stack that
does not provide for gesture accept/reject semantics or simultaneous
gestures, but we plan to update it to include those features over the next
six months.
We also have begun work on a gesture recognition system for the Chromium
web browser. There are many potential gesture interactions that we hope to
leverage in the browser. An initial implementation was proposed, but a
rearchitecture of the gesture plumbing in Chromium required us to refactor
it. We hope to merge an implementation into Chromium in the next few
months.
Conclusion
Over the past two years the uTouch team has been working hard to bring
multitouch gestures to the Linux desktop. We now have a complete stack that
rivals, and in many ways surpasses, what is possible on other platforms. We
look forward to further integration of uTouch gestures in desktop
environments and applications, and we encourage everyone to take a look at
what our stack has to offer.
(
Log in to post comments)