A uTouch architecture introduction

Posted May 24, 2012 13:08 UTC (Thu) by tshow (subscriber, #6411)
Parent article: A uTouch architecture introduction

I may be misreading this, but it seems to me that the major problem of gesture recognition is being punted to the client.

Having spent a fair while doing touch-based work (DS, iOS, effectively similar problems like wii remote gestures and game controller combo recognizers), the main problem beyond gesture detection is determining which gesture the user intended. The classic example is double tap; if you want to accept double tap as a gesture, you cannot process a single tap until you're sure it *isn't* a double tap.

What this means in practice is that unless the double tap's action is compatible with the single tap's action, you have to delay accepting the single tap. In general, this winds up meaning that your interface is laggy; it has to wait for the longest applicable potential gesture before confirming an action.

Unless I'm totally misreading this, uTouch seems to "fix" that by simply shoveling a giant pile of events at the client and saying "meh, you figure it out".

I know this is a hard nut to crack, but surely we can do better than this?

The main solution I've worked with to date on this is in iOS (everything else I've worked on was handrolled gesture recognition built on raw data streams), and the iOS solution isn't great. They let you register gesture recognizers and chain them in priority order, so you can say things like "if you get a tap and it isn't a double tap, give me a single tap" and it will handle the delay internally. They also have a "cancel" mechanism where it will occasionally say "Oops, that touch I told you about a moment ago? It was part of a gesture, you should totally disregard it...".

That system is... ok to work with, I guess. Passable. Usable. That's about as far as I'll go, though.

uTouch doesn't sound to me like an improvement, which is unfortunate.

A uTouch architecture introduction

Posted May 24, 2012 14:27 UTC (Thu) by cnd (guest, #50542) [Link] (1 responses)

You are correct in that uTouch gives you the events and you have to decide what to do with them. However, it is much easier to deal with a "tap" event than to try to detect a tap yourself. Single-touch tap events are fairly simple to recognize, but having uTouch as an abstraction helps. Multitouch tap events, on the other hand, are much more involved.

The other aspect is that a key design goal is leaving total control to the client. You say that iOS may come back at you some time in the future and tell you that a touch was part of a gesture and you should ignore it. At what point do you know that won't happen? When can you commit to an irreversible action based on a touch point?

In order to have that level of control, you have to tell the client a bunch of information and let them decide. There's not much else you can do, unless you only want to cater to trivially simple gesture handling.

A uTouch architecture introduction

Posted May 24, 2012 21:41 UTC (Thu) by tshow (subscriber, #6411) [Link]

Believe me, I'm not trying to defend the iOS model here. It gets a C- at best. Some days it gets a hard F.

My complaint here is that it really *isn't* that hard to detect a tap. Touch point appears, touch point remains within some epsilon of the point at which it first appeared, disappears before some specific time has elapsed. The basic gestures (tap, double tap, pinch/rotate, swipe, move) are all really easy to detect; the scariest thing you need to call is atan2().

I've put a basic touch gesture recognition package together in less than an hour, and I think anyone who isn't terrified of basic vector algebra could do the same; it's not a hard problem. The hard problem is the one this package isn't (?) solving, which is trying to winnow things down to the gestures that the client cares about.

OSX has a weak solution to the problem, which is that gesture recognition "objects" get attached to the touch input stream, and will consume touch events and produce gesture events. It has some basic priority handling, where you can say things like "don't tell me about a single tap unless the double tap fails", but that falls afoul of the input lag problem. uTouch seems to suffer from the same problem holding off on finalizing gestures until all possible matches are in.

Of course, it's quite possible that the input lag problem is intractable in the general case. The problem always comes down to the fact that gestures seem more expressive than they actually are, and the machine can't sense intent. One fundamentally cannot, for instance, tell two simultaneous pinches from two simultaneous overlapping stretches if the two are close to parallel.

If anything, what would be really useful (at least to me) is a substantially more robust gesture system; something with some fuzzy logic in it or somesuch. There was a commercial tool for the wiimote for a while (it may still be out there somewhere) which you could use to do data capture. You would perform a gesture with the wiimote repeatedly, and it would use that data to generate the outline of a gesture; within this T value, the parameters must be within this box/sphere/whatever. You could adjust the slop on it a bit, or play with the bounding curves, the spacial and temporal tolerance, and the result was an arbitrary gesture recognition engine generator.

There's no real difference between accelerometer gesture recognition, mouse position gesture recognition, multitouch gesture recognition, or gamepad combo catchers; it's only a question of the number of parameters.

If I could feed in data saying "here's the signature of an input gesture which I care about, and here's a set of rectangular regions in which I care if it happens", and do so for different gestures in (potentially) overlapping regions, I'd be a happy man.

A uTouch architecture introduction

Posted May 25, 2012 0:49 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

If you look at the event stream for, say, a triple click in GTK, you get all of the events (single, double, and triple). We have this issue in uzbl because if we wire up triple click events, the double click event handler has already been fired. Of course, if there is documentation anywhere on how to handle this situation, directions would be greatly appreciated (I've found GTK's docs could use some examples with some complexity in them; as it is, I'm usually forced to trudging through GTK apps which have behavior that I want for examples).