May 4, 2011
This article was contributed by Michael Stapelberg
A few weeks ago, the initial
announcement of x11vis, an X11 protocol visualizer, was posted to the xorg
mailing list. Only few people are developing low-level
X programs these days (think of xkill, xwininfo, etc.) and all these tools
actually work. Why would anyone need a tool to visualize X protocol traffic?
In a way, x11vis is similar to tcpdump: both are wire-level analysis tools —
tcpdump shows network traffic while x11vis shows X traffic. Even though most
people are not network engineers, from time to time it comes in handy to check
if a web application is really using SSL.
X basics
When an X client, say Firefox, connects to the X server, it first has to
authenticate itself. As soon as the connection is established, data flows in
two directions: the client can only send requests while the server sends
replies (to those requests), errors, and events.
Let's have a look at the very basic task of creating an empty window on your
screen. The client starts by sending a CreateWindow request which
initializes a specified X ID (an unsigned 32-bit integer) with a
position/size, a parent window, border, etc. Afterwards, properties such as the
window title or icon are set with the ChangeProperty request. To
actually display the window on your screen, a MapWindow request is
sent.
These are the requests which the client sends. None of these actually have
replies, but any request can generate an error — for example if you pass an
invalid parent window in the CreateWindow request. After the window
has been mapped (made visible on your screen), the client will receive a
MapNotify event. Other often used events include things like
KeyPress and ButtonPress, which are generated by keyboard
and mouse input.
Before x11vis: xtrace
Before x11vis, the standard tool for analyzing X applications was xtrace. Like strace, it prints a
textual representation of what happens. While that works
just fine and fits well in the Unix text world, it is not easy to use for
analyzing problems for several reasons:
- The vast amount of plain text output is very hard to understand or even to
navigate in. Each line of xtrace output starts with a number representing the
connection which is used for this particular packet (Firefox could be number 1,
GVim number 2, and so on). The rest of the line contains a full dump of the
packet, including all data. For some requests, this data can be more than I can
fit on my 1280x800 screen.
- In the X protocol, a lot of IDs are used. There are IDs for windows,
atoms, fonts, pixmaps, and so on. xtrace translates atom IDs to human-readable names,
so you will see "UTF8_STRING" instead of 0x113. However, such
a translation is completely absent for window IDs. Analyzing X sessions with
more than one client quickly becomes difficult.
- While it is possible, hiding specific events is tedious in a text
editor. When debugging a real-world problem, you are usually not interested in
packets such as InternAtom or PropertyNotify.
- A user might want to display information related to information in the
packet that is
currently being inspected (for
example all events for a affected window, not only the
CreateWindow packet). This is naturally not currently
possible with xtrace, as it presents only textual, non-interactive output.
Inside x11vis
x11vis strives to be better in those areas outlined above. It consists of two
main parts: the so-called "interceptor" and the GUI. The interceptor is a Perl
daemon that implements a proxy between your client(s) and the X server,
dissecting all packets that are sent through it and dumping them into a JSON
file. The code that dissects the raw bytes into a nice data structure is
auto-generated from the XML protocol description in xcb-proto. The GUI
parses this JSON file and displays all packets in a well-arranged fashion.
The GUI is not a stand-alone application, but is implemented as a web application
using jQuery. This decision was made because building the GUI on top of the
HTML Document Object Model (DOM) with CSS is a lot quicker than writing custom
widgets in Qt or GTK (in terms of development time). Also, it makes x11vis
easily usable on computers on your local network, which is a common setup when
debugging X problems.
Example: Comparing XCB and Xlib
I mentioned XCB as the project which includes the XML protocol
description. XCB stands for X C-language Bindings and is the successor of Xlib.
By automatically generating the bindings from the protocol description, XCB
achieves multiple goals. First of all, every function has a predictable name and
by using xcb_ as a prefix, and it does not clutter the namespace
(unlike
Xlib with types such as Font and Display). More
importantly, XCB does not hide the asynchronous nature of the X protocol from the
programmer. When a typical X application starts, it has to request the Atom
IDs for a number of atoms, say 20. With Xlib, there is the XInternAtom
function that returns the ID for a given name. XCB instead provides two
functions: xcb_intern_atom() and xcb_intern_atom_reply(). The
former returns a cookie which you pass to the latter to get the actual result.
The idea is that you place your requests as early as possible, do
something else, then fetch all the replies.
A good example of both XCB's asynchronous nature and x11vis is analyzing the
xwininfo(1) program. By starting:
xwininfo -id 0xf00 -children
the program will first query the given window (an iceweasel window in this
case) for all of its children and then request some properties for every child.
The screenshot above shows the x11vis output when using xwininfo 1.0.5, which
uses Xlib. On the left, you can see all the requests and replies, organized in
bursts. As Xlib is blocking, each burst contains only one packet.
Compare the Xlib shot
to the one above, where xwininfo 1.1.0 uses XCB to talk to the X server. While
you can still identify three round-trips, you can see that the burst on the
bottom of the screenshot contains requests for different information of more
than one window.
You can see that in the first burst, x11vis displays "Iceweasel" instead of the
window ID 0xf00, even though that information is only available later on. Also,
the description of each packet is a short representation of the most important
facts. The GetGeometry reply is labeled "(3362, 1112) 155 x 21" and
can be expanded by clicking on it. In the xtrace output, the equivalent is the
following line:
000:>:0003:32: Reply to GetGeometry: depth=0x18 root=0x000000be \
x=3362 y=1112 width=155 height=21 border-width=0
Example: Identifying a race condition in i3-wm
x11vis has been used multiple times to solve real-world X problems. For
example, in the i3 window manager, there was a problem with themed mouse
cursors: they would not show up on the very first window decorations that were
created around already existing windows, but only on window decorations for
windows that were created later on.
We know that the problem is related to creating windows and setting the cursor
for these windows. Therefore, I started by scrolling down to the first
CreateWindow request and checked if there were any X errors (pink
background in x11vis). And in fact, as you can see on the screenshot, there is
one corresponding Error packet for every Request trying to use the themed
cursor. You can see the bad_value of the Cursor error being c_0
(unnamed cursor 0) which is precisely the cursor ID we are setting in the
ChangeWindowAttributes request above.
I then used the search function of my browser to see where c_0 was
actually initialized. The location of the CreateGlyphCursor request
for c_0 was actually after the X errors. Now this
explains the symptom, but in the code, the order is correct: First, the
cursor is initialized (line 291), then the
existing windows are handled (line 425). Having a closer look at
the burst reveals that the cursor initialization is actually sent via
the separate Xlib connection instead of the main XCB connection. As
both connections buffer, my next guess was that the code neglects to
flush the Xlib connection. It turns out, the
guess was correct.
This bug was found in a short time due to two factors. On the one hand, X
errors can be spotted very easily in x11vis. On the other hand, distinguishing
the different connections requires only a quick glance to the top of each burst.
Conclusion
In this article, I explained how x11vis tries to help X developers: it
visualizes the X protocol on wire level, providing some helpful features like
markers, as well as folding or mapping human-readable names to connections and X IDs.
x11vis is still a young project and is looking for contributors. If you want
to help making x11vis become a better tool for you, please do not hesitate to
contact me at michael@x11vis.org or
go get
the source and documentation at the project web site.
(
Log in to post comments)