User: Password:
|
|
Subscribe / Log in / New account

Development

An update on GDB

By Jake Edge
April 16, 2014
Collaboration Summit

In an amusingly titled ("Not Just Software Botox: Rejuvenating GDB") talk at the 2014 Collaboration Summit, Stan Shebs of Mentor Graphics presented the history of the debugger along with efforts and plans to update the tool. At 28 years old, GDB is one of the oldest free-software projects that is still in use. Partly due to its age, it needs renovation of various sorts, he said.

History and background

The oldest surviving version of GDB's source code is 2.0 from 1987, though it has a copyright of 1986 in the code. It ran natively on Motorola 68K (Sun 2 and 3) and DEC VAX systems and consisted of 25K lines of C. The current version is 7.7, with 35 target architectures (running both native and cross-architecture). It has 700K lines of C code, plus 23K lines of tests in the test suite. The basic command set is largely the same between those two versions, though essentially all of the code has changed in that time.

[Stan Shebs]

GDB is the standard debugger for Linux, and a commonly available one for the rest of the Unix family. It is also the most widely available debugger for embedded processors. There is a good chance, Shebs said, that most of the devices in the room have a GDB stub (the code that handles the remote GDB protocol) stored in their flash or ROM somewhere, though it may be disabled. GDB is also the standard debug engine for the Eclipse integrated development environment (IDE). Because of its long history, GDB contains the details of changes to many system internals (e.g. chip instruction sets, operating system and compiler behavior) over the years.

There are certain requirements that the project needs to continue to fulfill as it makes changes, Shebs said. GDB needs to be able to control and examine programs built from low-level languages. It needs to debug programs at the source level and to debug optimized code. It also needs to do both native and cross-debugging. The project also has to follow Free Software Foundation (FSF) policies, "even if we don't like it". For example, the project was told that it must switch to the GPLv3.

There are a set of "non-requirements" as well, he said, some of which would be "nice to haves", but aren't hard and fast requirements. There is no push to support proprietary compilers; GDB does, but it doesn't have to. Nor does it need to target 16-bit architectures. There is no requirement to structure it as a library or set of components, nor to match the competition on features. It does not need to have a GUI. Perhaps surprisingly, it is not required to work with IDEs, though there would be "much howling from Eclipse" if the project stopped supporting that particular IDE.

Rework

There have been several rework efforts on GDB in the past, Shebs said. The first was in 1990 and is remembered by almost no one who still works on GDB; it added the BFD library to read executables and object files, rather than hardcoding it in the debugger itself. 1991 saw the addition of the target vector that separated out the handling of operating-system-specific interfaces needed to target a particular type of system (e.g. file targets, remote stub targets, etc.).

In 1999, asynchronous event handling was added to GDB, so that local and remote events could be handled simultaneously, without one blocking the other. That year also saw the beginning of the "architectural object" effort. It took four to five years to get each supported architecture into that new object-oriented model. The project moved from a snapshot-based development model to a public CVS tree in 1999 as well. In 2003, a move to object-oriented frame objects was made. This replaced the simple stack-frame-pointer tracking that was done earlier with more detailed tracking.

Both of the object-oriented changes foreshadow one of the bigger rework projects that is planned: moving from C to C++. GDB is too large for C, Shebs said. The project has introduced features that simulate C++ (e.g. target vector, architecture and frame objects) over the years. C++ is not so chaotic and non-portable any more, and its overhead is not really a problem now either, so it is a reasonable choice. The idea has been discussed since 2008, with a "rough consensus" in favor of it forming recently. There have been some changes made to smooth the path toward C++, but no one-way changes have been made yet.

Another effort is to document the internals of GDB. There is an internals manual that was started by John Gilmore and worked on by Shebs along the way, but it is well out of date at this point. The existing manual will be abandoned and the information will be moved to the GDB wiki. The plan is to identify widely used code as an "internal API" and then to use Doxygen to build a web-based manual of that API.

There was a complaint from the audience about Doxygen-generated documentation being rather "sterile". But Roland McGrath said that problem is not a tool issue exactly. Doxygen doesn't solve all of the problems with documentation, he said, but it makes it easier for people to do the right thing. Shebs noted that it allows developers to mark certain sections of the comments to become part of the documentation; it will also cross-reference symbols for the entries.

Finding the commonalities between GDB and the gdbserver (the remote debugging server) and factoring them out into a separate library is another planned rework. Both GDB and gdbserver call ptrace() to control the process being debugged, but they currently do it with separate code. In addition, handling the remote protocol is done by both, separately. Moving that all to a common library will not only reduce code duplication, it will also expose any assumptions that the native debugger side has, Shebs said.

The original assumption of there being just a single process to debug has long gone by the wayside, but there is still work to do to support multiple processes (and threads) on multiple cores in modern systems. The original single process to be debugged was known as the "inferior" process (it was being debugged, thus had bugs, thus it was inferior, he said with a chuckle). Since then, inferior objects have been added to track processes. In addition, "address space" and "program space" objects have been added. They are not particularly interesting for Unix processes, as there is just one of each, but there are uses for multiple programs in the same address space.

Multiple processes on multiple cores lead to a number of race conditions that need to be dealt with. For example, the "breakpoint dance" that occurs when the user steps over a breakpoint is race-prone. The breakpoint must be disabled, then reenabled after the step, which leaves a window where other processes may not break correctly. Inferior objects and threads must be added to the user interface as well, so that one could, for example, break on a particular function only in certain threads.

Python scripting was added to GDB in 2008, while Guile scripting was just added this year. It is useful for application-specific higher-level commands and it reduces the call for C code to be added into GDB, he said. But scripts are "not really a solved problem" as they are complicated by the control algorithms in the debugger.

Git, maintainers, and politics

GDB has moved to Git. The original CVS repository contained GDB, binutils, Cygwin, and other projects from what was originally the Cygnus source tree. When considering the move to Git, there was the question of what to do with the other projects. GDB decided to convert the binutils and GDB parts of the tree and leave the others to fend for themselves. Red Hat contributed a sanitized repository that had lots of history before the CVS tree became public. Tom Tromey did a bunch of scripting to convert the CVS history into Git. Some pre-public revisions were lost, but there is now a single Git repository for GDB and binutils. That work was completed toward the end of 2013.

Originally, there was a single maintainer for GDB; Richard Stallman was the first, and there were others, including Shebs, along the way. By around 2005, though, there were global and area maintainers, but no single person was in charge of the whole project. Decisions were more consensus-driven, but that meant the project was somewhat less decisive as a whole. A GDB steering committee was formed in 2000, but it never made any decisions. Stallman disbanded it in 2012 and "nobody mourned its passing", Shebs said.

On the issue of politics, Shebs said that he had no interest in dishing dirt on individuals, but that there were some issues that had come up over the years that were worth mentioning. GDB originally started out in the "cathedral" model of development, with no public repository. Moving to the public CVS tree was meant to combat that and move to a more "bazaar" style of development. That has been mostly successful, he said, pointing to the few forks that there have been over the years as evidence.

There is ongoing tension between experimentation and stability. It is sometimes hard to make changes because it could break for some users on some random language and operating system combination. That tends to make developers conservative, but GDB has moved toward a more experimental model. The project is willing to risk breaking 10% of its users to make progress.

But backward compatibility is one area where it needs to step lightly. If GDB breaks the remote protocol, lots of people get unhappy quickly. That means adding new packets rather than changing the old ones. In addition, the API that Eclipse uses is not tightly specified and the IDE depends on some undocumented behavior, which means that those are areas where changing the code must be done carefully.

Retaining old code, even for systems that no longer work at all, has been a source of some tension in the project. There is anxiety about removing the old, dead code and configurations, he said, but the project now does so routinely. There used to be some issues between hobbyist/volunteer developers and those who are paid to work on GDB, but that is mostly in the past. These days, most who are working on GDB are paid to do so.

Shebs's last point was about responding to the competition. He noted that the LLVM debugger (LLDB) is far behind where GDB is, so it does not really provide much competition. Other debuggers, such as TotalView, are focused on niches (e.g. high-performance computing) rather than being general-purpose, so again those are not spurring much feature development.

It was good to get a nice overview of the state of GDB today, as it has been some time since that kind of report has come to our attention. It is a project that many use, often daily, but somehow doesn't generate the attention that tools like GCC seem to garner. The project seems healthy and headed in a reasonable direction so that it will likely be the Linux debugger of choice for many years to come.

[ Thanks to the Linux Foundation for supporting my travel to the Collaboration Summit. ]

Comments (5 posted)

Brief items

Quotes of the week

Of course, dog:Greyhound and dog:Chihuahua are still prevented from eating cat_chow:Siamese by type enforcement, even if the MLS type Greyhound dominates Siamese.
Dan Walsh in The SELinux Coloring Book, which is now available under CC-BY-SA terms.

Now, consider that we're operating in an environment where multi-billion dollar companies are relying on our software while making only relatively small contributions to its ongoing support and evolution, and where we have multiple prominent community members wishing vocally (and encouraging others to advocate) for the core development team to devote our volunteer efforts to improving a legacy language rather than the new one we shifted en masse to working on instead. (Note that the latter actually makes about as much sense to me as telling the Rust and Go developers they should spend their free time working on C compilers instead because the latter would be more immediately useful to commercial users)
Nick Coghlan

Whenever git-forecast-log reapplies a history, the --stroke-fear-submodule flag can be used to filter-branch a history for the commit that is archived by an automatic origin, so the --cancel-index option can be used to filter-branch a commit for the submodule that is set by a temporary stash. Provided that RACE_CHANGE is not pruned, after a git-ravage-history (grepped by git-fashion-log or git-individualize-base) names an origin, successfully returned areas are annotated for the user, and upstreams that were counted during reflogging are left in a committed state.
— A statistically-unlikely-to-be-helpful selection randomly served up by the git man page generator.

Comments (none posted)

cmocka 0.4.0 released

Version 0.4.0 of Andreas Schneider's cmocka unit-testing framework for C has been released. The update adds support for collecting unit tests into groups, and makes several improvements to error messages.

Comments (none posted)

GNU Guix 0.6 released

Version 0.6 of the GNU Guix package manager for GNU is now available. The main feature highlighted in this release relates to testing Guix: "This release comes with an updated QEMU virtual machine image that shows preliminary work toward building a stand-alone GNU system with Guix. The image uses the GNU Linux-Libre kernel and the GNU dmd init system, and runs X11. It may be used primarily to try out Guix and dmd."

Full Story (comments: none)

Benetl 4.5 for PostgreSQL available

Version 4.5 of the Benetl extract, transform, and load (ETL) utility for PostgreSQL databases has been released. Notably, the update moves to Java 7, in addition to various bug fixes and functional update

Full Story (comments: none)

EasyTAG 2.2.0 available

Version 2.2.0 of the EasyTAG audio-file tag editing tool has been released. This version moves to GTK+3 as the default toolkit, adds support for GIF images in tags, fixes a truncation error that happened when saving Vorbis audio, and adds support for the Opus audio codec, among other changes.

Full Story (comments: none)

Cinnamon 2.2 released

Version 2.2 of the Cinnamon desktop environment is out. New features include a lot of improvements to the settings dialogs, tweaks to the "hot corners" and heads-up display mechanisms, better high-resolution display support, and more.

Comments (15 posted)

GCC 4.9.0 release candidate available

The GCC 4.9.0 release candidate is available for testing; the final 4.9.0 release is expected to happen on April 22. The list of new features in this release is quite long; see this page for details.

Full Story (comments: 9)

Emacs 24.4 pretest available

The first "pretest" release for the forthcoming Emacs 24.4 release is now available from the GNU FTP server. Such releases are akin to betas and release candidates in many other projects; Glenn Morris indicates that six or so should be expected between now and the final 24.4 release.

Full Story (comments: none)

WordPress 3.9 is available

Version 3.9 of the WordPress blog framework and content-management system (CMS) has been released. The update, codenamed "Smith," introduces a suite of updated visual editing tools, improved multimedia support (such as audio and video playlists), and live previewing of some theme-editing options.

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (1 posted)

Notes from the Python Language Summit

Guido van Rossum has posted his notes from the just-concluded Python Language Summit in Montreal. "We (I) still don't want to do a 2.8 release, and I don't want to accelerate 3.5, but I do think we should make things better for people who have to straddle Python 2 and 3 in a single codebase, by developing more tools, and by security and possibly installer updates to 2.7 (PEP 466)." See the thread for notes from other participants as well.

Full Story (comments: 17)

Numerical Python (Linux Journal)

The Linux Journal digs into the NumPy Python extension. "The key element that NumPY introduces is an N-dimensional array object. The great flexibility of Python lists, allowing all sorts of different types of elements, comes at a computational cost. NumPY arrays deal with this cost by introducing restrictions. Arrays can be multidimensional, and they must all be the same data type. Once this is done, you can start to take some shortcuts by taking advantage of the fact that you already know what the type of the elements is. It adds extra overloading functions for the common operators and functions to help optimize uses of arrays."

Comments (10 posted)

Wirzenius: development as performance art

To celebrate his thirty years as a programmer, Lars Wirzenius has undertaken a "performance art" development project, during which all interested observers can watch his terminal as he writes an as-yet-undisclosed software tool from scratch. "It'll be something I have wanted to have for a while, but I'm not saying beforehand what it will be. For me, the end result is interesting; for you, the interesting part is watching me be stupid and make funny mistakes." The show, so to speak, begins on April 18 at 09:00 UTC. No word yet as to where the adventurous can wager on the outcome....

Comments (none posted)

Page editor: Nathan Willis
Next page: Announcements>>


Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds