How to fix an ancient GDB problem
The problem in question, Alves said, has to do with the handling of keyboard interrupts, which normally result from the user hitting control-C. The user's normal expectation is that an interrupt within GDB while the target program is running will stop the program and return the GDB prompt. If, however, that program has blocked the SIGINT signal, the interrupt will never be delivered. At best, GDB will not stop; at worst, the entire debugging session can become stuck and need to be killed from another terminal. GDB users, it seems, tend not to like that behavior.
This problem results from how GDB handles both terminals and interrupt signals. A "session", in the Unix sense, is a set of process groups, all of which share a single controlling terminal. Normally, the debugged process runs in the same session as — and shares the terminal with — GDB, but GDB puts that process into a different process group. Multiple process groups can share a terminal, but only one of those — the foreground group — will receive signals generated by the user at that terminal. GDB normally runs as the foreground group but, when it runs the target program, it designates that program's group as the foreground group instead.
Normally, if the target process receives a SIGINT signal, it will
be intercepted by GDB; that happens as part of how tracing with ptrace()
works. GDB will respond by stopping the target program and putting out a
prompt; the signal is never actually delivered to that program. If,
however, the program has blocked SIGINT then the signal remains
pending; since it is never delivered, ptrace() has nothing to
intercept. That can result in everything getting stuck. There are other
paths to the same situation; sigwait()
calls, for example, can consume pending signals in a way that causes them
to never actually be delivered.
The solution, Alves said, is the same as for any other problem in computer science: add another layer of indirection. In this case, that layer takes the form of a pseudo-terminal (PTY) that is given to the target process rather than the real controlling terminal. GDB then acts as an intermediary between the two terminals. Any output written by the target program to the PTY is simply copied to the real terminal. Input is a bit trickier, since the target can have changed the terminal's modes; GDB has to put the real terminal into raw mode, then copy all of the input from the real terminal into the PTY. When the target is not running, the terminal is put back into "readline mode" for interaction with GDB.
Now, the target can do anything it wants with SIGINT without affecting GDB, which, as the foreground process on the real terminal, can handle events directly. Since that terminal is in raw mode, that means recognizing the interrupt character and responding accordingly. There are other advantages as well; since GDB remains in control of when output goes to the (real) terminal, it can avoid intermixing its own output with that from the target. Another advantage is that GDB is now able to preserve the user's thread selection (the specific thread that debugging activity is focused on) after an interrupt; this wasn't possible before.
There is, he said, an "escape hatch" for anybody wanting the previous behavior; it needs to be there to support other Unix systems in any case.
There were a few other remaining problems, he said. The first process in the foreground process group is considered the "session leader" by the kernel; if that process exits, then its children will be sent a SIGHUP signal. Most applications are not prepared for that and will be killed as a result. Now that the target has its own terminal, it becomes the session leader once it starts. If that process forks and exits, its child processes are likely to meet an untimely end — not the debugging experience that the user is likely to have had in mind.
The solution in this case is a variation on the double-fork technique; before launching the target, GDB will fork twice, with the first process doing nothing but waiting. It will become the session leader; since it doesn't exit, no SIGHUP signals will be generated if the target does surprising things.
GDB still has to be able to stop programs that block SIGINT; for obvious reasons, it cannot use SIGINT for that purpose. The solution here, he said, is to use SIGSTOP, which cannot be blocked, instead.
As is often the case, Emacs users present their own special challenges. Emacs uses control-C for its own purposes, and remaps SIGINT to control-G instead. In cases like this, the user almost certainly wants control-C to be passed through to the target. The answer is a GDB command that allows the user to specify which key should interrupt the process and return to the GDB prompt.
This patch was first prototyped in 2019, but didn't make it to the GDB list until 2021. There were a few problems that turned up at that point, including the session-leader difficulty. Those have all been resolved, and Alves intends to post the patch set again sometime soon. His objective, he concluded, is to post it at least once per year until the problem is finally solved.
[Thanks to LWN subscribers for supporting my travel to this event.]
Index entries for this article | |
---|---|
Conference | GNU Tools Cauldron/2022 |
Posted Sep 29, 2022 22:41 UTC (Thu)
by dullfire (guest, #111432)
[Link] (2 responses)
Anyhow, This is sure to make my debugging time less of "fury/anguish at my debugging tools". Thanks for the explanation and write up.
Posted Sep 30, 2022 7:36 UTC (Fri)
by fw (subscriber, #26023)
[Link] (1 responses)
Posted Sep 30, 2022 13:17 UTC (Fri)
by gray_-_wolf (subscriber, #131074)
[Link]
Posted Sep 30, 2022 2:53 UTC (Fri)
by nyanpasu64 (guest, #135579)
[Link] (3 responses)
Also is "a GDB command that allows the user to specify which key should interrupt the process and return to the GDB prompt" already present, or will it be added in the rewrite?
Posted Sep 30, 2022 15:58 UTC (Fri)
by ballombe (subscriber, #9523)
[Link]
Unlikely. Usually 'graceful shutdown' is implemented by forking and waiting for the child to terminate (or crash).
Posted Oct 3, 2022 18:30 UTC (Mon)
by palves (guest, #91099)
[Link] (1 responses)
Back when v2 was posted to the list last year, someone tested it with PipeWire specifically, and confirmed it works there.
From https://inbox.sourceware.org/gdb-patches/1c54ccee2e4a2980... :
>> "I was recently fixing some bug in Pipewire. To be exact, I was working on
..
> Also is "a GDB command that allows the user to specify which key should interrupt the process
It does not exist yet. It will be added in the next version of the patches.
Posted Nov 18, 2022 21:04 UTC (Fri)
by Hi-Angel (guest, #110915)
[Link]
That was me. The patchset still haven't been merged…? Oh, my :c
Posted Sep 30, 2022 7:53 UTC (Fri)
by rwmj (subscriber, #5474)
[Link] (3 responses)
Posted Sep 30, 2022 13:19 UTC (Fri)
by gray_-_wolf (subscriber, #131074)
[Link]
Posted Sep 30, 2022 13:51 UTC (Fri)
by fw (subscriber, #26023)
[Link] (1 responses)
Posted Oct 4, 2022 11:52 UTC (Tue)
by palves (guest, #91099)
[Link]
When you attach to process running in another terminal, and hit Ctrl-C in the terminal running GDB, that Ctrl-C is turned into a SIGINT sent to GDB. So that half of the problem does not exist in that scenario, GDB sees the SIGINT first, not the inferior. However, currently, in "I am attached" scenario, GDB forwards that SIGINT to the target process, using plain "kill(pid, SIGINT)", and then relies on ptrace intercepting that SIGINT. If the target process blocks SIGINT, then you're back to square 1. To get that scenario working properly, we stil need part of the proposal in place, specifically the part about pausing the target process in a different way, with SIGSTOP.
Posted Sep 30, 2022 8:24 UTC (Fri)
by kleptog (subscriber, #1183)
[Link] (1 responses)
I'm only half joking.
Posted Sep 30, 2022 14:43 UTC (Fri)
by leromarinvit (subscriber, #56850)
[Link]
Since the tty code can't be built as a module, the only other way to achieve what I wanted would either require rebuilding the kernel (which I'm usually too lazy for) or ugly runtime patching.
Posted Sep 30, 2022 14:13 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (5 responses)
Posted Sep 30, 2022 14:46 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (3 responses)
I've worked on a codebase that blocked SIGINT whenever it entered a critical section, and unblocked it afterwards. This was a poor way to avoid being killed by the end user hitting Ctrl-C to escape the program at an inopportune moment (not least because it didn't account for Ctrl-\ or Ctrl-Z), but it worked well enough for the use case in question, since users just spammed Ctrl-C until it died.
A proper fix was to replace this with a signal handler, and teach the critical sections to check to see if there had been a SIGINT while they were in progress (exiting immediately once you were out of the critical section). But that took maintenance work being done for other reasons - it wasn't something that was going to get fixed until I had to do some work to port to a new OS.
Posted Sep 30, 2022 15:15 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (2 responses)
Yes, that is how my project does it too. Just adds a signal handler that set some flag, and check the flag when leaving the critical section. Much nicer than blocking SIGINT.
Posted Sep 30, 2022 15:17 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Yes, but this also requires competent programmers, as opposed to people who flail around until they find something that fixes the reported bug (users killing the program with Ctrl-C when it can't safely be interrupted without triggering a full DB recheck).
Posted Oct 3, 2022 18:06 UTC (Mon)
by iabervon (subscriber, #722)
[Link]
Posted Sep 30, 2022 19:52 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
* Make a signalfd(2) and add it to your main event loop's select(2)/poll(2)/etc. call.
You might also block signals for a short time while the thread or process is holding a mutex, in cases where this could cause deadlock or violate an invariant.
Frankly, there is absolutely nothing in either POSIX or Linux to suggest that blocking signals is "wrong" so long as the process actually responds to them in a valid and appropriate way. SIGINT means "please stop what you're doing," not "please push a weird extra frame on the stack so the debugger can break in."
How to fix an ancient GDB problem
The segfault when running GDB under GDB might come from Guile support, or more precisely, from the Boehm-Demers-Weiser garbage collector library Guile uses: in some version, that library probes mapping boundaries, deliberately triggering segfaults, rather than enumerating shared objects using the How to fix an ancient GDB problem
dl_iterate_phdr
function. If that's indeed the problem you could build GDB without Guile support, or apply this garbage collector patch.
How to fix an ancient GDB problem
How to fix an ancient GDB problem
How to fix an ancient GDB problem
This means that you are running gdb against the wrong program, so of course you do not get anything useful.
You need to tell gdb to follow the fork.
How to fix an ancient GDB problem
> (like PipeWire and Audacious), pressing Ctrl+C will trigger the app's graceful shutdown process
> (terminating execution) rather than breaking into the debugger like I want. Will the changes to gdb fix this issue?
>> `pipewire-media-session` (one of daemons that Pipewire is comprised of). Process
>> of debugging basically included setting breakpoints somewhere, then hopefully
>> triggering the breakpoint, then inspecting the state. Nothing unusual.
>>
>> So, the thing that was slowing me down and annoying is that I couldn't just
>> pause the debuggee through the usual means of ^C, then add a breakpoint.
>> Pressing ^C was causing the process to exit!
>>
>> Pipewire project seems to have a number of various daemons, and all of them set
>> SIGINT handlers (judging by git-grepping for SIGINT). So basically, it is hard
>> to debug Pipewire with GDB due to GDB not being able to interactively pause the
>> process.
>>
>> Now, I tested the patchset, and I confirm it solves the problem! GDB built from
>> the palves' branch pauses pipewire-media-session correctly on ^C! "
> and return to the GDB prompt" already present, or will it be added in the rewrite?
How to fix an ancient GDB problem
How to fix an ancient GDB problem
How to fix an ancient GDB problem
How to fix an ancient GDB problem
How to fix an ancient GDB problem
> operation hasn't really suffered from these problems, as far as I understand it.
How to fix an ancient GDB problem
How to fix an ancient GDB problem
why blocking sigint ?
I can understand installing a signal handler, but blocking it outright ?
This does not seems very user friendly.
why blocking sigint ?
why blocking sigint ?
why blocking sigint ?
why blocking sigint ?
why blocking sigint ?
* Make a dedicated "handle the signals" thread, block all the signals you care about (including SIGINT), and call sigwaitinfo(2) from that thread.