|
|
Log in / Subscribe / Register

Conill: Rethinking sudo with object capabilities

Ariadne Conill is exploring a capability-based approach to privilege escalation on Linux systems.

Inspired by the object-capability model, I've been working on a project named capsudo. Instead of treating privilege escalation as a temporary change of identity, capsudo reframes it as a mediated interaction with a service called capsudod that holds specific authority, which may range from full root privileges to a narrowly scoped set of capabilities depending on how it is deployed.


to post comments

Useability is not good

Posted Dec 14, 2025 9:34 UTC (Sun) by NHO (subscriber, #104320) [Link] (24 responses)

I understand the intent. But.

No one will use it until all command line capability selection is automagiced away. Too much to write. Usable in scripts, not usable online.

Also, what's deploy/global configuration story?

Useability is not good

Posted Dec 14, 2025 18:01 UTC (Sun) by nix (subscriber, #2304) [Link] (23 responses)

Nobody will use it because... it's a command-line tool, just like sudo is? Why would you ever want a privilege escalation tool to be usable "online" -- from the web? it's not clear what you even mean, but remote privilege escalation is not a thing people usually *seek*.

Useability is not good

Posted Dec 14, 2025 18:37 UTC (Sun) by NHO (subscriber, #104320) [Link] (22 responses)

By online I didn't mean "from the internet", sorry.
I meant "Interactively".
It's unlikely to be used *as is* because there's a lot of missing steps between it and being comfortable for the user. It's proof-of-concept, not even minimal viable product.

Also, examples from article should be in README. Probably with capsudod.1 and capsudo.1

Useability is not good

Posted Dec 15, 2025 1:43 UTC (Mon) by cytochrome (subscriber, #58718) [Link]

It is clear from the blog that this is a work in progress and that the creator of capsudo is exploring various ideas. There was no implication that others should be using the tool at this early stage.

Useability is not good

Posted Dec 15, 2025 18:02 UTC (Mon) by nix (subscriber, #2304) [Link] (20 responses)

It seems to me to be a foundational building block from which authentication systems can be built as much as it is a full-blown sudo replacement -- but it does show that you don't need very much to implement something at least as expressive as sudo, but without the horrifying EBNF-grammar-laden manpage and wildly irregular language full of nasty traps. (I mean I use sudo all the time, but, *shiver*).

Useability is not good

Posted Dec 16, 2025 9:30 UTC (Tue) by taladar (subscriber, #68407) [Link] (19 responses)

I wonder if it would make sense to use a more standardized file format for security tools like sudo, and by that I mean some sort of relatively minimal but widely used format like JSON (not YAML, that one is way too complex) which is unlikely to have security issues and is well-understood in its low level details by pretty much everyone.

Useability is not good

Posted Dec 16, 2025 13:57 UTC (Tue) by mbunkus (subscriber, #87248) [Link] (18 responses)

Having read & participated in way too many bug reports & discussions about how numbers work in JSON lets me doubt the "well-understood by everyone" part.

Additionally, standard JSON doesn't support comments, making it completely unsuitable for anything meant to be edited by humans in my opinion.

Useability is not good

Posted Dec 17, 2025 9:09 UTC (Wed) by taladar (subscriber, #68407) [Link] (17 responses)

Comments can trivially be added to a JSON file format by specifying that any object can have a "comment" key or something along those lines. The point was that it would make sense from a security perspective to have a well-tested parser instead of a completely custom one and ideally a file format that doesn't have 27 types of strings like YAML.

Useability is not good

Posted Dec 17, 2025 13:26 UTC (Wed) by pizza (subscriber, #46) [Link] (15 responses)

> Comments can trivially be added to a JSON file format by specifying that any object can have a "comment" key or something along those lines.

That makes said comments _data_, not _documentation_.

> The point was that it would make sense from a security perspective to have a well-tested parser instead of a completely custom one and ideally a file format that doesn't have 27 types of strings like YAML.

Comments in Javascript (ie the 'J' in 'JSON') are already well-defined. There was no need to reinvent any wheel here, and whether or not the syntax supports comments is orthogonal to how well-tested it is.

Useability is not good

Posted Dec 17, 2025 13:48 UTC (Wed) by mbunkus (subscriber, #87248) [Link] (14 responses)

> Comments in Javascript (ie the 'J' in 'JSON') are already well-defined. There was no need to reinvent any wheel here, and whether or not the syntax supports comments is orthogonal to how well-tested it is.

The JSON standard that's actually called JSON (aka ECMA 404 aka RFC 8259) does _not_ allow for comments and several other things that would make using it easier for humans (e.g. trailing comma after last array/object element, multi-line strings with \n instead of \\n) Most JSON parsers out there do fail to parse JSON documents that contain JavaScript-style comments.

There are other standards/projects with similar names such as JSON5 that _do_ have those features, or jsonc which is base JSON with comments. But that's not JSON.

And neither is JavaScript.

Sorry for being pedantic here; this is just a pet peeve of mine. In my opinion there's simply no really good text configuration format. Some of my objections:

- JSON: lack of support for comments, multi-line strings, trailing commas
- JSON: no (consistent) support for 64-bit integers (or larger) in a lot of parsers
- YAML: huge complexity due to all the features
- YAML: security implications due to object type thingies
- YAML: way too lenient with string values & trying to auto-guess data types resulting in a lot of surprising conversions that highly depend on the parser used
- YAML: incredibly easy to screw up the format for inexperienced authors
- YAML: sub-par tooling support due to lack of structural information in the format itself
- TOML: nested hashes requires repeating all upper-level key names over & over again (e.g. "[settings]" → "[settings.auth]", "[settings.database]" etc.)
- TOML: lack of wide-spread language support
- XML: attributes vs data in child elements
- XML: easy to create structures that don't map 1:1 into array/hash hierarchies
- XML: without external type information it's impossible to know what's supposed to be array-like & what isn't
- XML: even less type information than any of the others

Even despite all of its drawbacks I tend to use YAML a lot more than JSON for anything that a human has to touch semi-regularly (e.g. Ansible stuff), simply due to basic JSON being so anti-maintenance.

Useability is not good

Posted Dec 17, 2025 14:09 UTC (Wed) by pizza (subscriber, #46) [Link]

> Even despite all of its drawbacks I tend to use YAML a lot more than JSON for anything that a human has to touch semi-regularly (e.g. Ansible stuff), simply due to basic JSON being so anti-maintenance.

Yep, that level of "anti-[human-]maintenance" is my fundamental beef with JSON...

Useability is not good

Posted Dec 18, 2025 9:05 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

Where YAML really completely breaks is when you have to do any kind of templating of the file. Significant whitespace is just a really bad fit for that. Not only is it basically impossible to read but it is also very close to unwritable to anyone who isn't a masochist.

Useability is not good

Posted Dec 18, 2025 12:09 UTC (Thu) by mbunkus (subscriber, #87248) [Link]

100% agree. I really hated that back when I was still using SaltStack. In SaltStack, unlike Ansible, the templating happens on the "whole file" level of the YAML rules/roles, meaning it'll be interleaved with regular YAML stuff — often completely breaking formatting, linting etc., as you said.

For example, in SaltStack you could do something like this to distribute two files:

{% set file_list = [ 'vimrc', 'bashrc' ] %}
{% for item in file_list %}
/etc/{{ item }}:
  file.managed:
    - source: salt://{{ item }}
{% endfor %}

In Ansible templating can only be used in YAML values, though, meaning they're always part of a YAML string. In order to provide basic loops & conditions Ansible itself has special hash keys it recognizes, evaluating the template code in the corresponding values & making the decision based on it. For example:

- ansible.builtin.file:
    src: "files/{{ item }}"
    dest: "/etc/
  loop: "{{ file_list }}"
  vars:
    file_list: [ 'vimrc', 'bashrc' ]
Whole-file is obviously much more powerful as you have a Turing-complete templating language at your disposal, but dealing with all but the simplest cases becomes a real pain.

Useability is not good

Posted Dec 18, 2025 16:03 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

It frustrates me that nobody seems to know that textproto exists. That's the text serialization format of protocol buffers.[1] In general, it has the following advantages:

* Bindings for quite a few major languages.
* No semantic type ambiguities (schemas are mandatory, so the type of every field is known in advance). This is also used to propagate static type information into the language bindings (by generating per-schema serialization code).
* Few/no syntactic type ambiguities (strings are quoted, floats have decimal points or an "f" suffix, etc.).
* Comments are supported.
* Losslessly (and easily) converts into a binary format for efficient on-wire representation.
* Supports a JSON-like syntax, which should feel familiar to most people.

Disadvantages:

* Not opinionated enough. Several things can be spelled in multiple ways, and these spellings can be freely mixed.
* Quite a few bits of the linked spec say things like "depending on the implementation...," although this is mostly confined to edge cases where you write something silly and the implementation has to guess what you mean.
* Many enterprisey features that are not required for simple use cases. Expect to see "com.google" etc. show up a lot.
* Similar to TOML, it has support in several languages, but not in every language under the sun (contrast with JSON).
* If your build system is... less than ideal, then you probably think that generating code is scary and problematic. As you might expect, it works fine under Bazel, because that's how Google uses it internally.

Disclaimer: I'm a Google engineer, and Google invented protobufs.

[1]: https://protobuf.dev/reference/protobuf/textformat-spec/

Useability is not good

Posted Dec 19, 2025 10:26 UTC (Fri) by mbunkus (subscriber, #87248) [Link]

You're spot-on that I did not know about textproto until now. Thanks for mentioning it. It definitely looks interesting & I'll take a serious look at it.

Useability is not good

Posted Dec 18, 2025 19:49 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> - XML: even less type information than any of the others

That's not quite correct. There's XSD that even allows you to restrict numbers to a specific subset.

Useability is not good

Posted Dec 19, 2025 10:25 UTC (Fri) by mbunkus (subscriber, #87248) [Link] (2 responses)

You're correct that with additional, out-of-band information you can map it unambiguously, but there's out-of-band info for other formats, too, e.g. JSON schema.

What I meant was that the following XML cannot be read by naive parsers & converted into hash-array structures without an additional information such as a stylesheet or hints to the parser:

<xml>
  <settings>
    <something>42</something>
  </settings>

  <auth>
    <option/>
  </auth>
</xml>

First of all, the program might expect <settings> to be either a hash or an array; it't not obvious just from a stylesheet-less XML alone. Here are two possible corresponding JSON representations:

{
  "settings": {
    "something": 42
  }
}

or even

{
  "settings": [
    { "something": "42" }
  ]
}

Second, there's no info about the type of <option> either. It might be: an empty string; a None/undefined/null type of value…

Of course this is because XML is capable of expressing more complex structures than nested hash-arrays, but most programs nowadays use nested hash-arrays for any kind of configuration information — because it's more or less natural to build such structures, they're trivial to implement in most programming languages, they map cleanly to all kinds of binary & text representations. XML's flexibility & capabilities are to its detriment when considering to use it as a human-maintainable configuration format.

For me an ideal human-maintainable format has a couple of properties:

  • allows for comments (JSON loses here)
  • has in-band structural information to make tooling a viable option (pretty printing, structure validation, auto-indenting in an editor, easy navigation with "jump to key XYZ…" functionality; YAML loses here)
  • makes it harder to mess up the format (YAML & XML lose here)
  • does have little repetition in what I have to type all the time (XML & TOML lose here)
  • is optically easy to grasp for us meatbags, not just agile computers (YAML loses here, but so does JSON when you have to deal with long strings)

I did not actually know about textproto which NYKevin has just mentioned. I will definitely look into that.

Useability is not good

Posted Dec 19, 2025 21:27 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Yeah, if you want to carry all the information in-band then there's no real good option. Integers/booleans are especially trippy. E.g. is 129387 an int32 or int64? Then what about JavaScript?

Useability is not good

Posted Dec 20, 2025 9:28 UTC (Sat) by Wol (subscriber, #4433) [Link]

Until the target system importing (or exporting) the JSON just doesn't care - it could be a straight infinite precision integer (or number) ...

Cheers,
Wol

Useability is not good

Posted Dec 19, 2025 10:23 UTC (Fri) by gioele (subscriber, #61675) [Link] (4 responses)

I'd add:

- TOML: no difference between "this field is set to null" and "this field is not set and thus it has the default value". (See https://lobste.rs/s/h50lml/toml_1_1_0_released#c_wiibz1 for a longer discussion.)

And I'd remove:

- XML: attributes vs data in child elements

Why is that a problem? Many IDLs offer you a way to specify non-structured metadata about an object (attributes) or encapsulated structured/non-structured data (child elements).

Useability is not good

Posted Dec 19, 2025 11:13 UTC (Fri) by mbunkus (subscriber, #87248) [Link] (3 responses)

>> - XML: attributes vs data in child elements

> Why is that a problem? Many IDLs offer you a way to specify non-structured metadata about an object (attributes) or encapsulated structured/non-structured data (child elements).

Due to how the distinction between data & metadata isn't actually adhered to or, god forbid, enforced somehow. Just check the Apache Tomcat server.xml configuration files. While you can map that data into hash-array structures, you cannot do the reverse without additional information provided to the writer (stylesheet, explicit writer configuration etc.).

Again, I'm only arguing from a standpoint of having a format that humans can easily maintain. In this situation the duality or overlap of functionality of attributes & child elements is a clear detriment. Not only do you have to remember which options exist, but also whether the parser/application expects those to be an attribute or a child element. Furthermore, element & attribute names are often spelled differently, again placing more cognitive load on us humans.

Sure, good tooling & stylesheets fix some of those concerns. And no, XML isn't unusable, of course. I'm just… frustrated by the lack of a format I can consider really easy to use for us humans with only very minor drawbacks.

Useability is not good

Posted Dec 22, 2025 9:15 UTC (Mon) by taladar (subscriber, #68407) [Link] (2 responses)

Where XML really loses is its ridiculous escaping system with giant lookup tables, making every XML library comparatively huge and escaping support a pain to implement.

Useability is not good

Posted Dec 22, 2025 9:37 UTC (Mon) by gioele (subscriber, #61675) [Link] (1 responses)

> Where XML really loses is its ridiculous escaping system with giant lookup tables

What are you referring to exactly? XML's escaping system defines exactly 5 entities (lt, gt, amp, apos, quot), a way for DTD authors to define their own entities (sadly known for the "billion laughs" attack), and a generic mechanism for referring to Unicode codepoints (&#248; for ø). None of that requires "giant lookup tables".

Maybe you're mixing XML with HTML, whose predefined list of char entities is quite long and contains unusual things like " &npart; (partial differential, combining long solidus overlay)"? https://en.wikipedia.org/wiki/List_of_XML_and_HTML_charac...

Useability is not good

Posted Dec 22, 2025 10:24 UTC (Mon) by taladar (subscriber, #68407) [Link]

Well, HTML and pretty much any concrete XML format I have ever had the displeasure of dealing with. The basic design of the entity system is just deeply flawed when compared to much simpler escaping mechanism in other text formats. I'd much rather deal with five levels of backslash escaping in a template that generates shell code that uses regular expression parameters to modify some other stuff with backslash escaping than with XML and that is not because nested backslash escaping is fun to deal with (especially when each level has slightly different things that need backslash escaping).

Useability is not good

Posted Dec 17, 2025 14:02 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Trivial? The "use 'comment' keys" fails in the following ways:

- only a single instance for an entire object
- if you do support multiple, you still need to manually avoid conflicts
- no way to comment an array (say an `argv` where you want to comment terse flags with a description of what/why)
- formatters will easily re-sort them away from immediate context

I do like that JSON is a dead simple parse (though semantics around its "Unicode" requirement are unfortunate[1] and "number" being BigInt and InfinitePrecision at the same time sucks). YAML has the second system syndrome and tries to fix too much at once and ends up tripping over itself. Nevermind that I'm not aware of any 100% accurate strict parsers for the format either.

[1] IETF's JSON RFC states UTF-8 though, so there is a good reference to lock that down at least.

be afraid, very afraid

Posted Dec 14, 2025 9:58 UTC (Sun) by grmnsftphr (subscriber, #178591) [Link]

this all has its roots in the failed IRL POSIX capabilities model that was only put to test after POSIX had died. And it failed and still fails in real life. I see daily enough of the problems this causes us while having the hallmarks of security cult. I fail to see how this newest attempt addresses the underlying problem of capabilities being extremely use-case dependent and this exploding in our faces. Also, when you do services you quickly realize they're are many different angles that lead to this explosion.

run0

Posted Dec 14, 2025 17:56 UTC (Sun) by mb (subscriber, #50428) [Link] (29 responses)

I have long replaced all my uses of sudo with run0.

How does capsudo compare to run0?

run0 vs capsudo?

Posted Dec 14, 2025 22:45 UTC (Sun) by nickodell (subscriber, #125165) [Link] (1 responses)

One similarity between capsudo and run0 is that rather than using a setsuid binary, both run0 and capsudo have a daemon process. Rather than invoking a setsuid binary and gaining additional privileges, the process makes an RPC call to the privileged daemon asking it to run the command.

Where they differ is in two things:
1) How many daemon processes may run?
2) How is a user allowed or disallowed to run a command?

1) IIUC, systemd runs a single run0 process, as root, and all authorization decisions are made inside this process. In contrast, capsudo runs as many processes as there are delegated permissions. Each capsudo process may run at a different privilege level.

2) systemd uses polkit to determine whether a user is allowed to run a command. In contrast, capsudo allows a user to specify the command when creating the capability object. Which users are permitted to use the object is determined by standard Linux permissions on the unix socket used to perform RPC.

So, they share some attributes, but I would say that run0 is more similar to sudo than capsudo is similar to sudo.

run0 vs capsudo?

Posted Dec 15, 2025 4:59 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

The thing about run0, however, is that it's little more than a thin bit of glue code between systemd and polkit. It is highly opinionated and relatively inflexible, not because of any technical limitation, but because systemctl (and the rest of systemd) already exists and supports all the flexibility in the world.

Put another way: Comparing capsudo to run0 is more than a little unfair, because run0 is the tip of the iceberg that is systemd.

run0

Posted Dec 15, 2025 6:42 UTC (Mon) by zdzichu (subscriber, #17118) [Link] (26 responses)

I've just tried to use `run0` over SSH connection to my desktop computer. It appeared to hang. Then I've discovered it displayed a Polkit prompt on the graphical session (which is mainly used for Kodi and playing music, hardly ever interacted by keyboard+mouse). Not really userfriendly.

run0

Posted Dec 15, 2025 12:57 UTC (Mon) by barryascott (subscriber, #80640) [Link] (23 responses)

Do you have DISPLAY or WAYLAND_DISPLAY set when ssh into the system?

run0

Posted Dec 15, 2025 14:03 UTC (Mon) by zdzichu (subscriber, #17118) [Link] (2 responses)

No, I do not.

run0

Posted Dec 15, 2025 14:48 UTC (Mon) by ballombe (subscriber, #9523) [Link] (1 responses)

XDG_RUNTIME_DIR maybe?

run0

Posted Dec 15, 2025 17:30 UTC (Mon) by zdzichu (subscriber, #17118) [Link]

Yes:

XDG_RUNTIME_DIR=/run/user/[myuid]
XDG_SESSION_CLASS=user
XDG_SESSION_ID=5
XDG_SESSION_TYPE=tty

run0

Posted Dec 16, 2025 6:02 UTC (Tue) by intelfx (subscriber, #130118) [Link] (19 responses)

> Do you have DISPLAY or WAYLAND_DISPLAY set when ssh into the system?

No, it does not work that way.

Whatever DE the GP was using on another seat had registered a GUI polkit authentication agent, which subsequently intercepted the authentication prompt.

I wonder if polkit should be patched to avoid contacting authentication agents that belong to another seat (and treat absence of a defined seat as a unique ephemeral seat that does not compare equal to anything but itself).

OTOH, this will break tmux, screen, and any other client-server terminal emulator that launches user processes under `systemd --user`.

run0

Posted Dec 16, 2025 15:34 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (2 responses)

It is inherently hard to get this right in every case. But IMHO we could get reasonably close by having each agent report whether it is a GUI agent or a text-based agent (and/or its controlling terminal/session). Then you could have the requestor specify the same information, and filter the available agents to ones that are likely to be usable:

* If the requestor is a CLI app, and there is no $DISPLAY variable or similar, then contacting GUIs is probably a bad idea. It's probably also a bad idea to contact text-based agents with a different pty/tty, but that might be appropriate if there is no alternative (e.g. we're running under systemd or otherwise do not have a controlling terminal).
* Similarly, if the requestor is a GUI, contacting text-based agents is probably a bad idea.
* Finally, if we're a CLI app with a $DISPLAY, then it might be OK to contact GUIs on that specific display, as well as text-based agents on the same pty/tty. But it should not contact some random other session that has nothing to do with us.

run0

Posted Dec 17, 2025 9:12 UTC (Wed) by taladar (subscriber, #68407) [Link] (1 responses)

Alternatively maybe some sort of approach similar to systemd-ask-password could be used where it is possible to call a command to get the prompt to your current terminal even if the automatic prompting is happening somewhere else?

run0

Posted Dec 17, 2025 13:58 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

There is the `wall` implementation. `systemd-tty-ask-password-agent` is not polkit-enabled to answer system requests as a user though. See https://github.com/systemd/systemd-ui/pull/7 for updating a simple GUI agent to modernity. A TTY agent would be nice to have too.

run0

Posted Dec 17, 2025 10:07 UTC (Wed) by mchapman (subscriber, #66589) [Link] (15 responses)

> I wonder if polkit should be patched to avoid contacting authentication agents that belong to another seat (and treat absence of a defined seat as a unique ephemeral seat that does not compare equal to anything but itself).

As far as I know this is supposed to happen already.

A polkit agent is always registered against a particular process or logind session. When polkit needs to talk to an agent, it determines it according to the subject of the authorization: either that process itself (i.e. it's acting as its own agent), or the logind session that owns that process.

Where I have seen this break is when people start Tmux or Screen in their GUI session, then, at a later time, reconnect to that over SSH. When they reconnect to it they are effectively back in that GUI session, even if they aren't sitting at that seat.

run0

Posted Dec 17, 2025 21:50 UTC (Wed) by raven667 (subscriber, #5198) [Link]

This makes a lot of sense and I wish there was a better understanding of how this stuff actually works, and how it's intended to work, so that its easier to intuitively guess at possible problems/solutions rather than assuming the system is fundamentally broken and incapable of doing a bunch of things it already does. I sometimes see people going the long way and working around problems which are fundamentally misunderstandings of how a system was designed to work, then complain about how the system is b0rken and incomplete because it doesn't do $THING when it does, they just didn't know how (and sometimes its _very_ poorly documented, or hard to find). I hate to see people's effort going to waste building systems that don't get used.

run0

Posted Dec 18, 2025 7:38 UTC (Thu) by SLi (subscriber, #53131) [Link] (10 responses)

Is there a fix to this, other than don't use tmux or screen?

run0

Posted Dec 19, 2025 13:56 UTC (Fri) by paulj (subscriber, #341) [Link] (9 responses)

I'm also curious on the answer to this.

run0

Posted Dec 21, 2025 4:37 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (8 responses)

I've posited it elsewhere, but it only helps those running `systemd` (probably not an issue for `run0` users though).

If we could get a polkit agent to forward through password agents[1] and a TTY-based agent that rang `\a` bell when a new request comes in, you could open it in a screen/tmux window, go over to it, enter the password, then go back to what you were doing. I've asked udiskie to plumb requests through agents. A polkit agent that did the same would be nice to have as well (whether a new one or a change to an existing agent).

I'm willing to work on the TTY password agent part if someone else wants to start on the polkit agent side.

[1] https://systemd.io/PASSWORD_AGENTS/

Interaction multiplexing

Posted Dec 21, 2025 14:05 UTC (Sun) by SLi (subscriber, #53131) [Link] (7 responses)

This feels to me something of a specific solution to a generic problem, and my instinct would be to ask if we can solve the class of problems instead. It doesn't apply only to $SSH_ASKPASS but also things like $DISPLAY; and a large part of the problem is that environment variables are frozen at program startup time.

I would frame the generic problem as:

> Given this process, what human-facing I/O endpoints are actually reachable right now, and which one is the user currently attending to, even if that differs from the environment at exec time?

(Now this is still ambiguous to an extent; one can attach to tmux from multiple places.)

Is there a generic way we could approach solving this? I *think* there is.

Terminal managers like tmux are at a position where they can have that information, probably better than anyone else. They just aren't able to modify the environment of the running program to inform it, so there needs to be an out-of-band way to query the terminal emulator for the information.

I first thought of weird query-response OSC escape sequences; but they are tricky, and I think don't have an advantage in this case, because the one thing tmux *can* provide to all programs is an out-of-band mechanism.

Here's where this line of thought evolved. Option A keeps tmux self-contained and maximally useful on its own; option B minimizes tmux-specific policy and makes it easier to share semantics with other multiplexers or brokers.

# Option A ("flat"):

- We have an environment variable. I'm bad at naming, so I call it INTERACTIVE_CONTEXT.

- it contains a capability-ish _pointer_ to "who can route interaction for me", not the answer:

> INTERACTIVE_CONTEXT=unix:/run/user/1000/interactive/ctx-<token>.sock

- It is set initially by PAM or systemd user manager (or something using something like XDG_RUNTIME_DIR on non-systemd contexts)

- On a GUI login, it points to the GUI session broker. On a SSH login, it points to the SSH session broker

- Multiplexers like tmux wrap it. When tmux starts a session/window, it sets it to

> INTERACTIVE_CONTEXT = unix:/run/user/1000/interactive/tmux-ctx-<serverid>.sock

- The tmux context service can select "the most suitable client" (last-active, last-attached, view-of-pane, whatever); then either handle prompting itself, forward to the outer context of the chosen client, or maybe just tell "sorry, interaction is not available".

- To forward, tmux needs to remember each client's "outer" INTERACTIVE_CONTEXT at attach time. So when `tmux attach` runs, it captures the attaching environment's INTERACTIVE_CONTEXT and stores it per-client.

# Option B (delegating pointer chain)

Like option A, but instead of tmux implementing everything, its socket would implement only

- `resolve()` -> returns INTERACTIVE_CONTEXT for best client
- `describe()` -> says "I'm a multiplexer; here's what I'd choose"
- optionally `prompt()` implemented by forwarding (not UI)

---

I think there are some security implications that need thinking. I think the most useful model here is that the terminal multiplexer arbitrates "the most suitable client" and external policy decides "what this endpoint is allowed to do". One useful mental model may be to treat INTERACTIVE_CONTEXT as a bearer capability scoped to a login or multiplexer context, rather than as a global per-user resource (instead of "any same-UID process can pop auth prompts everywhere")

Interaction multiplexing

Posted Dec 22, 2025 5:31 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (6 responses)

> > Given this process, what human-facing I/O endpoints are actually reachable right now, and which one is the user currently attending to, even if that differs from the environment at exec time?

Is this really answerable in a generic way? Say I am ssh'd into a machine from my laptop. I attach to a tmux session. How is there any indication that the desktop is at my feet with the screen unlocked and keyboard accessible at the desk I'm sitting in front of? I don't want to manage that state by hand and I really have no idea how one would automate it (maybe some BLE presence metrics, but I am not my phone (or even watch), so that isn't really crossing over the "works most of the time, but enough to be frustrating when it doesn't" threshold in my book).

I don't see why it needs all this extra stuff. You run an agent to answer requests (polkit, passwords, etc.) where you interact (the DE, the I-ssh'd-in tmux session, etc.). Any agent can answer a request; any agent's answer is sufficient. Requests come over the standard protocols (DBus, inotify-on-a-directory) and you manage them given the UI you provide. If you want to know you're in tmux and broadcast it with `tmux display-message -c $client` (or `display-popup`) so that you don't even need the agent's session to be active, that sounds a lot simpler to me.

Interaction multiplexing

Posted Dec 22, 2025 9:23 UTC (Mon) by taladar (subscriber, #68407) [Link] (2 responses)

I like the general idea but why do they need to be requests to agents at all, wouldn't it be even simpler if they were just notifications about a new password request so that even seats where the user logged in after the request was created could respond to the request by polling for pending requests after login or when the user asks for them explicitly?

Interaction multiplexing

Posted Dec 22, 2025 14:11 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Polkit sits on DBus, so you're stuck with a "proper" agent there. The systemd protocol is "drop files in a directory", so you could definitely use notification actions to trigger the actual prompt (systemd-gnome-ask-password-agent does this). It also scans on startup and sends notifications for anything "laying around" when it starts up. Just need something that doesn't rely on a GUI and notification daemon (the TTY agent I mentioned).

Interaction multiplexing

Posted Dec 22, 2025 17:01 UTC (Mon) by SLi (subscriber, #53131) [Link]

Yes, I think this makes sense; essentially, decoupling request creation from request fulfillment. I think this pushes in the direction of making the request durable and discoverable, which I like. I think it's largely orthogonal to the idea of making the routing contextual.

Interaction multiplexing

Posted Dec 22, 2025 16:48 UTC (Mon) by SLi (subscriber, #53131) [Link] (1 responses)

> Is this really answerable in a generic way?

No, not in any foolproof way. You cannot infer presence. But you can do heuristics that are better than defaulting to whatever it was when you started this program by answering questions like which interaction endpoints are currently attached and responsive, and which of those have most recently had user activity.

tmux, for example, does not know if the local desktop seat is unlocked, but it does know whether there is a client attached over SSH that has recently been active.

So, sure, perfect correctness is probably impossible, but that just bounds the abstraction to something that is less likely to be completely wrong.

Interaction multiplexing

Posted Dec 23, 2025 14:40 UTC (Tue) by taladar (subscriber, #68407) [Link]

Being correct all the time is also not necessarily required if you do allow the user to override the heuristic when it guesses wrong by e.g. running a command on a terminal they do have access to to get to the prompt. 100% correctness is hard but it is only made necessary by the lack of remedial options when the heuristic guesses wrong.

Interaction multiplexing

Posted Dec 22, 2025 16:58 UTC (Mon) by SLi (subscriber, #53131) [Link]

> I don't see why it needs all this extra stuff. You run an agent to answer requests (polkit, passwords, etc.) where you interact (the DE, the I-ssh'd-in tmux session, etc.). Any agent can answer a request; any agent's answer is sufficient. Requests come over the standard protocols (DBus, inotify-on-a-directory) and you manage them given the UI you provide. If you want to know you're in tmux and broadcast it with `tmux display-message -c $client` (or `display-popup`) so that you don't even need the agent's session to be active, that sounds a lot simpler to me.

A broadcast model like this would probably work for a password prompt, in a sense. It would fail for other examples of the same problem class ("which DISPLAY to connect to"). Even for the password prompt, I think it does have some downsides.

First, it's not really very symmetric across transports. GUI agents work well with dbus + desktop notifications; TTY agents require explicit discovery and focus; and headless/multiplexed contexts are rather second class.

Second, I think a generally useful idea is "route this request to the same place the user typed the command". I think a broadcast model of, essentially, "route this request somewhere and hope the user notices" is sufficient for things like boot time disk unlocks, and more frustrating for interactive admin commands over ssh+tmux.

So, yes, I think the broadcast model works, but it gives up the correlation. Maybe that's sufficient.

run0

Posted Dec 18, 2025 13:25 UTC (Thu) by intelfx (subscriber, #130118) [Link] (2 responses)

> polkit agent is always registered against a particular process or logind session. When polkit needs to talk to an agent, it determines it according to the subject of the authorization: either that process itself (i.e. it's acting as its own agent), or the logind session that owns that process.

Ah, but the part about how to treat the "absence of a defined seat" was load-bearing in my reply.

> Where I have seen this break is when people start Tmux or Screen in their GUI session, then, at a later time, reconnect to that over SSH. When they reconnect to it they are effectively back in that GUI session, even if they aren't sitting at that seat.

In my personal setup, tmux server runs under `systemd --user`, outside of any defined session (and certainly outside of the session that owns seat0). Yet, when I run pkexec under tmux, it still gets ahold of the GUI authentication agent, despite having no reason to do so.

For instance, I went to specific pains to forward proper environment into each new tmux pane (such that when I create a new tmux pane from an SSH connection, that pane inherits specific environment variables of the SSH connection), and it inherits the SSH connection's $XDG_SESSION_ID. But none of this helps polkit to avoid contacting the GUI agent.

run0

Posted Dec 18, 2025 13:32 UTC (Thu) by intelfx (subscriber, #130118) [Link] (1 responses)

> In my personal setup, tmux server runs under `systemd --user`, outside of any defined session

Slight correction: `systemd --user` is not outside of any defined session — it does, in fact, run under its own logind session (of class "manager"). But that does not materially change what I said (it's still not the session that owns seat0).

run0

Posted Dec 22, 2025 7:04 UTC (Mon) by mchapman (subscriber, #66589) [Link]

Ah, you're right. Polkit explicitly asks logind for the user's current "display" session if a session can't be identified for the process.

Perhaps it shouldn't do that, since it thwarts any attempt by that process to provide a fallback agent (which all the systemd tools do when they're using polkit).

run0

Posted Dec 18, 2025 11:25 UTC (Thu) by mgedmin (subscriber, #34497) [Link] (1 responses)

I've had that experience trying to unlock a LUKS device with udisksctl over ssh. Had to use a remote desktop client to see the polkit auth prompt.

run0

Posted Dec 18, 2025 18:11 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

I wonder if it wouldn't be useful to have udiskie forward things through password agents[1] and then a tty-based agent be used for that use case. I filed an issue with `udiskie`[2] to do this. All that would be needed then is a terminal UI to listen to password agent requests and offer a selector between them. Could pop one open in tmux/screen and leave it there.

[1]https://systemd.io/PASSWORD_AGENTS/
[2] https://github.com/coldfix/udiskie/issues/330


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds