Hashimoto: We rewrote the Ghostty GTK application
Mitchell Hashimoto has written a blog
post about "fully embracing the GObject type system
" with a
rewrite of the GTK version of Ghostty:
In addition to memory management [improvements], we can now more easily create custom GTK widgets. This let us fully embrace modern GTK UI technologies such as Blueprint. For example, here is our terminal window Blueprint file. This has already led to more easily introducing GUI features like a new GTK titlebar tabs option, an animated border on bell, etc.
The rewrite is now the default if one builds Ghostty from source, and will be included in the 1.2 release that is expected in the next few weeks. LWN covered Ghostty in January.
Posted Aug 15, 2025 20:52 UTC (Fri)
by atai (subscriber, #10977)
[Link] (27 responses)
Subjectively, for a long time, especially before GPU acceleration becoming common, Qt apps felt faster than the corresponding gtk+ ones (
Posted Aug 15, 2025 23:44 UTC (Fri)
by intelfx (subscriber, #130118)
[Link] (2 responses)
I don't think C++ STL data structures (which are, btw, *very* far from optimal) fill the same niche as the GObject type system.
Essentially, GLib/GObject is a runtime introspection system on top of classes and virtual dispatch "emulated" in C code, just like Qt is a runtime introspection system (MOC/QObject) on top of classes and virtual dispatch written in C++ code. There is no difference (at least conceptually, quality of implementation aside). Why would C preprocessor macros and hand-rolled vtables be "slower" than C++ classes and automatically built vtables?
Posted Aug 16, 2025 1:20 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Aug 16, 2025 1:39 UTC (Sat)
by intelfx (subscriber, #130118)
[Link]
But honestly, it sounds like a design issue in whatever application does that. Why would you even need to ref/unref GObjects concurrently at such a rate that the cache coherence overhead becomes significant? It's not like it's impossible to abuse QSharedPointers either.
Posted Aug 16, 2025 3:00 UTC (Sat)
by DemiMarie (subscriber, #164188)
[Link] (7 responses)
My understanding is that entity-component-system is much more friendly to modern hardware.
Posted Aug 17, 2025 17:54 UTC (Sun)
by emk (subscriber, #1128)
[Link] (5 responses)
In a constrained, high-performance domain like video games, it can be a great performance optimization. But language support is iffy, and it's an alien programming paradigm for a lot of programmers. People who want to get a feel for what kinda-sorta decent language support would feel like should try writing a simple game (like Space Invaders) in Rust, using the Bevy engine. It's not all sunshine and daffodils, unfortunately, but it's an interesting experience.
Personally, I wouldn't bother using ECS for GUI building unless the language I was using was deeply hostile to mutable object trees.
Posted Aug 18, 2025 15:48 UTC (Mon)
by DemiMarie (subscriber, #164188)
[Link] (4 responses)
I’m no expert on performance, but to the best of my knowledge modern CPUs are incredibly fast, provided one can minimise cache misses and branch mispredictions and make effective use of SIMD. Solutions that can do this will be much faster than those that cannot, and even asymptomatic improvements cannot overcome that unless the data size is quite large. Also, flat data layouts with minimal branching unlock offloading to accelerators like GPUs, which can be another huge performance win.
It’s all about what the hardware needs to be able to keep the execution units used and fed with data. Code using sea-of-pointers data structures can’t do that unless it can make very effective use of prefetching. Code using large contiguous arrays can.
Posted Aug 18, 2025 18:06 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (3 responses)
Where ECS gets you a big win is when you're looking at a subset of interesting components of many entities. With tree of objects, the best I can do is have a work queue of objects with interesting fields on them, sorted by object location. ECS lets me do better; I can have a work queue of tuples of interesting components, which I can sort by component location, and I get much better locality of reference (since I'm now looking at interesting components, not whole objects, and doing so in a predictable fashion).
Posted Aug 18, 2025 18:48 UTC (Mon)
by daroc (editor, #160859)
[Link] (2 responses)
But I do agree that they're a tradeoff; there's a subset of problems where they work well, and a subset where they don't.
Posted Aug 18, 2025 19:20 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (1 responses)
Posted Aug 18, 2025 19:54 UTC (Mon)
by daroc (editor, #160859)
[Link]
Posted Aug 21, 2025 15:22 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Many indirect dispatches are well predictable. In those cases where they are not (e.g., when you are processing an array with objects that are instances of pseudo-random classes), if you implement the same functionality with conditional branches, such as in "Efficient Dynamic Dispatch without Virtual Function Tables. The SmallEiffel Compiler.", or by writing an imperative program that contains some ifs, you will probably also get branch mispredictions.
Posted Aug 16, 2025 8:07 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (15 responses)
Why would they be? At the end of the day, you're invoking functions through vtables. The C++ compiler doesn't have magical powers that let it generate better vtables. (Clang will happily devirtualize manually-coded vtable calls too.) Macros and C++ templates both boil down to plain old codegen once you strip away the ergonomics. Qt guarantees ABI compatibility so it's certainly not doing monomorphization across interface boundaries --- not that it'd help anyway.
> Subjectively, for a long time, especially before GPU acceleration becoming common, Qt apps felt faster than the corresponding gtk+ ones (
The perceived performance difference is a combination of your imagination (blinding is hard) and language-independent design choices that differ between the toolkits. For example, Qt uses hierarchy-based memory management for some things while GObject and GTK use reference counting exclusively. That's not a C++-vs-portable-ABI thing. It's a software architecture thing.
It's always tempting to point to infrastructure as the cause of performance gaps. It's right there, where you can see it. Everyone knows C++ is faster than C, so GTK is slower obviously because it's in C, yes? Everyone knows JNI is slow, so OF COURSE some random Java program's perf problems come from language interop issues. Everyone knows Python is slow, so OF COURSE it's because of cpython or the GIL or whatever that a Python program is slow. Everyone knows TCP is slow, so OF COURSE if your program is laggy, you should switch to QUIC, right?
Life is seldom so simple.
Performance issues usually have a lot more to do with what a program is doing than the system the programmer used to express the doing. Yet myths about spelling refuse to die, in part because ultimately irrelevant differences show up in microbenchmarks, and in part because when someone rewrites a program using technology X in technology Y, the new program is often faster, and the programmer often attributes the difference to Y being better than X --- incorrectly, usually, because rewrites clear away cruft and often just do less than the originals, so of course they seem faster at first. The software matures and the cycle repeats.
No, GObject as implementation strategy will not affect a terminal program's performance in any meaningful way. It will, however, make it easy to work with that terminal system from any language and evolve its interface in an ABI-stable manner. *That's* the GObject differentiator.
Posted Aug 16, 2025 11:34 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (14 responses)
Maybe. It's very easy for a language to inadvertently encourage programmers to write an expensive thing unnecessarily. So in a sense the expensive thing is "What a program is doing" but the programmer never intended that.
C++ implicit conversion is a nasty example of this, in Rust if you needed a String but provided an &str that won't compile, read the diagnostics. In C++ if you needed a std::string but provided a char* that compiles... to code which will make a std::string, initialize it from the value of the char* and then carry on.
The Rust programmer might curse and add to_owned() or their preferred way to express what they meant, but now they know what's happening, the C++ programmer remains ignorant until the performance numbers are terrible.
Posted Aug 16, 2025 16:21 UTC (Sat)
by linuxrocks123 (subscriber, #34648)
[Link] (5 responses)
If you're going to modify the string in the callee, there's absolutely no way around doing an strcpy, no matter what language you're in, because you need the original char array for the next time you make the call.
If you're NOT going to modify the string in the callee, then C++ has been getting a lot smarter about initializing stuff at compile-time, which would result in not doing the unnecessary initialization you're describing. See, for instance, https://quuxplusone.github.io/blog/2023/09/08/constexpr-s...
Posted Aug 16, 2025 18:16 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (4 responses)
The "innovation" which is maybe more relevant is std::string_view from 2017. This finally gave C++ a type analogous to &str except as a standard library type rather than a primitive. Before that the choices were to take const std::string (invoking the needless conversion I discussed but promising not to change the string) or char* (a raw pointer with unclear semantics).
We know this caused the overhead I'm talking about in real codebases such as Google's Chrome browser. My contention is that while not having a basic string slice reference until 2017, and not having a basic slice reference until 2020 (yes std::span took longer) is silly, there is always more of this because the root cause is the implicit conversion which is a core language feature, and that similar features exist in many languages, causing the intent of the programmer to diverge from "what the program does" with significant performance implications.
Posted Aug 16, 2025 21:48 UTC (Sat)
by linuxrocks123 (subscriber, #34648)
[Link] (3 responses)
Although I don't doubt you that there are programmers who used an the implicit conversion that way in performance-critical code, and now aren't because of string_view, when I first looked at string_view, I thought, "that's weird, why would anyone want that?", and I still don't think I've ever used it or ever would.
If I needed the semantics of string_view in performance-critical code, I would have the method take const char* and pass str.c_str() plus whatever constant I needed to the callee. Maybe I'd also pass a length parameter to the method if I didn't want the entire tail of the string and the desired length was not obvious from context. I don't think string_view adds any value to that approach. In fact, I think it subtracts value since it's not obvious to novices that it causes nasal demons if the host string gets destructed.
But, hey, my way still works, so I don't mind if there's also another way others find easier. Good for them.
Posted Aug 20, 2025 2:17 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
IME, much of the perf benefit comes from not having to call `strlen` on every C-string API. For example, comparisons can short-circuit on length comparisons instead of doing some `memcmp` SIMD shenanigans. This really helps the `if (str == "literal") { /* … */ } else if (str == "other_literal") { /* … */ }` trees when you can suffix those literals with `_s` and have a compile-time length just baked into the binary for each comparison[1].
On the API side, the main benefit of `string_view` is that it communicates that the string is only looked at. If the callee always adopts the argument value (e.g., by moving it into a member), just taking a `std::string` by value is best. This lets the caller then use `std::move` themselves if *they* can relinquish ownership and just elide an entire allocate/memcpy/deallocate dance. So most `std::string const&` parameters should instead be a by-value `std::string_view` so that you can "pass the buck" on where the `strlen` is called and have it appropriately accounted to the site which *is* just slinging plain C strings around.
Yes, you have the lifetime footguns with `std::string_view`…but that's really no different than with `const char*` juggling either.
[1] It still helps to get a general frequency of comparisons so you can sort the tree in an optimal way and put the "never taken" branches at the end of the chain, but the hidden `strlen` calls really don't help things at all.
Posted Aug 20, 2025 8:55 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (1 responses)
strlen is supposted to constexpr for string literals, so what you said didn't make sense to me. I tried it out on my machine just now to make sure:
---
#include <string>
using namespace std;
bool is_elf_friend(string x)
---
This compiles to:
---
_Z13is_elf_friendNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE:
---
That cmpq $6 instruction is doing an obvious immediate comparison based on the length of "mellon", followed by an early return if the length of the parameter does not match. So, I don't think you're correct.
Posted Aug 21, 2025 13:45 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Even without that, the different connotations implied at the API boundaries are still useful regardless IMO.
Posted Aug 16, 2025 20:17 UTC (Sat)
by Sesse (subscriber, #53779)
[Link] (7 responses)
Posted Aug 20, 2025 2:22 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (6 responses)
I really wish C++ would get a `Cow` type so that I can stop allocating space for static strings all over the place just because a handful of places use runtime values.
Posted Aug 20, 2025 7:35 UTC (Wed)
by Sesse (subscriber, #53779)
[Link]
Posted Aug 20, 2025 18:03 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
CoW ended up not working out. Atomic refcounting of the string buffer ate up all the time savings. It also violated the exception safety guarantees specified in the C++ standard.
Posted Aug 21, 2025 13:51 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
In CMake's implementation (where most of my C++ code ends up lately), this would mean that when a new variable scope is created (a function call or `add_subdirectory` call), we could store them as either "value from parent scope" (Borrow) or "custom local value" (Owned) and avoid copying *all* of the variable values into a new scope object. When the scope overwrites the value, we just unconditionally write it as an "Owned" value rather than whatever was there before. You then also only pay for O(n_local_variables) rather than O(n_variables_in_call_stack).
Posted Aug 21, 2025 22:42 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Ah, so the data never dies and thus doesn't need refcounting? This can work. But I think only with static data.
Posted Aug 21, 2025 23:47 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Aug 22, 2025 9:14 UTC (Fri)
by excors (subscriber, #95769)
[Link]
The Cow can be implicity coerced to (non-mut) &str, so the Cow is often an implementation detail that the caller doesn't have to care about - they can just treat it like a &str. But if they want to mutate it, or want it to outlive `v`, they can explicitly convert it to an owned type (at which point it will clone `v`, if necessary).
It sounds like the problem with pre-C++11 COW std::string was that seemingly read-only operations like `c = s[0]` (where `s` is not declared as const) might have to unshare `s`, because `operator[]` doesn't know whether the caller is going to mutate the reference that's returned, and that made it too easy to write subtly buggy code with data races or unexpected reference invalidation. C++11 added some requirements to prevent a COW implementation and reduce that risk. (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/...)
std::borrow::Cow avoids that problem because unsharing can only happen via an exclusive `&mut` reference, so the borrow checker guarantees there aren't any other references that might be unexpectedly changed.
Posted Aug 18, 2025 9:24 UTC (Mon)
by ehiggs (subscriber, #90713)
[Link] (3 responses)
Well you can avoid it and we did avoid it. And it leads to a mess trying to tie the lifetimes of your non-reference-counted objects to the reference counted ones. There was an entire class of bug that kept popping up in the Ghostty GTK application that could basically be summed up as: the Zig memory or the GTK memory has been freed, but not both.
Gosh, they sound like really annoying bugs. Debugging those issues must have cause people to utter some 4 letter words.
Mostly the one beginning with 'R'.
Posted Aug 18, 2025 13:13 UTC (Mon)
by pizza (subscriber, #46)
[Link] (2 responses)
So... please explain how that four-letter-R-word would have made any difference when you've deliberately ignored the built-in lifetime management mechanisms of the codebase you're using?
Posted Aug 18, 2025 15:21 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link]
Posted Aug 18, 2025 21:17 UTC (Mon)
by ehiggs (subscriber, #90713)
[Link]
But those bindings didn't happen magically. They were done by hard work from talented developers.
"""
I didn't fully understand if you're saying Zig or Ghostty has built-in lifetime management mechanisms and I knew about them and ignored them. Or if you meant something else. AFAIK, Zig and hence Ghostty don't have built-in lifetime management mechanisms and one needs to manually pair allocations and deallocations using defer. If that's not the case I'm always happy to learn!
Posted Aug 25, 2025 20:18 UTC (Mon)
by IanKelling (subscriber, #89418)
[Link] (2 responses)
Posted Aug 26, 2025 0:16 UTC (Tue)
by intelfx (subscriber, #130118)
[Link]
There is only so much time the world can stay in place to accommodate those who prefer to stick to outdated technology, for any kind of reason. There is a certain amount of inertia, and it's mostly a good thing, but it's finite, which is also a good thing.
Modern roads and motorways are not horse-friendly anymore, either. A computer without a fully shader-capable (like OpenGL 3.3+ or even 4+) GPU is increasingly becoming a "horse carriage". You can still use it on rural roads, sure, but not on Autobahns.
Posted Aug 26, 2025 5:01 UTC (Tue)
by raven667 (subscriber, #5198)
[Link]
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
My understanding is that ECS has much better performance on modern hardware, unless algorithmic overheads are too high. Furthermore, the data size one must reach for other solutions to be better is likely to increase with time.
Hardware strongly prefers ECS
It's a tradeoff, as with so much in computer science. An ECS architecture, used naïvely, is as bad as a tree of objects architecture; as long as you always go from object/entity to interesting fields/components of that object/entity, both are equally awful for performance.
Tradeoffs for ECS versus tree of objects
Tradeoffs for ECS versus tree of objects
Even though the components may be the same size, they can have padding of their own; a 30 byte component that needs 4 byte alignment still needs 2 bytes of padding when stored in ECS form, and the object may (through sheer chance) be better by not needing padding (if there's a way to rearrange the fields so that something with 2 byte alignment fills the gap).
Tradeoffs for ECS versus tree of objects
Tradeoffs for ECS versus tree of objects
GObject type system: slow?
Indirect dispatch is a good way to stall the pipeline of a modern CPU.
Not particularly. Some branches (direct or indirect) are hard to predict, and on a misprediction all instructions behind the branch are canceled and the whole pipeline is restarted to follow the correct path; there is no actual pipeline stall involved (that's something from the times of simpler microarchitectures from the mid-1980s).
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
{
return x=="mellon";
}
.LFB1425:
.cfi_startproc
xorl %eax, %eax
cmpq $6, 8(%rdi)
je .L7
ret
.p2align 4,,10
.p2align 3
.L7:
movq (%rdi), %rax
cmpl $1819043181, (%rax)
je .L8
.L3:
movl $1, %eax
.L4:
testl %eax, %eax
sete %al
ret
.p2align 4,,10
.p2align 3
.L8:
cmpw $28271, 4(%rax)
jne .L3
xorl %eax, %eax
jmp .L4
.cfi_endproc
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
GObject type system: slow?
Managing lifetimes bugs?
Whatever your feelings are about OOP and memory management, the reality is that if you choose GTK, you're forced into interfacing in some way with the GObject type system. You can't avoid it.
"""
Managing lifetimes bugs?
> Mostly the one beginning with 'R'.
Managing lifetimes bugs?
Managing lifetimes bugs?
...you've deliberately ignored the built-in lifetime management mechanisms of the codebase you're using
"""
Doesn't work without a modern GPU
Doesn't work without a modern GPU
Doesn't work without a modern GPU