|
|
Subscribe / Log in / New account

The underlying std::string is freed?

The underlying std::string is freed?

Posted Jul 8, 2024 11:21 UTC (Mon) by mathstuf (subscriber, #69389)
In reply to: The underlying std::string is freed? by NYKevin
Parent article: New features in C++26

The main use case I have is to help alleviate the performance leakage from CMake's stringly typed variable setup. You have a variable with value `foo;bar;baz`. Any time CMake wants to use this as a list, there's a `vector<string>` made which allocates for each piece of the list (here, 3 strings `foo`, `bar`, `baz`). When the command ends, this parse is dropped and the next command that wants to do a (reading) list operation needs to redo the breakdown with subsequent allocations. Instead, I want to just have `.getAsList()` which caches the as-a-list parsing using `string_view` where possible. However, there is the rule that `foo\;bar;baz` is two elements: `foo;bar` and `baz`. I cannot store a `string_view` for the first item because it has different contents. Removing this optimization for this case is…unfortunate (though rare, it's something to consider). So for me, the lifetime is obvious, but it may indeed be the case that it is hard to determine…just like C++ makes it anyways today.

Analogously, I'd like to have `.getAsPathComponents()`, `getAsBool()`, etc. pre-parsings cached where possible.


to post comments

The underlying std::string is freed?

Posted Jul 9, 2024 17:52 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

If I were doing that, I would just use strings for the individual components, cache/intern them aggressively with e.g. an std::unordered_map or the like, and then have functions for looking these up which hand out string_views everywhere. Then you're using no more than twice the file size of the input CMake file, which is probably on the order of kilobytes (I don't use CMake, but surely its files are not huge?). In fact, probably much less than that because I would tend to assume the average CMake file is not 100% made up of stringly-typed lists (but again, I don't use CMake).

The main problem with this approach is that cache invalidation is hard. But I'm not sure how many CMake files you're going to parse in one run of your program, so I don't know if that's actually a problem or not. Probably you can have a per-file cache if needed.

The underlying std::string is freed?

Posted Jul 12, 2024 14:53 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

The thing is that "lists" in CMake are just *interpretations* of the actual values. There's no such actual thing. Some APIs just interpret the values as `;`-separated lists. So interning values over something like:

```
foreach (item IN LISTS some_glob_result)
list(APPEND absolute_sources "${CMAKE_CURRENT_SOURCE_DIR}/${item}")
endforeach ()
```

would end up interning O(N²) string data to store the "real" value of `absolute_sources` across the loop.

CMake's inspirations came from Tcl (which is why it is stringly-typed) and the backwards compatibility guarantees make it very hard to actually break away from that.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds