|
|
Subscribe / Log in / New account

The underlying std::string is freed?

The underlying std::string is freed?

Posted Jul 7, 2024 7:10 UTC (Sun) by NYKevin (subscriber, #129325)
In reply to: The underlying std::string is freed? by mathstuf
Parent article: New features in C++26

The whole point of Cow is to "forget" whether you own something or not. In a borrow-checked language like Rust, you can make that the compiler's problem, but in C++, it would have to be tracked at runtime (presumably via refcounting), or else it's going to have all the disadvantages of an extra indirection layer without any of the upside.

In other words: The closest you can get to Cow in C++ is shared_ptr on a type with a non-deleted copy constructor, and maybe a thin wrapper to make the whole thing slightly more ergonomic.


to post comments

The underlying std::string is freed?

Posted Jul 8, 2024 11:21 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (2 responses)

The main use case I have is to help alleviate the performance leakage from CMake's stringly typed variable setup. You have a variable with value `foo;bar;baz`. Any time CMake wants to use this as a list, there's a `vector<string>` made which allocates for each piece of the list (here, 3 strings `foo`, `bar`, `baz`). When the command ends, this parse is dropped and the next command that wants to do a (reading) list operation needs to redo the breakdown with subsequent allocations. Instead, I want to just have `.getAsList()` which caches the as-a-list parsing using `string_view` where possible. However, there is the rule that `foo\;bar;baz` is two elements: `foo;bar` and `baz`. I cannot store a `string_view` for the first item because it has different contents. Removing this optimization for this case is…unfortunate (though rare, it's something to consider). So for me, the lifetime is obvious, but it may indeed be the case that it is hard to determine…just like C++ makes it anyways today.

Analogously, I'd like to have `.getAsPathComponents()`, `getAsBool()`, etc. pre-parsings cached where possible.

The underlying std::string is freed?

Posted Jul 9, 2024 17:52 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

If I were doing that, I would just use strings for the individual components, cache/intern them aggressively with e.g. an std::unordered_map or the like, and then have functions for looking these up which hand out string_views everywhere. Then you're using no more than twice the file size of the input CMake file, which is probably on the order of kilobytes (I don't use CMake, but surely its files are not huge?). In fact, probably much less than that because I would tend to assume the average CMake file is not 100% made up of stringly-typed lists (but again, I don't use CMake).

The main problem with this approach is that cache invalidation is hard. But I'm not sure how many CMake files you're going to parse in one run of your program, so I don't know if that's actually a problem or not. Probably you can have a per-file cache if needed.

The underlying std::string is freed?

Posted Jul 12, 2024 14:53 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

The thing is that "lists" in CMake are just *interpretations* of the actual values. There's no such actual thing. Some APIs just interpret the values as `;`-separated lists. So interning values over something like:

```
foreach (item IN LISTS some_glob_result)
list(APPEND absolute_sources "${CMAKE_CURRENT_SOURCE_DIR}/${item}")
endforeach ()
```

would end up interning O(N²) string data to store the "real" value of `absolute_sources` across the loop.

CMake's inspirations came from Tcl (which is why it is stringly-typed) and the backwards compatibility guarantees make it very hard to actually break away from that.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds