What goes into default Debian?
What goes into default Debian?
Posted Feb 21, 2021 10:11 UTC (Sun) by zblaxell (subscriber, #26385)In reply to: What goes into default Debian? by Sesse
Parent article: What goes into default Debian?
updatedb doesn't use whitelists--it indexes everything not on a blacklist. Network filesystems are filtered with a blacklist that had to be updated when nfs4 came along, then again when smbfs^H^H^H^H^Hcifs came along, then again and again and again when a thousand fuse filesystems came along (the horror...semi-infinite generated file namespaces with expensive iterators popping up at random, vs. updatedb). If I start using a new filesystem tomorrow, I can have an updatedb-related disaster the following day.
updatedb would also index any directory under / that wasn't explicitly excluded by a blacklist. The whole point of using novel directories under / that no existing software knows about is that existing software will stay the hell out of them until explicitly directed there. Not updatedb! That thing goes looking for trouble, and as long as trouble isn't in a default list of a half-dozen excluded directories, it'll find it!
Most of the problems would have been trivially solved if updatedb used a whitelist of FHS directories (/etc, /bin, /lib, /lib64, /sbin, /srv, /usr, /var, /opt, /home if local, all with -xdev in find) and searched only those until told to do otherwise. A "normal user" will not store anything outside of $HOME anyway. Users who attach huge file stores to their machines can add the mount points to updatedb.conf, or use a standard (for Debian) path like /srv for the big file store. Users who use 'locate' every day can edit /etc/updatedb.conf.
My host configuration logs indicate locking didn't start happening in Debian until 2015 (give or take a year) with the introduction of updatedb.mlocate. Even then, the locking had an obvious bug until 2017. This was long after I had stopped caring what the locate package maintainer did any more--by 2002 I was already using /etc/apt/preferences to ensure the locate package could not be installed on production machines, and a few years later I stopped testing new versions.
I don't know what systemd or plocate does, but if it doesn't start with bind-mounting whitelisted filesystems in a private namespace and running updatedb chroot in that namespace, I don't need to see the rest of it. I've seen updatedb's blacklisting accidentally defeated by users and junior sysadmins and upstream software updates, and I've seen nothing to prevent this from happening again.
When we index files, we define a service profile, purchase or assign hardware from inventory, task that hardware with running the service, i.e. storing hundreds of millions of files, and providing an indexing service for them. We assign staff and robots to operate and monitor the hardware, periodically check that the hardware is healthy and services in the profile are all running correctly, check that the indexes are correct and up to date and indexing files in scope and not indexing files not in scope, and ensure there is enough storage for the index and enough free iops to update it with whatever frequency the service profile says we need. In other words, these indexers are _supervised_.
None of this happens when updatedb is installed by default. It's a production risk and wasted cost until you turn it off or take control of it. If you turn your back on it, it inevitably fills up /var without warning, and burns power and media lifetime even when it's working normally--and when it's not, it can take big servers all the way down.
As far as I can tell, among the standard Unix packages, this is a unique property of *locate? Off the top of my head I can't think of any other past or present default-installed service that potentially or actually does scheduled work proportional to the number of files your host can access.
Posted Mar 13, 2021 18:18 UTC (Sat)
by nix (subscriber, #2304)
[Link] (5 responses)
Posted Mar 14, 2021 2:50 UTC (Sun)
by zblaxell (subscriber, #26385)
[Link] (4 responses)
That means sandboxing it to prevent accidents, and determining what it can access by whitelist to minimize surprises. Possibly also disabling it by default, but that might not be necessary if the default sandbox is sufficient.
systemd can set up a chroot filesystem namespace, or a few lines of shell script can set up bind mounts to map whitelisted filesystems into a chroot tree, then run some more robust version of 'chroot $sandbox_path updatedb'. This isn't rocket science--it's how modern system services should be designed.
Posted Mar 16, 2021 19:00 UTC (Tue)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Mar 19, 2021 7:17 UTC (Fri)
by zblaxell (subscriber, #26385)
[Link] (2 responses)
Giant filesystems are run by professionals who know what updated is and have either "purge updatedb" or "have a plan to manage updatedb" on their deployment checklist. We don't have to worry about them. I mean, they'll obviously be annoyed by having to add a new package name to their blacklists every few years, but they've lived with this for a quarter century already. They're fine.
Cloud nodes and IoT devices are built by professionals who know that updatedb strictly wastes energy and shortens device lifespan with zero upside. Nobody ever logs into one of these hosts, so nobody benefits from locate--they do everything by orchestration, or by dropping a root filesystem image built somewhere else onto the device. If they use locate at all, they use it on their development system with the prototype filesystem tree--every node has an identical copy of that. "Don't install updatedb" is burned into their toolchain. We don't need to worry about them either.
Managed clients are somewhere between totally ad-hoc and an IoT device, depending on what the manager permits and how strictly permissions are enforced. Here we have to rely on the manager to make a good decision: either not install updatedb, or permit only whitelisted filesystems and mount points on the host that updatedb can safely handle. This is a reasonable expectation of a client manager, and often required for other reasons like security. If the manager doesn't manage well, then these hosts fall into the next category.
The problem happens when everyone else runs updatedb: ad-hoc servers and desktops run by people who are not aware updatedb exists, or how it will interact with whatever random third-party thing they've bought and installed, and who also buy and install random third-party things. These are the ones that pick all-default options during install. These are also the ones that are most likely to combine something new (recall that lots of things are new to a stable Debian system) with updatedb's default configuration, and trigger a bespoke flavor of disaster. This is the most common kind of person to have an updatedb failure case in my experience.
Every failure is slightly different, and easily fixed by extending the blacklists in updatedb.conf, but none of the individual fixes help any other user (they are always something like "exclude /DavidsBigAndUnreliableUSBFilePile", which doesn't work for users named Peter or Alice). There's no possible patch to send upstream to prevent the problem from happening to anyone else, other than "throw out updatedb.conf and start over." None of the failures could have happened in a properly configured sandbox with a whitelist, but Debian's updatedb.conf syntax provides no way to whitelist anything. Predictable and avoidable failures just keep happening, year after year. Those who are most affected are least equipped to deal with them.
Fix the design so updatedb is properly whitelisted and sandboxed, or don't install it at all. Those are the two safe options for most users. Based on how it's going so far, I think I'm just going to have to keep repeating that every 5 to 10 years until the updatedb maintainers finally get it.
Posted Mar 20, 2021 1:49 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
I do think the default updatedb and locate configurations could do with updating to make lightly-distributed setups employing NFS work better (ensuring no traversals of remotely-mounted filesystems by updatedb, while still making them searchable by locate. I have patches to do that: I should submit them... alas they require multiple databases, so no plocate support.
Posted Mar 26, 2021 4:04 UTC (Fri)
by zblaxell (subscriber, #26385)
[Link]
bind mounts, private mount namespace, and chroot? If those aren't for sandboxing dodgy software (or protecting critical software from dodgy users), then I've been using them wrong for years. The chroot gets a curated (whitelisted) list of filesystems and mount points imported into it, from either a preconfigured list or /etc/fstab. Anything else--user mounts, removable media, new filesystems, giant external file stores--is not merely ignored, but not even accidentally accessible inside the sandox. Can't accidentally wipe out a big tree of files from the locate db by mounting something in the middle of /home, since a mount like that is not propagated into the sandbox namespace (this is more of a problem for backups than locate, but we do run backups sandboxed this way to avoid that problem).
If your question is "how to implement it in a way that is backward compatible with updatedb.conf", I don't have an answer. plocate is an amazing technical leap forward from traditional updatedb, and yet the 1% of updatedb that wasn't rewritten from scratch for plocate is the 1% of updatedb that causes all the problems in practice. There is a robust supply of competing file indexers out there, and I've always just used one that doesn't reimplement the worst 1% of updatedb.
I thought I had seen all the ways updatedb can fail, but I ran plocate on a test VM for a while, and discovered a new (to me) one: it spends most of its time indexing trees that will not exist at locate time. updatedb.plocate traverses the filesystem with the openat() family of functions, which means it will block umounts and snapshot deletion until it's finished indexing the entire tree--then it will close its FD, and the tree will cease to exist. Unlike traditional updatedb, updatedb.plocate still has access to the umounted or deleted tree through its open directory FDs, and I didn't see any sign of updatedb.plocate periodically checking to see if the tree it is indexing is still reachable from /. That can multiply indexing time for snapshots (especially if you are using one of snapper's default configs which creates new 24 snapshots every day), and interfere with umounts if the user were trying to disconnect or remount that filesystem.
What goes into default Debian?
When we index files, we define a service profile, purchase or assign hardware from inventory, task that hardware with running the service, i.e. storing hundreds of millions of files, and providing an indexing service for them. We assign staff and robots to operate and monitor the hardware, periodically check that the hardware is healthy and services in the profile are all running correctly, check that the indexes are correct and up to date and indexing files in scope and not indexing files not in scope, and ensure there is enough storage for the index and enough free iops to update it with whatever frequency the service profile says we need. In other words, these indexers are _supervised_.
It mystifies me that anyone running at this sort of scale would expect findutils' locate, or any comparable tool, to do a decent job. This is just way outside its design parameters, and putting it inside its design parameters would likely make it so complex that it would be unusable for its intended purpose.
You are building your argument from the opposite direction, but we agree at the conclusion: updatedb should not be actively wandering around unsupervised across every available path reachable from /, looking for things that it wasn't designed to handle, especially if there is any possibility it will be installed and enabled by default.
What goes into default Debian?
What goes into default Debian?
What goes into default Debian?
What goes into default Debian?
> I don't see how your properly configured sandbox is even *implementable*,
What goes into default Debian?