|
|
Subscribe / Log in / New account

What goes into default Debian?

What goes into default Debian?

Posted Feb 17, 2021 20:43 UTC (Wed) by mathstuf (subscriber, #69389)
Parent article: What goes into default Debian?

I've forgotten about locate myself. I remember fighting with `updatedb` for disk access at times too (no idea which implementation it was either). Personally, I've found `fzf`[1] to be of far more use since it uses thread pools and has interactive filtering. But, I also tend to know the ballpark of where the files I'm interested in too, so maybe that's another axis.

Also, using mtimes seems weird for system packages. If I downgrade a package or install a newer version that happened to be built before the newer one (copr repositories on Fedora, AUR on Arch, or whatever Ubuntu is providing…can't remember the name), does `updatedb` get confused (AFAIU, timestamps tend to come from the package, not install time)?

[1] https://github.com/junegunn/fzf/


to post comments

What goes into default Debian?

Posted Feb 17, 2021 21:14 UTC (Wed) by juliank (guest, #45896) [Link]

I'm not sure if mtime works like that for directories. But also you can just do != instead of > to see if you need to rescan.

What goes into default Debian?

Posted Feb 17, 2021 22:08 UTC (Wed) by warrax (subscriber, #103205) [Link] (24 responses)

It's a relic from a time long gone. Even spinning metal and most network file systems are plenty fast enough that just doing a find / ... is usually fine. (IIRC the standard updatedb even goes so far as to exclude networked file systems, so that's moot, I guess.)

I mean, if your busy loop is 'find a file' then maybe something like it makes sense, but if you need to do that then you should find a better way to do it than calling locate.

What goes into default Debian?

Posted Feb 17, 2021 22:25 UTC (Wed) by Sesse (subscriber, #53779) [Link] (23 responses)

“find / -name \*LWN\*” on my main machine takes 9 minutes 11 seconds.

“plocate LWN” takes 8 milliseconds.

I wrote plocate because mlocate's slowness was a real impediment to my (volunteer) sysadmin tasks. find is fine if you have a tiny system or a narrow search scope, but even on my laptop's SSD with a pretty small installation, it takes 2–3 seconds to run.

What goes into default Debian?

Posted Feb 17, 2021 23:28 UTC (Wed) by clump (subscriber, #27801) [Link] (5 responses)

The -iname flag for GNU find will do case-insensitive searches. Just in case you didn't know about it.

I like the ease of locate but switched to find years ago because find is always up to date.

What goes into default Debian?

Posted Feb 18, 2021 6:12 UTC (Thu) by dowdle (subscriber, #659) [Link]

locate shouldn't be more than 24 hours out of date... and if you are looking for something more recent, you most likely created it and it's in your home directory... and you can find it quickly enough. But for the vast majority of the filesystem outside of your home directory, locate smokes find.

If I've done a lot of package installs or updates... I'll often run updatedb before using locate. The updatedb action usually only takes a second or two. So making locate as up-to-date as find, is still way, way faster than find.

Yes, there are feature differences because locate only matches file/dir names whereas find has a whole slew of properties you can search for. It does NOT need to be an either or... or one is better than the other. Use both, they are both great.

What goes into default Debian?

Posted Feb 18, 2021 7:29 UTC (Thu) by anton (subscriber, #25547) [Link] (3 responses)

"rlocate is an implementation of the ``locate'' command that is always up-to-date." Except that rlocate itself is not up-to-date; it was written before inotify/fanotify, so it uses its own kernel module instead. But maybe one of the current locate implementors can add an always-up-to-date feature based on fanotify.

It's funny that some people argue that updatedb is too costly while others argue that "find /" (which costs hardly less) is fast enough.

What goes into default Debian?

Posted Feb 18, 2021 9:20 UTC (Thu) by smcv (subscriber, #53363) [Link] (2 responses)

It depends on how much you use it. Assume updatedb runs once a day, for example. If you run locate multiple times a day, then the daily updatedb is definitely "cheaper" than using find every time; but if you only run locate once a year, then you're reading the whole filesystem hierarchy 365 times as often as you need to.

What goes into default Debian?

Posted Feb 18, 2021 9:39 UTC (Thu) by anton (subscriber, #25547) [Link] (1 responses)

It also depends on how you value the user's time vs. the computer's time. However, on my personal system I indeed do not run updatedb automatically, because last time I did (long ago) it would run right on system startup (i.e., every morning) and make the system sluggish.

What goes into default Debian?

Posted Feb 18, 2021 9:44 UTC (Thu) by Sesse (subscriber, #53779) [Link]

While this is a valid concern, do note that updatedb in plocate is run with the “idle” I/O class, so it should get lower priority than your interactive use. (At least that's what the kernel claims!)

What goes into default Debian?

Posted Feb 18, 2021 6:02 UTC (Thu) by atai (subscriber, #10977) [Link] (3 responses)

Funny how some people reacted to daemons of locate-like indexers:

how to disable GNOME Tracker: https://www.linuxquestions.org/questions/ubuntu-63/how-to...

how to disable KDE Baloo: https://askubuntu.com/questions/1214572/how-do-i-stop-and...

What goes into default Debian?

Posted Feb 18, 2021 8:45 UTC (Thu) by Wol (subscriber, #4433) [Link]

Well, when the default install basically screws your system ...

I'm a bit like that - my grief was with Akonadi, I think the longest I waited to log in was about 36 hours, I ended up installing xfce in order to get a usable system.

(That 36 hours - that wasn't "login until usable desktop", it was "login until I killed the system in frustration". For someone who uses their PC as a desktop, ie switch it off every night, login times like that just aren't acceptable. Well, they're not acceptable full stop, but ...)

Cheers,
Wol

What goes into default Debian?

Posted Feb 19, 2021 20:18 UTC (Fri) by clump (subscriber, #27801) [Link]

Tracker is obnoxious. People have been asking for a simple way to disable it for years.

What goes into default Debian?

Posted Feb 25, 2021 10:56 UTC (Thu) by oak (guest, #2786) [Link]

Trackers index file *contents*, not just the file names. That’s at least magnitude heavier than what locate does.

What goes into default Debian?

Posted Feb 18, 2021 8:52 UTC (Thu) by josh (subscriber, #17465) [Link]

Personally, I would love to have an optional mechanism for "find" to cache some information *when run*, so that it can reduce the amount of work it needs to do while still giving up-to-date results.

What goes into default Debian?

Posted Feb 19, 2021 18:48 UTC (Fri) by jond (subscriber, #37669) [Link]

I saw your blog post and it was the first I’d heard of plocate, but I am interested. I worked on the mlocate package this cycle because I wanted to fix some issues I had with it, but it hadn’t occurred to me to explore alternatives at the time. We should definitely evaluate the situation for bullseye+1.

What goes into default Debian?

Posted Feb 25, 2021 13:02 UTC (Thu) by Hello71 (subscriber, #103412) [Link] (3 responses)

this seems like a bogus comparison if you don't specify -xdev to skip searching devtmpfs, procfs, and sysfs, the latter of which contains deeply nested trees. unless plocate also searches /sys?

What goes into default Debian?

Posted Feb 25, 2021 13:52 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (2 responses)

Those filesystem are virtual, "find" goes through them in single-digit milliseconds.

What goes into default Debian?

Posted Feb 25, 2021 14:03 UTC (Thu) by pizza (subscriber, #46) [Link] (1 responses)

On the system I'm writing this from, 'time find /sys > /dev/null" claims 81ms (with 55042 entries)

(So it's double-digit-ms speeds..)

What goes into default Debian?

Posted Feb 25, 2021 15:33 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

Before posting I've checked on my work environment (Fedora in HyperV on Windows 10, on ~5 years old Thinpad T560). I was getting 5-8 ms for /dev, /proc and /sys.
It completely doesn't matter while we are talking about "find / …" taking over 9 minutes, so let's end this subthread here.

What goes into default Debian?

Posted Mar 13, 2021 18:10 UTC (Sat) by nix (subscriber, #2304) [Link] (6 responses)

I was really happy when I saw plocate existed... until I noticed it didn't support multiple databases. This makes it a complete non-starter for larger installations exporting large numbers of files over NFS, the very sorts of installations in which a fast locate is most necessary: with locate, slocate and mlocate this is easily supportable by putting a locatedb covering each filesystem at the root of each exported filesystem and pointing the LOCATEPATH through all of them, but with plocate there is no replacement except for skipping all the remote filesystems entirely (my $HOME is on one of them: that's out) or traversing them from the client (they're huge: no). Even though plocate has a two-phase db generation process, you can't even pass multiple dbs in to the second phase to emit a single plocate db that is the union of all of them.

I know nobody cares about people with networked filesystems any more, but this made me sad :(

What goes into default Debian?

Posted Mar 20, 2021 8:45 UTC (Sat) by Sesse (subscriber, #53779) [Link] (5 responses)

There's nothing preventing plocate from searching multiple databases, especially if you just want them to be searched serially. It's literally a for loop in the client—you probably want some sort of path rewriting in updatedb, but that could be done, too. (Then there's the question on whether you want to try to be maximally clever by pruning away the prefix or not. YMMV.) Due to io_uring, it should be reasonably performant to search databases on NFS, although I haven't tried.

You could probably even just make a shell script that calls plocate multiple times. The main reason I've never done it is that it's such a niche case nobody's ever asked for it—it requires a lot of admin intervention.

What goes into default Debian?

Posted Mar 23, 2021 20:12 UTC (Tue) by nix (subscriber, #2304) [Link] (4 responses)

Aha! If it's that easy, I might work on some patches. I do like the idea of plocate, but unless it works with a setup that allows N databases, one per NFS export point, it's kinda hard to make it do anything useful on a system where most filesystems you would like to run locate over are on NFS. (In my case, I usually run locate on the desktop and almost all the things I ever want it to find are on the server. Obviously the server's locate databases have to be built *on* the server: even with 10GbE I don't want updatedb throwing the stat data for thirty million files over the network every night :) and that means multiple databases if I want locate on the client to scan everything visible from the client, whether it's server-side or not).

What goes into default Debian?

Posted Mar 28, 2021 10:39 UTC (Sun) by Sesse (subscriber, #53779) [Link] (3 responses)

I pushed code for searching multiple databases to the git repository. Please give it a go.

What goes into default Debian?

Posted Mar 28, 2021 10:52 UTC (Sun) by zdzichu (subscriber, #17118) [Link]

That's why I love open source community!

What goes into default Debian?

Posted Apr 27, 2021 12:54 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

Ooh, excellent! I'll give this a try this weekend :) (yes, sometimes I do go months between catching up with LWN. Mea culpa etc.)

What goes into default Debian?

Posted Mar 27, 2022 15:50 UTC (Sun) by nix (subscriber, #2304) [Link]

Aaaand months and months after I said I'd try it, I finally did. This thing is fantastic :)

Before, with GNU findutils: 40 mins to build the locatedb, 10 mins if everything was in cache. Afterwards (hot cache figures only): 56 seconds. DBs about five times smaller. As for times:

% /usr/bin/time locate wombat
[...]
8.09user 0.08system 0:08.25elapsed 99%CPU (0avgtext+0avgdata 2084maxresident)k

% /usr/bin/time locate -r wombat
[...]
19.75user 0.40system 0:20.38elapsed 98%CPU (0avgtext+0avgdata 2196maxresident)k

% /usr/bin/time locate -r womb.t
[...]
24.98user 0.03system 0:25.08elapsed 99%CPU (0avgtext+0avgdata 2184maxresident)k

Afterwards:
% /usr/bin/time locate wombat
[...]
0.00user 0.00system 0:00.02elapsed 71%CPU (0avgtext+0avgdata 4140maxresident)k

% /usr/bin/time locate -r wombat
[...]
4.95user 0.10system 0:01.68elapsed 299%CPU (0avgtext+0avgdata 10952maxresident)k

% /usr/bin/time locate -r womb.t
[...]
5.15user 0.06system 0:01.72elapsed 302%CPU (0avgtext+0avgdata 11012maxresident)k

This is with a LOCATE_PATH with 19 databases in it, so I think we can safely say that the 20-fold increases in plocate time implied by this are... well... still pretty insignificant :)

What goes into default Debian?

Posted Feb 22, 2021 9:29 UTC (Mon) by amarao (guest, #87073) [Link]

As cloud operator I always hated all locate-like software. It causes a lot of cold reads from many unused, and otherwise idle machines. On a shared storage it causes a spike in cold io, which causes a spike in latency.

Before crontab randomization was introduced, it was even noticed by electrical operator of data center. He asked 'what is happening at 4:00' every night?'. It was crontab, doing cron.daily on all machines. I suspect, locate update was the part of that electricity spike too.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds