User: Password:
|
|
Subscribe / Log in / New account

TALPA strides forward

TALPA strides forward

Posted Aug 28, 2008 9:00 UTC (Thu) by evgeny (guest, #774)
In reply to: TALPA strides forward by dlang
Parent article: TALPA strides forward

> compilers are running into problems

I believe it's `make' that runs into the problem.


(Log in to post comments)

TALPA strides forward

Posted Aug 28, 2008 9:16 UTC (Thu) by liljencrantz (guest, #28458) [Link]

Sure, many other build systems, like scons, allow you to use file checksums instead of mtime, for determining if a file has been modified. But once your project gets big enough, that is slow. The largest project I've used scons contained a few megabytes of source code, and scons would take a noticable amount of time checking the dependencies. So it's not only make.

TALPA strides forward

Posted Aug 28, 2008 10:30 UTC (Thu) by rvfh (subscriber, #31018) [Link]

Looking at mtime should be enough, provided that we look for any change in it and not just for it to be greater that than of another file.

Let me explain: the time maybe wrong a one machine causing the mtime go backwards, like when editing a file that's on a build server share, but it is very unlikely that the mtime will be exactly the same as it was before edition.

It is quicker to check mtime for a change than checksuming the whole file.

TALPA strides forward

Posted Sep 4, 2008 20:27 UTC (Thu) by renox (subscriber, #23785) [Link]

Mmm, in this case a version number attribute associated with each file would be better (if only because SW developers would be less likely to compare mtime of different files), though it may be a bit costly to maintain especially on CPU with don't have an atomic increase instruction..

TALPA strides forward

Posted Aug 28, 2008 10:57 UTC (Thu) by nix (subscriber, #2304) [Link]

And the right solution to this is finer-grained timestamps.

TALPA strides forward

Posted Aug 28, 2008 11:05 UTC (Thu) by evgeny (guest, #774) [Link]

I'm not sure. Consider distributed compilation farms (here "distributed" may refer to either filesystem and/or compiler; or just an NFS-mounted volume in the simplest case). Then maintaining nanosecond-accuracy time sync between several computers is needed, which is not trivial.

TALPA strides forward

Posted Aug 28, 2008 11:58 UTC (Thu) by nix (subscriber, #2304) [Link]

NTP can already report jitter and offset values. Maybe what we need is a
way to have those values *reduce* the precision of the kernel-provided
nsec timestamps, so that you get timestamps as accurate as possible for
your timebase, but no more accurate? (Of course, if the jitter changes a
lot, interesting things may happen, but that's quite rare.)

TALPA strides forward

Posted Aug 28, 2008 16:01 UTC (Thu) by bfields (subscriber, #19510) [Link]

Note that no linux filesystem has time resolution better than a jiffy. (The on-disc format may use nanoseconds, but the mtime/ctime/atime aren't updated using a nanosecond-precision time source.)

TALPA strides forward

Posted Aug 28, 2008 18:43 UTC (Thu) by SEJeff (subscriber, #51588) [Link]

"""Then maintaining nanosecond-accuracy time sync between several computers is needed, which is not trivial."""

It is actually really easy if you are not using Cisco switches. The latency of the switches makes a big difference.

Use ptpd from the linux hosts:
http://ptpd.sourceforge.net/

It will allow you to keep nanosecond time sync of all machines in a lan using multicast.

TALPA strides forward

Posted Aug 28, 2008 23:41 UTC (Thu) by njs (guest, #40338) [Link]

No, you just need a cleverer algorithm -- like someone mentioned above, you should look for changed timestamps rather than simply "future" timestamps (because clocks get set back all the time, but it's extraordinarily unlikely that a second edit will come along at exactly the moment when the old timestamp is repeated). Then to fix the quickly-repeated-edits problem, if the timestamp is within 2*resolution of the current time (for some conservative definition of resolution), don't write that timestamp down in your cache. Easy and safe, and causes hardly any speed degradation.

(High-quality VCS's already do this; I first learned the trick from bzr, dunno if any other popular ones have picked it up.)

TALPA strides forward

Posted Aug 29, 2008 0:08 UTC (Fri) by dlang (subscriber, #313) [Link]

remember that the notification goes out while the file is still open.

so a program writes to a file, the scanner gets notified, scans the file, notes the mtime, the program writes to the file again.

on a fast machine it's very possible that this can all take place in a short enough time that the mtime does not change

TALPA strides forward

Posted Aug 29, 2008 0:35 UTC (Fri) by njs (guest, #40338) [Link]

>so a program writes to a file, the scanner gets notified, scans the file, notes the mtime, the program writes to the file again.

and the scanner gets notified again, and scans the file again, yes.

All the things you say are true, but I'm afraid I don't understand why you are saying them here (i.e., I'm missing your point somewhere)?

TALPA strides forward

Posted Aug 29, 2008 0:47 UTC (Fri) by dlang (subscriber, #313) [Link]

if the scanner is only notified when mtime changes, then if the mtime doesn't change no notification will be sent out.

I posted a proposal for a slightly different approach where instead of using mtime and a single 'clean' bit I suggested stealing a chunk of xattr namespace and have the kernel clear this namespace when the file was dirtied.

this would let a scanner set a placeholder in the namespace to indicate that it was looking at the file, then when it was done it could check to see if the placeholder was still there, if so the file didn't change while it was being scanned and it's safe to mark it as scanned, if the placeholder is not there then you know the file changed and the scan you just did is worthless.

by using a chunk of namespace you can also support multiple scanners (without them needing to know anything about each other)

TALPA strides forward

Posted Aug 29, 2008 7:46 UTC (Fri) by njs (guest, #40338) [Link]

Oh, I see. Sure. I was reading quickly and just assumed that anyone talking about "notify when the mtime changes" actually meant, "hook into the kernel's poke-that-file's-mtime routine so it sends a notification", whether the resulting mtime was modified or not.

(In practice I'm pretty sure that the mtime *would* always be updated, though, because in linux, in-memory inodes always get nanosecond-accurate timestamps. The extra resolution gets stripped away by the filesystem driver when the metadata gets pushed out to disk, but the actual data structures used in the core kernel don't care about that.)

TALPA strides forward

Posted Aug 29, 2008 16:45 UTC (Fri) by bfields (subscriber, #19510) [Link]

In practice I'm pretty sure that the mtime *would* always be updated, though, because in linux, in-memory inodes always get nanosecond-accurate timestamps.

That's not true. On a recent kernel try running a simple test program, that does e.g., write, stat, usleep(x), write, stat. You'll see that on ext2/ext3 "x" has to be at least a million (a second) before you see a difference in the two stats, and that on something like xfs, it has to be at least a thousand to ten thousand (a few milliseconds--the time resolution used is actually jiffies).

(On older kernels I think the ext2/3 behavior might look like xfs's; that was fixed because of problems with unexpected changes in timestamps (due to lost nanoseconds field) when an inode got flushed out of cache and then read back.)

TALPA strides forward

Posted Aug 30, 2008 1:47 UTC (Sat) by njs (guest, #40338) [Link]

I was aware of the issues with confusing timestamp changes, but didn't realize it had been changed. Thanks.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds