User: Password:
Subscribe / Log in / New account

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 3, 2010 5:11 UTC (Fri) by neilbrown (subscriber, #359)
In reply to: Hardlinks are hardly a source of complexity by ksandstr
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

I think you might be confusing 'complexity' with 'difficulty'. While they often go together, they don't have to.

Digging a big hole is difficult (if, like me, you aren't very fit) but it is hardly complex. Solving sudoku is certainly complex but I do not find it particularly difficult (fun though).

You could think of 'complexity' as meaning 'room for errors to creep in'. It is certainly easy to make mistakes in sudoku. Less so in hole digging.

The complexity is not in the code, but in the need for the code. It means that I cannot simply archive each file in isolation, but need to interpret it in a larger context. It means to extract a file from an archive, I either need 2 passes, or I need to remember where every linked file was and rewind to read it.
It means it is imperative that the filesystem provides a unique inode number for every file, that is not re-used during the time that tar runs. This is not always as easy as it sounds.

Suppose while tar is running it finds a file '/long/path/foo' with a link count of 2. Immediately thereafter I remove both links and create a new file with two links, one at '/other/path/foo' and it happens to get the same inode number. When tar gets to that other foo, what does it do? Is it the other link which happens to have been changed in the mean time - so probably best to record the link and not the file - or is it a brand new file - so best to archive it and forget about finding the second link to the first foo.

Even if you think the answer to the above is obvious, the fact that I had to ask the question is a complexity.

So no: it isn't difficult to fix the glaring obvious issues. But it still adds complexity which we might be better off without.

(Log in to post comments)

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 6, 2010 17:34 UTC (Mon) by stevem (subscriber, #1512) [Link]

While you're right that hard links carry a complexity penalty, I'm not convinced that any alternatives might be any better. As an example, when building Debian CDs we:

* make a hard-link tree of the files that we want to fit within each image
* add some sym-links within that tree for various reasons
* run mkisofs/genisoimage/whatever on the tree to make the output ISO image

As an alternative, we *could* simply copy all the files that we want into the temp trees, but that costs a vast amount more in terms of disk space and time spent.

Or (as we have done in the past) create a tree of symbolic links instead. But then we've got to resolve where those links point when we build the image to know whether they belong inside or outside and hence how we should resolve them - more complexity.

Hard links are *cute* and I like them. :-)

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 10, 2010 4:19 UTC (Fri) by neilbrown (subscriber, #359) [Link]

Hard links certainly are cute. But also painful.

Your comments leads perfectly to exercise 3. You have identified a use-case that isn't really well handled by symlinks. So: what other technologies could serve the purpose -- and would they be better or worse than hard links?

There are at least two that I can think of, but I'd rather not give my answers away - better to encourage others to think - someone will probably have a better idea than mine...

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 10, 2010 4:31 UTC (Fri) by sfeam (subscriber, #2841) [Link]

For that particular use, one might attach an attribute either to the individual files or to their respective directory entries that would control whether or not the file is visible in the current context. The process burning the ISO image would grab all visible files, as it does now, but many other files in the same directories would be effectively invisible to it.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 11, 2010 16:06 UTC (Sat) by MrWim (subscriber, #47432) [Link]

It seems that what is really wanted in this case is a copy-on-write as discussed in LWN articles COW Links (29/3/2004) and The two sides of reflink()

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 14, 2010 19:06 UTC (Tue) by adriboca (guest, #71850) [Link]

I agree with nellbrown that hardlinks are not the right solution for that problem. In fact, the options "-path-list" & "-graft-points" of mkisofs, should allow you to select any files that must be included in the disc image, renaming them as desired. This method, of creating a small text file and passing it to mkisofs is certainly much faster than hard linking all the tree.

If for some reason, those options do not work exactly like you need them, then mkisofs or whatever tool you use must be improved, not the file system. I have a lot of experience but I have never seen any application for which hard links are the best solution, but I have seen a lot of cases when they are an inconvenience.

I must make a correction to the article, the phrase "the idea of "hard links", known simply as links before symbolic links were invented" is not true. The first type of links that were invented were what are called now symbolic links, and they were introduced in the Multics file system.

UNIX made four simplifications of the Multics FS and the last two of them were stupid (i.e. they made negligible economies in time & space, but they created problems that are not solved even today in the successors of UNIX):
1 Short 14-character names instead of long names
2 A single set of file mode bits instead of ACLs
3 Hard links instead of symbolic links
4 Merged write & append rights

Later, BSD did the right thing by reintroducing in their improved file system the long names & the symbolic links, which were copied afterwards by the other UNIX derivatives.

Hardlinks are hardly a source of complexity .. or is that 'difficulty'

Posted Dec 15, 2010 22:25 UTC (Wed) by neilbrown (subscriber, #359) [Link]

Hard links do clearly provide a simple solution for this problem, but as I have hinted, I don't think that value is worth the cost. However I don't really like the approach of depending on cleverness in mkisofs either as it is a solution that would need to be implemented in any tool that has this need.

reflinks (already mentioned) are certainly a possible solution. I'm not entirely sure I'm comfortable with reflinks, though I cannot really explain why, so it might irrational. I would generally prefer any deduplication happened transparently rather than with a new syscall, but whatever...

My faviourite technology for this need is overlayfs (or possibly union mounts, though I like overlayfs more). Clearly it would require non-privileged uses to create mountpoints but I think the pressure is building for that and it is going to become a reality some day soon. Other than that issue, it is a perfect solution!

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds