User: Password:
|
|
Subscribe / Log in / New account

Soft updates, hard problems

Soft updates, hard problems

Posted Jul 2, 2009 7:12 UTC (Thu) by jzbiciak (subscriber, #5246)
Parent article: Soft updates, hard problems

Wow... Humans computed those dependence graphs? Yipe! This feels a little like the "sufficient thrust" principle was applied: With sufficient thrust, any pig will fly.

I've programmed in assembly language on VLIW computers (and still do from time to time). These computers allow you to schedule many instructions in parallel, but you have to take into account their resource usage, their latencies, and most importantly the dependences[*] between instructions. For short programs (10s of instructions), this isn't too difficult to do. For anything non-trivial, it very, very quickly gets unmanageable for a human.

That's why we use code generation tools that know about the architecture and can compute and track the dependence chains for the programmer. The programmer may still be picking the individual instructions, but the tool computes the actual dependence edges.

And, when it comes time to enhance the architecture, the tool itself needs to be reconfigured. Rather than manually write all the new rules, it makes more sense to provide higher level descriptions of how the features all relate, and let the tool figure out how to apply that to new code. (This can be invaluable when exploring different potential tweaks to a CPU architecture.)

I see this as a direct analog to the dependence discovery one needs to do on a filesystem to implement soft-updates, at least based on Valerie's description above.

It seems like it ought to be possible to describe the on-disk format in some higher level form that allows a tool to work out the dependences between various updates automatically, dramatically improving the maintainability of the filesystem. If nothing else, it seems like it ought to reduce the chance of error.

Is the argument that the filesystem is fairly static, and therefore it's not worth the cost of building the tool the first time? That could be short sighted. For one, such a tool may open the way for another level filesystem optimization that focuses on reorganizing things to achieve different objectives: reduce the depth of dependence chains, or minimizing the number of rewrites to sectors, or what-have-you. Depending on the type of media you're writing to, different objectives become more or less important. Rewrites are a bigger issue on SSDs due to wear leveling concerns, but seeking for reads is "free." etc. etc. etc.

[*]Yes, "dependences", not "dependencies." Flip open a copy of Hennessy and Patterson if you disagree... Call me a curmudgeon. ;-)


(Log in to post comments)

Soft updates, hard problems

Posted Jul 2, 2009 7:16 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

PS. The dependencies vs. dependences thing isn't meant to be a dig at Valerie, since actually she's in very good company with folks that prefer proper English to the bastardized version that I learned when learning computer architecture. :-) My footnote was meant more as a "don't make fun of me because I talk funny and prefer it that way."

Thank you for the well written article.

Soft updates, hard problems

Posted Jul 2, 2009 12:05 UTC (Thu) by rsidd (subscriber, #2582) [Link]

Not just "folks that prefer proper English": I'm pretty sure the more common term in mathematics, too, is "dependency" (as in "dependency graph") and not "dependence". Cf. any number of well-known textbooks.

dependencies

Posted Jul 3, 2009 20:19 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

(as in "dependency graph")

Well, it's logical in that sense. The problem would be saying that the graph shows dependencies.

I think of dependency like connectivity and functionality, two other words computer people widely misuse because they find longer words sound smarter. Connectivity is the degree or quality of being connected, so that a mesh network has more connectivity than a star. But "I have connectivity to Chicago" is nonsense. It's "I have a connection to Chicago." Similarly, functionality is the degree or quality of being functional, so "I'm improving the functionality of the word processor" sensible, but "I'm adding undelete functionality" is not. (It's "I'm adding undelete function").

So I think dependency is the degree or quality of being dependent. A graph that shows how things depend on other things shows dependency. But an instance of A depending upon B is dependence.

That's apparently not a traditional use of "dependency" -- traditionally, I think it just doesn't exist. I just find it a logical construction of the word.

dependencies

Posted Jul 3, 2009 20:31 UTC (Fri) by nix (subscriber, #2304) [Link]

"I'm adding undelete function" sounds desperately ungrammatical to me: at
the very least it's missing an article.

"I'm adding an undelete function" is grammatical but does *not* mean the
same thing as 'I'm adding undelete functionality" (one has definite
number, the other does not).

Your statement about 'connectivity' is also nonsense: your
sample "ungrammatical" sentence is perfectly grammatical.

dependencies

Posted Jul 3, 2009 22:00 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

"I'm adding undelete function" sounds desperately ungrammatical to me: at the very least it's missing an article.

You don't recognize "function" as a mass noun? Form follows function? Some function is too costly provide? Which half of the function shall we leave out?

You're right that "I have connectivity to Chicago" is grammatically correct, and it isn't nonsense as I said. It's just a nonsensical phrasing for a statement that I have a connection, when there are plainer ways to phrase it. Like saying that what distinguishes a window from a door is that a window has transparency.

dependencies

Posted Jul 3, 2009 22:37 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

Time flies like an arrow; fruit flies like a banana. -- Groucho Marx

Gotta love the chameleon-like nature of certain words in English. "Function" is such a word. In a computer context, it's use pretty much boils down to one of three of its many potential meanings:

  • A mathematical construct, such as f(x) = x2.
  • A self contained piece of code (in some sense modeled after the mathematical construct).
  • A synonym for the word "role."

In all three cases, the noun is not a mass noun. It makes sense to use an article with "function" in the original example, regardless of which of the two senses ("piece of code" or "role") were intended.

Even in your example, "Form follows function," "function" does not act as a mass noun. In that example, there is an implied "its" in front of both "form" and "function:" "(Its) form follows (its) function." And "its" can be replaced with any possessive: "Fred's form follows Fred's function." "The iPhone's form follows the iPhone's function."

And here we see that the "form follows function" example is bogus. I imagine Forrest Gump would take no issue with "iPhone's form follows iPhone's function," but most people would be more comfortable saying "The iPhone's form follows the iPhone's function."

Here's a better test. When there's "many" of something, we use "many" if it's a counting noun, and "much" if it's a mass noun. ("Many chairs" vs. "much furniture.") So, by means of a concrete example: Would you say complicated, overreaching software has:

  1. Too much function.
  2. Too much functionality.
  3. Too many functions.

I reckon the latter two are accepted more widely than the first.

Remind me what this had to do with filesystems again? I must thoroughly apologize for having highlighted the obscure trivia of "dependences" vs. "dependencies."

dependencies

Posted Jul 4, 2009 2:33 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Hmm. "Too much function" sounds perfectly fine to me, but maybe it's just something I made up. I see the dictionary doesn't mention anything close to that meaning. It also doesn't cover other uses I'm sure I've heard around the engineering world, like "we're testing function this week and performance next week," or "is the patch just for maintainability, or does it affect function?"

And with "form follows function," I don't see how an implied "its" fits in there. I think an architect might well say, "I used to think just about form, but now I spend most of my time worrying about function."

But even if a certain feature of a device can't be considered a "piece of function," I can't see how it could be considered a "piece of functionality" either.

dependencies

Posted Jul 4, 2009 2:48 UTC (Sat) by jzbiciak (subscriber, #5246) [Link]

At the very least, "piece of functionality" has nearly 100k hits on Google, generally referring individual features of a product or device. In contrast, "piece of function" only gets 14,500, and most of those that I looked at are phrases where "function" modifies something else--ie. "piece of function noun".

*shrug*

dependencies

Posted Jul 4, 2009 3:30 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Sure, but popularity is irrelevant to the point I'm making. In fact, I mentioned in my first post that "functionality" is widely used this way - it's why I thought it was worth mentioning. "Dependency" is quite a bit more popular than "dependence" in computer science discussions, but still wrong. Other phrasings that are in majority use but wrong: IDE to mean ATA, DB-9 to mean DE-9, RJ45 to mean the 8 position modular connector we use with Ethernet. "Could care less" to mean couldn't care less, gridlock to mean slow traffic, exponential growth to mean fast growth, steep learning curve to mean shallow learning curve.

dependencies

Posted Jul 4, 2009 9:18 UTC (Sat) by SiB (subscriber, #4048) [Link]

> Sure, but popularity is irrelevant to the point I'm making.

Language is supposed to serve people, not the other way. What people use and understand defines language, not the dictionary. The dictionary is supposed to record how people use language. Language evolves. Dictionaries need to follow that change. Popularity is all that matters. (I'm not a native English speaker, but what I said should apply to all languages in use.)

dependencies

Posted Jul 4, 2009 18:12 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

Sure, but popularity is irrelevant to the point I'm making.
Language is supposed to serve people, not the other way. What people use and understand defines language, not the dictionary.

I agree, but I don't know why you bring it up. While I made the statement above about popularity, I didn't say anything about dictionaries except to say that the dictionary doesn't support my usage of "function."

Popularity is all that matters

Language serves people best by being logical, consistent, precise, and easily expressive. Those are not implied by popularity -- the number of people using a particular phrasing. When one chooses between two phrasings to write, the relative number of times one has heard one or the other should be a fairly minor factor.

dependencies

Posted Jul 4, 2009 12:12 UTC (Sat) by ajf (subscriber, #10844) [Link]

At the very least, "piece of functionality" has nearly 100k hits on Google
... and definately has nearly 14 million.

dependencies

Posted Jul 4, 2009 12:26 UTC (Sat) by jzbiciak (subscriber, #5246) [Link]

...many of which are people complaining that it's to be spelled "definitely," which incidentally gets nearly 10x as many hits (over 128M).

How exactly have you invalidated the notion that the relative hit count between two directly comparable alternatives suggests which one is more likely to be correct? I'd be worried if "definitely" got 100k hits but "definately" got 14M.

dependencies

Posted Jul 4, 2009 16:29 UTC (Sat) by ajf (subscriber, #10844) [Link]

Quite right, I should have compared it to the definantly count and concluded definately was "more likely to be correct" - which an utterly worthless conclusion, because both are still wrong. And that's exactly where your comparison leaves us.

dependencies

Posted Jul 10, 2009 8:55 UTC (Fri) by Lennie (guest, #49641) [Link]

I think the one about connectivity is easily explained.

The provider says: we provide connectivity.

The customer gets it connection and things: I guess now I have connectivity too.

Well, maybe. :-)

PS English is not my first language.

Soft updates, hard problems

Posted Jul 3, 2009 21:11 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

I recall specifically reading somewhere (but I cannot find it), that the person who coined "dependence"/"dependences"/"dependence graph" purposefully avoided the word "dependency" because he felt it had negative connotations.

In any case, a directed graph that illustrates relationships such as "A depends on B" is a graph of dependence relationships. Whether you call it a dependency graph or a dependence graph, and whether you call the edges in said graph dependencies or dependences doesn't bother me. The former is more standard English, I believe, but the latter is fairly common in computer science textbooks. I try to stick to the latter when using the computer science concept, but I admit it sounds funny.

Soft updates, hard problems

Posted Jul 3, 2009 4:03 UTC (Fri) by njs (guest, #40338) [Link]

Yeah, ever since I first heard a real description of soft updates, I've been waiting to see an LWN article about some PhD student showing up on linux-kernel: "I haven't written soft updates code, but I wrote an OCaml program that wrote soft updates code, here is a patch for the filesystem and to add the program to MAINTAINERS..."

Soft updates, hard problems

Posted Jul 3, 2009 16:34 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

It seems like it ought to be possible to describe the on-disk format in some higher level form that allows a tool to work out the dependences between various updates automatically, dramatically improving the maintainability of the filesystem. If nothing else, it seems like it ought to reduce the chance of error.
Hear, hear! An even better example than your VLIW compiler would be the humble parser generator: have you read the incomprehensible muck that vomits forth from bison? However unintelligible, it works every time.

The difference is that for VLIW instruction sets and parsers, respectively, we have a higher-level forms in which to express the problem: high-level languages and BNF grammars, respectively. What would the equivalent representation be for on-disk data structures? Does there exist today a general representation for data dependencies?

Soft updates, hard problems

Posted Jul 3, 2009 21:16 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

An LJ user pointed me to some very interesting work in this area. He wrote the following:

The Featherstitch filesystem uses clever formal systems techniques to derive softupdates ordering requirements and optimise away unnecessary disk operations. They've basically generalized and automated Kirk McKusick's one-off analysis of FFS's ordering requirements. This technology means softupdates is no longer difficult.

http://featherstitch.cs.ucla.edu/

I thought I should share it over here.

Soft updates, hard problems

Posted Jul 3, 2009 21:27 UTC (Fri) by vaurora (guest, #38407) [Link]

Yes, I think Featherstitch is very interesting - I saw the paper when it was first presented and was impressed. Would anyone be interested in reading an LWN article about it?

Soft updates, hard problems

Posted Jul 3, 2009 21:47 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

Put my vote in the "Yes" column. :-)

Soft updates, hard problems

Posted Jul 4, 2009 2:39 UTC (Sat) by dlang (subscriber, #313) [Link]

definantly yes.

it would either doucment a tool that could be used by kernel developers, or give us answers for the (now inevitable) swarm of questions about why don't the kernel devs just use featherstitch and implement soft updates ;-)

is there a filesystem simple enough to show examples of it's analysis? (I'm thinking either ext2 of ext4 without journling as possibilities)

Featherstich

Posted Jul 4, 2009 5:16 UTC (Sat) by ewen (subscriber, #4772) [Link]

Yes, I'd be keen to see an article on Featherstich. The article on soft updates was good to see here too; I'd read the original paper on soft updates and admired the approach taken even though it seemed to have a "Level of Difficulty == Superhuman". But there are many problems in computer science like that (write a modern operating system in assembly language, anyone?), for which the computer science answer is another layer of abstraction (and tools to translate back to the lower levels). It feels like that's what we need here too: a way to express the dependency graph of all the operations such that both the code to do it, and the code to track the order in which things hit disk, is auto-generated.

Ewen

Soft updates, hard problems

Posted Jul 4, 2009 0:20 UTC (Sat) by njs (guest, #40338) [Link]

Huh, and what's *really* cool about it is that they apparently export a userspace API for expressing IO write barriers. So with the featherstitch patch, you really can say things like "write these bytes to that file, and then rename it, but don't let the rename hit the disk until after the write has completed", except with arbitrarily complex dependency structures (!).

I don't really care whether I'm using soft-updates or journalling, just so long as my disk is safe and things are reasonably fast, but real write barriers could totally change the performance profile of higher level apps. (Except for ACID databases, of course; I guess that's why historically no-one has cared about write barriers, because the only important high-level storage service was ACID databases.)

Soft updates, hard problems

Posted Jul 4, 2009 0:41 UTC (Sat) by njs (guest, #40338) [Link]

They got a bunch of the important stuff right, too... the "patchgroup" handles are basically fd's, they correctly prevent circular dependencies, they do the Right Thing for all different modes of fs operation (soft-updates: barriers use soft-update magic, full journaling: barriers optimize out, metadata journaling: barriers magically upgrade only those operations with dependencies to use full journaling), and check out this example from the paper:

> The following command line would ensure that 'in' is not removed until all changes in the preceding sort have committed to disk: $ depend 'sort < in > out' 'rm in'

(Of course, really you want shell support, so you can just say 'sort <in >out &&! rm in' or whatever.)

Aw, man, now I'm all excited about a research project from 2007 that will never go anywhere in real life :-(.

Soft updates, hard problems

Posted Jul 4, 2009 12:06 UTC (Sat) by nix (subscriber, #2304) [Link]

Seconded. This looks really nice...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds