LWN: Comments on "Ensuring data reaches disk" https://lwn.net/Articles/457667/ This is a special feed containing comments posted to the individual LWN article titled "Ensuring data reaches disk". en-us Fri, 12 Sep 2025 17:47:32 +0000 Fri, 12 Sep 2025 17:47:32 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Ensuring data reaches disk https://lwn.net/Articles/899023/ https://lwn.net/Articles/899023/ b10s <div class="FormattedComment"> Hi, thank you the article!<br> <p> <p> <font class="QuotedText">&gt; Now, since the amount of data transferred is already known, and given the nature of network communications (they can be bursty and/or slow), we&#x27;ve decided to use libc&#x27;s stream functions (fwrite() and fflush(), represented by &quot;Library Buffers&quot; in the figure above) in order to further buffer the data.</font><br> <p> But actually reading is done without libc&#x27;s stream function:<br> <p> 11 ret = read(sockfd, buf, MY_BUF_SIZE);<br> <p> <p> May I ask, why have you mentioned nature of network here?<br> <p> <p> ---<br> <p> What if we do fsync() but will not do fflush() in your code example?<br> The answer most likely - nothing will happen, since fsync flushes OS cache but our data is in libc cache.<br> <p> <p> ---<br> <p> <font class="QuotedText">&gt; Writes using these functions may not result in system calls, meaning that the data still lives in buffers in the application&#x27;s address space after making such a function call</font><br> <p> Except fflush()?<br> <p> &quot;Flushing output on a buffered stream means transmitting all accumulated characters to the file&quot;.<br> <a rel="nofollow" href="https://www.gnu.org/software/libc/manual/html_node/Flushing-Buffers.html">https://www.gnu.org/software/libc/manual/html_node/Flushi...</a><br> <p> It sounds like fflush() ends with some system call, isn&#x27;t it? If so, which? (just write()?)<br> <p> ---<br> <p> <font class="QuotedText">&gt; I/O operations performed against files opened with O_DIRECT bypass the kernel&#x27;s page cache, writing directly to the storage. Recall that the storage may itself store the data in a write-back cache, so fsync() is still required for files opened with O_DIRECT in order to save the data to stable storage.</font><br> <p> Then why O_DIRECT might be needed at all if we still have to call fsync()? What are use cases?<br> <p> ---<br> <p> Do storages hide their cache? May kernel directly write to drive&#x27;s stable storage?<br> I&#x27;ve heard some storages do hide their cache.<br> </div> Sat, 25 Jun 2022 10:48:15 +0000 Ensuring data reaches disk https://lwn.net/Articles/836686/ https://lwn.net/Articles/836686/ farnz <p>To be fair to btrfs, that's it's USP compared to ext4 - when hardware fails, it lets you know that your data has been eaten at the time of the issue, and not months down the line. <p>And knowing consumer hardware, chances are very high that it did commit everything properly, and then had a catastrophic failure when there was a surprise power-down. Unfortunately, unless you have an acceptance lab verifying that kit complies with the intent of the spec, it often complies with the letter of the spec (if you're lucky) and no more :-( Mon, 09 Nov 2020 18:27:56 +0000 Ensuring data reaches disk https://lwn.net/Articles/836679/ https://lwn.net/Articles/836679/ zlynx <div class="FormattedComment"> More fun is consumer grade SSDs that protect their metadata during power-loss but not necessarily the data.<br> <p> I had to rebuild a btrfs volume because my laptop battery ran down in the bag and on reboot the drive contained blocks saying writes had completed, but those data blocks had old data in them. In other words, data that had been committed to physical storage (or that was CLAIMED by the drive) was no longer present after power-loss. It probably had to fsck or equivalent on the Flash FTL and lost some bits. <br> <p> btrfs gets very upset about that.<br> <p> I guess this behavior is still better than some older SSDs which had to be secure-erased and reformatted after losing their entire FTL? I guess.<br> </div> Mon, 09 Nov 2020 17:11:31 +0000 Ensuring data reaches disk https://lwn.net/Articles/836585/ https://lwn.net/Articles/836585/ Wol <div class="FormattedComment"> Ouch. As a database guy I&#x27;m desperate for &quot;queued flush straight barrier&quot;, because if you want data integrity that at least makes reasoning possible - &quot;if the transaction log is incomplete, revert; if the data write is complete, continue; if the log is complete and the data write isn&#x27;t, re-play the log&quot;.<br> <p> If you can&#x27;t be sure what has or hasn&#x27;t hit the disk - the nightmare scenario is &quot;part of the log, and part of the data&quot; - then you get the hoops that I believe SQLite and PostgreSQL go through :-(<br> <p> Cheers,<br> Wol<br> </div> Mon, 09 Nov 2020 10:17:29 +0000 How disappointing https://lwn.net/Articles/836584/ https://lwn.net/Articles/836584/ Wol <div class="FormattedComment"> Except I was comparing Unix against proprietary OS&#x27;s from the likes of DEC, Honeywell, Pr1me, etc.<br> <p> While Unix was eating the mini-computers&#x27; lunch, yes, Windows came along and started eating its lunch ...<br> <p> Cheers,<br> Wol<br> </div> Mon, 09 Nov 2020 10:11:54 +0000 Ensuring data reaches disk https://lwn.net/Articles/836583/ https://lwn.net/Articles/836583/ farnz <p>Yes, such a command is needed, and the various interface specs (ATA, SCSI, NVMe) all have standardised commands for flushing the cache. <p>At a minimum, you get a FLUSH CACHE or SYNCHRONIZE CACHE type command, which is specified as not completing until all data in the cache is in persistent storage; this is enough to implement fsync() behaviour; beyond that, you can also have forced unit access (FUA) commands, which do not complete until the data written is on the persistent media, and even partial flush commands that only affect some sections of the drive. <p>There's an added layer of complexity in that some standards have queued flushes which act as straight barriers (all commands before the flush complete, then the flush happens, then the rest of the queue); others have queued flushes that only affect commands issued before the flush in this queue (and can over-flush by flushing data from later commands in the queue), and yet others only have unqueued flushes which require you to idle the interface, wait for the flush to complete, and then resume issuing commands. Mon, 09 Nov 2020 09:56:43 +0000 How disappointing https://lwn.net/Articles/836579/ https://lwn.net/Articles/836579/ jem <p><q>The trouble with POSIX is it is based on Unix and, sorry guys, Unix is crap as a commercial OS. It won because it was cheap and good enough.</q></p> <p>No, commercial Unix lost to Windows because <i>Windows</i> was cheap and good enough. <a href="https://www.youtube.com/watch?v=sforhbLiwLA">Only 99 dollars!</a>. (This was not a real ad, though.)</p> <p>I also don't buy the argument that Unix cost more to maintain <i>per user</i>. Back then, Unix was a multi-user operating system that was centrally administered. Then came DOS and Windows and every user had their individual problems.</p> Mon, 09 Nov 2020 07:52:10 +0000 avoiding orphan files https://lwn.net/Articles/836574/ https://lwn.net/Articles/836574/ Wol <div class="FormattedComment"> Except it doesn&#x27;t work - see my comment about hard-linked files ...<br> <p> Cheers,<br> Wol<br> </div> Sun, 08 Nov 2020 23:17:30 +0000 Ensuring data reaches disk https://lwn.net/Articles/836573/ https://lwn.net/Articles/836573/ Wol <div class="FormattedComment"> Not that this is necessarily the way it&#x27;s done, but linux does have a disk database. I see that often enough watching the raid stuff. It&#x27;s quite possible (though probably not) that linux looks up the drive characteristics.<br> <p> Cheers,<br> Wol<br> </div> Sun, 08 Nov 2020 23:15:40 +0000 License of example files https://lwn.net/Articles/836571/ https://lwn.net/Articles/836571/ Wol <div class="FormattedComment"> How about &quot;these examples are too simple to be worthy of copyright&quot; ... ?<br> <p> The law itself sets out a vague line, so just declare that this stuff falls the wrong side of the line.<br> <p> Cheers,<br> Wol<br> </div> Sun, 08 Nov 2020 23:09:33 +0000 How disappointing https://lwn.net/Articles/836570/ https://lwn.net/Articles/836570/ Wol <div class="FormattedComment"> And then what happens if the file is hardlinked, and you want to modify the original file, not make a modified copy...<br> <p> The trouble with POSIX is it is based on Unix and, sorry guys, Unix is crap as a commercial OS. It won because it was cheap and good enough.<br> <p> And I curse it regularly because, unlike a lot of people today, I&#x27;ve actually had experience of real commercial OSs. Trouble is, they&#x27;ve died because they cost too much to maintain :-(<br> <p> (Mind you, I&#x27;ve used real commercial OSs that had those flags to do fancy file-system stuff, and when they have bugs they really do have bugs ...)<br> <p> Cheers,<br> Wol<br> </div> Sun, 08 Nov 2020 23:05:06 +0000 Ensuring data reaches disk https://lwn.net/Articles/836567/ https://lwn.net/Articles/836567/ yzou93 <div class="FormattedComment"> An awesome article.<br> My question about fsync() is how the OS could control/know the device-internal caching behavior.<br> When designing a block device hardware, for example if Samsung wants to design a new SSD, is a cache control support for fsync() command issued from OS required?<br> <p> Thank you.<br> </div> Sun, 08 Nov 2020 21:55:01 +0000 Ensuring data reaches disk https://lwn.net/Articles/623830/ https://lwn.net/Articles/623830/ ppai <div class="FormattedComment"> I'm wondering about this case:<br> Directory "a/b/c" already exists.<br> <p> create("/tmp/whatever")<br> write("/tmp/whatever")<br> fsync("/tmp/whatever")<br> os.makedirs("a/b/c/d/e")<br> rename("/tmp/whatever", "a/b/c/d/e/obj.data")<br> fsync("a/b/c/d/e/")<br> <p> Is it really required to fsync dirs all the way from e to a ? Fsync is totally not necessary for "a/b/c" as it already existed. But after doing a makedirs(), there's no way to know which subtree of "a/b/c/d/e" needs fsync().<br> Is it reasonable to fsync only the containing directory and expect the filesystem to take care of the rest (to make sure the entire tree makes it to disk) ?<br> </div> Mon, 01 Dec 2014 13:47:25 +0000 Direct Reads https://lwn.net/Articles/548066/ https://lwn.net/Articles/548066/ etienne <div class="FormattedComment"> <font class="QuotedText">&gt; ... data read from disk and not kernel buffers ...</font><br> <p> Maybe echo 1, 2 or 3 to /proc/sys/vm/drop_caches ?<br> </div> Mon, 22 Apr 2013 11:37:00 +0000 Direct Reads https://lwn.net/Articles/548009/ https://lwn.net/Articles/548009/ nikm <div class="FormattedComment"> I am interesting if there is a way to ensure that when you are reading data, these data is directly from the disk. As I wonder in case of reading even if the DIRECT_IO flag is used the data propably is read from kerner buffer. Is that correct?<br> <p> thanx<br> </div> Sun, 21 Apr 2013 03:16:31 +0000 avoiding orphan files https://lwn.net/Articles/477223/ https://lwn.net/Articles/477223/ droundy <div class="FormattedComment"> <font class="QuotedText">&gt; The write/sync/rename process is hardly an ideal way to implement atomic replacement semantics. There are simply too many potential points of failure.</font><br> <p> True, but it's also the only one we've got, right?<br> </div> Wed, 25 Jan 2012 20:38:48 +0000 code feedback ... https://lwn.net/Articles/477222/ https://lwn.net/Articles/477222/ droundy <div class="FormattedComment"> I prefer to avoid putting buffers on the stack simply to reduce the difficulties associated with buffer overflows. Not for security reasons (most of my programming is scientific), but simply so overwriting the buffer won't trash the stack, making debugging harder. And also to allow valgrind to immediately recognize a buffer overwrite...<br> </div> Wed, 25 Jan 2012 20:34:47 +0000 Ensuring data reaches disk https://lwn.net/Articles/460306/ https://lwn.net/Articles/460306/ oak <div class="FormattedComment"> How much using fdatasync() instead of fsync() would help?<br> <p> </div> Fri, 23 Sep 2011 20:56:34 +0000 How disappointing https://lwn.net/Articles/460114/ https://lwn.net/Articles/460114/ spitzak <div class="FormattedComment"> If it is opened with the atomic-replace semantic, I would just have plain close() do the replacement.<br> <p> There may be a need to somehow "abort" the file so that it is as though you never started writing it. But it may be sufficient to do this if the process owning the fd exits without calling close().<br> <p> I very much disagree with others that say POSIX should be followed. The suggested method of writing a file is what is wanted in probably 95% of the time that files are written. It should be the basic operation, while "dynamic other processes can see the blocks change as I write them" is an extremely rare operation that should be the one requiring complex hacks.<br> <p> </div> Fri, 23 Sep 2011 00:59:49 +0000 avoiding orphan files https://lwn.net/Articles/459407/ https://lwn.net/Articles/459407/ nybble41 <div class="FormattedComment"> <font class="QuotedText">&gt; Also in the write/sync/rename workflow, what happens if the temp file is on a separate filesystem?</font><br> <p> In the write/sync/rename workflow, this is never supposed to occur. The temp file must always be on the same filesystem as the real file for the atomic-rename guarantee to apply.<br> <p> Naturally, this can be extremely difficult to achieve in some cases. The file may be a symlink, which must be fully resolved to a symlink-free path to determine the real filesystem. The file may be the target of a bind mount, in which case I doubt there is any portable way to determine which filesystem it came from. And there there's the possibility that you can write to the file, but not the directory _containing_ the file...<br> <p> The write/sync/rename process is hardly an ideal way to implement atomic replacement semantics. There are simply too many potential points of failure.<br> </div> Mon, 19 Sep 2011 18:17:35 +0000 avoiding orphan files https://lwn.net/Articles/459398/ https://lwn.net/Articles/459398/ mathstuf <div class="FormattedComment"> As someone who uses symlinks to manage dotfiles in another repository, the write/sync/rename workflow is annoying as all hell. Tin (since moved to slrn), gpg, pidgin, weechat (which is why I'm still using irssi), and more all force me to manually copy the file to the real location and remake the symlink. If there was a way to do the workflow and then replay the writes to an fd returned by open() on the original path, it would be much better. So far, it looks as if all of the above solutions still fail for me.<br> <p> Also in the write/sync/rename workflow, what happens if the temp file is on a separate filesystem? There's a copy involved there, so there is a time when the file is not atomically replaced (unless I'm missing some guarantee by POSIX in this case).<br> </div> Mon, 19 Sep 2011 17:15:29 +0000 avoiding orphan files https://lwn.net/Articles/459337/ https://lwn.net/Articles/459337/ aeriksson <div class="FormattedComment"> The upshot of using the write/sync/rename workflow is of course that the original file is left untouched until there is a fully comitted replacment ready on disk. The downside is that you need to create a temporary file with a temporary filename while doing it. This is bad for crash recovery, where you'd leave orphan files on the filesystem.<br> <p> Is there a way to sole that which I have overlooked?<br> <p> fd=open("",O_UNNAMED);<br> ....<br> rename_unnamed(fd,"/some/file");<br> </div> Mon, 19 Sep 2011 12:06:44 +0000 Opening device nodes https://lwn.net/Articles/459246/ https://lwn.net/Articles/459246/ bjencks <div class="FormattedComment"> What are the semantics when you open a device node (either block, e.g. disk, or char, e.g. tape)? Does the kernel ever use page cache for device files? Does O_DIRECT do anything? What about O_SYNC? Does fsync always generate a barrier?<br> <p> Also, what about the different disk abstraction layers (LVM, dm-crypt, MD RAID, DRBD, etc) -- what's involved in passing an fsync() all the way down the stack?<br> </div> Fri, 16 Sep 2011 22:50:48 +0000 code feedback ... https://lwn.net/Articles/459237/ https://lwn.net/Articles/459237/ bronson <div class="FormattedComment"> That was painfully true 20 years ago. With modern memory management it doesn't matter much. Within reason of course -- a 20 MB buffer should probably still go on the heap.<br> </div> Fri, 16 Sep 2011 19:17:22 +0000 code feedback ... https://lwn.net/Articles/459196/ https://lwn.net/Articles/459196/ renox <div class="FormattedComment"> <font class="QuotedText">&gt; you forgot to free(buf) after the while loop and in the early returns inside of the loop. might as well just use the stack: char buf[MY_BUF_SIZE];</font><br> <p> Is-it such a good idea?<br> I though that it was better to keep the stack small.<br> <p> </div> Fri, 16 Sep 2011 13:43:18 +0000 Ensuring data reaches disk https://lwn.net/Articles/459180/ https://lwn.net/Articles/459180/ andresfreund <div class="FormattedComment"> If the storage device has a write cache but no independent power supply you have the problem that you will loose data on power loss because O_DIRECT will only guarantee that the write reaches the device, not that it reaches persistent storage inside that device.<br> For that you need to issue some special commands - which e.g. fsync() knows how to do.<br> Besides an O_DIRECT write doesn't guarantee that metadata updates have reached stable storage.<br> </div> Fri, 16 Sep 2011 10:35:52 +0000 Ensuring data reaches disk https://lwn.net/Articles/459175/ https://lwn.net/Articles/459175/ scheck <div class="FormattedComment"> "Recall that the storage may itself store the data in a write-back cache, so fsync() is still required for files opened with O_DIRECT in order to save the data to stable storage."<br> <p> Why should I use fsync() for files opened with O_DIRECT and why has the storage device's cache anything to do with it?<br> <p> Apart from that a very nice and comprehensible article. Thank you.<br> <p> <p> </div> Fri, 16 Sep 2011 09:16:57 +0000 code feedback ... https://lwn.net/Articles/458888/ https://lwn.net/Articles/458888/ jwakely <div class="FormattedComment"> the cast isn't needed with a C++ compiler either<br> </div> Wed, 14 Sep 2011 11:59:16 +0000 code feedback ... https://lwn.net/Articles/458866/ https://lwn.net/Articles/458866/ vapier <div class="FormattedComment"> in the past i've been bitten where fwrite was given a char* and size==num bytes to write and nmemb==1 (like in the example here). but perhaps that was a bug in the lower layers (it was a ppc/glibc setup). i do know that size==sizeof(*buf) has always worked for me ;).<br> </div> Wed, 14 Sep 2011 05:33:28 +0000 code feedback ... https://lwn.net/Articles/458668/ https://lwn.net/Articles/458668/ jzbiciak <P>Err... I guess the<TT> read </TT>error path returns -1 <I>or </I> 0, which again I think may be an error, unless you wanted to return 0 when the connection drops before "<TT>nrbytes</TT>" gets read. Oops.</P> <P>That raises a different question: If you exit early due to the socket dropping, you won't<TT> fflush/fsync</TT>. Seems like you want a '<TT>break</TT>' if <TT>read</TT> returned 0 and <TT>errno != EINTR</TT>, don't you?</P> Tue, 13 Sep 2011 06:08:33 +0000 code feedback ... https://lwn.net/Articles/458665/ https://lwn.net/Articles/458665/ jzbiciak <BLOCKQUOTE><I>another common mistake: the size/nmemb args are swapped ... the size is "1" (since sizeof(*buf) is 1 (a char)), and the number of elements is "ret". once you fix the arg order, the method of clobbering the value of ret won't work in the "if" check ... </I></BLOCKQUOTE> <P>I don't think it's an error. In the example, if<TT> fwrite </TT>returns anything other than '1', then it reports an error. This is an "all-or-nothing"<TT> fwrite</TT>. If it fails, 'ret' will be 0, otherwise it will be 1. The semantic is "write 1 buffer of size 'ret' bytes."</P> <P>I see nothing wrong with this, and it matches the<TT> if (ret != 1) </TT>statement that follows. Sure, you don't get to find out how many bytes did get written, but the code wasn't interested in that anyway. And, it's one less variable that's "live across call," so the resulting compiler output may be fractionally smaller/faster. (While I can think of smaller microoptimizations, this type of microoptimization <I>is</I> pretty far down the list, I must admit.)</P> <P>Personally, I think the code might be clearer breaking 'ret' up into multiple variables. For example, if you did switch size/nmemb, you might rewrite the loop like so:</P> <PRE> while (tot_written < nrbytes) { int remaining = nrbytes - tot_written; int to_read = remaining > MY_BUF_SIZE ? MY_BUF_SIZE : remaining; read_ret = read(sockfd, buf, to_read); if (read_ret <= 0) { if (errno == EINTR) continue; return read_ret; } write_ret = fwrite((void *)buf, 1, read_ret, outfp); tot_written += write_ret; if (write_ret != read_ret) return ferror(outfp); } </PRE> <P>Written that way, you could easily add a way to return how many bytes <I>did</I> get written.</P> <P>Also, the return value is inconsistent. I think "<TT>return ferror(outfp)</TT>" is wrong. <TT>ferror </TT> returns non-zero on an error, but it isn't guaranteed to be negative. The other paths through this function return positive values on success, so shouldn't it be simply "<TT>return -1;</TT>" to match the<TT> read </TT>error path (which also simply returns -1, and maybe should be written as such)? ie: </P> <PRE> while (tot_written < nrbytes) { int remaining = nrbytes - tot_written; int to_read = remaining > MY_BUF_SIZE ? MY_BUF_SIZE : remaining; read_ret = read(sockfd, buf, to_read); if (read_ret <= 0) { if (errno == EINTR) continue; return -1; } write_ret = fwrite((void *)buf, 1, read_ret, outfp); tot_written += write_ret; if (write_ret != read_ret) return -1; } </PRE> Tue, 13 Sep 2011 06:00:56 +0000 How disappointing https://lwn.net/Articles/458457/ https://lwn.net/Articles/458457/ nix <div class="FormattedComment"> Again, ugh ('with'?). I'd simply say close_replace(), no need for a flag or indeed any parameters at all. This means it has the same prototype as close(), so if anyone wants to choose between calling close() or close_replace() at runtime, they can just use a function pointer.<br> <p> </div> Sun, 11 Sep 2011 23:51:19 +0000 How disappointing https://lwn.net/Articles/458448/ https://lwn.net/Articles/458448/ sionescu <div class="FormattedComment"> How about "close_with_flags" ?<br> </div> Sun, 11 Sep 2011 22:14:21 +0000 How disappointing https://lwn.net/Articles/458446/ https://lwn.net/Articles/458446/ nix <div class="FormattedComment"> True, though close2() is a horrible name (as is wait$num() and accept$num()): give it a name that reflects its purpose.<br> <p> </div> Sun, 11 Sep 2011 21:52:56 +0000 How disappointing https://lwn.net/Articles/458441/ https://lwn.net/Articles/458441/ sionescu <div class="FormattedComment"> Why POSIX ? There are other Linux-specific open flags, did Ulrich object to every one of them ?<br> <p> The new syscall could be called close2, adding a "flags" parameter - in the spirit of accept4() et al.<br> </div> Sun, 11 Sep 2011 21:10:24 +0000 How disappointing https://lwn.net/Articles/458440/ https://lwn.net/Articles/458440/ nix <div class="FormattedComment"> Sure. Practicalities: you could do it to open() (though you'd have to get the change into POSIX before Ulrich would let it past), but you could never do that to close() without breaking every C program ever written. You could call it close_replace(), perhaps?<br> </div> Sun, 11 Sep 2011 20:45:55 +0000 How disappointing https://lwn.net/Articles/458412/ https://lwn.net/Articles/458412/ sionescu <div class="FormattedComment"> No, it's the common way of implementing atomic commit when *modifying* the data, but it's not what I have in mind, which is this:<br> <p> open(path, O_REPLACE) only allocates a new inode<br> <p> close(fd, CLOSE_COMMIT) atomically replaces the reference to the old inode with the new inode(just like rename) copying all metadata except for the (a|c|m)time, then calls fsync()<br> <p> easy, isn't it ?<br> </div> Sun, 11 Sep 2011 16:00:42 +0000 How disappointing https://lwn.net/Articles/458389/ https://lwn.net/Articles/458389/ butlerm <div class="FormattedComment"> <font class="QuotedText">&gt;Who said anything about locking ?</font><br> <p> I mention locking because it is the most common way to implement atomic commit semantics, from the perspective of all other processes. Your idea makes great sense as long as you have multiversion read concurrency, so that existing openers can see an old, read only version of the file indefinitely.<br> <p> POSIX simply has a different solution for that, as I am sure you know - the name / inode distinction, which allows you to delete a file, or rename replace it with a new version without locking other processes out, waiting, or disturbing existing openers. <br> <p> It is unfortunate of course that there is no standard call to clone an existing file's extended attributes and security context for use in a rename replace transaction - perhaps one should be added, it would be a worthwhile enhancement. Hating UNIX when it is vastly superior to the most widely distributed alternative in this respect seems a bit pointless to me. <br> <p> </div> Sun, 11 Sep 2011 00:12:37 +0000 code feedback ... https://lwn.net/Articles/458380/ https://lwn.net/Articles/458380/ vapier <pre> 5 char *buf = malloc(MY_BUF_SIZE);</pre> you forgot to free(buf) after the while loop and in the early returns inside of the loop. might as well just use the stack: char buf[MY_BUF_SIZE]; <pre>11 ret = read(sockfd, buf, MY_BUF_SIZE);</pre> common mistake. the len should be min(MY_BUF_SIZE, nrbytes - written). otherwise, if (nrbytes % MY_BUF_SIZE) is non-zero, you read too many bytes from the sockfd and they get lost. <pre>12 if (ret =< 0) {</pre> typo ... should be "<=" as "=<" doesn't compile. <pre>18 ret = fwrite((void *)buf, ret, 1, outfp); 19 if (ret != 1)</pre> unless you build this with a C++ compiler, that cast is not needed. and another common mistake: the size/nmemb args are swapped ... the size is "1" (since sizeof(*buf) is 1 (a char)), and the number of elements is "ret". once you fix the arg order, the method of clobbering the value of ret won't work in the "if" check ... <pre>27 ret = fsync(fileno(outfp)); 28 if (ret < 0) 29 return -1; 30 return 0;</pre> at this point, you could just as easily write: <pre> return fsync(fileno(outfp));</pre> Sat, 10 Sep 2011 21:02:28 +0000 How disappointing https://lwn.net/Articles/458304/ https://lwn.net/Articles/458304/ sionescu <div class="FormattedComment"> Who said anything about locking ?<br> </div> Fri, 09 Sep 2011 22:45:06 +0000