<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
>

  <channel rdf:about="http://lwn.net/headlines/325304/">
    <title>LWN: Comments on "Wheeler: Fixing Unix/Linux/POSIX Filenames"</title>
    <link>http://lwn.net/Articles/325304/</link>
    <description>
This is a special feed containing comments posted
to the individual LWN article titled &quot;Wheeler: Fixing Unix/Linux/POSIX Filenames&quot;.

    </description>

    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>2</syn:updateFrequency>
    <items>
      <rdf:Seq>
	<rdf:li resource="http://lwn.net/Articles/362018/rss" />
	<rdf:li resource="http://lwn.net/Articles/362006/rss" />
	<rdf:li resource="http://lwn.net/Articles/362000/rss" />
	<rdf:li resource="http://lwn.net/Articles/361998/rss" />
	<rdf:li resource="http://lwn.net/Articles/361997/rss" />
	<rdf:li resource="http://lwn.net/Articles/328525/rss" />
	<rdf:li resource="http://lwn.net/Articles/327173/rss" />
	<rdf:li resource="http://lwn.net/Articles/326932/rss" />
	<rdf:li resource="http://lwn.net/Articles/326695/rss" />
	<rdf:li resource="http://lwn.net/Articles/326538/rss" />
	<rdf:li resource="http://lwn.net/Articles/326499/rss" />
	<rdf:li resource="http://lwn.net/Articles/326477/rss" />
	<rdf:li resource="http://lwn.net/Articles/326395/rss" />
	<rdf:li resource="http://lwn.net/Articles/326380/rss" />
	<rdf:li resource="http://lwn.net/Articles/326376/rss" />
	<rdf:li resource="http://lwn.net/Articles/326375/rss" />
	<rdf:li resource="http://lwn.net/Articles/326312/rss" />
	<rdf:li resource="http://lwn.net/Articles/326227/rss" />
	<rdf:li resource="http://lwn.net/Articles/326202/rss" />
	<rdf:li resource="http://lwn.net/Articles/326190/rss" />
	<rdf:li resource="http://lwn.net/Articles/326161/rss" />
	<rdf:li resource="http://lwn.net/Articles/326129/rss" />
	<rdf:li resource="http://lwn.net/Articles/326127/rss" />
	<rdf:li resource="http://lwn.net/Articles/326126/rss" />
	<rdf:li resource="http://lwn.net/Articles/326123/rss" />
	<rdf:li resource="http://lwn.net/Articles/326122/rss" />
	<rdf:li resource="http://lwn.net/Articles/326121/rss" />
	<rdf:li resource="http://lwn.net/Articles/326120/rss" />
	<rdf:li resource="http://lwn.net/Articles/326119/rss" />
	<rdf:li resource="http://lwn.net/Articles/326109/rss" />
	<rdf:li resource="http://lwn.net/Articles/326092/rss" />
	<rdf:li resource="http://lwn.net/Articles/326091/rss" />
	<rdf:li resource="http://lwn.net/Articles/326090/rss" />
	<rdf:li resource="http://lwn.net/Articles/326087/rss" />
	<rdf:li resource="http://lwn.net/Articles/326088/rss" />
	<rdf:li resource="http://lwn.net/Articles/326082/rss" />
	<rdf:li resource="http://lwn.net/Articles/326059/rss" />
	<rdf:li resource="http://lwn.net/Articles/326057/rss" />
	<rdf:li resource="http://lwn.net/Articles/326052/rss" />
	<rdf:li resource="http://lwn.net/Articles/326049/rss" />
      
      </rdf:Seq>
    </items>

  </channel>
    <item rdf:about="http://lwn.net/Articles/362018/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/362018/rss</link>
      <dc:date>2009-11-15T13:15:57+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
zsh93 was too sodding hard to require because building it was a nightmare. &lt;br&gt;
At the time it wasn't free enough either.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/362006/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/362006/rss</link>
      <dc:date>2009-11-15T01:06:04+00:00</dc:date>
      <dc:creator>yuhong</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&quot;ksh was too buggy (thanks, Linux, for pdksh, with its broken &lt;br&gt;
propagation of variables out of loops-with-redirection)&quot;&lt;br&gt;
Was ksh93 tried?&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/362000/rss">
      <title>Leading spaces are common, actually</title>
      <link>http://lwn.net/Articles/362000/rss</link>
      <dc:date>2009-11-15T00:32:25+00:00</dc:date>
      <dc:creator>yuhong</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&quot;Classic Mac OS loads files in /System Folder/Extensions in lexicographic &lt;br&gt;
order, and the load order matters, and the leading space trick is used very &lt;br&gt;
frequently there. &quot;&lt;br&gt;
Yep, look at what they had to do about this when Apple introduced HFS+ in Mac &lt;br&gt;
OS 8.1:&lt;br&gt;
&lt;a rel=&quot;nofollow&quot; href=&quot;http://developer.apple.com/legacy/mac/library/technotes/tn/tn1121.html#HFSPlu&quot;&gt;http://developer.apple.com/legacy/mac/library/technotes/t...&lt;/a&gt;&lt;br&gt;
s&lt;br&gt;
&lt;a rel=&quot;nofollow&quot; href=&quot;http://developer.apple.com/legacy/mac/library/technotes/tn/tn1123.html&quot;&gt;http://developer.apple.com/legacy/mac/library/technotes/t...&lt;/a&gt;&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/361998/rss">
      <title>NT (Windows kernel) doesn't care about filenames any more than Linux</title>
      <link>http://lwn.net/Articles/361998/rss</link>
      <dc:date>2009-11-15T00:06:30+00:00</dc:date>
      <dc:creator>yuhong</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
Another trick you can use with CreateFile is to start the filename with \\.\. &lt;br&gt;
If that is done, the only processing done on the filename before CreateFile &lt;br&gt;
calls NtCreateFile with the name is that \\.\ is replace with \??\, which is &lt;br&gt;
an alias of \DosDevices\.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/361997/rss">
      <title>NT (Windows kernel) doesn't care about filenames any more than Linux</title>
      <link>http://lwn.net/Articles/361997/rss</link>
      <dc:date>2009-11-14T23:58:04+00:00</dc:date>
      <dc:creator>yuhong</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&quot;files that are more than 2GB long&quot;&lt;br&gt;
Yep, NT had supported both files and disks larger than 2GB from the first &lt;br&gt;
version (NT 3.1) using the NTFS filesystem. Exercise: compare the design of &lt;br&gt;
the GetDiskFreeSpace and SetFilePointer APIs (look them up using MSDN or &lt;br&gt;
Google), both of which has existed since NT 3.1.  Which one was so much more &lt;br&gt;
error-prone that the versions of Windows released in 1996 had to cap the &lt;br&gt;
result to 2GB, even though older versions of NT supported returning more than &lt;br&gt;
2GB using it, and why?&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/328525/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/328525/rss</link>
      <dc:date>2009-04-15T10:38:01+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      &lt;blockquote&gt;Because a whole lot of stupid people thought that &quot;wide characters&quot; are the solution and put them into certain systems we have to live with it and interoperate. The most popular solution is to translate invalid bytes in UTF-8 into 0xDCxx. This can be used as a stopgap until they finally realize that leaving the data in UTF-8 is the real solution.&lt;/blockquote&gt;They cannot 'leave the data in UTF-8' because it is not in UTF-8 to start with!  If it contains invalid bytes then by definition it's not UTF-8.  It is just a string of arbitrary bytes and certainly, yes, the application can treat it as such.  That does make life difficult when you want to display the filename to the user or otherwise treat it as human-readable text.
&lt;p&gt;
And indeeed, the Python developers are living in a magic fairy land where filenames are sanely encoded and are always human-readable text, but wouldn't it be better to change things so that this situation is no longer wishful thinking, but part of the ordinary things userspace can rely on?  That is what Wheeler is proposing.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/327173/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/327173/rss</link>
      <dc:date>2009-04-03T18:49:33+00:00</dc:date>
      <dc:creator>anton</dc:creator>
      <description>
      &lt;blockquote&gt;It could also recognise the null character as an argument
separator as in 'find -print0'.&lt;/blockquote&gt;

A few weeks ago I wanted to process my .ogg files which contain all
kinds of characters that are treated as meta-characters by the shell
or other programs I use in sheel scripts.  I eventually ended up
writing a new shell &lt;a
href=&quot;http://www.complang.tuwien.ac.at/forth/programs/dumbsh.fs&quot;&gt;dumbsh&lt;/a&gt;
that uses NUL as argument separator, and feeding it from find, with
some intermediate processing in awk (which is quite flexible about
meta-characters).

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326932/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326932/rss</link>
      <dc:date>2009-04-02T15:54:06+00:00</dc:date>
      <dc:creator>forthy</dc:creator>
      <description>
      &lt;p&gt;It is actually not that bad. As collating sequence, ß=ss (i.e. Mass 
and Maß sort to the same bin). Except for Austrian telephone books, where 
ß follows ss, but comes before st (though St. follows Sankt ;-).&lt;/p&gt;

&lt;p&gt;However, there's a huge mess in the CJK part of UCS: short and long 
forms of the same character (sometimes even a special variant for the 
Japanese character). This should never have happend, the different forms 
of the same character should be encoded in fonts, not in UCS. So far, not 
even Mac OS X normalizes these characters, but it is obvious that a 
mainland China file called &quot;&amp;#20013;&amp;#22269;&quot; and a Taiwan file called &quot;&amp;#20013;&amp;#22283;&quot; not only 
mean the same, but they also refer to the same word, and can be 
interchanged at will (see for example the Chinese wikipedia entry: the 
lemma is the short form, the headline is the long form). And it is not 
easy to access long and short forms with usual input methods (mainland 
China: Pinyin, Canton: Cantonese Pinyin (gives traditional characters, 
bug you need to know Cantonese), etc.).&lt;/p&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326695/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/326695/rss</link>
      <dc:date>2009-04-01T16:38:38+00:00</dc:date>
      <dc:creator>spitzak</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
A program that treats bytes with the high bit set as &quot;this may be a piece of a UTF-8 character&quot;, and puts all those bytes into a single class such as &quot;may be a part of an identifier&quot;, can safely handle UTF-8 strings (including invalid ones) as bytes. This is FAR better than trying to detect and handle errors, in particular because it is a hundred times simper and thus more reliable and less likely to have bugs.&lt;br&gt;
&lt;p&gt;
Do NOT throw exceptions on bad strings. This turns a possible security error into a guaranteed DOS error. Working around it (as I have had to do countless times due to stupid string-drawing routines that refuse to draw a string with an error in it) means I have to write my *own* UTF-8 parser, just to remove the errors, before displaying it or using it. I hope you can see how forcing programmers to use their own code to parse the strings rather than providing reusable routines is a bad idea.&lt;br&gt;
&lt;p&gt;
And I don't want exceptions thrown when I compare two strings for equality. That way lies madness. It is unfortunate that too much of this stuff is being designed by people who never use it or they (and you) would not make such trivial design errors.&lt;br&gt;
&lt;p&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326538/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/326538/rss</link>
      <dc:date>2009-04-01T05:12:40+00:00</dc:date>
      <dc:creator>njs</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; I am sure that &quot;errors in UTF-8 only contain bytes with the high bit set&quot;, which is what I thought you were asking.&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
Okay, fair enough. I agree, all ASCII characters are valid UTF-8. I was objecting to your claim that bytes with the high bits set &quot;do not cause any problems with any programs&quot;.&lt;br&gt;
&lt;p&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; An overlong encoding consists of a leading byte with the high bit set. This is an error.&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
All characters with codepoint &amp;gt;= 128 are encoded in UTF-8 as a string of bytes with the high bit set (including on the leading byte). Having the high bit set is *certainly* not an error. I can't tell what you're saying in general, but it's just not true that the only time strings need to be interpreted as text is for display. In many, many cases text needs to be processed as text, and it's often impossible and rarely practical to write algorithms in such a way that they do something sensible with invalid encodings. Those serious security bugs I pointed out up above are examples of what happens when you try.&lt;br&gt;
&lt;p&gt;
(You're right that invalid strings usually shouldn't be silently transmuted to valid strings; they should usually signal a hard error.)&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326499/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326499/rss</link>
      <dc:date>2009-03-31T19:28:56+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I've contributed fixes now and then, but I just read a lot. :) The &lt;br&gt;
projects are public, after all.&lt;br&gt;
&lt;p&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326477/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/326477/rss</link>
      <dc:date>2009-03-31T17:59:20+00:00</dc:date>
      <dc:creator>spitzak</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I am sure that &quot;errors in UTF-8 only contain bytes with the high bit set&quot;, which is what I thought you were asking.&lt;br&gt;
&lt;p&gt;
An overlong encoding consists of a leading byte with the high bit set. This is an error. That may be followed by any byte. If it is another leading byte then it might start another UTF-8 character, or it might be an error. If it is a continuation byte then it is an error. If it is an ASCII character then it is not an error. As before, EVERY ERROR BYTE has the high bit set!&lt;br&gt;
&lt;p&gt;
I might have misunderstood your question. You said &quot;are you sure&quot; in response to me saying that all error bytes have the high bit set. The reason I was confirming that all error bytes have the high bit set is that if they are mapped to a 128-long range of Unicode then the adjacent 128-long range makes a good candidate for &quot;quoting&quot; characters that are not allowed in filenames.&lt;br&gt;
&lt;p&gt;
I do believe there are some serious mistakes in a lot of modern software. UTF-8 should NOT be converted until the very last moment when it is converted to &quot;display form&quot; for drawing on the screen. This is the only reliable way of preserving identity of invalid strings. People who think invalid strings will not occur or that it is acceptable for them to compare equal or silently be changed to other invalid strings or with valid strings are living in a fantasy land.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326395/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326395/rss</link>
      <dc:date>2009-03-31T07:47:30+00:00</dc:date>
      <dc:creator>michaeljt</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I was wondering now whether to ask about this on the Bash mailing lists.  Just out of interest, are you involved with the development of Bash/the GNU tools in any way?  You seem well informed about them.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326380/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326380/rss</link>
      <dc:date>2009-03-31T05:14:50+00:00</dc:date>
      <dc:creator>njs</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
We have that -- that's what file descriptors are. It would be nice if programs passed them back and forth more often, but my guess is that they mostly get used where they should, and to make their use more ubiquitous you'd need to radically re-architect a lot of stuff. (If one wanted to be provocative, one could claim that the whole goal of EROS/Coyotos is to figure out what that re-architecting looks like.)&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326376/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326376/rss</link>
      <dc:date>2009-03-31T05:00:40+00:00</dc:date>
      <dc:creator>njs</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I think you're overcomplicating things -- I wouldn't implement UTF-8 requirements at the VFS level (it just doesn't make sense, since there manifestly exist filesystems where you don't know the encoding, both from pre-existing Linux installs and with &quot;foreign&quot; filesystems). I'd make it a filesystem feature -- a flag in the ext2/3/4 header that's set at mkfs time, say. That removes all the issues about translating invalid filenames -- if that flag is set and a filename is invalid, then *your filesystem is corrupt*. fsck can check for such corruption if it feels like it.&lt;br&gt;
&lt;p&gt;
Then you just get distros to set that flag on the root filesystem by default, add a few bits of API for programs who want to know &quot;is this filesystem utf8-only?&quot; or &quot;how does this filesystem normalize names?&quot; (which would be really useful calls anyway), and away you go.&lt;br&gt;
&lt;p&gt;
(It's unfortunate that the Win32 designers screwed this up, but that's hardly an argument to perpetuate their mistake.)&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326375/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/326375/rss</link>
      <dc:date>2009-03-31T04:49:07+00:00</dc:date>
      <dc:creator>njs</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; Yes I am sure.&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
So -- just checking we're on the same page here -- what you're saying is that you're sure that those three security bugs I found in 5 minutes of googling were &quot;not problems in any program&quot;.&lt;br&gt;
&lt;p&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; The first two references are about programs failing to recognize overlong encodings as being invalid.&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
Right -- if invalid codings are interpreted differently in different parts of a system, then that creates bugs and security holes.&lt;br&gt;
&lt;p&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; But those invalid sequences start with a byte with the high bit set (following bytes may not have it set, but the fact that decoders consider them part of the first byte is the decoders error, a fixed decoder would consider it a one-byte error with the high bit set, followed by normal ascii characters which are unchanged and thus cannot cause a security hole).&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
I'm sorry -- I cannot make out a word of this. The bug in the first two links is that the invalid sequences are over-long (but like all the bugs mentioned here, involve only bytes with the high bits set -- do you know how UTF-8 works?). The decoder should have an explicit check for such sequences and throw an error if they are encountered, but this check was left out.&lt;br&gt;
&lt;p&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; The last one is EXACTLY the bug I am trying to fix: stupid people who somehow believe that throwing errors or replacing with non-unique strings is how invalid UTF-8 should be handled.&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
Errrr... quite so. I wasn't sure how useful this was to start with, but when you say in so many words that the proper solution to XSS security holes is to stop sanitizing web form inputs and instead convert all web browsers so that they *don't interpret unicode* then... maybe it's time I step out of this thread. Best of luck to you.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326312/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326312/rss</link>
      <dc:date>2009-03-30T19:36:47+00:00</dc:date>
      <dc:creator>rickmoen</dc:creator>
      <description>
      mrshiny wrote:

&lt;p&gt;&lt;em&gt;You can pry my spaces from my filenames out of my cold dead fingers.&lt;/em&gt;

&lt;p&gt;ObMenInBlack:  &quot;Your offer is acceptable.&quot;

&lt;p&gt;(I remember having to write AppleScript to recurse through directories cleaning up files created  
on network shares by MacOS-using munchkins who put space characters at the &lt;em&gt;ends&lt;/em&gt;  
of filenames, in order for them to become valid filenames when seen by MS-Windows-using 
employees looking at the same network shares.  The converse problem was files, from MS-Windows 
users, with names containing colon, which is a reserved character in MacOS file namespace.  
What a pain in the tochis.)

&lt;p&gt;Rick Moen&lt;br&gt;
rick@linuxmafia.com
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326227/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326227/rss</link>
      <dc:date>2009-03-30T16:41:21+00:00</dc:date>
      <dc:creator>Hawke</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I don't think any DOS applications use backslash for their option marker.  Some use dash, and most use slash.  But I'm pretty sure that practically none if any use backslash&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326202/rss">
      <title>Bad understanding of UTF-8</title>
      <link>http://lwn.net/Articles/326202/rss</link>
      <dc:date>2009-03-30T16:08:23+00:00</dc:date>
      <dc:creator>spitzak</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
Yes I am sure.&lt;br&gt;
&lt;p&gt;
The first two references are about programs failing to recognize overlong encodings as being invalid. But those invalid sequences start with a byte with the high bit set (following bytes may not have it set, but the fact that decoders consider them part of the first byte is the decoders error, a fixed decoder would consider it a one-byte error with the high bit set, followed by normal ascii characters which are unchanged and thus cannot cause a security hole).&lt;br&gt;
&lt;p&gt;
The last one is EXACTLY the bug I am trying to fix: stupid people who somehow believe that throwing errors or replacing with non-unique strings is how invalid UTF-8 should be handled. The bug is that it maps more than one different string to the same one. The proper solution is to stop translating UTF-8 into something else and treat it as a stream of bytes. Nothing should care that it is UTF-8 except stuff that draws it on the screen.&lt;br&gt;
&lt;p&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326190/rss">
      <title>NT (Windows kernel) doesn't care about filenames any more than Linux</title>
      <link>http://lwn.net/Articles/326190/rss</link>
      <dc:date>2009-03-30T15:13:36+00:00</dc:date>
      <dc:creator>foom</dc:creator>
      <description>
      &lt;font class=&quot;QuotedText&quot;&gt;&amp;gt;&amp;gt; Does that mean if you code against the NT API directly, you 
can create files foo and FOO in the same directory?
&lt;br&gt;&amp;gt; Yes. This is what the POSIX subsystems for NT do
&lt;/font&gt;
&lt;p&gt;
You can actually do this through the Win32 API: see the FILE_FLAG_POSIX_SEMANTICS flag for &lt;a 
href=&quot;http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx&quot;&gt;CreateFile&lt;/a&gt;. 
However, MS realized this was a security problem, so as of WinXP, this option will in normal 
circumstances do absolutely nothing. You now have to explicitly enable case-sensitive support on 
the system for either the &quot;Native&quot; or Win32 APIs to allow it.
&lt;p&gt;
(the SFU installer asks if you want to this, but even SFU has no special dispensation)

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326161/rss">
      <title>NT (Windows kernel) doesn't care about filenames any more than Linux</title>
      <link>http://lwn.net/Articles/326161/rss</link>
      <dc:date>2009-03-30T10:55:30+00:00</dc:date>
      <dc:creator>nye</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt;Does that mean if you code against the NT API directly, you can create files foo and FOO in the same directory?&lt;/font&gt;&lt;br&gt;
&lt;p&gt;
Yes. This is what the POSIX subsystems for NT do; they're implemented on top of the native API, as is the Win32 API. Note that Cygwin doesn't count here as it's a compatibility layer on top of the Win32 API rather than its own separate subsystem.&lt;br&gt;
&lt;p&gt;
Unfortunately the Win32 API *does* enforce things like file naming conventions, so it's impossible (at least without major voodoo) to write Win32 applications which handle things like a colon in a file name, and since different subsytems are isolated, that means that no normal Windows software is going to be able to do it.&lt;br&gt;
&lt;p&gt;
(I learnt all this when I copied my music collection to an NTFS filesystem, and discovered that bits of it were unaccessible to Windows without SFU/SUA, which is unavailable for the version of Windows I was using.)&lt;br&gt;
&lt;p&gt;
&lt;a href=&quot;http://en.wikipedia.org/wiki/Native_API&quot;&gt;http://en.wikipedia.org/wiki/Native_API&lt;/a&gt;&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326129/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326129/rss</link>
      <dc:date>2009-03-30T00:07:22+00:00</dc:date>
      <dc:creator>njs</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
You cannot, in general, convert a filename to text. That's the fundamental problem that any of the proposals would solve.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326127/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326127/rss</link>
      <dc:date>2009-03-29T22:37:31+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      &lt;blockquote&gt;No, it is very possible to write such a function. The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be.&lt;/blockquote&gt;Well, yeah.  If you allow the function to return the wrong answer, then it is easy to write.  But it is not possible to in all cases return the correct filename to the user, matching the original one chosen by the user.  If you pick a known encoding everywhere (UTF-8 being the obvious choice) then the problem goes away.
&lt;blockquote&gt;This doesn't represent a security problem.&lt;/blockquote&gt;Correct (at least none that I can think of).  The security issue is with special characters and control characters in filenames, and is separate to the issue of how to encode characters that don't fit in ASCII.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326126/rss">
      <title>Re: Not A System Problem</title>
      <link>http://lwn.net/Articles/326126/rss</link>
      <dc:date>2009-03-29T22:32:28+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
You don't get it. In order to permit / and \0 as valid filename &lt;br&gt;
characters, syscalls like open() must change. Library calls like fopen() &lt;br&gt;
have to change, because they too accept a \0-terminated string, with /s &lt;br&gt;
separating path components. Every single call in every library that &lt;br&gt;
accepts pathnames has to change. Probably the very notion of a string has &lt;br&gt;
to change to something non-\0-terminated.&lt;br&gt;
&lt;p&gt;
So whatever you're describing, userspace cannot any longer use standard &lt;br&gt;
POSIX calls: in fact, it can't any longer use ANSI C calls! I suspect that &lt;br&gt;
such a system would be almost unusable with C, simply because you couldn't &lt;br&gt;
use C string literals for anything.&lt;br&gt;
&lt;p&gt;
If you want VMS, you know where to find it.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326123/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326123/rss</link>
      <dc:date>2009-03-29T22:07:27+00:00</dc:date>
      <dc:creator>foom</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
Eh...but OSX *does* run 40+ years of UNIX programs. It's pretty clear that the change to require &lt;br&gt;
UTF-8 (and even the change to be case insensitive!) didn't bother most programs.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326122/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326122/rss</link>
      <dc:date>2009-03-29T21:58:46+00:00</dc:date>
      <dc:creator>clugstj</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
No, it is very possible to write such a function.  The character encoding issue only prevents you from assuring that the string matches what the file's creator thought it should be.  This doesn't represent a security problem.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326121/rss">
      <title>Conventions are great! Let's go back to FAT!</title>
      <link>http://lwn.net/Articles/326121/rss</link>
      <dc:date>2009-03-29T21:44:21+00:00</dc:date>
      <dc:creator>clugstj</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
&quot;UNIX is not broken. Your head, on the other hand, is&quot;&lt;br&gt;
&lt;p&gt;
Wow, childish personal attacks.  How droll.&lt;br&gt;
&lt;p&gt;
&quot;Number of correct scripts is not important metric. Number of bad scripts is&quot;&lt;br&gt;
&lt;p&gt;
I would think that the percentage of each would (possibly) be a useful metric.  But, what is the damage from these &quot;bad scripts&quot;?  If you are writing shell scripts that MUST be absoutely bullet-proof from bad input, perhaps because they run setuid-root, then you are already making a much worse mistake than the possible bugs in the script.&lt;br&gt;
&lt;p&gt;
Still don't understand the FAT reference.  Sorry, maybe I'm just slow.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326120/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326120/rss</link>
      <dc:date>2009-03-29T21:30:44+00:00</dc:date>
      <dc:creator>clugstj</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
OS X is trivial to handle.  It only has to continue to work in a compatible way with the previous Mac OS - which wasn't UNIX.  So using it as an example of how to &quot;fix&quot; these problems is not a good idea if you care about supporting 40+ years of UNIX programs - which is why this is difficult to change.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326119/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326119/rss</link>
      <dc:date>2009-03-29T21:27:08+00:00</dc:date>
      <dc:creator>clugstj</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
I'm sorry, but when you said, that any of these propositions is better than the current situation, I HAD to disagree.  In what way is the current situation so bad that any proposal is better that the current situation?&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326109/rss">
      <title>Re: Not A System Problem</title>
      <link>http://lwn.net/Articles/326109/rss</link>
      <dc:date>2009-03-29T19:47:30+00:00</dc:date>
      <dc:creator>ldo</dc:creator>
      <description>
      &lt;P&gt;nix wrote:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;FONT STYLE=&quot;color : #0000FF&quot;&gt;&lt;P&gt;What you're describing is not POSIX anymore.&lt;/P&gt;&lt;/FONT&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Nothing to do with POSIX. POSIX is a userland API, it doesnt dictate how the kernel should work.&lt;/P&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326092/rss">
      <title>Simplicity is better than complexity.</title>
      <link>http://lwn.net/Articles/326092/rss</link>
      <dc:date>2009-03-29T15:03:37+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
To check for control characters&lt;br&gt;
&lt;p&gt;
for (const char *c = filename; *c; c++)&lt;br&gt;
   if (*c &amp;lt; 32) return EINVAL;&lt;br&gt;
&lt;p&gt;
Adding a fixed list of 'bad characters' (please excuse lack of indentation, the LWN comment form eats it):&lt;br&gt;
&lt;p&gt;
for (const char *c = filename; *c; c++)&lt;br&gt;
   if (*c &amp;lt; 32 || *c == '&amp;lt;' || *c == '&amp;gt;' || *c == '|') return EINVAL;&lt;br&gt;
if (filename[0] == '-') return EINVAL;&lt;br&gt;
&lt;p&gt;
To check valid UTF-8 is a little more complex, but not much.  You do not need to check that assigned Unicode characters are being used, or worry about combining characters, upper and lower case, etc.  See &amp;lt;&lt;a href=&quot;http://www.cl.cam.ac.uk/~mgk25/unicode.html&quot;&gt;http://www.cl.cam.ac.uk/~mgk25/unicode.html&lt;/a&gt;&amp;gt; for a list of valid byte sequences.  The code would be something like&lt;br&gt;
&lt;p&gt;
/* First pad the filename with 4 extra NUL bytes at the end.  Then, */&lt;br&gt;
int is_cont(char c) { return 128 &amp;lt;= c &amp;amp;&amp;amp; c &amp;lt; 192 }&lt;br&gt;
const char *p = filename;&lt;br&gt;
while (*p) {&lt;br&gt;
  if (*p &amp;lt; 128) ++c;&lt;br&gt;
  else if (192 &amp;lt;= *p &amp;amp;&amp;amp; *p &amp;lt; 224 &amp;amp;&amp;amp; is_cont(p[1])) p += 2;&lt;br&gt;
  else if (224 &amp;lt;= *p &amp;amp;&amp;amp; *p &amp;lt; 240 &amp;amp;&amp;amp; is_cont(p[1]) &amp;amp;&amp;amp; is_cont(p[2]) p += 3;&lt;br&gt;
  else if (240 &amp;lt;= *p &amp;amp;&amp;amp; *p &amp;lt; 248 &amp;amp;&amp;amp; is_cont(p[1]) &amp;amp;&amp;amp; is_cont(p[2])&lt;br&gt;
           &amp;amp;&amp;amp; is_cont(p[3])) p += 4;&lt;br&gt;
  else if (248 &amp;lt;= *p &amp;amp;&amp;amp; *p &amp;lt; 252 &amp;amp;&amp;amp; is_cont(p[1]) &amp;amp;&amp;amp; is_cont(p[2])&lt;br&gt;
           &amp;amp;&amp;amp; is_cont(p[3]) &amp;amp;&amp;amp; is_cont(p[4])) p += 5;&lt;br&gt;
  else if (252 &amp;lt;= *p &amp;amp;&amp;amp; *p &amp;lt; 254 &amp;amp;&amp;amp; is_cont(p[1]) &amp;amp;&amp;amp; is_cont(p[2])&lt;br&gt;
           &amp;amp;&amp;amp; is_cont(p[3]) &amp;amp;&amp;amp; is_cont(p[4]) &amp;amp;&amp;amp; is_cont(p[5])) p += 6;&lt;br&gt;
  else return EINVAL;&lt;br&gt;
}&lt;br&gt;
&lt;p&gt;
For a self-contained system, that takes care of it.  Put some code like the above into a function and call it at each place a filename is taken from user space.  Coping with 'foreign' filesystems (e.g. NFS servers) returning non-UTF-8 filenames is a bit more complex.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326091/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326091/rss</link>
      <dc:date>2009-03-29T14:43:25+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      &lt;blockquote&gt;a function which takes a zero-terminated byte array representing a filename and returns a string suitable for display&lt;/blockquote&gt;Currently it is impossible to reliably write such a function, because you don't know whether the byte array is encoded in Latin-1, Shift-JIS, UTF-8 or whatever.
&lt;p&gt;
Imagine removing the character encoding headers from the http protocol.  There would then be no reliable way to take the content of a page and display it to the user - just a panoply of hacks and rules of thumb that differed from one browser to another.  This is the situation we have now with filenames, which are *names* and intended for human consumption just as much as the content of a typical web page.  The two choices are (a) add headers to the protocol saying what encoding is in use (or in the case of filenames, an extra parameter in all FS calls), or (b) mandate a single encoding everywhere.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326090/rss">
      <title>NT (Windows kernel) doesn't care about filenames any more than Linux</title>
      <link>http://lwn.net/Articles/326090/rss</link>
      <dc:date>2009-03-29T14:36:20+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      &lt;blockquote&gt;
NT (the kernel API in Windows NT, 2000, XP and etc.) doesn't care about filename encodings. The only thing that makes NT's attitude to such things different from that of Linux's is that NT's arbitrary sequences of non-zero code units used for filenames use 16-bit code units, and in Linux obviously they're 8-bit.
&lt;p&gt;
Everything else you see, such as case-insensitivity, bans on certain characters or sequences of characters, is implemented in other layers of the OS or even in language runtimes, not the kernel. Low-level programmers, just as on Unix, can call a file anything they like.&lt;/blockquote&gt;Does that mean if you code against the NT API directly, you can create files foo and FOO in the same directory?  I expect that opens up all sorts of juicy security holes - many of them theoretical, since a typical NT system has just one user and there is not much need for privelege escalation - but still it sounds fun.
&lt;blockquote&gt;using UTF-8 and blindly trusting that everything you work with is actually legal and meaningful display-safe UTF-8 are quite different things.&lt;/blockquote&gt;Indeed.  Hence the benefit of enforcing this at the OS level: it gets rid of the need for sanity checks that slow down the good programmers and were never written anyway by the bad programmers.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326087/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326087/rss</link>
      <dc:date>2009-03-29T14:31:15+00:00</dc:date>
      <dc:creator>epa</dc:creator>
      <description>
      Yes, validate every filename that comes from user space to check it is valid UTF-8 and does not have control characters.  This is not in fact an expensive operation (especially not compared to the cost of opening or creating a file in the first place).
&lt;p&gt;
Every non-Unix OS already forbids control characters in filenames so there would not be much extra checking to do in filesystems like smbfs or ntfs.  (Except out of paranoia to detect disk corruption, which is probably a good thing to do anyway.)  As you point out, there remains the question of network filesystems like NFS, where the server could legitimately return filenames containing arbitrary byte sequences.  And there would have to be some policy decision about what to do.  But I would rather have one single place to deal with the mess rather than leave it to 101 different bits of code in user space.  (Python 3.0 pretends that invalid-UTF-8 filenames do not exist when returning a directory listing; other programs will show them but may or may not escape control characters when displaying to the terminal; goodness knows what different Java implementations do.)
&lt;p&gt;
I would favour silently discarding filenames that contain control characters from the directory listing, and for those in some legacy encoding like Latin-1 or Shift-JIS, translating them to UTF-8.  (The legacy encoding would be specified with a mount parameter.  Again, this is a bit awkward but a hundred times less complicated than leaving every userspace program to do its own peculiar thing.)

&lt;blockquote&gt;Meanwhile application developers get no benefit for many years because of compatibility considerations.&lt;/blockquote&gt;Not really true.  The benefit in closing existing security holes is immediate.  In writing new code, you can note that there may be corner-case bugs on systems that permit control characters in filenames, but for 90% of the user base they do not exist.  That is 90% better than the current situation, where everyone just writes code assuming that filenames are sane, but no system enforces it.  By analogy, consider that many classic UNIX utilities had fixed limits on line length.  If I write a shell script that uses sort(1), I just write it for GNU sort and other modern implementations.  I might note that people on older systems may encounter interesting effects using my script with large input data, but I don't have to wait for every last Xenix system to be switched off before I can get the benefit in new code.
&lt;p&gt;
&lt;blockquote&gt;Personally I think the issue to look at is spaces. Spaces are legal. They are undoubtedly going to remain legal. But they are inconvenient. How can we tweak our basic Unix processes (including the shell and many old tools) so that spaces are harmless ?&lt;/blockquote&gt;
This is true in principle but in thirty years of Unix, essentially no progress has been made on this.  Nobody bothers to fix the shell or utilities such as make(1) to cope with arbitrary characters, despite much wishing that they would.  Nobody bothers to write shell scripts that cope with all legal filenames, mostly because it is all but impossible.  Instead, people who care about bug-free code end up rewriting shell scripts in other languages such as C (for example, some of the git utilities), people who think life is too short are happy to distribute software that misbehaves or has security holes, and many others just don't realize there is a problem.
&lt;p&gt;
OS X is something of a special case because of case insensitivity.  If you don't want case insensitivity then you do not need to worry about Unicode composition; just a simple byte sequence check that you have valid UTF-8.  But OS X is a useful example in another way: a case-insensitive filesystem is a much bigger break with Unix tradition that what's proposed here, and yet the world did not come to an end, and it was trivial for most Unix software to adapt.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326088/rss">
      <title>Re: Not A System Problem</title>
      <link>http://lwn.net/Articles/326088/rss</link>
      <dc:date>2009-03-29T13:54:54+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
What you're describing is not POSIX anymore. Every single POSIX app would &lt;br&gt;
need rewriting, for essentially zero gain (ooh, you can't have nulls in &lt;br&gt;
filenames: that's why UTF-8 is *defined* to avoid nulls in filenames).&lt;br&gt;
&lt;p&gt;
I'm sure users would love not being able to type in pathnames anymore, &lt;br&gt;
too.&lt;br&gt;
&lt;p&gt;
Good luck getting anyone to do it.&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326082/rss">
      <title>Re: Not A System Problem</title>
      <link>http://lwn.net/Articles/326082/rss</link>
      <dc:date>2009-03-29T10:30:45+00:00</dc:date>
      <dc:creator>ldo</dc:creator>
      <description>
      &lt;P&gt;nix wrote:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;FONT STYLE=&quot;color : #0000C0&quot;&gt;&lt;P&gt;Um, if you remove the prohibition on nulls, how do you end the filename? This isn't Pascal.&lt;/P&gt;&lt;/FONT&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Nothing to do with Pascal. C is perfectly capable of dealing with arbitrary data bytes, otherwise large parts of both kernel and userland code wouldnt work.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;FONT STYLE=&quot;color : #0000C0&quot;&gt;&lt;P&gt;And if you remove the prohibition on slashes, how do you distinguish between a file called foo/bar and a file called bar in a subdirectory foo?&lt;/P&gt;&lt;/FONT&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Simple. The kernel-level filesystem calls will not take a full pathname. Instead, they will take a parent directory ID and the name of an item within that directory. Other OSes, like VMS and old MacOS, were doing this sort of thing decades ago.&lt;/P&gt;
&lt;P&gt;Full pathname parsing becomes a function of the userland runtime. The kernel no longer cares what the pathname separator, or even what the pathname syntax, might be.&lt;/P&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326059/rss">
      <title>At last, a hope of progress</title>
      <link>http://lwn.net/Articles/326059/rss</link>
      <dc:date>2009-03-29T00:01:58+00:00</dc:date>
      <dc:creator>mikachu</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
On days when I'm feeling paranoid I always say ./* instead of just *, especially when talking to /bin/rm. On the other hand, touch -- -i in directories where you have important files is a nice trick too.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326057/rss">
      <title>Meta-discussion</title>
      <link>http://lwn.net/Articles/326057/rss</link>
      <dc:date>2009-03-28T22:21:03+00:00</dc:date>
      <dc:creator>man_ls</dc:creator>
      <description>
      Hmmm, I'm not so sure. I feel strongly about ext4 losing data, but I don't have a strong opinion about this issue. Really. Not for lack of sensitivity to the problem -- I've had an administrator at work erase a whole directory of files because of a leading space (so that 'rm -rf /dir/file' became 'rm -rf /dir/ file'). But there are advantages and disadvantages, and I cannot pick a side.
&lt;p&gt;
Bojan has only posted once, and his message contains the words &quot;not sure&quot;. I would say that this debate attracts a different subset of (opinionated) people.
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326052/rss">
      <title>Leading spaces are common, actually</title>
      <link>http://lwn.net/Articles/326052/rss</link>
      <dc:date>2009-03-28T20:36:45+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
It's called 'sort by version' because the function it calls (strverscmp()) &lt;br&gt;
was designed to sort version numbers, and because the expected use of &lt;br&gt;
ls -v was sorting a directory full of version-named directories in version &lt;br&gt;
order.&lt;br&gt;
&lt;p&gt;
(And you're right on the collation sort thing: I spoke carelessly.)&lt;br&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/326049/rss">
      <title>Wheeler: Fixing Unix/Linux/POSIX Filenames</title>
      <link>http://lwn.net/Articles/326049/rss</link>
      <dc:date>2009-03-28T19:50:37+00:00</dc:date>
      <dc:creator>dwheeler</dc:creator>
      <description>
      Thanks for your comments!  In particular, you're absolutely right about swapping the order of \t and \n in IFS - that makes it MUCH simpler.  I prefer IFS=`printf '\n\t'` because then it's immediately obvious that \n and \t are the new values.  I've put that into the document, with credit.
      
      </description>
    </item>
</rdf:RDF>

