User: Password:
|
|
Subscribe / Log in / New account

Control characters in file names

Control characters in file names

Posted Nov 23, 2010 20:55 UTC (Tue) by Yorick (subscriber, #19241)
In reply to: Control characters in file names by zlynx
Parent article: Ghosts of Unix past, part 4: High-maintenance designs

I'm in no way suggesting "huge complicated rulesets", only to expand the set of disallowed bytes from {0, '/'} to {0..31, '/'}. I believe the benefits of doing so would outweigh the disadvantages, of which precious few have been shown.

I'm also curious what file names can be created on OS X that "it can't handle", and how.
(Editing raw bytes on disk doesn't count - that way invalid file names could be created in ext2 as well.)


(Log in to post comments)

Control characters in file names

Posted Nov 23, 2010 21:26 UTC (Tue) by jzbiciak (subscriber, #5246) [Link]

I wonder if this could just be controlled with a feature flag and tune2fs and/or a mount option? When set, just disallow creating files which have the troublesome characters. Still allow access to existing files with the troublesome characters, though, so you never have "files you can't get to."

Invariably, whenever I've created filenames with control characters in them, it's been through some strange fat-fingering. I would rather have the OS throw those files away. :-)

Control characters in file names

Posted Nov 24, 2010 0:40 UTC (Wed) by zlynx (subscriber, #2285) [Link]

An example from my Mac laptop. It was created by a recursive wget from the terminal. This file has been in my .Trash for a year now...

$ ls ShowXml.asp?user_group=5&user_path=user1%2F30462&userid=30462&blogname=ѩӣ֮%C0%E1

$ ls | xxd
0000000: 5368 6f77 586d 6c2e 6173 703f 7573 6572  ShowXml.asp?user
0000010: 5f67 726f 7570 3d35 2675 7365 725f 7061  _group=5&user_pa
0000020: 7468 3d75 7365 7231 2532 4633 3034 3632  th=user1%2F30462
0000030: 2675 7365 7269 643d 3330 3436 3226 626c  &userid=30462&bl
0000040: 6f67 6e61 6d65 3dd1 a9d0 b8cc 84d6 ae25  ogname=........%
0000050: 4330 2545 310a                           C0%E1.

I think that is just the representation the terminal sees and not what is actually in the filesystem because any attempt to delete with that name results in a file not found.

Control characters in file names

Posted Nov 24, 2010 11:08 UTC (Wed) by vonbrand (guest, #4458) [Link]

Have you tried quoting that (with ', not ")? There are many characters special to the shell in there. Sometimes a "rm -i *" helps by giving the "correct" filename to the selection. In very recalcitrant cases, you could write a proggie that unlinks the file by hardcoded name...

Control characters in file names

Posted Nov 24, 2010 12:50 UTC (Wed) by Yorick (subscriber, #19241) [Link]

Interesting - I tried creating a file by that name in OS 10.5, but when reading out the resulting name, the last two combining characters (U+0304 and U+05ae, corresponding to cc 84 and d6 ae respectively) had been transposed, presumably for reasons of canonical order. I had no problems removing it afterwards.

This would explain why you had trouble removing the file but now how it came to be created in the first place. I have heard claims that the normalisation algorithm in OS X has changed between versions; perhaps you upgraded your system between the creation and removal of the file? It could also just be a plain bug, of course, or a RAM single-bit error, etc. If you can reproduce it, I'm sure Apple would like to know about the bug.

Control characters in file names

Posted Nov 24, 2010 17:01 UTC (Wed) by michaeljt (subscriber, #39183) [Link]

> I'm in no way suggesting "huge complicated rulesets", only to expand the set of disallowed bytes from {0, '/'} to {0..31, '/'}. I believe the benefits of doing so would outweigh the disadvantages, of which precious few have been shown.

It seems to me that what is broken here is the shell language, not the filesystem, and encouraging people to use better languages is the right fix. For what it's worth, I have had occasion to miss the '/' character that you can't have in Unix filenames. I find it rather silly that that character is encoded into the low-level APIs (the kernel in this case, although having it in the libc API would amount to the same) instead of letting higher levels handle it.

Control characters in file names

Posted Nov 29, 2010 9:57 UTC (Mon) by quotemstr (subscriber, #45331) [Link]

Good luck with that. Unix shells won't change any time soon. It's hard enough to get filenames with whitespace working properly.

Forbidding control characters has an immediate upside and comes at almost zero cost. We could do it tomorrow, and nobody would notice except for increased robustness. Not doing that and instead pining for a perfect solution is just unrealistic.

Control characters in file names

Posted Nov 29, 2010 10:12 UTC (Mon) by michaeljt (subscriber, #39183) [Link]

> Good luck with that. Unix shells won't change any time soon.

I wouldn't be quite that pessimistic. Unix shells won't change, but now there is python which is being used for a lot of things that shell used to be, and things like upstart and systemd are also reducing the need for new shell code. Old shell code won't all go away, but problems in it can be fixed, and if less new shell code is written the problem is greatly reduced.

Control characters in file names

Posted Nov 29, 2010 18:24 UTC (Mon) by dlang (subscriber, #313) [Link]

Python has not replaced shell in many areas, and probably never will (just like Perl is used in a lot of places where shell used to be used, but will never replace shell)

Control characters in file names

Posted Dec 2, 2010 17:25 UTC (Thu) by Ross (guest, #4065) [Link]

I don't know anyone using Python as their login shell. That would be a pretty terrible idea IMHO. On the other hand shell does make it easy to handle filenames wrong (just leave out some quotes and it will still seem to work) so python scripts are more likely to do a good job. That's not really going to fix the problem though, no matter how popular Python becomes for scripting compared to shell.

Control characters in file names

Posted Dec 2, 2010 17:40 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

> I don't know anyone using Python as their login shell.

I was assuming that the biggest problem with filenames and shell script came from actual well-known script files that got exploited, not stuff typed in at the command line. I'm sure that can be a problem too of course.

Control characters in file names

Posted Nov 29, 2010 14:40 UTC (Mon) by nix (subscriber, #2304) [Link]

Handling directory separation at a higher level would be a classic That Hideous Name nightmare, converting paths from a simple string to a complex structure involving N components with associated lengths, which would *still* have to be somehow converted to a string a lot of the time: and if you can do that, you need a quoting mechanism to make it unambiguous, and if you have *that*, you could use the same quoting mechanism at input time, and still retain the /.

The current situation with /-and-no-quoting-characters is simplest of all, and eliminates the numerous attacks we have seen on SQL and other languages involving incorrectly processed quoting characters.

Control characters in file names

Posted Dec 2, 2010 17:27 UTC (Thu) by Ross (guest, #4065) [Link]

Thank you for the sanity :)

Yes, having to concoct paths with some helper function in some nasty encoding would not be an improvement. If people think it's too hard for scripts to handle spaces -- wait until filenames can have slashes in them too!

Control characters in file names

Posted Nov 24, 2010 18:41 UTC (Wed) by brother_rat (subscriber, #1895) [Link]

I don't know about "can't handle", but there are definitely quirks.

One quirk with OSX is that the GUI is consistent with earlier Macs that permitted / in filenames (when : was used as the folder separator), but as / is now restricted to be the folder separator the two characters are swapped over behind the scenes.

This causes very odd bugs with GUI tools that launch CLI utilities. For example, Hugin uses make to process photos, and make doesn't support : in filepaths. However many users put photos in folders with a date in the name, and the dates.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds