LWN.net Logo

Quote of the week

properly quote rpath $ORIGIN so it can be passed from make to shell to configure to generated Makefile to libtool to invoked gcc without loss of valuable dollars.

It is an open question to which extent this commit should be credited to the designers of sh, autoconf, libtool, make, and/or Solaris ld.

Michael Stahl (hat tip to Cesar Eduardo Barros)


(Log in to post comments)

Quote of the week

Posted Feb 28, 2013 19:57 UTC (Thu) by nix (subscriber, #2304) [Link]

I defy anyone to look at
-Wl,-rpath,\\"\$$\$$ORIGIN:'\'\$$\$$ORIGIN/../ure-link/lib
and not think that "there should be a better way than multiple levels of quotation"? (Though, of course, there isn't, not with these tools. Make has a bunch of nearly-insoluble quotation problems anyway: try to build anything in a directory with a space, or worse, a $(...) or : in the path and burst into tears.)

Quote of the week

Posted Mar 1, 2013 1:11 UTC (Fri) by dashesy (subscriber, #74652) [Link]

Maybe to use @ instead and patch the linker?

Quote of the week

Posted Mar 1, 2013 18:03 UTC (Fri) by nix (subscriber, #2304) [Link]

That means temporary files, and now you have two problems.

Quote of the week

Posted Mar 2, 2013 19:02 UTC (Sat) by cesarb (subscriber, #6266) [Link]

You only looked at the first ones. Look further down in the patch, there are some with one more level of quotation:

-Wl$(COMMA)-rpath$(COMMA)\\"\$$\$$ORIGIN:'\'\$$\$$ORIGIN/../ure-link/lib"

Quote of the week

Posted Mar 2, 2013 20:31 UTC (Sat) by nix (subscriber, #2304) [Link]

That$(APOSTROPHE)s$(SPACE)getting$(SPACE)insane$(PERIOD)

Quote of the week

Posted Mar 2, 2013 21:27 UTC (Sat) by madscientist (subscriber, #16861) [Link]

You're talking about something quite different: you're talking about problems using make to create target files (foo.o or whatever) that have special characters in the name. You're right make is bad at that. Although $ is not a huge problem, whitespace and colons are pretty much impossible although for different reasons.

However this article is discussing command lines and here make is actually reasonable, considering the alternatives. Make does not care about any character in a command script except $. If you want a literal $ you have to double it. However, every other character including backslash (except in line-continuation context), quotes, whitespace, *, &, (), [], <>, etc. etc. is given, unmolested, to the command processor (e.g., the shell).

Although it would perhaps have been nice to avoid the need for "$$" somehow, imagine how bad it would be if make ALSO used backslash for quoting, or interpreted other characters besides "$". At least as it is the real complexity in these quoting statements is not introduced, or much exacerbated, by make.

Quote of the week

Posted Mar 3, 2013 0:59 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

I wonder…how well does make handle files named ";\n\tgcc"?

Quote of the week

Posted Mar 3, 2013 3:32 UTC (Sun) by madscientist (subscriber, #16861) [Link]

$ cat Makefile
FILE := ;\n\tgcc
$(FILE): ; @echo '$@'

$ make
;\n\tgcc

Quote of the week

Posted Mar 3, 2013 23:48 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

Yet again: the kernel has no business allowing these characters in filenames. They do only harm.

Quote of the week

Posted Mar 4, 2013 17:49 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I agree. A good restriction would be UTF-8 and no control characters. If the VFS had a utf8 and warn_nonutf8 (modulo spelling bikeshedding) mount option to handle this, I'd be happy. In the meantime, I'll keep my "I wonder about…" instincts in regards to escaping.

In reality, the thing I was interested in was something more along the lines of:

FILES := $(wildcard *.c)
all: $(patsubst %.c,%.o,%(FILES))
%.o: %.c
gcc -c '$<' -o '$>'

In the face of a file named ";\n\tgcc.c". Which breaks pretty badly with "make: gcc.o: Command not found".

Quote of the week

Posted Mar 6, 2013 22:31 UTC (Wed) by Jandar (subscriber, #85683) [Link]

The kernel has no business in patronise me about filenames I like. If you want to be restricted a mount option like -nodev would be right: -filename_characteristic=(undisturbing|no_shell_quote|no_profanity|raw).

Quote of the week

Posted Mar 7, 2013 16:53 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

You *like* having vertical tabs, bells, newlines, etc. in your filenames? If so, I trust that all of the shell and command-line globbing you do is safe. Control characters can definitely be kernel-enforced and I think that should be the default.

For encoding on the other hand, enforcing UTF-8 should be an option when mounting (with a default of warning on non-UTF-8 filenames). I can see the usecase for non-UTF-8 filenames (much as I think that it's silly since I don't think there's a per-directory LC_FILENAME (or even file if you have quite the mix), but whatever).

Quote of the week

Posted Mar 7, 2013 17:41 UTC (Thu) by Jandar (subscriber, #85683) [Link]

I don't have control about all filesystems. Network-filesystems or movable ones on UBS-sticks may have strange characters within filenames. It doesn't matter if my own created and populated filesystems have only sane filenames (for an arbitrary value of sane). A program unable to handle strange characters within filenames is only usable on a computer not interacting with the real world.

Quote of the week

Posted Mar 7, 2013 18:01 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

There is the problem of how to handle such filenames when they do pop up. The kernel can't act as if they don't exist (DoS by creating a large file that "doesn't exist"), and it can't deny the filename in general (stat/lstat, rename, unlink would all be good to be able to call on the thing to fix the problem).

How does Windows handle filenames which are NTFS-valid but not Windows-valid? Or is Windows-valid defined as NTFS-valid these days (which I also would presume is a superset of FAT-valid)?

Quote of the week

Posted Mar 7, 2013 19:29 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> (DoS by creating a large file that "doesn't exist")

Alias files with invalid names to "INVALID_FILE_NAME~1" or something like that. These names don't have to be stable.

> How does Windows handle filenames which are NTFS-valid but not Windows-valid? Or is Windows-valid defined as NTFS-valid these days (which I also would presume is a superset of FAT-valid)?

You mean NT-valid? As far as NT is concerned, filenames are sequences of UCS-2 codepoints. There's no validation or normalization. Win32 wraps the NT filesystem API, and although Win32-invalid files are enumerated, the Win32 API cannot enumerate these files. User-mode programs can, however, just call the NT APIs.

Quote of the week

Posted Mar 8, 2013 19:13 UTC (Fri) by JanC_ (guest, #34940) [Link]

NTFS is a POSIX compatible filesystem, so it can work with every filename that works with any of the "native" linux filesystems, including case sensitiveness, etc. (And I think NTFS supports UTF-8 too, not only UCS-2.)

I'm pretty sure that the Windows APIs don't work with those features very well though...

Quote of the week

Posted Mar 18, 2013 15:15 UTC (Mon) by nye (guest, #51576) [Link]

>I'm pretty sure that the Windows APIs don't work with those features very well though...

I have in the past had files on NTFS filesystems with colons in the name[0]. I don't recall whether the name of the file was correctly listed in that case, though I *think* it was, however a lot of accesses would result in the process blocking indefinitely and needing a reboot to kill, ie. the same thing as being stuck in uninterruptible sleep on Unix.

This was back on Windows XP mind, so maybe more recent versions behave less badly, but I wouldn't bet on it.

If you are using a version of Windows with SFU/SUA/whatever it's called now then you can do what you like with those files from there, because the POSIX subsystem is built directly on the Windows native API which supports it just fine. OTOH Cygwin of course is layered on top of win32[1], so inherits all of its problems.

In the unlikely event that anyone is interested, I went into this with a little more detail a few years back: http://blog.steamsprocket.org.uk/2010/02/26/posix-file-se...

[0] Created by telling Amarok to organise music files based on the tags, and lacking the foresight to look up characters disallowed by win32 and munge them.

[1] Actually I'm fairly sure Cygwin does use the native API in some places to provide certain features, but in the main it sits on top of win32.

Quote of the week

Posted Mar 7, 2013 18:06 UTC (Thu) by viro (subscriber, #7872) [Link]

Control characters are function of encoding; kernel really has no business enforcing either. Pathname is a zero-terminated stream of octets; the only special ones are 0x2f ('/' in ASCII or UTF8) splitting it into segments and to some extent 0x2e ('.' in the same) - when segment consists of solitary 0x2e or pair of 0x2e, it is gets special interpretation. Other than that, kernel does not and should not care. How you map your userland encoding onto that is your business. And what gets interpreted as bells, vertical tabs, etc. is *really* none of the kernel business. Hell, should we ban things like 0x9b in pathnames? VT100 interprets it as a special character, after all...

If you want VFS to enforce anything of that kind, feel free to fork the kernel and maintain the damn thing yourself. Any patches of that kind will be NAKed.

Quote of the week

Posted Mar 7, 2013 19:22 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> Other than that, kernel does not and should not care.

You haven't addressed the reason the kernel should not care. The kernel is in an excellent position to reduce the likelihood real users will be harmed. Why *not* impose these restrictions? You're just saying that the kernel does not today impose any structure on filenames, so it never should, no matter the advantages gained by doing so. You're arguing by assertion. Perhaps you're accustomed to winning arguments this way, but it doesn't make your position any stronger.

> Any patches of that kind will be NAKed.

That's unfortunate. Maybe libc can provide the same benefits --- unless a special environment variable is set, hide these files from directory enumerations, and delete them transparently if rmdir(2) would otherwise fail due to these hidden files being left in a directory.

Quote of the week

Posted Mar 7, 2013 19:55 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> Maybe libc can provide the same benefits --- unless a special environment variable is set, hide these files from directory enumerations, and delete them transparently if rmdir(2) would otherwise fail due to these hidden files being left in a directory.

No discussion about bad characters in filenames would be complete without a link to http://www.dwheeler.com/essays/fixing-unix-linux-filename.... It covers lots of options, and talks about why a libc-level fix isn't necessarily going to work.

Quote of the week

Posted Mar 7, 2013 20:55 UTC (Thu) by viro (subscriber, #7872) [Link]

BTW, there's a fairly strong reason why enforced UTF8 for filenames alone won't work. There's such thing as text file contents. _If_ we lived in a world where we could say "everything is in UTF8", life would be much simpler. As it is, consider a situation when you have a text file in e.g. KOI8. With cyrillic-containing filenames mentioned in it. Do you put those in UTF8 or in KOI8? Life would be *much* nicer, if all we had to cope with had been in the same encoding; no arguments about that. Just as it would be much nicer if we didn't have to deal with languages like sh(1). I like Plan9. For many reasons, including uniform UTF8 for everything. _And_ rc(1) instead of sh(1). See dwheeler's point regarding the impossibility to get rid of srb's little monster and its bigger offspring...

Quote of the week

Posted Mar 8, 2013 21:36 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

As a Cyrillic user, I very much HATE non-UTF8 names.

Most of my editors support recoding just fine, but I'm still encountering filesystems with rubbish in filenames. And they're much more complex to deal with.

Quote of the week

Posted Mar 7, 2013 20:19 UTC (Thu) by viro (subscriber, #7872) [Link]

> You haven't addressed the reason the kernel should not care.

Yes, I have. The set you want to filter out is a function of encoding (which the kernel has no notion of), the set of userland code likely to run on that system (ditto; different interpreted languages have different sets that need to be quoted) and your preferences.

Policy like that doesn't belong in the kernel. And drop the demagoguery, please. You demand assistance in imposing your personal preferences on everything and accuse those who have the gall to refuse of arguing by assertion? Nice... BTW, what's the difference between "user" and "real user"? Just curious...

Quote of the week

Posted Mar 7, 2013 20:46 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> The set you want to filter out is a function of encoding (which the kernel has no notion of)

Not today, no, but it could. Even absent explicit knowledge of encoding, the kernel can get very far under the assumption that a byte below 0x20 is a forbidden control character. This scheme works even for Shift-JIS. Additionally, forbidding leading 0x2D bytes still allows the full expression of characters in almost all encodings, even Shift-JIS.

The scheme I'm proposing is compatible with any character set that is itself compatible with the kernel's use of 0x2F as a directory separator.

> imposing your personal preferences

This issue has nothing to do with personal preference. I wasn't on the POSIX committee. I didn't write the Bourne shell. The way we interpret filenames is not my "personal preference": it's a reality. I'm trying to make real users safer. You're the one making systems less robust because you don't want to think about encodings.

> And drop the demagoguery, please.

Demagoguery? You're the one calling legitimate technical options "personal preferences", calling mitigation approaches "impositions" (is ASLR also an imposition of personal preferences?), and generally not touching the core point, which is that a component in a position to significantly improve the robustness of millions of lines of code without at the same time causing significant adverse side effects, well, should do so.

You need to either identify why the problem we're trying to ameliorate isn't actually a problem, describe why the solution we're discussing doesn't actually address the problem, or describe why the cost of this solution is too high.

Charitably, I can take your comment as suggesting that the cost of teaching the kernel about filename encodings is too high. I don't agree. The kernel doesn't need to know about particular encodings because all commonly-used encodings have enough in common that a byte-by-byte coding-agnostic analysis will suffice.

Quote of the week

Posted Mar 7, 2013 21:03 UTC (Thu) by viro (subscriber, #7872) [Link]

0x9b > 0x2d and it *is* a control character, so you still need to filter output. A bunch of shell metacharacters are outside of that range as well, so you still need to quote. And then there's SQL/HTML/make/etc. - any number of languages with their own needs wrt quoting...

Quote of the week

Posted Mar 7, 2013 21:23 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

> 0x9b > 0x2d and it *is* a control character

Sure, but even if you restrict the filtering to characters below 0x20 (because characters between 0x20 and 0x2D are used all over the place) and to leading 0x2D, you still eliminate a big class of problems, particularly in contexts other than terminal display. 0x9B isn't special to the shell, but 0xA is. There are lots of shell idioms that break only when filenames contain newlines. Can't we just make these idioms work all the time?

> And then there's SQL/HTML/make/etc. - any number of languages with their own needs wrt quoting...

So? The kernel can't solve all problems, but it can go a long way toward eliminating a very well-known subset of "all problems relating to metacharacter injection".

Quote of the week

Posted Mar 7, 2013 22:21 UTC (Thu) by viro (subscriber, #7872) [Link]

> Can't we just make these idioms work all the time?

Unless you can guarantee that your script is never runs on older and/or non-Linux kernels, you _still_ need to quote properly. And if you can guarantee that, the usual objections re -print0 and its ilk being non-portable do not apply.

PS: the verb "impose" had been brought into that thread by yourself. The context had been about the kernel being in excellent position to impose the restriction you asked for. Check it yourself. And yes, it smells like demagoguery, especially since you proceeded to complain indignantly that I was treating something like an imposition, etc.

Quote of the week

Posted Mar 8, 2013 21:39 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Hey! I have an idea.

How about removing all this ACL crap? Everything should just be world-writable. After all, we all know that all editors are vulnerable and all kernels have locally exploitable rootholes.

Quote of the week

Posted Mar 7, 2013 20:03 UTC (Thu) by sfeam (subscriber, #2841) [Link]

So how would you propose to enforce UTF-8 only? Scan the entire filesystem during mount to see if you like the file names? That sounds like a recipe for painfully slow mounts. And what if it does encounter an non-UTF name - refuse to mount? mangle it? How does the user recover from this? Would corruption of a file name, intentional or otherwise, make the volume unmountable in the future?

Even if the option only pertained to file creation on an already mounted volume - how would that work? I get sent tarballs or zip data archives from people using SJIS encoding for example. Yeah it's a pain, but refusing to unpack the thing would only make matters worse.

Quote of the week

Posted Mar 7, 2013 22:23 UTC (Thu) by raven667 (subscriber, #5198) [Link]

> So how would you propose to enforce UTF-8 only? Scan the entire filesystem during mount to see if you like the file names? That sounds like a recipe for painfully slow mounts.

No, that sounds silly. It would be enforced when files are created/(re)named and when those names are read before giving the names to userspace.

> And what if it does encounter an non-UTF name - refuse to mount? mangle it? How does the user recover from this? Would corruption of a file name, intentional or otherwise, make the volume unmountable in the future?

To begin with it should probably just log a warning about the invalid filename (without having an attacker break the logging system 8-) and return to userspace as normal. After a few years it could be made mandatory with an option to turn it back to a warning or to turn validation off.

The kernel probably shouldn't return filenames that fail validation or allow them to be created and return an error instead. Worst case an invalid file could be renamed into lost+found with its inode number but that might be too invasive.

> I get sent tarballs or zip data archives from people using SJIS encoding for example.

That seems to be a thorny problem. Those filenames are going to be broken anyway unless your whole environment is all operating in SJIS mode, right? Or are the cases you care about already have encodings that overlap with UTF8, the way ASCII does? if your environment already is dealing with multiple legacy encodings then it may be able to normalize the encoding to UTF8 before using names in syscalls. it seems like it'll take a long time before every utility that uses open() can translate between whatever encoding the app/env/term is using and the system encoding. At the least this will only break for people still using legacy encodings, that might be enough reason to scrap the idea.

Quote of the week

Posted Mar 7, 2013 22:57 UTC (Thu) by viro (subscriber, #7872) [Link]

FWIW, for cyrillic there are *four* encodings in real use, counting UTF8. Alt (cp866), Windows (1251), KOI8 and UTF8. There's also 8859-5, but I'm yet to see that used in the wild. Other four are used quite a bit; fortunately, UTF8 becoming more and more common. Other three are 8bit, with lower half being ASCII-compatible. KOI8 puts cyrillic letters into 0xc0--0xff (0x80|latin transliteration, mostly), Alt is 0x80--0xaf/0xe0--0xef (alphabetic order, skipping the IBM pseudographic range - fancy borders/not-quite-ASCII art/etc.; used in files that originated on DOS), CP1251 is 0xc0--0xff alphabetic (use on Windows). Note that Alt *does* hit 0x9b...

And yes, it's a goddamn mess. To make life really interesting, there's a bunch of fun with TeX - there had been several cyrillic font families, with different encodings used, so an old paper sitting around may be nasty. So can old documentation, for that matter - transliterating from Alt to KOI or UTF isn't a problem, but then you might want to rip your eyes out trying to make sense of what used to be a diagram describing a hardware register, with some well-intentioned twit having done boxes and arrows in that kind of pseudo-ASCII-graphics back in '93 or so...

Quote of the week

Posted Mar 7, 2013 23:02 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

Would you be more open to filtering out 0x9B and other control bytes in a few years, once UTF-8 becomes truly ubiquitous? Would you be open to the kernel performing (perhaps optionally) UTF-8 validation and normalization at that time as well?

Quote of the week

Posted Mar 8, 2013 0:41 UTC (Fri) by mgedmin (subscriber, #34497) [Link]

0x9B is used as a continuation byte for actual characters in UTF-8, so it can't possibly be filtered out without breaking valid filenames.

I assume UTF-8-capable terminal emulators are able to cope.

Quote of the week

Posted Mar 8, 2013 21:44 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

So stop compounding the problem!

Just add a freaking UTF-8 filter (optional) and synthetic names. Besides, kernel already DOES support encodings (for mounting of FAT filesystems). And Linux ALREADY has a notion of NLS: http://lxr.free-electrons.com/source/include/linux/nls.h

Quote of the week

Posted Mar 7, 2013 23:39 UTC (Thu) by sfeam (subscriber, #2841) [Link]

>> I get sent tarballs or zip data archives from people using SJIS encoding for example.
> That seems to be a thorny problem. Those filenames are going to be broken anyway unless your whole environment is all operating in SJIS mode, right?

The filenames are broken, yes, although they are at least guaranteed to be unique. Fortunately the file contents are usually binary, so text encoding is not relevant. So long as I can figure out which file is which, I can rename it to a UTF-8 encoded equivalent. E.g. filter the output of `tar tzvf` through an SJIS to UTF8 conversion step. But knowing how to rename the files would not help if the system were refusing to unpack the files onto disk in the first place.

Quote of the week

Posted Mar 8, 2013 15:48 UTC (Fri) by Jandar (subscriber, #85683) [Link]

> Those filenames are going to be broken anyway unless your whole environment is all operating in SJIS mode, right?

The environment has to be *agnostic* about the meaning of individual bytes apart from '/' and '\0'. Any interpretation of the bytes composing a filename should be restricted to the end-user interface, may it be ls in a xterm or a gui.

Quote of the week

Posted Mar 8, 2013 16:43 UTC (Fri) by raven667 (subscriber, #5198) [Link]

But that's part of the problem, the environment is very much not agnostic about the meaning of the bytes, it presumes and encoding and often that filenames are simple alphanumeric strings even to the point of having difficulty if there is whitespace in the filename, let alone terminal control characters and whatnot.

One could implement a standard encoding and a whiltelist of acceptable characters but that would be difficult and enormous and would still have problems with existing software, the other option would be for all existing software to filter characters the can't handle, to not have bugs, which is an impossible dream. Maybe instead of something in the kernel something in glibc or a library suitable for LD_PRELOAD could at least be a proof of concept that validating characters in filenames has merit, or alternately is unworkable.

Quote of the week

Posted Mar 11, 2013 12:07 UTC (Mon) by dgm (subscriber, #49227) [Link]

> the environment is very much not agnostic about the meaning of the bytes

Correct. And _each_ environment has the responsibility to filter what it cannot interpret. The kernel has no business into to that, because there are _many_ possible environments, each with it's own limitations.

Quote of the week

Posted Mar 11, 2013 14:51 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Yeah. And isn't it fun when you have to use two environments with disjoint requirements? Fun funfunfunfunfun!!!

Just imagine the cool double and triple-quoting necessary to deal with complex paths in sed invoked from $(eval) in Makefiles! Nobody should be deprived of this. No, we must preserve this insanity at all costs.

Guys, simply normalizing everything to UTF-8 is not going to harm any sane filesystem uses. And for insane users a mount-time option can be added.

I'm actually tempted to write patch and submit it - Linux already has all the required NLS pieces.

Quote of the week

Posted Mar 11, 2013 15:20 UTC (Mon) by daglwn (subscriber, #65432) [Link]

There is in fact a better way. Pass the string "ORIGIN" to a make function and $(eval) it.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds