perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 0:36 UTC (Sat) by Richard_J_Neill (subscriber, #23093)
Parent article: Improved block-layer error handling

We recently hit a bug where the disk had plenty of free space, but couldn't create new files, making the server unusable. It turns out, we'd run out of inodes (due to a misbehaving web-app creating hundreds of 0-byte lock files per minute). It was really hard to diagnose this, because of the lack of any helpful messages. I'd have expected that, if the kernel encounters a hard error like this, it would have at least put something into dmesg or syslog (it didn't). The design philosophy seems to be that running out of Inodes is more akin to a permissions error (i.e. nothing wrong with the system), than to a fatal disk error, and that, while even a trivial usb hotplug event generates lots of log traffic, an unusable root filesystem (from inode exhaustion) is deemed not important enough to merit a log message!

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 4:18 UTC (Sat) by k8to (guest, #15413) [Link]

The applications are all told ENOSPC in this situation, so lots of them should be complaining and some of that should be hitting logs.

It's unclear to me that the kernel should also log for each such failure. It might be so noisy as to cause more breakage. I would want the system to do something like log when this situation is near-occurring and when it has occurred in some throttled way, which suggests monitoring logic. Should that be implemented in-kernel or in userland?

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 7:44 UTC (Sat) by MarcB (guest, #101804) [Link] (6 responses)

I don't think running out of free inodes is conceptually different from running out of free space. Also, it is not a problem per se from the kernel's PoV: It cannot be prevented, it does not happen at random, it is not something exceptional at all.
It is just another resource exhaustion that user space has to deal with - and perhaps even is dealing with, so nothing is actually wrong.

Also, this used to be much more common in the past, when many filesystems allowed much fewer inodes by default. So, perhaps some administrators simply have forgotten (or never learned) that inode exhaustion is a real thing.

And diagnosing this - once you are aware that it can happen - is not harder than diagnosing "out of space" (in practice: even easier, as is is unlikely that large numbers of inodes are held by deleted yet opened files).
It can, and should, also be monitored just like free disk space.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 23:16 UTC (Sat) by Richard_J_Neill (subscriber, #23093) [Link] (5 responses)

Yes, you're right, excepting that the common "no space left on device" message is actually very misleading when there is in fact plenty of space.

Also, while the sysadmin can add extra monitoring and debugging, surely the point of a reliable system is to minimise the chance of human error.
We are used to the abstraction of a storage being "somewhere you can fill up with data"; the very existence of inodes should be no more the concern of the average programmer/sysadmin than the specifics of which pointer has which address... it should be "the computer's" problem, not "the operator's problem". If the computer is going to break that rule, and do so rarely, but catastrophically, the least it can do is to fail "noisily".

Anyway... in these days of LVM and resizeable volumes, why shouldn't the filesystem be able to automatically notice that it has lots of space but too few inodes, and automatically create more inodes as needed?

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 4, 2017 1:39 UTC (Sun) by rossmohax (guest, #71829) [Link] (1 responses)

that is exactly what XFS is doing, inodes are allocated dynamically and you can never run out of them as long as you have free space. Try using XFS instead of ext4, it is awesome

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 4, 2017 5:09 UTC (Sun) by matthias (subscriber, #94967) [Link]

Even XFS can run out of inodes. Inode numbers are mapped to blocks by a very simple mapping. It is roughly like every i-th block can have inodes. The possible inodes are just numbered starting by one. Once these blocks are filled (with inodes or data), XFS cannot create new inodes. Changing the mapping would change every single inode number.

We had once the following problem after growing a filesystem. Standard was at that time to only use 32-bit inode numbers. After growing the filesystem the 32-bit inode numbers where all in the already filled lower part of the filesystem.(*) Thus no new inodes could be created. Took a while to find that one only having the meaningful message "No space left on device.". Luckily it was a 64-bit system. Thus, we could just switch to 64-bit inode numbers. The other solution would have been to recreate the filesystem, not the quickest solution with a 56 TB filesystem.

That said the circumstances under which XFS runs out of inodes are very rare. So it would be very important to have meaningful error messages, to notice that one of these rare circumstances just happened.

(*) On fs creation XFS usually chooses the number i to be such that all possible inodes have 32-bit numbers. After growing this condition was not satisfied any more, as this number cannot be changed. On 32-bit systems, one would need to set this number i manually at fs creation time, if one wants to have the possibility to grow the filesystem.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 4, 2017 14:15 UTC (Sun) by MarcB (guest, #101804) [Link] (2 responses)

Remember that the possible error codes for syscalls were defined by POSIX, so simply adding an EOUTOFINODES would be non-compliant and could easily do more harm then good, because in practice, ENOSPC is a good fit for "out of inodes" and software might actually expect it to cover both cases:

If the software is some kind of cache, discarding the files that are least relevant is a proper course of action for both kinds of ENOSPC.
If the software is some kind of archival system, moving the oldest files to the next tier of storage will also help in both cases.

If the software can't freely discard or move data, all it can do, is scream for help, anyway.

Also, an ENOSPC due to lack of inodes will usually happen on open() while an ENOSPC due to lack of disk space will usually happen on write() or similar.
So applications could already translate this to proper error messages. It is common that the same error code has different meaning for different syscalls and developers should know this.

Of course, ideally filesystems would solve this problem completely. In fact, some do: btrfs has an upper limit of 2^64 inodes, as does XFS or ZFS (might be 2^48).
btrfs is fully dynamic, i.e. each btrfs, that is large enough to hold the inode information, can in fact contain 2^64 inodes. XFS is dynamic enough in practice (make sure to use "inode64", though. Otherwise inodes can only be stored in the lowest 1 TB, and that space can run out if also used for file data - been there, done that). Even NTFS allows 2^32 and is also fully dynamic

The ext-family is the big exception. Theoretically, the limit is also 2^32, but it cannot allocate space for inodes dynamically, and thus uses much lower limits by default. Otherwise, each inode would consume 256 bytes, even if unused.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 5, 2017 11:55 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)

Remember that the possible error codes for syscalls were defined by POSIX, so simply adding an EOUTOFINODES would be non-compliant and could easily do more harm then good, because in practice, ENOSPC is a good fit for "out of inodes" and software might actually expect it to cover both cases

It might well do more harm than good, but the first part of your statement is just wrong. POSIX.1 2008 states (and all previous versions have similar wording):

Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring.
The ERRORS section on each reference page specifies which error conditions shall be detected by all implementations (``shall fail") and which may be optionally detected by an implementation (``may fail"). If no error condition is detected, the action requested shall be successful. If an error condition is detected, the action requested may have been partially performed, unless otherwise stated.
Implementations may generate error numbers listed here under circumstances other than those described, if and only if all those error conditions can always be treated identically to the error conditions as described in this volume of POSIX.1-2008. Implementations shall not generate a different error number from one required by this volume of POSIX.1-2008 for an error condition described in this volume of POSIX.1-2008, but may generate additional errors unless explicitly disallowed for a particular function.

So adding more errors is not only not noncompliant, it is both explicitly permitted and very common.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 5, 2017 16:15 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> So adding more errors is not only not noncompliant, it is both explicitly permitted and very common.

Yes, for *new* error conditions not specified by POSIX. However:

> Implementations shall not generate a different error number from one required by this volume of POSIX.1-2008 for an error condition described in this volume of POSIX.1-2008, ...

The error list for the open() and openat() system calls specifies ENOSPC as follows:

> [ENOSPC]
> The directory or file system that would contain the new file cannot be expanded, the file does not exist, and O_CREAT is specified.

So if "the filesystem ... cannot be expanded" is read to include the "out of inodes" condition (a reasonable interpretation IMHO) then POSIX requires open() to return ENOSPC for this condition, and not some other error code.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 8:31 UTC (Sat) by matthias (subscriber, #94967) [Link] (2 responses)

As the other commenters, I agree that running out of inodes should not be a problem of the kernel. However the error reporting could be improved. Returning ENOSPC when the actual problem is running out of inodes is misleading. The user has to know that the error number is also used for other reasons than "No space left on device". Today, probably many users do not even know that they can run out of inodes. Even if they know this in theory, they have to remember this when seeing ENOSPC.

I would much prefer error reporting by exceptions. The type of the exception more or less corresponds to the error numbers and can be used by the program to determine how to react, but there is a string attached that can be passed up the call chain, which has meaningful information for the user. This way the program still gets the information contained in ENOSPC (actually most programs are fine to react to running out of space and running out of inodes in the same way), but the user which sees the error message knows instantly where to search for the problem.

Adding type inheritance to the exceptions additionally allows the program to select how fine grained the error information should be. Some programs are fine seeing an IO exception. Others want to differentiate whether the error is running out of resources or a real problem and some might want to know the difference between running out of space and running out of inodes.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 4, 2017 1:42 UTC (Sun) by rossmohax (guest, #71829) [Link]

you don't need exceptions to have error inheritance.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 4, 2017 3:39 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Exceptions imply some kind of a type system. I'd settle for something like: "error.filesystem.io.disk-space/required=1233/available=123" where I can use simple prefix matching to get more and more detailed error.

perhaps running out of inodes could be taken "more seriously"?

Posted Jun 3, 2017 10:32 UTC (Sat) by itvirta (guest, #49997) [Link]

Now that you learned about the issue of inodes running out, you know to add it to your monitoring.
It's very much the same as running out of disk space, which isn't that uncommon with some logging
getting out of hand either. Both can be checked with `df`.

Also, there's the possibility of distributing unrelated data on separate file systems, or using quotas to
protect the rest of the system from an application getting out of hand.