perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
Posted Jun 3, 2017 0:36 UTC (Sat) by Richard_J_Neill (subscriber, #23093)Parent article: Improved block-layer error handling
Posted Jun 3, 2017 4:18 UTC (Sat)
by k8to (guest, #15413)
[Link]
It's unclear to me that the kernel should also log for each such failure. It might be so noisy as to cause more breakage. I would want the system to do something like log when this situation is near-occurring and when it has occurred in some throttled way, which suggests monitoring logic. Should that be implemented in-kernel or in userland?
Posted Jun 3, 2017 7:44 UTC (Sat)
by MarcB (guest, #101804)
[Link] (6 responses)
Also, this used to be much more common in the past, when many filesystems allowed much fewer inodes by default. So, perhaps some administrators simply have forgotten (or never learned) that inode exhaustion is a real thing.
And diagnosing this - once you are aware that it can happen - is not harder than diagnosing "out of space" (in practice: even easier, as is is unlikely that large numbers of inodes are held by deleted yet opened files).
Posted Jun 3, 2017 23:16 UTC (Sat)
by Richard_J_Neill (subscriber, #23093)
[Link] (5 responses)
Also, while the sysadmin can add extra monitoring and debugging, surely the point of a reliable system is to minimise the chance of human error.
Anyway... in these days of LVM and resizeable volumes, why shouldn't the filesystem be able to automatically notice that it has lots of space but too few inodes, and automatically create more inodes as needed?
Posted Jun 4, 2017 1:39 UTC (Sun)
by rossmohax (guest, #71829)
[Link] (1 responses)
Posted Jun 4, 2017 5:09 UTC (Sun)
by matthias (subscriber, #94967)
[Link]
We had once the following problem after growing a filesystem. Standard was at that time to only use 32-bit inode numbers. After growing the filesystem the 32-bit inode numbers where all in the already filled lower part of the filesystem.(*) Thus no new inodes could be created. Took a while to find that one only having the meaningful message "No space left on device.". Luckily it was a 64-bit system. Thus, we could just switch to 64-bit inode numbers. The other solution would have been to recreate the filesystem, not the quickest solution with a 56 TB filesystem.
That said the circumstances under which XFS runs out of inodes are very rare. So it would be very important to have meaningful error messages, to notice that one of these rare circumstances just happened.
(*) On fs creation XFS usually chooses the number i to be such that all possible inodes have 32-bit numbers. After growing this condition was not satisfied any more, as this number cannot be changed. On 32-bit systems, one would need to set this number i manually at fs creation time, if one wants to have the possibility to grow the filesystem.
Posted Jun 4, 2017 14:15 UTC (Sun)
by MarcB (guest, #101804)
[Link] (2 responses)
If the software is some kind of cache, discarding the files that are least relevant is a proper course of action for both kinds of ENOSPC.
If the software can't freely discard or move data, all it can do, is scream for help, anyway.
Also, an ENOSPC due to lack of inodes will usually happen on open() while an ENOSPC due to lack of disk space will usually happen on write() or similar.
Of course, ideally filesystems would solve this problem completely. In fact, some do: btrfs has an upper limit of 2^64 inodes, as does XFS or ZFS (might be 2^48).
The ext-family is the big exception. Theoretically, the limit is also 2^32, but it cannot allocate space for inodes dynamically, and thus uses much lower limits by default. Otherwise, each inode would consume 256 bytes, even if unused.
Posted Jun 5, 2017 11:55 UTC (Mon)
by nix (subscriber, #2304)
[Link] (1 responses)
The ERRORS section on each reference page specifies which error conditions shall be detected by all implementations (``shall fail") and which may be optionally detected by an implementation (``may fail"). If no error condition is detected, the action requested shall be successful. If an error condition is detected, the action requested may have been partially performed, unless otherwise stated.
Implementations may generate error numbers listed here under circumstances other than those described, if and only if all those error conditions can always be treated identically to the error conditions as described in this volume of POSIX.1-2008. Implementations shall not generate a different error number from one required by this volume of POSIX.1-2008 for an error condition described in this volume of POSIX.1-2008, but may generate additional errors unless explicitly disallowed for a particular function.
Posted Jun 5, 2017 16:15 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link]
Yes, for *new* error conditions not specified by POSIX. However:
> Implementations shall not generate a different error number from one required by this volume of POSIX.1-2008 for an error condition described in this volume of POSIX.1-2008, ...
The error list for the open() and openat() system calls specifies ENOSPC as follows:
> [ENOSPC]
So if "the filesystem ... cannot be expanded" is read to include the "out of inodes" condition (a reasonable interpretation IMHO) then POSIX requires open() to return ENOSPC for this condition, and not some other error code.
Posted Jun 3, 2017 8:31 UTC (Sat)
by matthias (subscriber, #94967)
[Link] (2 responses)
I would much prefer error reporting by exceptions. The type of the exception more or less corresponds to the error numbers and can be used by the program to determine how to react, but there is a string attached that can be passed up the call chain, which has meaningful information for the user. This way the program still gets the information contained in ENOSPC (actually most programs are fine to react to running out of space and running out of inodes in the same way), but the user which sees the error message knows instantly where to search for the problem.
Adding type inheritance to the exceptions additionally allows the program to select how fine grained the error information should be. Some programs are fine seeing an IO exception. Others want to differentiate whether the error is running out of resources or a real problem and some might want to know the difference between running out of space and running out of inodes.
Posted Jun 4, 2017 1:42 UTC (Sun)
by rossmohax (guest, #71829)
[Link]
Posted Jun 4, 2017 3:39 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jun 3, 2017 10:32 UTC (Sat)
by itvirta (guest, #49997)
[Link]
Also, there's the possibility of distributing unrelated data on separate file systems, or using quotas to
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
It is just another resource exhaustion that user space has to deal with - and perhaps even is dealing with, so nothing is actually wrong.
It can, and should, also be monitored just like free disk space.
perhaps running out of inodes could be taken "more seriously"?
We are used to the abstraction of a storage being "somewhere you can fill up with data"; the very existence of inodes should be no more the concern of the average programmer/sysadmin than the specifics of which pointer has which address... it should be "the computer's" problem, not "the operator's problem". If the computer is going to break that rule, and do so rarely, but catastrophically, the least it can do is to fail "noisily".
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
If the software is some kind of archival system, moving the oldest files to the next tier of storage will also help in both cases.
So applications could already translate this to proper error messages. It is common that the same error code has different meaning for different syscalls and developers should know this.
btrfs is fully dynamic, i.e. each btrfs, that is large enough to hold the inode information, can in fact contain 2^64 inodes. XFS is dynamic enough in practice (make sure to use "inode64", though. Otherwise inodes can only be stored in the lowest 1 TB, and that space can run out if also used for file data - been there, done that). Even NTFS allows 2^32 and is also fully dynamic
perhaps running out of inodes could be taken "more seriously"?
Remember that the possible error codes for syscalls were defined by POSIX, so simply adding an EOUTOFINODES would be non-compliant and could easily do more harm then good, because in practice, ENOSPC is a good fit for "out of inodes" and software might actually expect it to cover both cases
It might well do more harm than good, but the first part of your statement is just wrong. POSIX.1 2008 states (and all previous versions have similar wording):
Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring.
So adding more errors is not only not noncompliant, it is both explicitly permitted and very common.
perhaps running out of inodes could be taken "more seriously"?
> The directory or file system that would contain the new file cannot be expanded, the file does not exist, and O_CREAT is specified.
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
perhaps running out of inodes could be taken "more seriously"?
It's very much the same as running out of disk space, which isn't that uncommon with some logging
getting out of hand either. Both can be checked with `df`.
protect the rest of the system from an application getting out of hand.