|| ||Jim Meyering <jim-AT-meyering.net> |
|| ||authors-AT-lwn.net |
|| ||grep -i bug and two sort -u data loss bugs |
|| ||Mon, 20 Aug 2012 11:28:50 +0200|
|| ||Article, Thread
You may wish to inform readers about two unusual releases.
grep and coreutils are normally quite stable and reliable tools.
Sure, there are always bug fixes, but at least for these two packages,
bugs usually involve rarely-used corner cases, either involving odd
combinations of options or evolving conditions like new file system
types, new kernel behavior, race conditions, etc.
However, in the last week, we learned of bugs in two tools. These bugs
are not like the others, in that they are relatively serious. Each is
about two years old. They prompted the fixes that led to this morning's
grep: release grep-2.14 (to fix grep -i '^$' false match)
coreutils: release coreutils-8.19 (to fix sort -u data loss bugs)
Here is some context for the grep bug:
Here is the discussion showing how the coreutils fixes evolved:
You can argue that using grep's -i option with a regexp like '^$'
is nonsensical, but I'll bet some use -i via GREP_OPTIONS or via
an alias or function.
grep bogosity (prior to grep-2.14):
$ seq 9|LC_ALL=en_US.utf8 /bin/grep -in '^$';echo
$ seq 2|LC_ALL=en_US.utf8 /bin/grep -il '^$'
sort -u data loss (prior to coreutils-8.19):
$ (yes 7|head -11; echo 1)|/bin/sort --p=1 -S32b -u
It should print these two lines:
perl -e 'print "0\n"x5000 ."6\n"x6000 ."8\n"x3000 ."4\n"x8000 ."1\n"x2000'\
| sed 's/^/a /'| /bin/sort -k2,2 -u --p=1 -S1k
It would print a single line:
rather than the required five:
The sort examples specify small internal buffers to allow for small
inputs. With the default (typically much larger) buffer size, you would
need much larger inputs to trigger the bug. Similarly, GNU sort is
multi-threaded on a system with two or more processors, so the examples
use --p=1 (--parallel=1) to eliminate differences that arise when the
buffer size depends on the number of processes.
to post comments)