|From:||Jim Meyering <jim-AT-meyering.net>|
|Subject:||grep -i bug and two sort -u data loss bugs|
|Date:||Mon, 20 Aug 2012 11:28:50 +0200|
You may wish to inform readers about two unusual releases. grep and coreutils are normally quite stable and reliable tools. Sure, there are always bug fixes, but at least for these two packages, bugs usually involve rarely-used corner cases, either involving odd combinations of options or evolving conditions like new file system types, new kernel behavior, race conditions, etc. However, in the last week, we learned of bugs in two tools. These bugs are not like the others, in that they are relatively serious. Each is about two years old. They prompted the fixes that led to this morning's releases: grep: release grep-2.14 (to fix grep -i '^$' false match) https://savannah.gnu.org/forum/forum.php?forum_id=7338 coreutils: release coreutils-8.19 (to fix sort -u data loss bugs) https://savannah.gnu.org/forum/forum.php?forum_id=7342 Here is some context for the grep bug: http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4639/foc... Here is the discussion showing how the coreutils fixes evolved: http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/231... You can argue that using grep's -i option with a regexp like '^$' is nonsensical, but I'll bet some use -i via GREP_OPTIONS or via an alias or function. Quick demos: grep bogosity (prior to grep-2.14): ================================== $ seq 9|LC_ALL=en_US.utf8 /bin/grep -in '^$';echo 2:4:6:8:10:12:14:16: $ seq 2|LC_ALL=en_US.utf8 /bin/grep -il '^$' (standard input) sort -u data loss (prior to coreutils-8.19): =========================================== $ (yes 7|head -11; echo 1)|/bin/sort --p=1 -S32b -u 7 It should print these two lines: 1 7 perl -e 'print "0\n"x5000 ."6\n"x6000 ."8\n"x3000 ."4\n"x8000 ."1\n"x2000'\ | sed 's/^/a /'| /bin/sort -k2,2 -u --p=1 -S1k It would print a single line: a 0 rather than the required five: a 0 a 1 a 4 a 6 a 8 ------------------------------- The sort examples specify small internal buffers to allow for small inputs. With the default (typically much larger) buffer size, you would need much larger inputs to trigger the bug. Similarly, GNU sort is multi-threaded on a system with two or more processors, so the examples use --p=1 (--parallel=1) to eliminate differences that arise when the buffer size depends on the number of processes.
Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds