User: Password:
|
|
Subscribe / Log in / New account

Bugfixes for grep and coreutils

From:  Jim Meyering <jim-AT-meyering.net>
To:  authors-AT-lwn.net
Subject:  grep -i bug and two sort -u data loss bugs
Date:  Mon, 20 Aug 2012 11:28:50 +0200
Message-ID:  <87boi5oqst.fsf@rho.meyering.net>
Archive-link:  Article

You may wish to inform readers about two unusual releases.
grep and coreutils are normally quite stable and reliable tools.
Sure, there are always bug fixes, but at least for these two packages,
bugs usually involve rarely-used corner cases, either involving odd
combinations of options or evolving conditions like new file system
types, new kernel behavior, race conditions, etc.

However, in the last week, we learned of bugs in two tools.  These bugs
are not like the others, in that they are relatively serious.  Each is
about two years old.  They prompted the fixes that led to this morning's
releases:

    grep: release grep-2.14 (to fix grep -i '^$' false match)
      https://savannah.gnu.org/forum/forum.php?forum_id=7338

    coreutils: release coreutils-8.19 (to fix sort -u data loss bugs)
      https://savannah.gnu.org/forum/forum.php?forum_id=7342


Here is some context for the grep bug:

    http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4639/foc...

Here is the discussion showing how the coreutils fixes evolved:

    http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/231...

You can argue that using grep's -i option with a regexp like '^$'
is nonsensical, but I'll bet some use -i via GREP_OPTIONS or via
an alias or function.

Quick demos:

  grep bogosity (prior to grep-2.14):
  ==================================

    $ seq 9|LC_ALL=en_US.utf8 /bin/grep -in '^$';echo
    2:4:6:8:10:12:14:16:
    $ seq 2|LC_ALL=en_US.utf8 /bin/grep -il '^$'
    (standard input)

  sort -u data loss (prior to coreutils-8.19):
  ===========================================

    $ (yes 7|head -11; echo 1)|/bin/sort --p=1 -S32b -u
    7

    It should print these two lines:
    1
    7

    perl -e 'print "0\n"x5000 ."6\n"x6000 ."8\n"x3000 ."4\n"x8000 ."1\n"x2000'\
      | sed 's/^/a /'| /bin/sort -k2,2 -u --p=1 -S1k

    It would print a single line:

      a 0

    rather than the required five:

      a 0
      a 1
      a 4
      a 6
      a 8

-------------------------------
The sort examples specify small internal buffers to allow for small
inputs.  With the default (typically much larger) buffer size, you would
need much larger inputs to trigger the bug.  Similarly, GNU sort is
multi-threaded on a system with two or more processors, so the examples
use --p=1 (--parallel=1) to eliminate differences that arise when the
buffer size depends on the number of processes.



(Log in to post comments)


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds