User: Password:
|
|
Subscribe / Log in / New account

grep-2.17 released

From:  Jim Meyering <jim-AT-meyering.net>
To:  info-gnu-AT-gnu.org
Subject:  grep-2.17 released [stable]
Date:  Mon, 17 Feb 2014 20:27:00 -0800
Message-ID:  <CA+8g5KG8XQCudrNHaVOeGzfOYEuN-U+kdczWfeuj31v18jvimw@mail.gmail.com>
Cc:  bug-grep-AT-gnu.org
Archive-link:  Article

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

This is to announce grep-2.17, a stable release.
This release is notable for its performance improvements:
we don't often see a 10x speed-up in a tool like grep.

There have been 19 commits by 8 people in the 7 weeks since 2.16.

See the NEWS below for a brief summary.

Thanks to everyone who has contributed!
The following people contributed changes to this release:

  Aharon Robbins (1)
  Benno Schulenberg (2)
  Jim Meyering (7)
  Mike Frysinger (1)
  Norihiro Tanaka (3)
  Paolo Bonzini (1)
  Paul Eggert (3)
  Pádraig Brady (1)

Jim [on behalf of the grep maintainers]
==================================================================

Here is the GNU grep home page:
    http://gnu.org/s/grep/

For a summary of changes and contributors, see:
  http://git.sv.gnu.org/gitweb/?p=grep.git;a=shortlog;h=v2.17
or run this command from a git-cloned grep directory:
  git shortlog v2.16..v2.17

To summarize the 26 gnulib-related changes, run these commands
from a git-cloned grep directory:
  git checkout v2.17
  git submodule summary v2.16

================================
Here are the compressed sources and a GPG detached signature[*]:
  http://ftp.gnu.org/gnu/grep/grep-2.17.tar.xz
  http://ftp.gnu.org/gnu/grep/grep-2.17.tar.xz.sig

Use a mirror for higher download bandwidth:
  http://ftpmirror.gnu.org/grep/grep-2.17.tar.xz
  http://ftpmirror.gnu.org/grep/grep-2.17.tar.xz.sig

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify grep-2.17.tar.xz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 7FD9FCCB000BEEEE

and rerun the 'gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.69.117-1717
  Automake 1.99a
  Gnulib v0.1-76-g497f4cd

================================
NEWS

* Noteworthy changes in release 2.17 (2014-02-17) [stable]

** Improvements

  grep -i in a multibyte locale is now typically 10 times faster
  for patterns that do not contain \ or [.

  grep (without -i) in a multibyte locale is now up to 7 times faster
  when processing many matched lines.

** Maintenance

  grep's --mmap option was disabled in March of 2010, and began to
  elicit a warning in January of 2012.  Now it is completely gone.

================================
also posted as:
  https://savannah.gnu.org/forum/forum.php?forum_id=7885
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (Darwin)

iQJ8BAEBCgBmBQJTAuA9XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQxNTVEM0ZDNTAwQzgzNDQ4NkQxRUVBNjc3
RkQ5RkNDQjAwMEJFRUVFAAoJEH/Z/MsAC+7uUXkP/jdZNd2wyNVJ3UARzmov8gYf
3BygFa50G5sixCKcvsjOIsFJvJ3oUGuR8enVzpf1ybATsmyUCdA/EYDgQ16xUToa
Aymd+MsoFqFi7yQBpiHuE2WkSUP3UcA1i9c+o0KPWc7gUu63TPhyl34b2Yuset9i
sUjruwS0trTJw9Q36uYbZMa4ekibgNEnvHza53qOlKf6aP8qGBCY7+rISdi79SP9
Rn3dLQl1tcqhjWgg+b6KaWbDbbrRd2nv8zLo5E1qq1VEjRWiwOiTZomR1bm1k5K9
5gDaeXqvErX5owtlDSCYlPq0BvvopLvxmPciN0xsM2QXu3mQSOJdOaaJXfJBTOn9
fF+j01y08EaFn7i9mcTaqW82Jl25310dH8hgmYQ+RISLn8f5rhR1Rgl0AZPuGX6A
fsaDeNivxqyiV4i84ewz/OtmydQHoF1nEdJjHpIXzCVU29shX02sbiFuWeqPRT39
45zrRJG4xjIG1mxk81WztpA/3UgUvxd+q20Knvjh2V9z5f1HzndJ8vCnQPe519tQ
6d3LHct7jdJ89S7XgdVuqGD2oM2ilZmi4p8MDSoD1OHeaqGeVcQfVq3rmYoLM4wD
eGO+WRCa8h+0UOjKsHYmwfeL1Jnrs8/JN1b9EZIVI1LQluNjH0YtoH4vyOh4dGmg
MO6PjrcSAEsDtRtX0G0Z
=9svb
-----END PGP SIGNATURE-----

_______________________________________________
GNU Announcement mailing list <info-gnu@gnu.org>
https://lists.gnu.org/mailman/listinfo/info-gnu



(Log in to post comments)

grep-2.17 released

Posted Feb 18, 2014 15:23 UTC (Tue) by njwhite (guest, #51848) [Link]

The interesting speedup commit is here:

http://git.savannah.gnu.org/gitweb/?p=grep.git;a=commit;h...

grep-2.17 released

Posted Feb 18, 2014 15:47 UTC (Tue) by gb (subscriber, #58328) [Link]

Oh well, yes, finally got it. I were always wondering why in all hell the 'grep -i' is 30 times slower than 'grep'. utf-8 locale is the reason...

grep-2.17 released

Posted Feb 18, 2014 21:41 UTC (Tue) by pixelpapst (subscriber, #55301) [Link]

Wow, I had always assumed that's what grep -i did internally. Happy to see them go down that route now, and hoping for smarter handling of the \ and [ cases too some time in the not-too-distant future.

grep-2.17 released

Posted Feb 18, 2014 16:11 UTC (Tue) by stressinduktion (subscriber, #46452) [Link]

I wonder when gnu grep will become multithreaded like e.g. git-grep already is.

grep-2.17 released

Posted Feb 18, 2014 17:32 UTC (Tue) by zlynx (subscriber, #2285) [Link]

Whenever somebody writes a patch for it.

I don't really see it useful for scanning single files but it could be great for scanning recursively or wildcards.

You can get the same effect by using xargs with -n and -P. For example:
find -name '*.c' | xargs -n1 -P4 grep some_function

That would launch a grep for every file with a limit of 4 at a time.

grep-2.17 released

Posted Feb 18, 2014 18:11 UTC (Tue) by cebewee (guest, #94775) [Link]

Doesn't that give you a rather unpredictable order of the output, perhaps even mixing lines? I tend to always use git grep instead of grep almost everywhere; with the "--no-index" option it is perfectly usable outside of git repositories (and I get colors in less, if there is something to scroll) and so on.

grep-2.17 released

Posted Feb 18, 2014 20:01 UTC (Tue) by jtaylor (subscriber, #91739) [Link]

you can use gnu parallel instead of xargs to get the same ordering as a serial program
http://www.gnu.org/software/parallel/

grep-2.17 released

Posted Feb 18, 2014 23:25 UTC (Tue) by stressinduktion (subscriber, #46452) [Link]

If you use multiple threads to traverse the directory tree in parallel with -R, yes, the order of the files are traversed is unpredictable.

stdio primitives are protected against concurrent output to a FILE*. You have to use the unlocked_stdio primitives to get lines mixed up.

grep-2.17 released

Posted Feb 18, 2014 17:09 UTC (Tue) by woye (guest, #95561) [Link]

This is actually really embarassing.

And the code in this change seems to be crap.

For example, I see no reason to not support "\" and "[" in regexps except the utter laziness of whoever was unfortunately put in charge of this software.

Also, it's not clear why the regexp engine is not doing this by default in the first place, requiring instead an hack that rewrites the regexp.

WTF?

How is it possible that such a crucial tool is maintained by incompetents?

grep-2.17 released

Posted Feb 18, 2014 17:19 UTC (Tue) by cortana (subscriber, #24596) [Link]

I assume your patches are forthcoming?

grep-2.17 released

Posted Feb 18, 2014 17:25 UTC (Tue) by juliank (subscriber, #45896) [Link]

You might not have noticed it, but the patch is from fb.com. They probably have better things to do than writing proper fixes.

grep-2.17 released

Posted Feb 18, 2014 20:33 UTC (Tue) by SEJeff (subscriber, #51588) [Link]

It was written by Jim Meyering, do yourself a favor and google him and see what other stuff he has written. Or just look on ohloh:
http://www.ohloh.net/accounts/meyering

If you can do a better job, why not send patches yourself?

grep-2.17 released

Posted Feb 18, 2014 21:36 UTC (Tue) by juliank (subscriber, #45896) [Link]

Should I have put quotes around "proper"? Don't take this too serious.

There's of course still the question why it is done this way, but I assume he had a good reason to do it the way he did.

grep-2.17 released

Posted Feb 18, 2014 17:27 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

A free software project is maintained by the people who are willing to contribute their labour to that project. If you don't like the standard of the work they're doing, you have several options, of which "contribute your own labour" and "make it worth someone's while to contribute theirs" are generally accepted as being the most effective; "insult the developers in an LWN comment", however, is not numbered among the highly effective choices.

grep-2.17 released

Posted Feb 18, 2014 18:18 UTC (Tue) by shmerl (guest, #65921) [Link]

For regex grep I use pcregrep anyway - it's way more flexible.

grep-2.17 released

Posted Feb 23, 2014 2:25 UTC (Sun) by k8to (subscriber, #15413) [Link]

And the RE syntax is one we all know by now. Seconding.

grep-2.17 released

Posted Feb 25, 2014 12:20 UTC (Tue) by nix (subscriber, #2304) [Link]

Or you could use GNU grep, which gives you PCRE grep syntax *plus* others (look for grep -P).

Seriously, GNU grep is probably used thousands of times more often than pcregrep. Major speedups like this are definitely worth it.

grep-2.17 released

Posted Feb 27, 2014 5:03 UTC (Thu) by k8to (subscriber, #15413) [Link]

I do use -P, but i don't use grep often enough that I forget it is there sometimes. pcregrep makes a superior mnemonic.

That doesn't invalidate your point, of course.

grep-2.17 released

Posted Feb 18, 2014 21:50 UTC (Tue) by mvar (guest, #82051) [Link]

Please feel free to submit your superior code

grep-2.17 released

Posted Feb 18, 2014 21:53 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

It doesn't look too awful to me. Not the best, but certainly not the worst either. The worst parts that stand out to me are:

- everything is in main.c (though not something this branch should address)
- only one test case added

What about tests for things like ß -> SS and vice versa? That seems to need something like (ß|SS) rather than [ßSS].

grep-2.17 released

Posted Feb 19, 2014 9:55 UTC (Wed) by xtifr (subscriber, #143) [Link]

I think that "only one test case added" has to do with the fact that this is simply changing how an existing feature is implemented, rather than adding any new feature. In fact, one test almost seems excessive, unless their test coverage was previously lacking.

Interesting question about ß, though. That might be worth inquiring about further. (But again, such a test should really have been the subject of test coverage already, *despite* the fact that any such test would have succeeded with the old approach, but might, as you suggest, fail now.)

grep-2.17 released

Posted Feb 19, 2014 13:53 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I looked at the teat cases which mention "case" and this is indeed not tested. In fact, there are only 7 or so case-related tests altogether which seems sparse for something with so many corner cases. Though if they leverage libicu, its test cases should help. I'll test once I'm in front of a terminal.

grep-2.17 released

Posted Feb 19, 2014 18:39 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

So after making a test, it fails on both sides of the commit in question, so it isn't a regression at least :) . I'll forward the test case to upstream; maybe there's something wrong with it, but it wouldn't hurt to have the bug fixed anyways. Futher testing shows that 'ẞ' is matched, so it seems that alternate upper case variants isn't handled by the old or new code.

grep-2.17 released

Posted Feb 19, 2014 22:48 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

Indeed, grep is not using Unicode-compliant regex.

(Source: I added those few caseless-matching testcases and wrote the slow version of the code; before my changes, it was slow _and_ failed all those cases...).

grep-2.17 released

Posted Feb 19, 2014 23:46 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I wonder if ugrep would be an acceptable addition to the grep family.

grep-2.17 released

Posted Feb 20, 2014 1:05 UTC (Thu) by hummassa (subscriber, #307) [Link]

IIRC pcregrep is unicode-aware.

grep-2.17 released

Posted Feb 19, 2014 22:38 UTC (Wed) by wahern (subscriber, #37304) [Link]

I recently wrote a tool which transformed PCRE expressions to a Ragel-able DFA form. To support things like boundary assertions I had to do some serious transformations in order get anchor points. Generating alternations gets complex when you already have alternations in the form of classes, let alone alternations of alternations.

As a first pass proof of concept I see no reason why the current code isn't sufficient, and no reason not to ship it. Transforming more complex expressions can require significantly more code, and in a mature project like grep it makes sense to stage this stuff over a longish period of time.

grep-2.17 released

Posted Feb 22, 2014 5:54 UTC (Sat) by sitaram (guest, #5959) [Link]

wonderful!

Hopefully this will eventually be as good as "ack" and "ag" then, and I will then have to revise my comments in https://github.com/sitaramc/ew#comparison-with-ag-and-ack


Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds