LWN.net Logo

Shell Scripts

Shell Scripts

Posted May 5, 2010 13:54 UTC (Wed) by paulj (subscriber, #341)
In reply to: Shell Scripts by nescafe
Parent article: Poettering: Rethinking PID 1

It's a real shame people don't know how to use AWK properly. It's a fairly capable little language. One of the common abuses is piping grep to AWK - as AWK applies regexes itself to every line[1]. Basically, if we can assume input tends not to be huge or that most the input will be acted on, then whenever you see:

grep XYZ | awk ... '{ ... }'

You'd be much better off with:

awk ... '/XYZ/ { .... }'

E.g. Your shell example could be done with:

find /path -name someglob | xargs awk '/^bar/ { print $2 }'

or using GNU find's built-in xargs-ish feature (when was that added?):

find /path -name someglob -exec awk '/^bar/ { print $2 }' {} +

This is meant more for the peanut gallery than for you ;) - I was expecting there'd be a rush to offer more optimal one-liners, strangely there hasn't been. ;)

1. Though, as Padraig Brady has shown me, beyond a certain size of file, there is a benefit to using grep to pre-filter input if you're discarding a sufficient amount of that input, as grep is much faster at processing each line than AWK.


(Log in to post comments)

Shell Scripts

Posted May 5, 2010 14:05 UTC (Wed) by johill (subscriber, #25196) [Link]

sed is faster than grep even for just grepping, at least last I checked it was.

sed 's/foo/\0/;t;d'

Shell Scripts

Posted May 5, 2010 16:30 UTC (Wed) by martinfick (subscriber, #4455) [Link]

The beauty of running grep (or sed if you are so inclined) separately on large data sets is the inherent parallelism possible due to using unix pipes. This a feature often overlooked by modern programming techniques, the creators of unix made an elegant simple (much less error/deadlock prone than most others) parallelism mechanism long ago! With two cores, each one of those pipe commands can easily run in parallel.

Shell Scripts

Posted May 5, 2010 14:46 UTC (Wed) by k8to (subscriber, #15413) [Link]

The exec switch was added around SysV timeframe. Every find implementation has the thing.

Shell Scripts

Posted May 5, 2010 15:07 UTC (Wed) by paulj (subscriber, #341) [Link]

Look carefully, the exec has a + at the end instead of \;. I've since noticed the man page says it's a POSIX specified feature, and was added in 4.2.12. Seems FreeBSD has the feature since at least FBSD 5.0 (judging by when it appears in the man pages).

Shell Scripts

Posted May 5, 2010 16:28 UTC (Wed) by fredi@lwn (guest, #65912) [Link]

Indeed, awk sometimes is better than the couple:

find | grep | xargs cut ...

or similar. Though the -exec on your last example for what i recall is slower than:

find /foo -name $GLOB -print0 | xargs -0 SOMECOMMAND

That because with -exec you start on each found entry another process while xargs passes all entriess to the same process if they fit in the command line max length. Hope i gave the idea & sorry for my bad english.

Shell Scripts

Posted May 5, 2010 18:05 UTC (Wed) by paulj (subscriber, #341) [Link]

Yes, you're right about the standard 'find ... -exec ... {} \;'. That's why I said "xarg-ish" and used the (apparently) little known 'find .. -exec ... {} +' form of the command. Note carefully the + there, I only discovered it today myself.

Shell Scripts

Posted May 6, 2010 15:48 UTC (Thu) by fredi@lwn (guest, #65912) [Link]

Didnt knowed this one, really useful! Thanks for the hint!

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds