[Novalug] bash & grep question - best for optimizing?
Ross Patterson
RossPatterson at Comcast.Net
Tue Nov 14 18:12:54 EST 2006
At 17:42 11/14/2006, James Ewing Cottrell 3rd wrote:
>One could argue, and I will, that hacking -r into grep is an
>abomination. The tool for recursing is FIND.
For general purpose tools, I agree. Of course, in that case, "ls" is
even more of an abomination than anyone who reads the man page
thinks. For the Only Correct Answer to the "show me my files"
question is, naturally, "find . -print".
For special purpose tools, especially those that are intended (as
grep was) to process large volumes of "stuff", performance rules the
roost, and you do whatever you need to get it. Back in the day when
everything was a uniprocessor system, "grep -r" was faster than "find
| grep". Today, when many modern systems are at least hyperthreaded,
there might be genuine performance benefit to splitting the task
across two processes. Then again, you said "find | xargs grep", not
"find | grep", because grep lacks a "take the list of files from
stdin" mode, and that means that you'll have potentially thousands of
processes (1 find, 1 xargs, many greps), so maybe it won't
perform. As always, Performance Analysts's Answer Number Three
applies: "It depends."
>Hacking every tool's features into every other tool violate the one
>of the ancient UNIX Fundamentals that "A Tool does one thing and
>does it well".
Please tell me that you use the One True and Holy Editor: ed. It
does one thing and one thing only, and you don't need to be
polydactyl to use it.
>but a better way is to do somethings like:
>
> find . -type f | xargs grep -l pat |
> while read file
> do process "$file"; done
Hey, while we're at it, what's with all this post-sh
stuff? "while"? "do"? Sheesh. Any problem that needs to be solved
should be solved using pre-existing filters in a pipeline without new code.
:-)
Ross
More information about the Novalug
mailing list