[Novalug] bash & grep question - best for optimizing?
Richard Rognlie
rrognlie at gamerz.net
Mon Nov 13 09:29:25 EST 2006
On Mon, Nov 13, 2006 at 09:08:40AM -0500, Nick Danger wrote:
>
> I have a large volume of files.(*) I would like to run a grep through
> them and then act on the files that match. Easy enough to do. The
> question I have is, which way is best?
>
> 1. This is a two layer deep hashed structure, and I have 4 patterns I
> want to match. I can either do a "grep -rl" at the top level, or cd into
> each hash (down 2 layers) and do a "grep " in that directory.
>
> 2. Should I do one grep for each pattern, or a single grep with multiple
> matches?
>
> There are anywhere from 200,000 to 250,000 files in there, so its not
> exactly a speedy process and so any few mins I can eek out of my shell
> script, I'd like to :-)
If any given directory is going to be large, I'd highly recommend you do
egrep -r
For portability, since not all forms of grep grok the -r flag (yes, I know
I'm dating myself, but...) I'd even go so far as to change the command to
find DIR -type f -print0 | xargs -0 egrep WHATEVER /dev/null
the /dev/null at the end of the egrep is important. else you run the risk
(however small) that xargs might choose to run egrep on a single file, and
you'd not see which file matched (if it did). /dev/null forces the cmd
have at least 2 filenames which would then force the egrep to spit out the
filename that matched.
You're assured portability at that point.
on the "1 grep or 2" front... remember... disk is slow, cpu is fast.
So, it's almost always faster to do a more complex egrep than it is to
do two separate greps.
egrep '(pattern1|pattern2)'
--
/ \__ | Richard Rognlie / Sendmail Ninja / Gamerz.NET Lackey
\__/ \ | http://www.gamerz.net/~rrognlie <rrognlie at gamerz.net>
/ \__/ | Creator of pbmserv at gamerz.net
\__/ | Helping reduce world productivity since 1994
More information about the Novalug
mailing list