Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!rutgers!modus!gear!am!alex From: alex@am.sublink.org (Alex Martelli) Newsgroups: comp.unix.questions Subject: Re: grep Keywords: grep, recursive Message-ID: <1991Apr16.225408.649@am.sublink.org> Date: 16 Apr 91 22:54:08 GMT References: <1991Apr14.214414.9815@hellgate.utah.edu> <1991Apr15.042100.11727@aplcen.apl.jhu.edu> Organization: Premiata Famiglia Martelli & Figli Lines: 89 akbloom@aplcen.apl.jhu.edu (Keith Bloom) writes: :mmoore%hellgate.utah.edu@cs.utah.edu (Michael Moore) writes: :> Does anyone know if there is an easy way to recursively search for a :>pattern down the entire file tree of a directory? :If your system has xargs, you could try: :find . -name '*' -print | xargs grep pattern :If you have a huge directory tree with thousands of files in it, this :may not work. Why not, pray? xargs is supposed to chop its stdin into pieces that are short enough that they can be passed as arguments to the target command, here 'grep pattern'. A slight improvement: use "grep pattern /dev/null" as the target command; by making grep look into more than one file, it will print the name of the file where the pattern is found (in the original, if grep happened to be called with just one file, for example at the very end of the search process, it might find lines and print them out without identifying where they came from). A second improvement: omit the -name "*"; all it's doing is not making grep look into files whose names start with a dot; and why wouldn't you want to grep inside .netrc, for example? A third improvement: add a -type f flag; avoid grepping into directories by mistake, and particularly avoid grepping into device-files - grepping into /dev/tty, for example, can hang the procedure until EOF is forced on the terminal... There are many other things one might wish to do (for example, only grep into files which are readable by you), but find does not support them easily. Unfortunately, some grep's will just fail if ONE of their target files is unreadable - and not even bother looking into the other ones! The best fix for this specific problem is probably to also attack another desideratum - NOT grepping into non-text files. The "file" command, on many systems, will emit a description containing the keyword "text" for a text file (in variations such as "English text", "ascii text", etc), but not for non-text files (it will say "data", or describe the type of executable, etc), and for non-readable files it will say something like "cannot open for reading" [if you're unlucky enough that your "file" command says, for example, "sh commands" instead of "sh command text" for a shell script, you will have to get a little more fancy in the following, but the basic idea still apply). So, we want to xargs the files emitted by find, first into file, then remove all non-text ones, and finally grep on the remainder only; we can both select for "text", and remove the descriptions, at one gulp with, for example, sed. find . -type f -print | xargs file | sed -n '/:.*text/s/:.*//p' | xargs grep pattern /dev/null This is still NOT perfect - filenames containing newlines will typically give problems with any find ... -print | xargs (one should use find ... -print0 and matching xargs -0, if lucky enough to have them, for example GNU versions of find and xargs), and here the further trip through file and sed will further mess things up if the filename contains a colon (and is a text file, or has the string "text" in the filename after the colon); one COULD get fancier, with a sed expression to exclude lines with two or more colon characters, but it's getting a bit late at night for me to figure out how to handle a filename with such as "joke: ascii text\nfooled you!" even with the -print0 and -0... there is a point of diminishing return where perl gets simpler than this sort of thing...:-). :If you don't have xargs, there's: : :find . -name '*' -print -exec grep pattern {} \; : :but this is more cumbersome, because it will print the names of all :your files, whether they contain the pattern or not. (I assume you :want to know the name of the file that 'pattern' is in.) You can omit the -print and just have /dev/null as an argument to grep just after the pattern, as I suggested above. It's still "more cumbersome" in the sense of overloading your CPU, since a fork and exec is done for each file, rather than processing them en masse via xargs... still, my suggestions about removing the '-name *' and inserting a '-type f' would also apply here. -- Alex Martelli - (home snailmail:) v. Barontini 27, 40138 Bologna, ITALIA Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).