Path: utzoo!utgpu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!lll-winken!uwm.edu!psuvax1!rutgers!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: alt.sources.d Subject: Re: "Simple" but non-portable == useless. Message-ID: <24649:Jan3022:31:0191@kramden.acf.nyu.edu> Date: 30 Jan 91 22:31:01 GMT References: <11306:Jan2817:58:5291@kramden.acf.nyu.edu> <27A5C105.1019@tct.uucp> <1991Jan30.055113.26485@convex.com> Organization: IR Lines: 67 Tom and Chip now have packed 7-line and 10-line Perl scripts to achieve the effect of a half-line shell script. This comes after Tom's 36-line Perl script to accomplish the same thing as a 20-token, 100-character shell script. (At the end of this article I develop, step by step, an even simpler pipeline for the repeats-in-path problem). Folks, the tools available from the shell are simply more powerful than Perl's function calls. They don't cover every situation, but when they do the job cleanly, they provide much more compact solutions than Perl. In article <1991Jan30.055113.26485@convex.com> tchrist@convex.COM (Tom Christiansen) writes: > It'll still run much faster than the shell > script without that. Try it -- the differnces are dramatic. This is an outright lie. ls takes by far the bulk of the time in the shell scripts I've posted; ls's file tree walk code has been heavily optimized over the years; perl doesn't run faster than ls. I suspect Tom is comparing ls *with* -Fl to his script *without* stat or -x, and that's cheating. > But I like the $files{$file,$dev,$ino} part (as it read originally) > because I didn't want to count the same real file twice. This, Chip, is what you missed. Tom's original 36-line shell script does this; my 20-token shell script does it. > because of the munge-until-done > strategy, A shell script is not munged. A shell script grows. Take a sample problem: Find all names that correspond to two different executables in $path (csh). To build a shell script for this, imagine the data flowing from one filter to another. You need to look at all executables in $path? Fine, start with full information about every file in $path: ls -ilF $path (use L to go through symbolic links) Now extract the executables. ls marks them with a trailing *: ls -ilF $path | sed -n 's/\*$//p' Now eliminate repeated lines that talk about the same file. One utility both brings lines together and eliminates duplicates: ls -ilF $path | sed -n 's/\*$//p' | sort -u Now strip the extra information, leaving just the filenames of all different executables in $path: ls -ilF $path | sed -n 's/\*$//p' | sort -u | sed 's/.* //' Finally, sort into order and extract the repetitions: ls -ilF $path | sed -n 's/\*$//p' | sort -u | sed 's/.* //' | sort | uniq -d Compare this simple script to Tom's original 36-line monster. Was this really so hard to develop? Is ``munge until done'' really an accurate description of data-flow programming? Is this script such a maintenance nightmare? (To port it to sh, for instance, you replace $path with the standard invocation `echo $PATH | tr : '\012'`. Is that so painful?) This is just one of many strategies for attacking the problem. You build shell scripts naturally out of the data available and out of what you can transform that data into. I daresay there is no such simple strategy in Perl. ---Dan