Path: utzoo!utgpu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!lll-winken!uwm.edu!psuvax1!rutgers!cmcl2!kramden.acf.nyu.edu!brnstnd
From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)
Newsgroups: alt.sources.d
Subject: Re: "Simple" but non-portable == useless.
Message-ID: <24649:Jan3022:31:0191@kramden.acf.nyu.edu>
Date: 30 Jan 91 22:31:01 GMT
References: <11306:Jan2817:58:5291@kramden.acf.nyu.edu> <27A5C105.1019@tct.uucp> <1991Jan30.055113.26485@convex.com>
Organization: IR
Lines: 67

Tom and Chip now have packed 7-line and 10-line Perl scripts to achieve
the effect of a half-line shell script. This comes after Tom's 36-line
Perl script to accomplish the same thing as a 20-token, 100-character
shell script. (At the end of this article I develop, step by step, an
even simpler pipeline for the repeats-in-path problem).

Folks, the tools available from the shell are simply more powerful than
Perl's function calls. They don't cover every situation, but when they
do the job cleanly, they provide much more compact solutions than Perl.

In article <1991Jan30.055113.26485@convex.com> tchrist@convex.COM (Tom Christiansen) writes:
> It'll still run much faster than the shell
> script without that.  Try it -- the differnces are dramatic.

This is an outright lie. ls takes by far the bulk of the time in the
shell scripts I've posted; ls's file tree walk code has been heavily
optimized over the years; perl doesn't run faster than ls. I suspect Tom
is comparing ls *with* -Fl to his script *without* stat or -x, and
that's cheating.

> But I like the $files{$file,$dev,$ino} part (as it read originally)
> because I didn't want to count the same real file twice.

This, Chip, is what you missed. Tom's original 36-line shell script does
this; my 20-token shell script does it.

> because of the munge-until-done
> strategy,

A shell script is not munged. A shell script grows. Take a sample
problem: Find all names that correspond to two different executables in
$path (csh). To build a shell script for this, imagine the data flowing
from one filter to another. You need to look at all executables in
$path? Fine, start with full information about every file in $path:

  ls -ilF $path   (use L to go through symbolic links)

Now extract the executables. ls marks them with a trailing *:

  ls -ilF $path | sed -n 's/\*$//p'

Now eliminate repeated lines that talk about the same file. One utility
both brings lines together and eliminates duplicates:

  ls -ilF $path | sed -n 's/\*$//p' | sort -u

Now strip the extra information, leaving just the filenames of all
different executables in $path:

  ls -ilF $path | sed -n 's/\*$//p' | sort -u | sed 's/.* //'

Finally, sort into order and extract the repetitions:

  ls -ilF $path | sed -n 's/\*$//p' | sort -u | sed 's/.* //' | sort | uniq -d

Compare this simple script to Tom's original 36-line monster. Was this
really so hard to develop? Is ``munge until done'' really an accurate
description of data-flow programming? Is this script such a maintenance
nightmare? (To port it to sh, for instance, you replace $path with the
standard invocation `echo $PATH | tr : '\012'`. Is that so painful?)

This is just one of many strategies for attacking the problem. You build
shell scripts naturally out of the data available and out of what you
can transform that data into. I daresay there is no such simple strategy
in Perl.

---Dan