Path: utzoo!utgpu!cs.utexas.edu!convex!news
From: tchrist@convex.COM (Tom Christiansen)
Newsgroups: alt.sources.d
Subject: Re: "Simple" but non-portable == useless.
Message-ID: <1991Jan30.055113.26485@convex.com>
Date: 30 Jan 91 05:51:13 GMT
References: <7628@sugar.hackercorp.com> <11306:Jan2817:58:5291@kramden.acf.nyu.edu> <27A5C105.1019@tct.uucp>
Sender: news@convex.com (news access account)
Reply-To: tchrist@convex.COM (Tom Christiansen)
Organization: CONVEX Software Development, Richardson, TX
Lines: 60
Nntp-Posting-Host: pixel.convex.com

From the keyboard of chip@tct.uucp (Chip Salzenberg):
:>Be serious, Karl. We're talking about seven simple commands in a
:>pipeline versus nearly 30 lines of perl to do the same job: namely, get
:>a list of all names where there are two different executables in PATH.
:
:So maybe Karl did it the long way.  Here's my version of pathdup in
:Perl, and it's only ten lines long:

[ clever perl script deleted ]

I think he was talking about my script, not Karl's.  About two-thirds of
those 30 lines were debugging, error messages, and added space for
legibility.  Since I've been criticized for verbosity for putting
in such features, they've been duly excised.

Chip's \0 stuff is sure the right way to go if you're that concerned.  Mine
may list things whose directories have spaces in them.  I'm not sure how
easy it is to discern files with \n's in them from separate entries in
Chip's output.

Here are 7 lines (although I only see 5 lines of code), and it does run
pretty fast.  Notice I don't count files linked together -- if you
#comment out that code, the trailing ``|| $f{$_...'' part of line 4, the
user time is cut in half.  It'll still run much faster than the shell
script without that.  Try it -- the differnces are dramatic.

1 for $d (split(/:/, $ENV{'PATH'})) {
2  next if $seen{$d}++ || $d =~ /^\.?$/ || !chdir($d) || !opendir(DIR,'.'); 
3  while (defined($_ = readdir(DIR))) {
4   $b{$_} .= " $d" unless /^\.{1,2}$/ || !(-x&&-f_) || $f{$_,(stat(_))[0,1]}++;
5  } 
6 } 
7 for (keys %b) { print $_, ":", $b{$_}, "\n" if rindex($b{$_},' '); }

But I like the $files{$file,$dev,$ino} part (as it read originally)
because I didn't want to count the same real file twice.  I've got a lot
linked in from /bin and /etc into /usr/bin and /usr/etc.  Detecting this
was one of the script's goals.  Doing that in a shell script isn't always
feasible.  What about hard links?

That's one of the major problems with these dancing-bear shell-script
examples (wonderful phrase, Chip!): because of the munge-until-done
strategy, it's hard to get feedback doing between the passes without
kluges.    Surely for some things the pipe and backquote stuff are great:
here's an old tcsh alias of mine before I got into ksh shell functions:

    alias vall	"vi '+/\!:1' `grep -l \!:1 *.[^oa]`"

I wouldn't feel obliged to write this as a script.  But for the more
complicated tasks, such as here where we're looking at all the files in
your path, finding dups, and weeding out false positives, the speed hits
and kluginess become too much to tolerate.  Maybe different people have
different tolerance levels for gross kluges and slow code.  I'm less
tolerant of the latter than the former. :-)

--tom
--
"Hey, did you hear Stallman has replaced /vmunix with /vmunix.el?  Now
 he can finally have the whole O/S built-in to his editor like he
 always wanted!" --me (Tom Christiansen <tchrist@convex.com>)