Path: utzoo!utgpu!cs.utexas.edu!convex!news From: tchrist@convex.COM (Tom Christiansen) Newsgroups: alt.sources.d Subject: Re: "Simple" but non-portable == useless. Message-ID: <1991Jan30.055113.26485@convex.com> Date: 30 Jan 91 05:51:13 GMT References: <7628@sugar.hackercorp.com> <11306:Jan2817:58:5291@kramden.acf.nyu.edu> <27A5C105.1019@tct.uucp> Sender: news@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Organization: CONVEX Software Development, Richardson, TX Lines: 60 Nntp-Posting-Host: pixel.convex.com From the keyboard of chip@tct.uucp (Chip Salzenberg): :>Be serious, Karl. We're talking about seven simple commands in a :>pipeline versus nearly 30 lines of perl to do the same job: namely, get :>a list of all names where there are two different executables in PATH. : :So maybe Karl did it the long way. Here's my version of pathdup in :Perl, and it's only ten lines long: [ clever perl script deleted ] I think he was talking about my script, not Karl's. About two-thirds of those 30 lines were debugging, error messages, and added space for legibility. Since I've been criticized for verbosity for putting in such features, they've been duly excised. Chip's \0 stuff is sure the right way to go if you're that concerned. Mine may list things whose directories have spaces in them. I'm not sure how easy it is to discern files with \n's in them from separate entries in Chip's output. Here are 7 lines (although I only see 5 lines of code), and it does run pretty fast. Notice I don't count files linked together -- if you #comment out that code, the trailing ``|| $f{$_...'' part of line 4, the user time is cut in half. It'll still run much faster than the shell script without that. Try it -- the differnces are dramatic. 1 for $d (split(/:/, $ENV{'PATH'})) { 2 next if $seen{$d}++ || $d =~ /^\.?$/ || !chdir($d) || !opendir(DIR,'.'); 3 while (defined($_ = readdir(DIR))) { 4 $b{$_} .= " $d" unless /^\.{1,2}$/ || !(-x&&-f_) || $f{$_,(stat(_))[0,1]}++; 5 } 6 } 7 for (keys %b) { print $_, ":", $b{$_}, "\n" if rindex($b{$_},' '); } But I like the $files{$file,$dev,$ino} part (as it read originally) because I didn't want to count the same real file twice. I've got a lot linked in from /bin and /etc into /usr/bin and /usr/etc. Detecting this was one of the script's goals. Doing that in a shell script isn't always feasible. What about hard links? That's one of the major problems with these dancing-bear shell-script examples (wonderful phrase, Chip!): because of the munge-until-done strategy, it's hard to get feedback doing between the passes without kluges. Surely for some things the pipe and backquote stuff are great: here's an old tcsh alias of mine before I got into ksh shell functions: alias vall "vi '+/\!:1' `grep -l \!:1 *.[^oa]`" I wouldn't feel obliged to write this as a script. But for the more complicated tasks, such as here where we're looking at all the files in your path, finding dups, and weeding out false positives, the speed hits and kluginess become too much to tolerate. Maybe different people have different tolerance levels for gross kluges and slow code. I'm less tolerant of the latter than the former. :-) --tom -- "Hey, did you hear Stallman has replaced /vmunix with /vmunix.el? Now he can finally have the whole O/S built-in to his editor like he always wanted!" --me (Tom Christiansen )