Path: utzoo!utgpu!watserv1!watmath!att!att!emory!wuarchive!sdd.hp.com!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.lang.perl Subject: Re: Help a perl apprentice Message-ID: <10080@jpl-devvax.JPL.NASA.GOV> Date: 23 Oct 90 18:13:32 GMT References: <18840001@hp-lsd.COS.HP.COM> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 186 In article <18840001@hp-lsd.COS.HP.COM> tbc@hp-lsd.COS.HP.COM (Tim Chambers) writes: : I have just written my first perl script. Ok, so it's not as cryptic as TECO : (which I never did master) but it's still a pretty nonintuitive language to : me. Strange, seems pretty intuitive to me... :-) And no, I wasn't the person who wrote a Fortran compiler in TECO. : So I ask the perl wizards to help me get used to programming in this new : beasty. I'm enclosing the perl script, the sh script I started with, and an : example of the program's input. My questions are: : : 1. Why doesn't the "next unless" line I've commented out do what I want it : to? (Trust me, it doesn't on my HP 9000 Series 300 HP-UX implementation.) Probably because of a bug I just fixed, presuming you're on patchlevel 36 or 37. : 2. Although I realize this isn't a teaching center, would anyone care to : offer some advice on how to optimize my perl implementation of the : algorithm? The algorithm I use is (1) capture in an array, (2) sort the : array, (3) format and print the array elements. I found it clumsy to do : all in perl, but it still runs faster than the filters I use in the shell : script. Are there perl-isms I am missing that make the algorithm run more : efficiently? (Especially my by_free sorting subroutine.) I'll talk about your script point by point, and then give how I'd write it. : ############################################################################## : #!/usr/local/bin/perl : # $Header: mydf,v 1.3 90/10/08 17:17:19 tbc Exp $ : : sub by_free { : @a = split(' ', $a); : @b = split(' ', $b); You realize that this splits each line multiple times? This is most of your extra overhead. : $a[3] lt $b[3] ? 1 : $a[3] gt $b[3] ? -1 : 0; As of 36, you can just say $b[3] <=> $a[3]; : } : : open (BDF_PIPE, "bdf $* |"); A nit, but you want open (BDF_PIPE, "bdf @ARGV |"); or better, open (BDF_PIPE, "bdf @ARGV |") || die "Can't run bdf: $!\n"; : # output : print "Filesystem Free (Mb) %Used\n"; : print "------------------------------|---------|-----\n"; Maybe a hair more efficient and readable (your call) to say: print <;$i++) { In general, use of index variables (such as $i) is a warning sign that there's a better way to do it in Perl. : # why doesn't this work?! : # next unless (/^.*\%.*$/); The bug I mentioned. The optimizer thought it could do tail matching on /%$/. But it's silly to write the pattern that way anyway. Why not next unless /%/; But it's better not to reject lines based on something toward the end of the line. : next unless /^[^ ]+ /; This is better, in that it fails at the beginning of the line on the lines it fails on. But when it succeeds, why check more than one character? I'd also invert the logic: next if /^ /; : next if /Mounted/; Again, this is checking for something at the end. Why not this? next if /^Filesystem/; : @lines[$i] = $_; This also stores info you aren't going to use later. In my example below I'll turn this logic inside out. : }; : : close (BDF_PIPE); Not strictly necessary on an input pipe, but it doesn't hurt anything, and is good documentation. It IS necessary to close OUTPUT pipes when you're done with them. : @sortedlines = sort by_free @lines; This is basically an extra scratch variable, since you can insert the sort into the foreach. (Though it doesn't hurt, because when you insert the sort into the foreach, it ends up making an internal scratch variable anyway...) : foreach $line (@sortedlines) { : ($f1, $f2, $f3, $avail, $capacity, $dirname) = split(' ', $line); Yet another split. If you split on /\s+/, you could do the nfs filesystems too, since $f1 would just be null. I include the nfs filesystems in mine. : printf '%s', $dirname; printf is slower than print. Should be just print $dirname; : for ($i = 30; $i > length($dirname); $i--) { : printf ('.'); : } Again, use print, not printf. Better, don't use a loop at all: print '.' x (30 - length($dirname)); : printf "%10.1f%6s\n", $avail / 1000, $capacity; : }; : : print "\n"; Okay, here's my version: #!/usr/local/bin/perl open (BDF_PIPE, "bdf @ARGV |"); print <) { ($fs, $kbytes, $used, $avail, $capacity, $dirname) = split; next unless $kbytes > 0; # next unless $fs; # uncomment to delete nfs filesystems $line{$avail} = sprintf("%30.30s%10.1f%6s\n", $dirname . '.' x 30, $avail / 1000, $capacity); } sub revnum { $b <=> $a; } foreach $key (sort revnum keys(%line)) { print $line{$key}; } print "\n"; We split just once per line, using a split that will leave $fs null on the nfs lines. We find a single test that weeds out both the header and the 1st of each pair of nfs lines. We make an associative array keyed on the field we will sort on, formatting the output string right then and there. After we've finished the input, we just do a simple reverse numeric sort on the keys of the line array, and print out each value. Actually, we could scrap that loop too and use a slice like this: print @line{sort revnum keys(%line)}, "\n"; That *doesn't* make a scratch variable internally. Larry