Path: utzoo!utgpu!watserv1!watmath!att!att!emory!wuarchive!sdd.hp.com!elroy.jpl.nasa.gov!jpl-devvax!lwall
From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Newsgroups: comp.lang.perl
Subject: Re: Help a perl apprentice
Message-ID: <10080@jpl-devvax.JPL.NASA.GOV>
Date: 23 Oct 90 18:13:32 GMT
References: <18840001@hp-lsd.COS.HP.COM>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA
Lines: 186

In article <18840001@hp-lsd.COS.HP.COM> tbc@hp-lsd.COS.HP.COM (Tim Chambers) writes:
: I have just written my first perl script.  Ok, so it's not as cryptic as TECO
: (which I never did master) but it's still a pretty nonintuitive language to
: me.

Strange, seems pretty intuitive to me...  :-)

And no, I wasn't the person who wrote a Fortran compiler in TECO.

: So I ask the perl wizards to help me get used to programming in this new
: beasty.  I'm enclosing the perl script, the sh script I started with, and an
: example of the program's input.  My questions are:
: 
: 1.  Why doesn't the "next unless" line I've commented out do what I want it
:     to?  (Trust me, it doesn't on my HP 9000 Series 300 HP-UX implementation.)

Probably because of a bug I just fixed, presuming you're on patchlevel 36 or 37.

: 2.  Although I realize this isn't a teaching center, would anyone care to
:     offer some advice on how to optimize my perl implementation of the
:     algorithm?  The algorithm I use is (1) capture in an array, (2) sort the
:     array, (3) format and print the array elements.  I found it clumsy to do
:     all in perl, but it still runs faster than the filters I use in the shell
:     script.  Are there perl-isms I am missing that make the algorithm run more
:     efficiently?  (Especially my by_free sorting subroutine.)

I'll talk about your script point by point, and then give how I'd write it.

: ##############################################################################
: #!/usr/local/bin/perl
: # $Header: mydf,v 1.3 90/10/08 17:17:19 tbc Exp $
: 
: sub by_free {
:     @a = split(' ', $a);
:     @b = split(' ', $b);

You realize that this splits each line multiple times?  This is most of
your extra overhead.

:     $a[3] lt $b[3] ? 1 : $a[3] gt $b[3] ? -1 : 0;

As of 36, you can just say

	$b[3] <=> $a[3];

: }
: 
: open (BDF_PIPE, "bdf $* |");

A nit, but you want

  open (BDF_PIPE, "bdf @ARGV |");

or better,

  open (BDF_PIPE, "bdf @ARGV |") || die "Can't run bdf: $!\n";

: # output
: print "Filesystem                     Free (Mb) %Used\n";
: print "------------------------------|---------|-----\n";

Maybe a hair more efficient and readable (your call) to say:

print <<EndHeader;
Filesystem                     Free (Mb) %Used
------------------------------|---------|-----
EndHeader

: for ($i=0;<BDF_PIPE>;$i++) {

In general, use of index variables (such as $i) is a warning sign that
there's a better way to do it in Perl.

: # why doesn't this work?!
: #    next unless (/^.*\%.*$/);

The bug I mentioned.  The optimizer thought it could do tail matching on /%$/.
But it's silly to write the pattern that way anyway.  Why not

	next unless /%/;

But it's better not to reject lines based on something toward the end
of the line.

:     next unless /^[^ ]+ /;

This is better, in that it fails at the beginning of the line on the lines
it fails on.  But when it succeeds, why check more than one character?
I'd also invert the logic:

	next if /^ /;

:     next if /Mounted/;

Again, this is checking for something at the end.  Why not this?

	next if /^Filesystem/;

:     @lines[$i] = $_;

This also stores info you aren't going to use later.  In my example
below I'll turn this logic inside out.

: };
: 
: close (BDF_PIPE);

Not strictly necessary on an input pipe, but it doesn't hurt anything, and
is good documentation.  It IS necessary to close OUTPUT pipes when you're done
with them.

: @sortedlines = sort by_free @lines;

This is basically an extra scratch variable, since you can insert the
sort into the foreach.  (Though it doesn't hurt, because when you insert
the sort into the foreach, it ends up making an internal scratch variable
anyway...)

: foreach $line (@sortedlines) {
:     ($f1, $f2, $f3, $avail, $capacity, $dirname) = split(' ', $line);

Yet another split.  If you split on /\s+/, you could do the nfs filesystems
too, since $f1 would just be null.  I include the nfs filesystems in mine.

:     printf '%s', $dirname;

printf is slower than print.  Should be just

	print $dirname;

:     for ($i = 30; $i > length($dirname); $i--) {
: 	printf ('.');
: 	}

Again, use print, not printf.  Better, don't use a loop at all:

	print '.' x (30 - length($dirname));

:     printf "%10.1f%6s\n", $avail / 1000, $capacity;
: };
: 
: print "\n";

Okay, here's my version:

#!/usr/local/bin/perl

open (BDF_PIPE, "bdf @ARGV |");

print <<End;
Filesystem                     Free (Mb) %Used
------------------------------|---------|-----
End

while (<BDF_PIPE>) {
    ($fs, $kbytes, $used, $avail, $capacity, $dirname) = split;
    next unless $kbytes > 0;
    # next unless $fs;		# uncomment to delete nfs filesystems
    $line{$avail} = sprintf("%30.30s%10.1f%6s\n",
			$dirname . '.' x 30,
			$avail / 1000, $capacity);
}

sub revnum { $b <=> $a; }

foreach $key (sort revnum keys(%line)) {
    print $line{$key};
}

print "\n";

We split just once per line, using a split that will leave $fs null on
the nfs lines.  We find a single test that weeds out both the header
and the 1st of each pair of nfs lines.  We make an associative array
keyed on the field we will sort on, formatting the output string
right then and there.

After we've finished the input, we just do a simple reverse numeric sort
on the keys of the line array, and print out each value.  Actually,
we could scrap that loop too and use a slice like this:

    print @line{sort revnum keys(%line)}, "\n";

That *doesn't* make a scratch variable internally.

Larry