Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!elroy!devvax!lwall From: lwall@devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.sources.d Subject: Re: How are YOU using perl? Message-ID: <1860@devvax.JPL.NASA.GOV> Date: 19 Apr 88 21:56:19 GMT References: <235@ateng.UUCP> <42400005@uicsrd.csrd.uiuc.edu> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA. Lines: 218 In article <42400005@uicsrd.csrd.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu writes: : Here are some of my notes about perl, for what they're worth. These : are points that appeared unclear to me at some time or another. : : 1) "each(x)" has one state variable per array, not per "each" call. : 2) On the other hand, each ".." has its own state variable. These are made clearer in the 2.0 documentation. : 3) eval 'X' takes about the same time as X. (eval is the only way to : simulate "first-class" file handles, associative arrays, et cetera.) It's about the same time ignoring the time to parse X. : 4) Lists are only one-dimensional. Sigh. Whaddya want, APL? : 5) "@foo = (); print $#foo;" prints -1, even if $[ == 1. Similarly, : "@foo = (); $foo[$[] = 3; print $#foo;" always prints 0. (So the : statement on page 3 of the man page, about $#foo being the subscript : of the last element, is false.) Fixed in 2.0 ($#foo, not the manual). : 6) On an Alliant FX/1 (a Motorola 68020-based machine), -1 % 2 == -1. : I. e. "%" takes the sign of the dividend, not the divisor. However, : this is probably dependent on how the "%" operator is implemented in C : on your particular computer, which in turn is probably dependent on : the hardware "mod" instruction (if any). True, the behavior of % is very machine dependent. On lots of machines the % operator simply doesn't work on negative numbers. Perl is simply reflecting the lack of rigor in the definition of C here. Don't use % on negative numbers. : 7) If you use a string in a numeric context, and the string contains : non-numeric characters, it converts as many leading characters from : the string as it can, and ignores the later ones. E. g. : $a = '1e+:1'; $a = $a + 1; print $a; : $a = 'abc'; $a = $a + 1; print $a; : print 2 and 1 respectively. It'll get whatever atof() gets. : 8) The conditionals below are ranked in order of increasing execution : time with VERY approximate normalized execution times. : 1 if : 1.02 unless : 1.2 if () {;} : 1.25 if (} {;} else {} : 1.63 do {;} unless : 1.64 do {;} if : 1.8 && : 2 ? () : 0 : (where is a simple assignment statement, and is a : boolean expression). Those last 2 lines suprised me a lot. I was : expecting them to be the fastest, or at least equivalent to if-then or : if-then-else. The last two can be as fast as the first two, but it depends on whether is recognized by the optimizer: do foo() if $reg; # optimized $reg && do foo(); # optimized to same as above, I believe $reg != 0 && do foo(); # not optimized Registers as booleans, some range operators, some patterns and substitutions are optimized to avoid calling the general expression evaluator. : 9) Timing for the following statements enclosed in a "sub": : 1 if () {;} else {} : 1.74 ? : : ( is a boolean expression, and is a variable reference, : like $a.) Why should putting them in a sub make much difference? An expression is an expression, and a statement a statement. Adding in the calling overhead merely reduce the ratio from 1:2 to 1:1.74. : 10) For the following statements exclosed in "sub max": : 1 $max = pop(@_); : while ($foo = pop(@_)) { $max = $foo if $max < $foo; } : $max; : 1.21 $max = pop(@_); : for ($i = 0; ++$i <= $#_; ) { : if ($max < ($foo = $_[$i])) { $max = $foo; } : } : $max; Subscripting is not very efficient. Popping an array is quite efficient, even more so than shifting an array. : 11) Neither "max", nor "min", nor "abs", nor "power" is builtin. : They're easy enough to code, though. : 12) In re assigning an array to variables, as in : ($a, $b, $c) = @foo; : If @foo has more than 3 elements here, the extras will not be : assigned. If @foo has less than 3 elements, the missing ones are : considered ''. Following this rule, of course, : ($a) = @foo; : assigns the first element of @foo to $a. However, : $a = @foo; : assigns the LAST element of @foo to $a. @foo used in ANY scalar : context refers to its last element. (E.g. 2+@foo is 2 plus the last : element of @foo.) This makes sense if you think of the comma operator : in perl or C: : a = (1, 2); : should assign 2 to a. That's how I was thinking of it. Note that in perl 2.0, there are a few more contexts in which @foo is interpreted as an array rather than a scalar. In particular, within lists: print 1,2,3,@foo,4,5,6; This interpolates all elements of array foo as if they'd been specified one by one. Not extremely useful in print, but what about things like kill, chmod or do? kill 9, @goners; chmod 0755, <*.[ch]>; do matchall($#foo,@foo,$#bar,@bar); Those of you who have read this far have the reward of being notified of the availability of perl 2.0 beta 1 via anonymous FTP from my machine, jpl-devvax.jpl.nasa.gov (128.149.8.43). Look in pub/perl.2.0.beta1/kits. I'm not going to offer patches for the beta version, but I expect to send off a real 2.0 in a week or two. If you are interested in beta testing 2.0, now is your big chance. Let's see, what else is new in 2.0? File globbing is hacked in, though it still calls sh underneath for now. File globs are done with , and returns arrays in array contexts, so you can say unlink <*.bak>; @files = <*>; not to mention @input = <>; # oh where, oh where has my memory gone! Speaking of arrays, there's a way to iterate over normal arrays now, such that you not only can read each element, but also modify it. foreach $elem (@elements) { $elem =~ s/foo/bar/; } I considered using the "for (a in b)" syntax, but decided it was too late to add a keyword like "in". So I went with the csh syntax. The "each" and the "$elem" are optional, so you can also write the above as for (@elements) { s/foo/bar/; # each elem is in $_ } How about for ((10,9,8,7,6,5,4,3,2,1,'BOOM!')) { print "$_\n"; sleep(1); } Several of the larger files are now split into smaller pieces for easier compilation. The @ary = (1); bug is now fixed. Oh, here's a biggie: I scrapped the search routines I was using before and inserted a heavily munged copy of Henry Spencer's regexp routines. This means you can now say silly things like /(abc|def)*/. I also added \s to match whitespace and \d to match digits. For setuid scripts, there was a need to be able to open pipes without invoking the shell to interpret the pipe command. To that end, opening a pipe to or from the command "-", forks off a copy of your script so that you can do the exec explicitly yourself, and not have to worry about weeding out shell metacharacters: open(secure,"|-") || exec '/usr/bin/tr', '[a-z]', '[A-Z]'; It hooks up the opened filehandle with either the stdin or stdout of the forked off script, whichever is appropriate. You don't have to do an exec, of course. What else...hmm..oh yes. You can now say do 'stat.pl'; which is similar to eval `cat stat.pl`; except that it's more efficient and can scan -I directories. Its primary purpose is to slurp in subroutine libraries from, say, /usr/local/lib/perl. The -I directories come in as array @INC, so you can modify them from within the program. There's more file tests, including -t to see if, for instance, stdin is a terminal. File tests now behave in a more correct manner. You can do file tests on filehandles as well as filenames. The $x = "...$x..."; bug is fixed. eof can now be used on each file of the <> input for such silly purposes as resetting the line numbers or appending to each file of an inplace edit. $#foo is now an lvalue. You can preallocate or truncate arrays. reset now resets arrays and associative arrays as well as string variables. Well, enough for starters. I particularly want to hear from people with machines I don't have access to. So far I've only tested it on Vax, 4.3bsd (Xinu) Sun 3, 3.3. Masscomp 5600 See if you can break it for me. Thanks. Larry Wall lwall@jpl-devvax.jpl.nasa.gov