Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!evax!utacfd!letni!mic!convex!convex.COM From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.lang.perl Subject: Re: Arrays and me :-( Message-ID: <110684@convex.convex.com> Date: 12 Dec 90 05:04:07 GMT References: <29144@sequoia.execu.com> Sender: usenet@convex.com Reply-To: tchrist@convex.COM (Tom Christiansen) Distribution: usa Organization: CONVEX Software Development, Richardson, TX Lines: 347 In article <29144@sequoia.execu.com> painter@sequoia.execu.com (Tom Painter) writes: > >I'd like to split a passwd file into a multi-dimensional array. I'd >like one dimension to be the relative line number in the file, and the >other to be the field in the passwd file. Such that the following >would print the GCOS field from the 50th entry > >while () { > $i++; > @passwd[$i] = split(/:/); >} >printf "%s\n", $passwd[5,50]; > >However, I end up with only the logname field in the array. While I could >list the parts out, I feel confident that someone has a clever answer. Please >suggest away. Let's start out with the basics. Your very biggest mistake is that perl doesn't really and truly have honest-to-goodness first-class multidimensional arrays as you are trying to use them. You might consult question #17 of the FAQ. Here are some other things: Remember that perl arrays are by default 0-based, which makes the GCOS field index 4, not 5. Also, since you have $. as the current line number, there's no need to keep $i. And when you split into @passwd[$i], you're splitting into an array slice of length one, which means that everything but the initial field of split is discarded, leaving you the login, which you pronounce logname and C programmers pronounce pw_name. Then you say $passwd[5,50], which is to your probably surprise really merely asking for element 50 because that comma is the C comma operator. And don't waste time using printf when a simple print will do nicely. I'm going to show you a bunch of ways to do what you've said you want to do. but you know, I can't help but question that you really want to do this. Why do you want to index by line number, of all funky things, rather than uid or login or whatnot? To create an array such as you'd like will be very slow. For example, here's my password file: % wc /etc/passwd 1850 4436 152509 /etc/passwd Parsing out password files in considered the wrong way of doing things. You really out to be using getpwent. But we'll get to that presently. The first thing I'll do is a relatively literal translation of your code into something that does more what you seem to want to do. I'm going to use perl's multidimensional array emulation technique of passing multiple subscripts to an associative array reference, as opposed to an indexed array. Don't worry right now that it's just an emulation. For what you're doing, this is quite convenient. # METHOD 1 while () { split(/:/); for $fld ( 0..8 ) { $passwd{ $fld, $. } = $_[$fld]; } } for $i (0..8) { printf "%d %s\n", $i, $passwd{$i,50}; } Notice I cannot do an aggregate (slice) assignment, because these aren't really multidimensional arrays that you can take slices of (at least not that way you can't.) This method runs in this amount of time: 11.030066 real 9.374684 user 1.020688 sys and produces this output on my system: 0 RSuucp 1 * 2 14 3 40 4 BTL research UNIX-to-UNIX Copy 5 /mnt/null 6 /usr/adm/admonish/No-Account 7 8 Yes, it's true, I didn't chop the newline from the shell. Anyway, that's RRRREEEEAAAALLLLLLLLYYYY SSSSLLLLOOOOWWWW. There are two reasons for this: it's a big password file, and those splits are expensive multiplied 1850 times. You don't really need indices 7 and 8 yet, but you might later; wait and see. Here's another way: # METHOD 2 $passwd[1 + $.] = until eof(PASSWD); split(/:/, $passwd[50]); for $i (0..8) { printf "%d %s\n", $i, $_[$i]; } This runs in this time and produces the same output: 2.271481 real 0.654493 user 1.345908 sys which is a lot better, albeit to my mind still a tad slow. The reason it's an order of magnitude faster is that I don't do all those splits or go creating all those array elements. If you wanted, you could grab the gcos this way: $gcos = (split(/:/, $passwd[50]))[4]; As you can see, delaying the split until you really need it is really much better. In case you wonder about the 1+$. part, it's because the $. doesn't get bumped until after the read, and I wanted $passwd[50] to still be the same guy. I think I'd write a function &gcos like this to cache the value for me so I never have to split more than once: # GCOS 1 sub gcos { unless (defined $gcos{$_[0]}) { $gcos{$_[0]} = (split(/:/, $passwd[$_[0]]))[4]; } $gcos{$_[0]}; } I'm assuming that I'm calling it with 50 or whatnot, your line number. (What a funny thing to do!) If you're going to be using small integers, why don't you use this: # GCOS 2 sub gcos { unless (defined $gcos[$_[0]]) { $gcos{$_[0]} = (split(/:/, $passwd[$_[0]]))[4]; } $gcos[$_[0]]; } Now, remember I said that you don't really want to parse the password file by hand? That's because you don't really know what it looks like, for one thing. Consider for example NIS, nee the Yellow Plague, and the way it handles +@netgroup and +foo::: entries and all. Another good reason to use these routines is that you might well have a hashed passwd file even if you're not using YP. Here's a method that's much more portable, based on method #1. # METHOD 3 setpwent; while (@_ = getpwent) { $i++; for $fld ( 0..8 ) { $passwd{ $fld, $i } = $_[$fld]; } } endpwent; for $i (0..8) { printf "%d %s\n", $i, $passwd{$i,50}; } It runs in this time: 11.062443 real 8.832377 user 1.186499 sys That's not very much better than method 1. One reason is you have to iterate through the whole file anyway, rather than just asking for the one value you need. You're also making all those darn array values that you may not ever use anyway. Now method 3 also produces different output: 0 RSuucp 1 * 2 14 3 40 4 0 5 6 BTL research UNIX-to-UNIX Copy 7 /mnt/null 8 /usr/adm/admonish/No-Account That's because the getpw* functions are defined in perl to return the following list. (I told you I'd used indices 8 and 9.) ($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell) = getpwent; Now, we can speed this up a bit by just keeping the gcoses (sounds like some kind of mental disorder, doesn't it?) with this code: # METHOD 4 setpwent; 0 while $gcos[++$i] = (getpwent)[6]; endpwent; print "gcos[50] is ", $gcos[50], "\n";; Which finally runs in respectable time: 0.642865 real 0.404234 user 0.126877 sys And I feel much better about perl again. On the other hand, why suck in the whole password file if all you want is the gcos field for line (line??? I still can't see how that makes sense) number 50. Let's assume you really want uid 50. Code your gcos function this way: # GCOS 3 sub gcos { unless (defined $gcos{$_[0]}) { $gcos{$_[0]} = (split(/:/, getpwuid($_[0]))[4]; } $gcos{$_[0]}; } You might wish to index by login name instead. Just use getpwnam where getpwuid is being used. Now, for those people who feel they just can't live without real (or at least realer) multidimensional arrays for whatever the reason, here are a couple ways of coming closer to doing that. I still believe that this is not what you really would like to do, but for the sake of completeness, I'll show you anyway. Do consider whether you honestly need the whole passwd file in memory all the time and already split up into pieces. I remain dubious. First, we'll invoke journeyman magic by constructing an array of array names, and load and store into it through an eval. Here's the code; it produces the same output as the first ones: # METHOD 5 while () { split(/:/); eval "\@pass$. = \@_"; } for $i (0..8) { printf "%d %s\n", $i, eval "\$pass" . 50 . '[$i]', "\n"; } It ran in this much time: 10.885047 real 8.753669 user 0.842107 sys Now, I could just print out $pass50[$i], but I wanted to show the general case. I did try joining the split inside the eval: # METHOD 6 while () { eval "\@pass$. = split(/:/)"; } But it ends up running more slowly this way. 11.797876 real 9.447561 user 0.923269 sys I've a strong suspicion that it's because the split regexp is getting recompiled in each eval. Sadly, you can't trick it by putting a /:/ outside the loop and then using // as your regexp (which normally means the last regexp and saves the recompilation) because split interprets // to mean to split on the null string, ie. a character at a time. Now if you caught all that, it's time to go into still heavier wizardry, at least by most people's standards. We're going to use the *foo type globbing notation to construct an array of array references. This is actually a bit faster this way, and anyway, I'm a bit of (computer) speed freak. Here's the code: # METHOD 7 while () { split(/:/); *passwd = "pass$."; @passwd = @_; } *passwd = 'pass' . 50; for $i (0..8) { printf "%d %s\n", $i, $passwd[$i]; } Which trims off a couple seconds: 9.131857 real 7.367163 user 0.686693 sys I can save myself some more time by storing the output of split directly into the right entry. # METHOD 8 while () { *passwd = "pass$."; @passwd = split(/:/); } *passwd = 'pass' . 50; for $i (0..8) { printf "%d %s\n", $i, $passwd[$i]; } This runs in this much time: 8.237005 real 6.832681 user 0.655225 sys Now, buried deep in the perl mannovel, Larry has written the following rather ominous warning: Assignment to *name is currently recommended only inside a local(). You can actually assign to *name anywhere, but the previous referent of *name may be stranded forever. This may or may not bother you. Well, I'm not sure whether I should be bothered, since it runs find this way, but I dutifully made a local for it and tried again: # METHOD 9 while () { local(*passwd) = "pass$."; @passwd = split(/:/); } local(*passwd) = 'pass' . 50; for $i (0..8) { printf "%d %s\n", $i, $passwd[$i]; } However, as is to be expected, this ran slower: 9.130706 real 7.162618 user 0.756463 sys That's because that apparent local declaration of *passwd is really a run-time statement, and we need to build up 1850 versions of *passwd before we exit that block. I must close by reiterating my suggestion to only call getpw...() on the thing you really want and not try to suck in the whole passwd file at once I do hope these are enough suggestions for you. :-) --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "With a kernel dive, all things are possible, but it sure makes it hard to look at yourself in the mirror the next morning." -me Brought to you by Super Global Mega Corp .com