Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!samsung!uunet!convex!usenet From: tchrist@convex.COM (Tom Christiansen) Newsgroups: comp.lang.perl Subject: Re: excessive perl memory usage Message-ID: <1991May02.234822.17266@convex.com> Date: 2 May 91 23:48:22 GMT References: <1991May2.212216.24563@batcomputer.tn.cornell.edu> Sender: usenet@convex.com (news access account) Reply-To: tchrist@convex.COM (Tom Christiansen) Organization: CONVEX Software Development, Richardson, TX Lines: 135 Nntp-Posting-Host: pixel.convex.com From the keyboard of lijewski@theory.tn.cornell.edu (Mike Lijewski): : :Perl users, : :Appended is a script called 'governor' which I'm working on. The :intent is to monitor the usage of one of our frontend machines for :heavy usage, with the intent of niceing or killing cpu bound :processes which should be running on our backend machines. While :running, I've seen the perl process grow to roughly 10Mbytes on our :IBM 3090 running AIX/370. The version of perl is 3.44. I would :appreciate it if anyone could tell me why it is so memory :inefficient. A typical 'ps -ef' returns 150 or so lines on the :machine. Thanks. You're doing a few things that really slow you down, and a few things that really gobble up memory. First, the memory. You're using local()s inside of loops. That gobbles up more and more memory until you finally exit that scope. For example: for ($i = 1; $i <= $#merged_pids; $i++) { # loop through lines local(@line2) = split(/[ \t]+/, $merged_pids[$i]); This will make $#merged_pids copies of @lines, which is itself going to use up a bunch of memory. Declare the local() outside the loop in cases like this. Another thing that you're doing that sucks up memory and cpu is a lot of splitting. Splitting is expensive. Instead of the split line above, you could use: ($pid2, $time2) = $merged_pids[$i] =~ /^\s*\S+\s+(\d+)[^:]+(\d+:\d+)/; This is also nice because it doesn't care how many fields or columns over the times are. This varies a lot on different machines. (By the way, to split without leading null fields, split on ' ' instead of on /\s+/.) The slowest thing of all is the way you're doing the sorting. You split each line many many many many times. This will take nearly forever. You should use the "sort the indices" trick so that you only pull out what you need once. Here's a hacked up version of your program that runs fine on my machine, pretty quickly, and without too much memory use. --tom #!/usr/local/bin/perl $ps_opts = -f '/vmunix' ? 'axu' : '-ef'; sub sort_by_pid { local(@pids) = (); for (@_) { /^\s*\S+\s+(\d+)/ && push(@pids, $1); } = @_[sort _by_pid 0..@pids]; } sub _by_pid { $pids[$a] <=> $pids[$b]; } # This subroutine calls ps, deletes all processes owned by root, and # then sorts it by pid. # sub get_sorted_ps { open (PS, "ps $ps_opts |") || die "Couldn't open ps pipe: $!"; @ps2 = ; # slurp up the ps output shift @ps2; # chop off the header # Delete root processes and sort by pid. @ps2 = &sort_by_pid(grep(!/^ *root\b/, @ps2)); close (PS); } # # This subroutine finds those processes using "too much" CPU time. # sub find_bad_dudes { local(@merged_pids) = &sort_by_pid(@ps1, @ps2); # merge old and new local($i, $min1, $min2, $sec1, $sec2, $cpu_rate, $pid1, $pid2, $time1, $time2); ($pid1, $time1) = $merged_pids[$0] =~ /^\s*\S+\s+(\d+)[^:]+(\d+:\d+)/; for ($i = 1; $i <= $#merged_pids; $i++) { # loop through lines ($pid2, $time2) = $merged_pids[$i] =~ /^\s*\S+\s+(\d+)[^:]+(\d+:\d+)/; # if pids are identical and time fields are different if ( ($pid1 == $pid2) && ($time1 ne $time2)) { # found a potential bad dude ($min1, $sec1) = $time1 =~ /(\d+):(\d+)/; ($min2, $sec2) = $time2 =~ /(\d+):(\d+)/; $cpu_rate = ((60 * $min2 + $sec2) - (60 * $min1 + $sec1)) / $sleep_interval; # make sure cpu rate is positive if ($cpu_rate < 0) { $cpu_rate = -$cpu_rate; } if ($cpu_rate > $cpu_threshold) { # we've found a cpu burner print "BURNER: ", $merged_pids[$i]; } } ($pid1, $time1) = ($pid2, $time2); # update last line } } # # ********** main routine ********** # # # global variables # $sleep_interval = 10; # how long we sleep between checking process statictics $cpu_threshold = 0.01; # what we consider an unreasonable amount of cpu usage @ps1 = (); # contains old "ps" output @ps2 = (); # contains new "ps" output print "starting...\n"; &get_sorted_ps; # get most recent process statistics for (;;) { # do forever sleep $sleep_interval; print "waking up...\n"; @ps1 = @ps2; # save previous process statistics &get_sorted_ps; # get most recent process statistics &find_bad_dudes; #last; } exit(0); -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."