Path: utzoo!attcan!uunet!dino!ux1.cso.uiuc.edu!uxa.cso.uiuc.edu!job00542 From: job00542@uxa.cso.uiuc.edu (James Bordner) Newsgroups: comp.sys.sequent Subject: Re: Timing parallel programs on Symmetry Message-ID: <1990Jul3.200119.692@ux1.cso.uiuc.edu> Date: 3 Jul 90 20:01:19 GMT References: <63900005@uxa.cso.uiuc.edu> <63900006@uxa.cso.uiuc.edu> <112093@linus.mitre.org> Sender: usenet@ux1.cso.uiuc.edu (News) Organization: University of Illinois at Urbana Lines: 70 The following are two partial outputs from the profiler 'gprof' for a parallel Fortran program on a Sequent Balance. The first was obtained when the load was relatively small; the second when other CPU intensive parallel programs were running. Both are from the same program running the same data; only the loads were different. granularity: each sample hit covers 4 byte(s) for 0.02% of 52.95 seconds %time cumsecs seconds calls name 10.3 5.44 5.44 9 _gradp_ 6.8 9.02 3.58 mcount 6.1 12.24 3.22 133 _do50%hesp_ 5.6 15.20 2.96 133 _do56%hesp_ 5.5 18.09 2.89 133 _do53%hesp_ 5.4 20.93 2.84 133 _do33%hesp_ 5.3 23.71 2.78 133 _do36%hesp_ : : : : : : : : : : granularity: each sample hit covers 4 byte(s) for 0.02% of 63.36 seconds %time cumsecs seconds calls name 20.5 13.02 13.02 2326 __m_join 8.6 18.46 5.44 9 _gradp_ 5.5 21.94 3.48 mcount 4.9 25.04 3.10 133 _do50%hesp_ 4.8 28.11 3.07 133 _do53%hesp_ 4.7 31.10 2.99 133 _do56%hesp_ 4.6 34.00 2.90 133 _do36%hesp_ : : : : : : : : : : The 'seconds' column differs significantly only in the m_join call. M_join is (I assume) an internal function in Sequent's Parallel Programming Library. (Functions in the P.P.L. generally start with 'm_'; I assume it's internal since it's not documented :-) A reasonable guess is that it's the function where processes wait for other processes to catch up to them at the end of parallel loops. If the load is heavy, then running processes may spend a lot of time waiting in m_join for processes which are not currently running (ready, blocked, suspended: whatever the OS people call it when the processor is currently working on someone else's process). Possible solutions are to: 1) Install the gang scheduler /pub/gang.tar.Z from maddog.llnl.gov, which is accessible via anonymous ftp. We have it installed on a Symmetry but I haven't gotten it to work yet. It is designed to syncronize the running and ready states of processes in a parallel program, eliminating problems such as the one described in the last sentence of the previous paragraph. 2) Do timings at night when the load is small. This is what I'm doing until I get the gang scheduler to work. 3) Use the output from gprof to figure out the user time. One problem with this is that to get a profile, you have to link in profiling functions which themselves slow down the code (without the profiler functions linked in, the above program ran in about 45 seconds during small loads). I hope this helps some-- let me know how things go. -- James Bordner email:job00542@uxa.cso.uiuc.edu bordner@mcs.anl.gov