Path: utzoo!attcan!uunet!husc6!bloom-beacon!gatech!ukma!psuvm.bitnet!cunyvm!nyser!cmx!amax.npac.syr.edu!anand From: anand@amax.npac.syr.edu (Anand Rangachari) Newsgroups: comp.sys.encore Subject: Re: Timing on the Multimax Message-ID: <832@cmx.npac.syr.edu> Date: 12 Nov 88 14:57:21 GMT References: <3273@ucdavis.ucdavis.edu> Sender: usenet@cmx.npac.syr.edu Reply-To: anand@amax.npac.syr.edu (Anand Rangachari) Organization: Northeast Parallel Architectures Center Lines: 86 In article <3273@ucdavis.ucdavis.edu> finley@iris.ucdavis.edu (Curtis M. Finley) writes: >I wrote a concurrent program using multitasking and ran the program >several times varying the amount of work done in the concurrent tasks. >The program reserves 35MB shared memory but doesn't use most of it. A >plot of work vs system time shows a "staircase" effect. As more work >is done, system time increases then levels off, increases then levels >off, increases then levels off .... The increases are about 0.4 to >0.6 seconds. The stepping occurs more when there are a large number >of processes (20) and less when there are fewer (4). Stepping is less >pronounced when only 3MB shared memory is reserved. One thought I had >is that the step up is due to time required to keep track of memory >for multiple processes at the time of a context switch. This argument >is flawed since we have six processors and the stepping still occurs >when the number of processes is less than six. Timings were done >while the machine was practically idle and there should not be any >context switches. Amazing, I thought that I was the only one who had a staircase effect. My program is a kernel for a concurrent language. The speedup is shown below. Note: Speedup := T(1) / T(n) where n is the number of processes. Speed | up | | | | | | 3 | * * | 2 | * * | 1 |* * +------------------------------------ 1 2 3 4 5 6 Number of processes The funny thing is that our Multimax has APCs in which each processor has its own cache which makes it even more mystifying. This effect is most noticeable in programs that need to make frequent references to the main memory (ie cache is not effective). This effect occurs all the way to 18 processes (we have 18 processors). Also no other users on the system. > >Does UMAX handle processes started with task_init as a unit? That is, >are they time-sliced simultaneously or are they completely independent? >Presumably they are independent but if they are not perhaps that could >help explain the stepping described above. This brings up another question about Umax. I know that on a single processor computer running unix, the cpu is interrupted at regular intervals by the timer to allow time slice scheduling. The question then is: Is each processor interrupted every 200 msec for scheduling or is it just one processor which is interrupted. Thus on our 18 processor system if the latter were true, each processor will be interrupted only every 3.6 seconds. > >My timing studies indicate that programs using multitasking have to do >a tremendous amount of parallel computation to out perform programs >with a single thread of control. This is mainly due to the large >system overhead of starting the multitasking environment. Is threads >significantly better in this respect? This is what we have been studying for the last year while developing our concurrent language. The speedup of your program depends on how much time is wasted. On the multimax, there are two ways that time can be wasted: 1. Waiting for references from main memory 2. Waiting on locks. You can minimize the first one by making your threads refer only to very small segments of memory. The second can be minimized by careful program design. I have not measured how long it takes to start a thread in the supplied package but in our language it takes about 200 microseconds. If a thread communicates with another thread no more than once a millisecond, your program will show a nearly linear speedup. R. Anand Internet: anand@amax.npac.syr.edu Bitnet: ranand@sunrise