Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!decwrl!amdcad!amdahl!mat From: mat@amdahl.UUCP Newsgroups: net.arch,net.unix Subject: Re: ELXSI System 6400 .... Information needed Message-ID: <3390@amdahl.UUCP> Date: Sat, 21-Jun-86 00:49:18 EDT Article-I.D.: amdahl.3390 Posted: Sat Jun 21 00:49:18 1986 Date-Received: Sun, 22-Jun-86 07:22:00 EDT References: <203@cybavax.UUCP> <1946@calmasd.CALMA.UUCP> <120@portal.UUcp> Distribution: net Organization: Amdahl Corp, Sunnyvale CA Lines: 38 Xref: watmath net.arch:3535 net.unix:8321 In article <120@portal.UUcp>, jel@portal.UUcp (John Little) writes: > In article <1946@calmasd.CALMA.UUCP>, rfc@calmasd.CALMA.UUCP (Robert Clayton) writes: > > a 10 processor test at Sandia Labs they got 10.1X the power of a single > > processor. > > This is an interesting trick. Does anyone have a clue about how they > got a greater than linear speedup? Was this a cpu benchmark or did > it include i/o? It was a fixed workload benchmark, running a variety of jobs as I recall. It included I/O, but didn't measure anything but relative CPU throughput capacity. The greater than linear speedup occurs as a result of improved locality of reference, reduced process switching, and better cache performance. The ELXSI machine uses a message based architecture, and has cached process context (registers, etc.) for 16 processes per processor. It is very cheap to switch to a process that has a process slot, and very expensive to switch to one that doesn't (process 0, the scheduler, must be woken up to purge one process from its slot and set up the slot for the new one before the new one can run. A microcode dispatcher handles the dispatching of "hot" processes that have a slot. Anyway, the existence of more process slots reduces the number of very costly swaps, and, as a byproduct, reduces cache miss rate, etc. Net result is that these savings more than offset any interprocessor interference losses. Since there is no memory sharing, this interference is small. It should be pointed out that the message based architecture induces a very high process switch rate, which makes these effects quite different than would be observed in more traditional systems. In a sense, the superlinear speedup is observed because of reducing overheads which make the uniprocessor system run "slower than it should." -- Mike Taylor ...!{ihnp4,hplabs,amd,sun}!amdahl!mat [ This may not reflect my opinion, let alone anyone else's. ]