Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!caip!topaz!ll-xn!nike!lll-crg!seismo!rochester!ritcv!cci632!rb From: rb@cci632.UUCP (Rex Ballard) Newsgroups: net.arch,net.unix Subject: Re: ELXSI System 6400 .... Information needed Message-ID: <131@cci632.UUCP> Date: Thu, 26-Jun-86 11:16:32 EDT Article-I.D.: cci632.131 Posted: Thu Jun 26 11:16:32 1986 Date-Received: Sat, 28-Jun-86 02:44:27 EDT References: <203@cybavax.UUCP> <1946@calmasd.CALMA.UUCP> <120@portal.UUcp> Reply-To: rb@ccird1.UUCP (Rex Ballard) Distribution: net Organization: CCI, Rochester Development, Rochester, NY Lines: 61 Xref: watmath net.arch:3593 net.unix:8416 Summary: How to get 101%, don't count everything. In article <120@portal.UUcp> jel@portal.UUcp (John Little) writes: >In article <1946@calmasd.CALMA.UUCP>, rfc@calmasd.CALMA.UUCP (Robert Clayton) writes: >> a 10 processor test at Sandia Labs they got 10.1X the power of a single >> processor. > >This is an interesting trick. Does anyone have a clue about how they >got a greater than linear speedup? Sure, I've seen it several times in several different situations. The secret is to not count anything other than "CPU" instruction speed. In reality, there are probably DMA, MMU and related controllers that are not included in the MIPS figures. Caching, asynchronous processing, and CPU time normally spent doing other things can also be contributing factors. One really old trick is to use the MMU to do "string moves", this is especially useful for "pipes" or their equivalents, where you know that the original is no longer needed. >Was this a cpu benchmark or did it include i/o? Any multi-processor benchmark requires at least some I/O even if it is just inter-process "pipes". If the single CPU timings were based on drystones, but the multi was prolog LIPs or some similar arrangement, the CPU ratings may have actually been too low. Even if the exact same algorythm was used (DMA controllers,...), the bus contention of DMA to/from the same processor vs two different processors would still lead to a small (1%) increase for two. From my own experience, I'm suprised they only got 1% on 10 processors, it should have been .3%/processor. Sequent, CCI, and several others have often found performance increases on certain applications (esp. the ones they were designed for). >Can I program my single processor to emulate a >multiprocessor configuration and get increased performance :-) ? In a way, yes! By using an ACRTC rather than a "Bit mapped" graphics display, an X.25 serial link instead of an RS-232 link ('rupts every block instead of every character), and about 20 other "tricks", you could actually get 200 times the performance of an equivalent "CPU only system". It wouldn't show up in the Drystones or Whetstones, but it would be noticable to the user. A number of 68020 and 68010 boxes have "Comm boards" that contain additional processors, including 68008s, 80186s and others, along with DMA, local memory (for buffering), and individual lines. These are usually not taken into consideration when Drystones are compared. A 5/30 is nominally rated at 2 MIPS, but there are a minimum of 2 additional 1 mips processors "hidden" in the controller boards. A Sun workstation isn't blindingly fast in Drystones, but for graphics, it would beat a VAX 8600 (if the Vax ran bit-mapped). A Cray X-MP will beat a 6/32 in number crunching any day, but a 6/32 does data bases and file servers extremely well. It's simply a matter of planning your system archetecture for the type of work you intend to do. Just to be fair, what benchmarks did they use?