Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ctrsol!ginosko!usc!bufo.usc.edu!vorbrueg From: vorbrueg@bufo.usc.edu (Jan Vorbrueggen) Newsgroups: comp.arch Subject: Re: parallel systems Message-ID: <20764@usc.edu> Date: 24 Oct 89 03:26:50 GMT Sender: news@usc.edu Reply-To: vorbrueg@bufo.usc.edu (Jan Vorbrueggen) Organization: University of Southern California, Los Angeles, CA Lines: 36 In article <36597@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: >Given equivalent performance interconnect, which rarely occurs because the >message passing machines tend to get short changed on the comm. hardware, >I have found the "shared memory" systems to have much better communication >performance. This is because the communication between processors is >directly supported in the memory management hardware. In the message passing >machines sending a message invokes a "kernel call" on both the sending and >recieving ends. This system call overhead is much greater than the hardware >latency itself, ammounting to a factor 5 or more. One could try for complex >hardware support of messaging, but a better solution is to just memory map it. > >Please note: I am not talking about the really horrible interrupt handling >of message forwarding here. This only compounds a bad situation for kernel >overhead. Eugene, ever seen a transputer? Overhead for receiving or sending a message is 19 cycles (630 ns for a 30 MHz part). The actual transfer is done by a dedicated DMA machine at a maximum rate of 1.7 Mbyte/s unidirectional or 2.4 MByte/s bidirectional. At 4 links/transputer this gives 9.6 Mbytes/s, close to what most memory interfaces will allow. Of course, very short messages will limit your transfer rate; however, at 128 Bytes/message you see about 80% of the maximum rate. There is no system call involved - the compiler just generates the necessary instruction. Message forwarding isn't so difficult either. I've read of a system requiring less than 10 us overhead per through-route (this probably is for the destination link being available). No interrupt handling involved here - that part is all handled in hardware. Next generation (i.e., promised for start of 1991) will have 100 Mbit/s per link and the possibility of hardware routing (a la wormhole). The cpu will be faster by factor of 4 or so and a memory bandwith to match. -- Jan Vorbrueggen