Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!biology.cambridge.ac.uk!ARCR1 From: ARCR1@biology.cambridge.ac.uk (Andy Raine) Newsgroups: comp.sys.transputer Subject: i860s and the like Message-ID: <8439.9008081138@prg.oxford.ac.uk> Date: 8 Aug 90 11:39:00 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 52 Dear netters, Following the exhibition at TA90 at Southampton, and following previous discussion on the net, the following has occurred to me, and I offer it as a topic for debate: Meiko, Transtech, Microway etc. etc. sell boards that have an intel i860 procesor interfaced to one or two t800's. INMOS have a TRAM which has a vector coprocessor attached to a t800. All these boards offer what seems to be impressive cpu performance (albeit for calculations that can be expressed in terms of vector processing libraries, at the moment), but one thing bothers me: Consider: Meiko claim that their board can do a 1024x32bit complex FFT in 1.3 ms. INMOS reckon their board will do the same in < 2.0 ms. The data required for this calculation is 1024x4x2 = 8kBytes, which would take (Using Meiko's CStools communications) about 8ms to transfer from one transputer to another (using occam on a bare link would be faster, but certainly no less than 4 ms). So if all 8 links of the Meiko board (it has 2 t800s) were saturated, and if communications can be overlapped with calculations completely, then the vector processor would just about be busy all the time. For the boards with only 4 links, than only 50% utilisation of the coprocessor would be expected as a maximum. In realistic cases, data just wont be available to the processor at these maximum rates. So what should we conclude? Well, many problems are parallelisable, but efficient algorithms depend on minimising the time spent in communicating data. The t800 has been referred to as a 'medium grain' processor, meaning that often a few tens of processors can be brought to bear efficiently on a particular problem. The new vector/i860 boards are then 'coarse grain' processors, and for the same problem, the maximum number that can be used efficiently will be smaller. I suggest that what is needed for a large number of scientific calculations is a 'fine grain' processor. In other words, if the ratio of communication speed to compute speed of the t800 is taken to be 1:1, then what is needed is a processor where the ratio is 10:1. The i860 & vector boards achieve a ratio of 1:10 (The wrong way!), and the H1 transputer maintains the t800 ratio at 1:1. If a manufacturer produced boards with a t800 coupled with link driver hardware that ran at 10 times the speed of the t800's links, then I would be able to use ten times as many processors, and get 10 times the performance. What about it? OK, thats all. I seem to have written quite a lot, but I dont want to appear to be trying to force my point of view down other people's throats, just to start a discussion. So what do people think? Andy Raine