Path: utzoo!attcan!uunet!mcsun!ukc!inmos!roger@wraxall.inmos.co.uk From: roger@wraxall.inmos.co.uk (Roger Shepherd) Newsgroups: comp.sys.transputer Subject: Re: Re 64K Node Machine Message-ID: <4759@ganymede.inmos.co.uk> Date: 19 Mar 90 17:41:21 GMT References: <1828.9003152120@prg.oxford.ac.uk> <19312@cs.yale.edu> Sender: news@inmos.co.uk Reply-To: roger@inmos.co.uk (Roger Shepherd) Organization: INMOS Limited, Bristol, UK. Lines: 74 In article <19312@cs.yale.edu> zenith-steven@cs.yale.edu (Steven Ericsson Zenith) writes: >In article <1828.9003152120@prg.oxford.ac.uk>, >HALLAM@physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes: >> >> At Southampton there is a big(ish) array of 1260 T212's without external >> RAM (called `Deep Thought' as each board has 42 nodes...)..... > >> what can be done with a 64K NODE machine I don't know - surely the link >> speed would start to be a problem? If not that how about the physical cooling >> /mounting engineering type problems? Where to get the 256 Gigabytes of RAM to >> make it worth while? Sounds like a fun project! > > Now here's an interesting question. What would the characteristics of > such a machine be? Remember, there is only 2kbytes per node, fixed > configuration. I figure the only useful way to program such a machine > would be of the "load problem, compute, unload solution" variety, > perhaps with an exchange with nearest nieghbour in there somewhere. But > then I'm sure for most problems 2K is just not going to be enough. > Anyone know better? This is a very interesting questoion that Steve has raised. The first thing that I would note about such a machine is that 2k really is very little store. This much store very rapidly gets filled up with program. When the 1260 transputer node machine was residing at one of our staffers, Graham Cramp, programmed the machine to do primality testing of Mercenne numbers. This proved to be a difficult problem due to the lack of memory. Programs had to be written so as to minimise code size - roll up all your loops - try and build programs so as to deal with the general case even if it is faster to separate out the problem into disjoint special cases. From my experience working with Graham on this problem I would suggest the following (in addition to SEZ's suggestion) as plausible ideas. 1) Data base retreaval. The machine has a reasonable amount of memory (128 Mbyte). Give half of this to program (generous) this leaves 64 Mbyte. The search problem is compiled, either into t-code, or into something which will be interpreted by a (small) program sitting on each processor. This should work pretty well. The balence between search time and problem distribution seems to be reasonable; the processors can be organised into a 16 level binary treee so a problem needs to be distributed to only 16 nodes (say 0.2 mS?). The search time for a trivial linear search of 1Kbytes is about 1mS, and you need to get the answer back (0.2 mS). As you make the lookup more complex the communication cost should become proportionately smaller. 2) Functional decomposition. In one application run on the Southampton machine they have used 2 processors as a compute node, this allows them to run larger programs. I suspect that for some applications this would work very well. When considering machines like this you really do need to rethink your tradeoffs. For example, in an even more exotic case, we looked at using silicon compilation to build a machine for generating (small, less than 16-bit) prime numbers. We had an architecture which used a number of very simple processor which could perform division and which were handed out work by a controlling processor. The controller passed out odd numbers which were then tested for primality by the workers. The question is ``Should you store previously computed primes so that the workers divide candidates only by primes rather than by all odd numbers?''. For small primes the answer is NO; the density of the primes in the odd numbers is quite large for numbers of less than 16-bits; this has two implications, firstly, there is not that much computation saved by dividing only by prime numbers, and secondly, you need a large RAM to store those primes. It turns out that it is much more efficient to use that silicon area to build worker processors than to build a RAM. Of course, not many people see that sort of ecconomics! Roger Shepherd, INMOS Ltd JANET: roger@uk.co.inmos 1000 Aztec West UUCP: ukc!inmos!roger or uunet!inmos-c!roger Almondsbury INTERNET: roger@inmos.com +44 454 616616 ROW: roger@inmos.com OR roger@inmos.co.uk