Path: utzoo!attcan!uunet!mcsun!ukc!inmos!roger@wraxall.inmos.co.uk
From: roger@wraxall.inmos.co.uk (Roger Shepherd)
Newsgroups: comp.sys.transputer
Subject: Re: Re 64K Node Machine
Message-ID: <4759@ganymede.inmos.co.uk>
Date: 19 Mar 90 17:41:21 GMT
References: <1828.9003152120@prg.oxford.ac.uk> <19312@cs.yale.edu>
Sender: news@inmos.co.uk
Reply-To: roger@inmos.co.uk (Roger Shepherd)
Organization: INMOS Limited, Bristol, UK.
Lines: 74

In article <19312@cs.yale.edu> zenith-steven@cs.yale.edu (Steven
Ericsson Zenith) writes:

>In article <1828.9003152120@prg.oxford.ac.uk>,
>HALLAM@physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes:
>> 
>> At Southampton there is a big(ish) array of 1260 T212's without external
>> RAM (called `Deep Thought' as each board has 42 nodes...).....
>
>> what can be done with a 64K NODE machine I don't know - surely the link
>> speed would start to be a problem? If not that how about the physical cooling
>> /mounting engineering type problems? Where to get the 256 Gigabytes of RAM to
>> make it worth while? Sounds like a fun project! 
>
> Now here's an interesting question. What would the  characteristics of
> such a machine be? Remember, there is only 2kbytes per node, fixed
> configuration.  I figure the only useful way to program such a machine
> would be of the "load problem, compute, unload solution" variety,
> perhaps with an exchange with nearest nieghbour in there somewhere. But
> then I'm sure for most problems  2K is just not going to be enough.
> Anyone know better?

This is a very interesting questoion that Steve has raised. The first
thing  that I would note about such a machine is that 2k really is very
little store. This much store very rapidly gets filled up with program.
 When the 1260 transputer node machine was residing at one of our
staffers, Graham Cramp,  programmed the machine to do primality testing
of Mercenne numbers. This  proved to be a difficult problem due to the
lack of memory. Programs had to be written so as to minimise code size
- roll up all your loops - try and build programs so as to deal with
the  general case even if it is faster to separate out the problem into
disjoint special cases. From my experience working with Graham on this
problem I would suggest the following (in addition to SEZ's suggestion)
as plausible ideas.

1) Data base retreaval. The machine has a reasonable amount of memory
(128 Mbyte). Give half of this to program (generous) this leaves 64
Mbyte. The search problem is compiled, either into t-code, or into
something which will be interpreted by a (small) program sitting on
each processor. This should work pretty well. The balence between
search time and problem distribution seems to be reasonable; the
processors can be organised into a 16 level binary treee so a problem
needs to be distributed to only 16 nodes (say 0.2 mS?). The search time for
a trivial linear search of 1Kbytes is about 1mS, and you need to get the
answer back (0.2 mS). As you make the lookup more complex the communication
cost should become proportionately smaller.

2) Functional decomposition. In one application run on the Southampton 
machine they have used 2 processors as a compute node, this allows them
to run larger programs. I suspect that for some applications this would
work very well. 

When considering machines like this you really do need to rethink your
tradeoffs. For example, in an even more exotic case, we looked at using
silicon compilation to build a machine for generating (small, less than 
16-bit) prime numbers. We had an architecture which used a number of very
simple processor which could perform division and which were handed out
work by a controlling processor. The controller passed out odd numbers
which were then tested for primality by the workers. The question is 
``Should you store previously computed primes so that the workers divide
candidates only by primes rather than by all odd numbers?''.  For small
primes the answer is NO; the density of the primes in the odd numbers
is quite large for numbers of less than 16-bits; this has two implications,
firstly, there is not that much computation saved by dividing only by 
prime numbers, and secondly, you need a large RAM to store those primes. It
turns out that it is much more efficient to use that silicon area to 
build worker processors than to build a RAM. Of course, not many people
see that sort of ecconomics!


Roger Shepherd, INMOS Ltd   JANET:    roger@uk.co.inmos 
1000 Aztec West             UUCP:     ukc!inmos!roger or uunet!inmos-c!roger
Almondsbury                 INTERNET: roger@inmos.com
+44 454 616616              ROW:      roger@inmos.com OR roger@inmos.co.uk