Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!nysernic!itsgw!batcomputer!andy From: andy@batcomputer.tn.cornell.edu (Andy Pfiffer) Newsgroups: comp.sys.atari.st,comp.sys.misc,comp.sys.amiga Subject: (Really Transputers) Re: Atari Transputers Message-ID: <2554@batcomputer.tn.cornell.edu> Date: Sun, 4-Oct-87 20:55:12 EDT Article-I.D.: batcompu.2554 Posted: Sun Oct 4 20:55:12 1987 Date-Received: Wed, 7-Oct-87 07:03:04 EDT References: <8709181728.AA13664@ucbvax.Berkeley.EDU> <1623@gryphon.CTS.COM> <607@sbcs.UUCP> <1138@water.waterloo.edu> <396@nikhefh.UUCP> Reply-To: andy@tcgould.tn.cornell.edu (Andy Pfiffer) Followup-To: comp.sys.transputer Organization: Cornell Theory Center, Cornell University, Ithaca NY Lines: 158 Xref: mnetor comp.sys.atari.st:5465 comp.sys.misc:888 comp.sys.amiga:9090 In article <396@nikhefh.UUCP> gert@nikhefh.UUCP (Gert Poletiek) writes: >I do not dare to compare ALL Risc type processors to the 68*** family of >processors. I can however comment a bit on the T-series from Inmos. The T-Series is an entire pseudo-hypercube (the multiplexed Transputer links actually configure the machine as a ring of hypercubes) from Floating Point Systems. The smallest module (a T-10, although FPS has never sold *just* a T-10) contains: 8 VPU boards (1MB video RAM, T414-15, Weitek VPU) 1 system board (1MB video RAM, T414-15, 71MB SCSI disk, 2 UARTS) and are packed two T-10s to a cabinet (a T-20). The machine can be configured (theoretically, at least) with 16384 VPU boards and 2048 system boards. A T-20 has been observed to run at a sustainable 160 MFLOPS. A "Transputer" is a single-chip CPU from Inmos, Ltd. How do I know, well -- lets just say we have a T-Series and are in the final stages before releasing our message-passing operating system for it (Trillium). It also happens to run very well on Suns, Vaxen, Goulds, generic Transputer boxes, and nearly everything we've been able to get our hands on. >The T414 is comparable to the 68000 or 68010, and the T800 is comparable >to the set 68020/68881. Roughly. Well, sort of... Its the old apples vs. oranges problem... >All transputers have a memory interface, 2 KiloByte onchip memory (4Kb for >the T800) and 4 serial interfaces. The serial interfaces, more commonly known as "links" use DMA cycle-stealing to transmit/receive data. They are *very* simple to use (buffer_ptr, link address, byte count, in/out) and the processor make *no* distinction between using "soft" channels in memory (use any addressable word in memory as a redevous point for channel communication between local processes) and external links. >the micro code level. The serial links run at 20 MegaBits per second. It also depends on where the data for that transfer is coming from. If you are transferring bytes from internal memory (on chip RAM), you will indeed get about 20Mb/second -- provided you count the 2 start bits, 8 data bits, 1 stop bit, and the 2 bit ACK from the other direction. Link performance is very subject to external memory speed, and communication in the other direction. In general, you can count on 8 to 10 Mb sustained if you have 3 or 4 processor cycle external RAM and are communicating in both directions. >The T800 is clocked at 20 MHz (and planned for next year? at 30 Mhz). The >floating point unit operates at a speed of 1.5 MegaFlops. The main processor >operates at a speed of 10 MIPS. You'll find that 7.5 to 12.5 MIPS is about what you will find for pure execution out of on-chip RAM (code, data, and stack) on a 20MHZ device. Actually performance is usually less. On the T-Series, the external memory takes 7 processor cycles to reference (approx. 50 nanoseconds/cycle) and the MIP rating on a per-node basis is substantially less (around 3 MIPS). >The transputers are multi tasking processors. Multitasking is also supported >by the hardware/micro code, resulting in a context switch that takes no more >than 2 to 4 MicroSeconds. Context switches are in fact very fast because only certain instructions allow a context switch to a process running at the same priority (there are two priority levels). The reason the switches are *so* fast is that no context (other than the stack pointer) is saved! Its a simply elegant (and everything that phrase implies) method for multi-tasking. It is in fact quite easy to write an infinite loop that will not let other processes run. >For processes running concurrently on the same processor the same data >transfer primitives as those used for the serial links can be used. Again, simply elegant. This very point is what makes it easy to move a process from one Transputer to another. That process doesn't have the slightest idea that it is or is not using a link adaptor. >... The instructions of the >transputers are all the same length, making decoding easier and faster. This isn't exactly true. Transputer instructions range in length from 1 byte to infite bytes (on a Transputer of infinite word size). Realistically, you won't find many instructions longer than 8 bytes, with a larger majority of sampled instructions from real programs being 1 or 2 bytes in length. Decoding is easy because there are only 16 basic instructions -- ie one nybble's worth. The most common instructions (load a variable from the local workspace, store a local variable, comparisons, branching, call, etc) can fit in 1 byte if your operand is valued from 0 to 15 inclusive. There are 2 other instructions, prefix and nfix that load their value into the operand register, and shift it left by 4. That is how the same code for a T212 (16 bit transputer) can run unmodified on a T414. And in most cases, the reverse is also true. (obviously constraining memory to the minimum of the two processors). >Also there are a lot less instructions than in the 68***. When last I checked, our simulator had 99 of 102 T414 instructions implemented. For those wondering how you get 102 instructions from 1 nybble, let me say that one of the basic instructions is an "indirect" instruction -- it operates on the value in the operand register. My best guess is that there is an internal microcodeish jump table burned into an inaccessible on-chip ROM... Floating point instructions on the T800 (the ieee floating point stuff is actually a separate processor within the chip carrier and can operate concurrently with the Transputer) are executed in much the same way. >The following table is published in several Inmos publications on the >transputer performance (note that this is not a MIPS/Flops rating, but a more >or less 'real-life' rating): > > processor clock Whetstones/second > > Intel 80286/80287 8 MHz 300 000 > Inmos T414 20 Mhz 663 000 > NS 32332/32081 15 Mhz 728 000 > MC 68020/68881 16/12 Mhz 755 000 > Fairchild Clipper 33 Mhz 2220 000 > Inmos T800 20 Mhz 4000 000 > Inmos T800 30 Mhz 6000 000 > Caveat Emptor! INMOS, as do most manufacturers in my experience, tend to oversell their estimates. I've seen this table too and the Transputer figures are gathered from the best possible cases (program written in Occam; program, stack, and any data running from on-chip RAM or INMOS fast static RAM; etc.) If it helps, Jeff Mock from Pixar is quoting about 3200 Dhrystones for a T414-20 with 3, maybe 4 cycle RAM (Hi Jeff. are you out there?) with the same C compiler we use. Our values (on a per processor scale) are considerably slower due to the T-Series 7 cycle RAM, and our operating system overhead (still present, but *very* small). His benchmarks run "bare" with no real operating system under them. >Gert Poletiek >NIKHEF-H, Dutch National Institute for Nuclear and High Energy Physics > Kruislaan 409, P.O.Box 41882, 1009 DB Amsterdam, The Netherlands In short, those who have worked at the lowest level on a Transputer have found it difficult to move to anything else. We've been bewildered by its simplicity and elegance. INMOS did more than a few things right with this one... Andy Pfiffer ps: for those not receiving comp.sys.transputer, you are invited to join our transputer mailing list (transputer-request@tcgould.tn.cornell.edu). Apologies to those 45 or so stacked up reuests to join -- its pretty rough around here as we get ready for release 1.0 of Trillium... -- Andy Pfiffer andy@tcgould.tn.cornell.edu Cornell Theory Center / Cornell U. cornell!batcomputer!andy Home of the first usable T-Series (607) 255-8686 "...that's the way a Transputer works, right?" Systems Group