Path: utzoo!utgpu!water!watmath!clyde!rutgers!ucla-cs!zen!ucbcad!pasteur!ucbvax!ucsfcgl!pixar!unicom!physh From: physh@unicom.UUCP (Jon 'Quality in - Quantity out' Foreman) Newsgroups: comp.arch Subject: Performance increase - a suggestion Summary: increase bandwidth ?= increase performance? Keywords: bandwidth datapath 128 Message-ID: <235@unicom.UUCP> Date: 14 Jan 88 12:55:15 GMT Reply-To: physh@unicom.UUCP Organization: Halcon Co. et al., via College of Marin in California. Lines: 59 I really don't think the real world really needs anything more expansive than a 32 bit processor to get most jobs done. Given that, (even for argument sake only) here are my ideas of a possible way to increase performance for relatively little cost. It seems to me that one way to get a further increase in performance without increasing clock speeds and thus memory and other chip costs, would be to leave the processor at 32 bits and increase the external data path width to say 128 bits. Then it may be possible to get at least one instruction per fetch. Since most computers fetch (mostly) one instruction per operation this would seem to translate into a 3 times plus increase in performance. (I'm probably wrong here, but in which direction I have no idea.) Data, as seen by the processor, would still look like 32 bits, but with a single (and almost free) external data word cache, you could also increase the performance of sequential data read operations (say a sequential table lookup) because your one 32 bit fetch would net you 4 32 bit words which you can use later. This would probably not work as well when writing to memory, because of DMA, etc. One additional advantage of such an approach would be that you could increase the clock rate on the cpu and still have lots of memory bandwidth. Implementation of DMA devices may be tricky, and it may be a help to have the DMA devices view the memory array as 8 or 16 bits wide. Also, I/O instructions (like on the 8086 family) would have to be worked out somehow. There is one other apparent bonus, you could fetch a IEEE temporary real number in one fetch, but compared to compute time, this may not really be all that wonderful. There is at least one obvious downside, however. Say one chooses a 128 bit data path, one would have to supply at least that number of pins on the chip to handle it. With control signals and all you could end up needing 200 pins on the chip. One could also argue that to layout such a thing on a circuit board would be a real drag, but I tend not to get too excited about this, mostly because of the reality of multilayer circuit boards. -------------- Ok, gang. I don't have the resources or know how to implement such a thing, so in the interrests of learning, please feel free to pick me to pieces. I really hope this is a good idea, because I am already feeling that my 16 MHz '286 box is too slow, and I really can't afford a machine that uses 20 nS. ram. :-) Has anyone experimented with this, or has implemented it? I at least would like to know. Jon Foreman -- ucbvax!pixar!\ | For small letters | ~~~~~~~\~~~ That's spelled hoptoad!well!unicom!physh | ( < 10K ) only: | Jon }() "physh" and ptsfa!/ | physh@ssbn.wlk.com | / pronounced "fish".