Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!samsung!munnari.oz.au!bruce!labtam!scott From: scott@labtam.labtam.oz (Scott Colwell) Newsgroups: comp.arch Subject: more registers for ix86, was: Let's pretend Keywords: Intel, 586, windows Message-ID: <5827@labtam.labtam.oz> Date: 3 Jan 91 00:02:44 GMT References: <3042@crdos1.crd.ge.COM> <1990Dec26.020034.4131@lpi.liant.com> Organization: Labtam Australia, Melbourne, Australia Lines: 45 rcg@lpi.liant.com (Rick Gorton) writes: >> What features should be put into the CPU to improve performance and >>reduce chip count? >> >SOME REGISTERS!!!!! As has been pointed out, adding more registers to the ix86 would be almost impossible (well impractical anyway) due to the current instruction coding. This is not really the only way of addressing the problem, current compilers use stack frame variables after the scarce registers have been allocated. On the 486, reg to reg operations take one clock, cached memory to reg take two and reg to cached mem take three. By improving this performance and making some changes to the cache allocation scheme, cache can and does compensate for the lack of general purpose registers. It becomes interesting when you consider if the extra byte of offset from the start of the stack frame that is required in the instruction sequence is significant and if the other proposed scheme of another escape byte to allow access to other new GP registers is better. As microprocessor peformance continues to stretch memory bandwidth and latency, keeping the code density reasonably high will always be a worthy aim. (As long as the data bandwidth isn't stupidly high due to lack of registers :-) Some ideas on how to make the on chip cache work better for stack frames; Allocate on writes for stack accesses. The 486 allocates on reads only which means that an automatic variable is guaranteed to be a cache miss on the first read. The cpu knows a stack address from data and code since it uses a different segment reg. If this status is propagated as the code/data status is, the cache could alter its behaviour for stack. As the on chip cache becomes larger, it may be worth the effort to have a special cache for the stack. It would be acceptable for it to not snoop _if_ it was guaranteed to only cache things on the stack. Using virtual tags might be an acceptable trade-off if the miss penalty was not too high after a flush. Filled from the on chip data cache ? (someone will let me know if I have presumed too much here :-) -- Scott Colwell Senior Design Engineer Labtam Australia net: scott@labtam.oz.au Melbourne, Australia phone: +61-3-587-1444