Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!crdgw1!crdos1!davidsen From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) Newsgroups: comp.arch Subject: Re: RISC vs. CISC -- SPECmarks Message-ID: <3401@crdos1.crd.ge.COM> Date: 3 May 91 14:14:39 GMT References: <11412@mentor.cc.purdue.edu> <1991May2.171755.18612@riacs.edu> Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen) Organization: GE Corp R&D Center, Schenectady NY Lines: 26 In article <1991May2.171755.18612@riacs.edu> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: | I believe that vector instructions | would actually prove to be *much* easier to implement than a CPU with 20+ | pending loads and stores, issuing five new instructions per CPU cycle... Consider for a moment a smart cache which does prefetch under condition {X}. Perhaps as simple as prefetching the next row whenever the last word (defined as size of the current datafetch) is fetched from a row and the previous word has also been accessed. This requires an accessed flag as well as the usual dirty flag, but is not inherently something hard to do in the cache control. This works nicely for instruction fetches, too. This could be the next to last word if latency requires. Obviously there's a tradeoff between slowing the CPU and using memory bandwidth to fetch data which are not used. A "preload cache" instruction or bit to change the cache state to the above behavior are other possibilities. A vector unit is an interesting coprocessor for a system, and could be implemented to allow multiple units to be detected and used by the CPU. That would make parallel vector processing flexibly extensible. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"