Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!crdgw1!crdos1!davidsen
From: davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr)
Newsgroups: comp.arch
Subject: Re: RISC vs. CISC -- SPECmarks
Message-ID: <3401@crdos1.crd.ge.COM>
Date: 3 May 91 14:14:39 GMT
References: <TH_A6-F@xds13.ferranti.com> <11412@mentor.cc.purdue.edu> <MCCALPIN.91May2095930@pereland.cms.udel.edu> <1991May2.171755.18612@riacs.edu>
Reply-To: davidsen@crdos1.crd.ge.com (bill davidsen)
Organization: GE Corp R&D Center, Schenectady NY
Lines: 26

In article <1991May2.171755.18612@riacs.edu> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:

|                                       I believe that vector instructions
| would actually prove to be *much* easier to implement than a CPU with 20+
| pending loads and stores, issuing five new instructions per CPU cycle...

  Consider for a moment a smart cache which does prefetch under
condition {X}. Perhaps as simple as prefetching the next row whenever
the last word (defined as size of the current datafetch) is fetched from
a row and the previous word has also been accessed. This requires an
accessed flag as well as the usual dirty flag, but is not inherently
something hard to do in the cache control. This works nicely for
instruction fetches, too. This could be the next to last word if latency
requires. Obviously there's a tradeoff between slowing the CPU and using
memory bandwidth to fetch data which are not used.

  A "preload cache" instruction or bit to change the cache state to the
above behavior are other possibilities.

  A vector unit is an interesting coprocessor for a system, and could be
implemented to allow multiple units to be detected and used by the CPU.
That would make parallel vector processing flexibly extensible.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"