Path: utzoo!attcan!uunet!zaphod.mps.ohio-state.edu!usc!ucla-cs!fiji.cs.ucla.edu!marc From: marc@fiji.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: load delays on SPARC Message-ID: <1990Nov13.012526.3415@cs.ucla.edu> Date: 13 Nov 90 01:25:26 GMT References: <1990Nov12.232129.21399@Neon.Stanford.EDU> Sender: news@cs.ucla.edu (Mr. News) Organization: UCLA Computer Science Department Lines: 32 Nntp-Posting-Host: fiji.cs.ucla.edu In article <1990Nov12.232129.21399@Neon.Stanford.EDU> hoelzle@Neon.Stanford.EDU (Urs Hoelzle) writes: >OK, here's something I've been wondering about for a long time: on the >SPARC, loads take *always* at least 2 cycles, and stores take 3 (in >all the Suns I know of). Even if the load's result isn't used by the >next instruction. Why??? Remember that SPARC offers the flexibility of base register plus index register memory addressing. This means that at one point in the pipeline *two* registers must be accessed just to generate the address. The register that contains the data must also be accessed which means that a store requires *three* register accesses. The register file of early implementations of the SPARC architecture had only two read ports (per cycle). The address can be generated during one cycle and then the register containing the data can be accessed in the following cycle. The third cycle consists of actually storing the data off-chip (no on-chip cache), hence the long latency. For loads, early implementations of the SPARC (once again) had multiplexed address and data busses between instruction and data. This means that while the address of the load goes out and when the data comes back no instruction can be fetched. This results in a extra cycle of latency. These implementations "features" can be overcome by dedicating a bit more silicon and more pins to the processor chip. _________________________________________________ Marc Tremblay internet: marc@CS.UCLA.EDU UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc