Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!sri-spam!ames!oliveb!pyramid!prls!mips!earl
From: earl@mips.UUCP
Newsgroups: comp.arch
Subject: Re: 32-bit CPUs ( NEC V70 ) and silly examples
Message-ID: <407@gumby.UUCP>
Date: Wed, 20-May-87 12:19:22 EDT
Article-I.D.: gumby.407
Posted: Wed May 20 12:19:22 1987
Date-Received: Fri, 22-May-87 00:45:59 EDT
References: <3810030@nucsrl.UUCP> <491@necis.UUCP> <3530@spool.WISC.EDU> <3962@cae780.TEK.COM>
Distribution: na
Lines: 25
Keywords: V60, V70, not so silly examples
Summary: but how fast is the instruction?

I agree with Scott Daniels and Ross Alexander that a->b->c and such
are definitely not silly examples.  I write such constructs
frequently.  But that does not necessarily mean it is a good idea to
add an instruction to implement them.  Perhaps someone with a data
sheet can post the cycle count for these instructions so we can
compare.

An R2000 will do a load of or a store to a->b->c in 2 - 4 cycles
depending on how well the load delays are scheduled (we typically
schedule 75% of these so say 2.5 cycles).  a->b->c->d in 3 - 6 (3.75).
I'm assuming a is in a register, which with the MIPS compiler is a
fairly safe assumption.

The ability to schedule the load delays is an excellant reason NOT to
provide such an addressing mode.  If you implement the mode, you'll
just find your microcode waiting all the time.  If you generate
separate instructions and let the compiler schedule them, then most of
the time you won't wait at all.

Note that I'm assuming that hardware can't take the output of the
cache, do an add to get the new address, perhaps translate it, and
feed it back to the cache in a single cycle.  If it took a single
cycle, I'd say the cycle time were artificially slow.  The R2000 takes
two cycles to do this, so loads have a delay of one cycle before the
result is usable.