Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!portal!cup.portal.com!bcase From: bcase@cup.portal.com (Brian bcase Case) Newsgroups: comp.arch Subject: Re: How to use silicon (was Re: Intel/MIPS Dhrystone ratio) Message-ID: <16156@cup.portal.com> Date: 23 Mar 89 19:25:04 GMT References: <37196@bbn.COM> <1989Mar16.190043.23227@utzoo.uucp> <24889@amdcad.AMD.COM> <355@bnr-fos.UUCP> <27600@apple.Apple.COM> <16080@cup.portal.com> <27711@apple.Apple.COM> Organization: The Portal System (TM) Lines: 38 >If auto-increment is frequent enough, then it can be done in addition to >executing *any two* operations at once. The leverage really hits you- a >10 inst. loop, including a couple of atuo-incs. shrinks to 5 if you can >average two instructions/cycle. At very little cost in hardware (I assert >this as a hardware design type), maybe this shrinks to 4 insts., a 20% >saving. Try to get 20% some other way- it's real tough! Your mileage may >vary, of course. Allen, you got me. (This is quite fair since I said the same thing, "try getting 20% some other way," about something I felt strongly about in an internal report when Allen and I worked for the same company!) You are quite right, if it really does make 20% difference. However, "your mileage may vary" is the right caveat: can that loop really be executed at 2 inst. per cycle if some of the parallelism is taken away by adding the autoinc? I don't know the answer, I am just in violent agreement with you (and John Mashey): you must simulate and measure and think, and then be able to predict the future :-). >I did have something in mind for that hardware. I dispute the signifcant cost >issue- it is roughly equivalent to register scoreboarding logic, and if you >have that, the additional cost is small (again, I assert this in my capacity >as a hardware design type that has gone through the exercise). I didn't >conjecture that it might be used for something else, I know it can, and I >know the kind of speedup it will give me, as well as the extra cost to use >it for that something else. This is an exercise for the reader- Part A: what >can an extra write port to a register file be used for (and what other hardware >is required to make it useful)? Part B: Now, suppose this extra write port can >be a read/write port? Oh, if you already have the answer, what it will also speed up, then I stand corrected. Maybe I am thinking about a different set of implementation trade-offs. What is the answer to your exercise? Does if have anything to do with loads/stores? I don't mean to say that autoincrement is *absolutely* wrong, I don't know all the possible implications for every architectural-cross-implementation approach. But without proof that it is good, I tend to be skeptical. (I guess you can tell! ;-) :-). Will/can you say how it fits in and why it is very good to have?