Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!lll-lcc!pyramid!prls!mips!mash From: mash@mips.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Re: String Processing Instruction Message-ID: <232@winchester.mips.UUCP> Date: Fri, 27-Mar-87 03:54:29 EST Article-I.D.: winchest.232 Posted: Fri Mar 27 03:54:29 1987 Date-Received: Sat, 28-Mar-87 11:52:07 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15304@amdcad.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Distribution: na Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 96 Keywords: instruction set architectures, Am29000 Xref: utgpu comp.arch:676 comp.lang.c:1350 In article <15304@amdcad.UUCP> bcase@amdcad.UUCP (Brian Case) writes: >... >for maybe 40bytes/5Megabytes or .008% of all code (obvioulsy a real rough >guess). But if it improves the running time of all the string-oriented >system utilties (i.e. almost all utilties!!) by 15% to 20%, it seems >worth it. And the implementation cost was so small. Also, there are This is clearly true. >some instructions that must be present just to administer the system, >like return from interrupt, move-from-special-register, etc. These >are not generated by a compiler either. Just to reiterate a point: RISC >is not reducing the instruction set but is improving performance. Absolutely! There are almost always "structural" instructions that are hard to generate, but are needed. > >Ok, so you don't believe the above? How about "It improved dhrystone >a hell of lot." Unfortunately, Dhrystone is an artificial benchmark. I couldn't resist doing a quick test [we have some pretty neat profiling tools for taking an already-compiled program & turning it into one that does procedure & statement counts, then gives you instruction cycle counts [which would be, typically 60% of the total cycles, the other 40% being cache misses, TLB-miss overhead, memory path interference, etc.] OK: here's a quiz: I did some simple tests [ones we use in our standard Performance Brief suite, which includes diff, grep yacc, and nroff], plus csh. I won't pretend this is definitive, since I just whipped it up. How much time would you guess these programs spend doing strlen, strcmp. strcpy [the high-runners]? [Just guess the instruction cycle %]. Try 0-1%, 2-5%, 6-10%, 11-20%, 21%-up. ANSWERS: program strlen strcmp strcpy % cycs cy/call % cycs cy/call % cycs cy/call diff .03% - 0 - 0 - grep none yacc .04% - .59% - 0 - nroff 0 - 0 - <0.1% - csh 1.71% 20 1.27% 9 1.84% 21 % of total func calls 3.76% 6.11% 3.75% Dhrystone <.01% 19 16.94% 103 22.36% 136 Bottom-line: 1) Dhrystone isn't remotely representative of some common string-pushing programs in UNIX. 2) most of these, the total is <1% of instruction cycles, hence <0.6% of fully-degraded cycles. Maybe you can save 20% of this, or about .1%. 3) For csh: what's going on is that these routines are called very frequently, but for short strings: 3-6 characters; strcmp's obviously fail quickly [2nd or 3rd character]. I think the implication is that maybe you can get rid of 20% of the cycles, which would be a 1% instruction cycle saving, or about <0.6% full-degraded cycle saving for csh. 4) Given all of this, maybe what you get can be grossly estimated as about .3%, maybe. [Again, this was somethign whipped up in half an hour, so hardly definitive]. 5) Note that Dhrystone spends a huge lot of its time copying and comparing long strings. Hence, it's well worth a little extra setup time for Dhrystone to lessen the cost per loop. [In fact, we only got around to installing an assembler strcpy when we noticed how much time was spent there in Dhrystone.] Thus, I'd say it's still an open issue [for those of you who happen to be designing computer architectures at this instant!] Dhrystone: says it's important, but is totally unrepresentative. csh: says it might get you .6% others: surprise you by using str* so little general evidence: says it might be useful, but if you follow our 1% rule [and HP used something similar, as I recall], there is as yet insufficient evidence to include it. If I were right now designing a system, I'd be tempted to do a lot of data-gathering. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086