Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!ames!jaw
From: jaw@ames.UUCP
Newsgroups: comp.arch,comp.lang.c
Subject: Re: String Processing Instruction
Message-ID: <1001@ames.UUCP>
Date: Thu, 26-Mar-87 21:06:22 EST
Article-I.D.: ames.1001
Posted: Thu Mar 26 21:06:22 1987
Date-Received: Sat, 28-Mar-87 07:17:26 EST
References: <15292@amdcad.UUCP>
Organization: NASA Ames Research Center, Moffett Field, CA
Lines: 32
Xref: utgpu comp.arch:671 comp.lang.c:1345

> [....] Since
> most C programs (especially utilities and other systems programs) do a lot of
> string processing, this one instruction is really worth the small
> implementation cost.  It often improves run times by 15% to 20% (just goes to
> show that the impact of processing C language strings has been long- ignored).

just curious which unix utilities use str(cpy|cmp|len) in their inner loops?
certainly, 'vn' comes to mind as devoting much cpu time to these functions.

is the 15-20% claimed due at all to either

	(a) significantly slow byte addressing to begin with (ala cray)?
	(b) in-line compilation of the string(3) stuff into the application?

if (a), then improving memory byte access speed in the architecture is a
more general solution with more payoff overall than the compare gate hack.
what is the risc chip cost for byte vs. word addressibility, anyway?

if (b), then maybe function call speed is the culprit rather than dearth of
the specialized instruction.

at any rate,
for cray unix, buffer copy ops in the kernel were vastly improved when
re-written for words instead of bytes, even more so when vectorized
(the only place in the kernel with vectorization, i think).

of course, table lookup using only 2^16 locations would be a joke
software solution for super-intensive null-char-in-16-bit-smallword
compare code.  drastic, but saves a test the amd chip appears worried about.
personally, i'm a fan of branch folding ...

ames!jaw