Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!henry
From: henry@utzoo.UUCP (Henry Spencer)
Newsgroups: comp.arch,comp.lang.c
Subject: Re: String Handling -- Incompetence of run-time libraries
Message-ID: <7862@utzoo.UUCP>
Date: Fri, 3-Apr-87 12:23:20 EST
Article-I.D.: utzoo.7862
Posted: Fri Apr  3 12:23:20 1987
Date-Received: Fri, 3-Apr-87 12:23:20 EST
References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <1530@husc6.UUCP>, <15915@sun.uucp>
Organization: U of Toronto Zoology
Lines: 19
Keywords: instruction set architectures, strcpy

> Unrolling the loop, as you did, might be a bigger win, especially on
> the 68020 where even the unrolled loop would probably fit in the
> instruction cache.

Mmmm, not necessarily.  I had to study this sort of thing in another
context recently.  When the number of iterations is short -- remembering
that strcpy is often used for quite short strings -- an unrolled loop can
be slower than a simple one.  The extra instruction fetches on the first
iteration of the unrolled loop can cost more than the extra loop control
in the simple loop.

(When the number of iterations is long, there is an optimal degree of
unrolling.  For simple cache designs, the curve of "time taken for a long
copy" against the log of the unrolling factor is beautifully symmetrical,
with a definite and specific minimum.  I haven't analyzed the situation
for more complex caches, although I conjecture vaguely similar results.)
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry