Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: henry@utzoo.UUCP (Henry Spencer) Newsgroups: comp.arch,comp.lang.c Subject: Re: String Handling -- Incompetence of run-time libraries Message-ID: <7862@utzoo.UUCP> Date: Fri, 3-Apr-87 12:23:20 EST Article-I.D.: utzoo.7862 Posted: Fri Apr 3 12:23:20 1987 Date-Received: Fri, 3-Apr-87 12:23:20 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <1530@husc6.UUCP>, <15915@sun.uucp> Organization: U of Toronto Zoology Lines: 19 Keywords: instruction set architectures, strcpy > Unrolling the loop, as you did, might be a bigger win, especially on > the 68020 where even the unrolled loop would probably fit in the > instruction cache. Mmmm, not necessarily. I had to study this sort of thing in another context recently. When the number of iterations is short -- remembering that strcpy is often used for quite short strings -- an unrolled loop can be slower than a simple one. The extra instruction fetches on the first iteration of the unrolled loop can cost more than the extra loop control in the simple loop. (When the number of iterations is long, there is an optimal degree of unrolling. For simple cache designs, the curve of "time taken for a long copy" against the log of the unrolling factor is beautifully symmetrical, with a definite and specific minimum. I haven't analyzed the situation for more complex caches, although I conjecture vaguely similar results.) -- "We must choose: the stars or Henry Spencer @ U of Toronto Zoology the dust. Which shall it be?" {allegra,ihnp4,decvax,pyramid}!utzoo!henry