Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!utcsri!greg
From: greg@utcsri.UUCP
Newsgroups: comp.arch,comp.lang.c
Subject: Re: String Handling ( really fixed-length copy ).
Message-ID: <4558@utcsri.UUCP>
Date: Sun, 12-Apr-87 12:44:11 EST
Article-I.D.: utcsri.4558
Posted: Sun Apr 12 12:44:11 1987
Date-Received: Sun, 12-Apr-87 17:35:27 EST
References: <15292@amdcad.UUCP> <7897@utzoo.UUCP>
Reply-To: greg@utcsri.UUCP (Gregory Smith)
Organization: CSRI, University of Toronto
Lines: 27
Xref: utgpu comp.arch:837 comp.lang.c:1563
Summary: fixed-size block copy hack.

This string op-stuff gave me an idea. A run-time library could contain
a function called 'mov200words' looking like this :

mov200words:	mov	(a0)+,(a1)+
		mov	(a0)+,(a1)+
		.....	200 mov's in all
		mov	(a0)+,(a1)+
		rts

Then, if, say, a 64-word struct needed to be copied, the compiler would get
the pointers and then call mov200words+(200-64)*2 [ or whatever ] to do the
copy. This would provide unrolled-loop speed with only one loop unrolled in
the whole executable. [ Call it more than once for >200 words ].  Presumably
this would be faster than a loop on a PDP-11 or a 68000, but might lose on a
machine with an instruction cache, that could run a copy loop on-chip. A wizzo
block copy instruction may or may not run faster than the unrolled loop.

The only advantage I am claiming over other unrolled-loop techniques is the
almost complete lack of anything but payoff move operations in the above,
whilst avoiding large amounts of code whenever a copy is done.

Of course, this must have been done before :-)

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...