Path: utzoo!utgpu!bnr-vpa!bnr-rsc!mark From: mark@bnr-rsc.UUCP (Mark MacLean) Newsgroups: comp.arch Subject: Re: Explanation, please! Message-ID: <760@bnr-rsc.UUCP> Date: 6 Sep 88 16:30:43 GMT References: <638@paris.ics.uci.edu> <189@bales.UUCP> <10329@tekecs.TEK.COM> Reply-To: mark@bnr-rsc.UUCP (Mark MacLean) Organization: Bell-Northern Research, Ottawa, Canada Lines: 27 In article <10329@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes: >Several contributors have suggested that unrolling a byte-copy loop is >a win. On some architectures it is, but on a good pipelined system it >may not be. As an example, the program fragment > > while (count--) { > to[i] = from[i]; > ++i; > } > >can be compiled to code on the M88k which copies memory as fast as a >DMA controller could; the instructions to decrement, increment, and >branch overlap with the data load/store requests. If you have instructions to load, store, increment, decrement, and branch in your loop then presumably this takes at least 5 clocks to perform only 2 memory accesses. Would'nt a DMA controller be able to perform a memory access every clock cycle? Is it not possible to unroll the loop into an inline stream of instructions (if the length is small and known at compile time) to produce an instruction sequence which could perform a memory access every cycle? If not, why not? It would seem very un-RISCy if the 88000 was unable to do this. Mark MacLean