Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!mintaka!olivea!decwrl!fernwood!portal!cup.portal.com!ts
From: ts@cup.portal.com (Tim W Smith)
Newsgroups: comp.arch
Subject: Re: Loop instructions
Message-ID: <41612@cup.portal.com>
Date: 24 Apr 91 10:01:33 GMT
References: <1991Apr16.152438.3445@waikato.ac.nz> <12739@pt.cs.cmu.edu>
  <1991Apr21.210031.16749@leland.Stanford.EDU> <12330@dog.ee.lbl.gov>
Organization: The Portal System (TM)
Lines: 20

Chris Torek says:
> However, it turns out that on the 68020 it is almost invariably faster
> to avoid DBcc anyway (bcopy, for instance, should be unrolled).  Score

If you unroll too far, don't you start to miss on the instruction
cache?  The optimum seems to be unrolled enough to lower loop overhead
but rolled enough to fit the loop in the cache.

I tried to calculate the proper amount of unrolling and came up with
about 11 move.l instructions per dbra.  This at first seemed somewhat
low, but it seems that larger amounts of unrolling, while still being
able to fit in the cache, can lose because the first time through
the loop there are more cache misses.

I don't really trust my calculations that much, but measurements
of a 1K copy routine I had to write for an application on my Mac II
showed that the best 2^N unrollings were 8 and 16, which tends to
support the calculation of 11.

					Tim Smith