Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!mintaka!olivea!decwrl!fernwood!portal!cup.portal.com!ts From: ts@cup.portal.com (Tim W Smith) Newsgroups: comp.arch Subject: Re: Loop instructions Message-ID: <41612@cup.portal.com> Date: 24 Apr 91 10:01:33 GMT References: <1991Apr16.152438.3445@waikato.ac.nz> <12739@pt.cs.cmu.edu> <1991Apr21.210031.16749@leland.Stanford.EDU> <12330@dog.ee.lbl.gov> Organization: The Portal System (TM) Lines: 20 Chris Torek says: > However, it turns out that on the 68020 it is almost invariably faster > to avoid DBcc anyway (bcopy, for instance, should be unrolled). Score If you unroll too far, don't you start to miss on the instruction cache? The optimum seems to be unrolled enough to lower loop overhead but rolled enough to fit the loop in the cache. I tried to calculate the proper amount of unrolling and came up with about 11 move.l instructions per dbra. This at first seemed somewhat low, but it seems that larger amounts of unrolling, while still being able to fit in the cache, can lose because the first time through the loop there are more cache misses. I don't really trust my calculations that much, but measurements of a 1K copy routine I had to write for an application on my Mac II showed that the best 2^N unrollings were 8 and 16, which tends to support the calculation of 11. Tim Smith