Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!bbn!rochester!cornell!uw-beaver!mit-eddie!rutgers!sri-spam!mordor!lll-lcc!ames!ucbcad!ucbvax!AI.AI.MIT.EDU!JBVB From: JBVB@AI.AI.MIT.EDU ("James B. VanBokkelen") Newsgroups: comp.protocols.tcp-ip Subject: How to unroll a loop ('D' if you know) Message-ID: <265600.871007.JBVB@AI.AI.MIT.EDU> Date: Wed, 7-Oct-87 01:41:51 EDT Article-I.D.: AI.265600.871007.JBVB Posted: Wed Oct 7 01:41:51 1987 Date-Received: Sat, 10-Oct-87 07:15:01 EDT Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 38 I got asked several times, so I'll reply to everyone. The easy-to-understand way is: ... remainder = count % UNROLLING_FACTOR; count = (count/UNROLLING_FACTOR) + 1; /* no fenceposts for me */ switch (remainder) { case 0: count--; /* optimize */ goto top; case 1: goto rem_1; ... /* up to UNROLLING_FACTOR - 1 */ } top: do_it; ... /* UNROLLING_FACTOR times in all */ rem_2: do_it; rem_1: do_it; if (count--) goto top; ... The way it is usually done (because speed freaks are rarely willing to put up with any real or supposed inefficiency in their compilers, and are coding the critical parts in assembler) is by figuring out how large each occurrence of "do_it" is in bytes, and multiplying that by the remainder, and either adding that to the program counter (if you can), or using it as an index in a JUMP instruction of some sort. If you have a CASE instruction, make sure it is really more efficient than coding it out. HLL users can take a look at how their compilers do 'case' or 'computed goto' - maybe it isn't losing much time. The smaller the loop test's elapsed time is relative to the time necessary to "do_it", the less unrolling buys you. If I coded it wrong, feel free to punish me publicly, but it probably doesn't belong on tcp-ip. If you are horrified by gotos, please don't tell me... jbvb