Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!caip!clyde!burl!ulysses!mhuxr!mhuxt!houxm!hropus!ka From: ka@hropus.UUCP (Kenneth Almquist) Newsgroups: net.lang.c Subject: Generating code for the switch statement Message-ID: <610@hropus.UUCP> Date: Mon, 4-Aug-86 16:37:41 EDT Article-I.D.: hropus.610 Posted: Mon Aug 4 16:37:41 1986 Date-Received: Tue, 5-Aug-86 07:49:55 EDT References: <15093@ucbvax.BERKELEY.EDU> <2765@brl-smoke.ARPA> <15120@ucbvax.BERKELEY.EDU> Organization: Bell Labs, Holmdel, NJ Lines: 122 > As I understand it, a switch/case setup compiles exactly the same as > if (var == const1) {.....}; > else if (var == const2) {.....}; > else {default_action}; > anyway. A compiler should be able to do better than that. Generating good code for switches is one place where compiler writers can show they know something about searching. Using a branch table will produce faster and smaller code if the cases are consecutive integers. The only thing that may be non-obvious is the best way to test whether the expression being switched on is within range before accessing the branch table. A straightforward coding of this test requires two comparisons, but Ritchie's compiler for the PDP-11 generated only one comparison using the machine language equivalent of the following code: if ((unsigned)(value -= MIN) > (MAX - MIN) goto default; On the PDP-11, this generates one less instruction (or two less, if MIN is zero) than: if (value < MIN || value > MAX) goto default; (Learning hacks like these is one of the benefits of reading the compiler.) Since some people like to count starting at one, a possible enhancement would be to change MIN from 1 to 0 in this case by adding an extra entry to the beginning of the branch table. But real C programmer count from 0; wimps who count starting with 1 deserve to have their switch statements execute slower. :-) If the cases are not consecutive, the branch table must be filled with entries for the "missing" cases, making the table larger. The Ritchie compiler will generate a branch table if the number of entries in the branch table would be less than three times the number of cases. If the cases are mostly consecutive with a few outliers, a branch table could be augmented with a few specific tests for the outliers, but I don't know of any compiler that does this. There are two general methods for implementing tables which do not rely on the keys being nearly consecutive: hashing and the binary search. Hashing, which is what the Ritchie compiler uses, has the advantage that its performance is independent of the number of cases, just as with a branch table. The modulo instruction is used as the hash function; the C compiler tries a variety of values for the modulus and selects the best one which means that the time required to generate the hash table is proportional to the square of the number of cases. The UN*X VAX compiler, on the other hand, uses a binary search. A binary search executes in time proportional to the log of the number of cases. This makes it sound worse that the hash search; I presume someone discovered that most switch statements contained few enough statements that the binary search was superior. The binary search works best on a machine with a three way branch like that provided by the Fortran arithmentic IF statement. On a machine with condition codes like the VAX, each step of the binary search is implemented by a compare instruction followed by two branches on the condition codes. Several things may be done to improve the performance of the binary search. First, many machines execute a conditional branch instruction faster when no branch is taken, so that rather than testing the middle value in the table one can test a value somewhat to one side in order to decrease the chance that a branch will be taken. This is implemented in the VAX compiler, but some other possible improvements are not. The binary search could be abandoned in favor of the linear search at some level (e. g. when the number of possibilities has been reduced to 2 or 3). If there are a series of consecutive cases, this can be used to eliminate extra tests. (For example, if a number is known to be less than 5 it cannot be greater than 4, and if a number is greater than 2 and less than 4 then it must be equal to 3, but the VAX compiler will code to test for both these cases.) Finally, each comparison generated by the VAX compiler looks like: cmpl r0,$7 # compare switch value with 7 beql L1 # if equal, branch to the code for case 7 bgtr L8 # if greater, branch to L8 for the next cmpl. # The next cmpl instruction goes here. Since the branch to L8 is more likely to be executed that the branch to L1, the bgtr instruction should come first to minimize the number of instructions executed. For small cases, the good old linear search is used by both the Ritchie and the VAX compilers. Several things are done to speed up the linear search. First, the switch expression is copied into a register. This helps in general but loses when the switch instruction is a register variable or the switch contains only one case. (This is hard to fix because the compilers generate the code for a switch in two pieces; first they generate code to place the value to be switched on in r0, and then they generate the code to implement the switch. Another problem with this separation appears on the PDP-11, where it may be easier to compute an expression in r1 than in r0 due to a brain damaged multiply instruction.) Second, the compiler implements the search as a series of compares followed by branch if equal instructions. This minimizes the time required to reach the default case because in the default case none of the branches will be taken. In contrast, the code at the top of this article will probably be compiled into a series of branch if not equal instructions, all of which will branch if there is no match. The Ritchie compiler at one time actually generated a table of values and labels, and did a linear search on it using a loop. This saves space but takes more time. Furthermore, it doesn't even save space if there are 1 or 2 cases. (The Ritchie compiler has special handling of switch statements with no cases except the default.) The Ritchie compiler was changed to generate linear searches using a series of compare instructions quite a long time ago, but I don't know if Dennis was responsible for the change. I don't know why the VAX compiler uses with a linear search at all, rather than using the binary search. Probably this was a holdover from the Ritchie compiler, which must be prepared to generate a linear search because a linear search will outperform a hash search on a few elements. In general, then, a compiler should do pretty well by using a jump table where possible and a binary search otherwise. What is actually optimum must be determined by measuring various approaches on a specific machine. Kenneth Almquist ihnp4!houxm!hropus!ka (official name) ihnp4!opus!ka (shorter path)