Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!mailrus!rutgers!iuvax!pur-ee!uiucdcs!uiucdcsb!kenny From: kenny@uiucdcsb.cs.uiuc.edu Newsgroups: comp.lang.c Subject: Re: Put your code... (was Re: gotos Message-ID: <165600039@uiucdcsb> Date: 25 Apr 88 23:23:00 GMT References: <2597@ttrdc.UUCP> Lines: 309 Nf-ID: #R:ttrdc.UUCP:2597:uiucdcsb:165600039:000:8092 Nf-From: uiucdcsb.cs.uiuc.edu!kenny Apr 25 18:23:00 1988 [Part 1: Double-exit loops and double decisions.] Before the advent of GO TO-less programming, many programmers recognized the advantages of straightforward control flow, and in fact, coded programs that created structured control structures using GO TO statements. With the advent of structured programming, these programs could be converted to use the Boehm-Jacopini [Boeh66] control structures; the conversion was usually straightforward. There was a common case, though, that defied conversion without either duplicating code or introducing extraneous variables. This case was where the result of executing conditional code was in turn used in the control condition of another conditional or looping context. In a study of Fortran programs, which addressed data flow analysis rather than control structures, Farrow, Kennedy and Zucconi [Farr76] identified two new control structures that address most of these cases. I propose looking at their control structures in the context of structuring `C' programs. They define their control structures in terms of program flow graphs; I will use GO TO programs to show the sort of analysis that can be done when a program is viewed in this light. Furthermore, to preserve the graph-theoretic nature of the arguments, I will sometimes insert needless GO TO statements to avoid considering the effects of `fallthrough.' If readers are unfamiliar with the fundamental techniques whereby flow graphs that already conform to Bohm-Jacopini control structures can be reduced to the corresponding control structures, I refer them to Appendix A at the end of this article. 1.1 The double decision. EXAMPLES 1-3 all use a control flow that cannot be reduced to the Boehm-Jacopini control-flow primitives without either introducing auxiliary variables or copying parts of the code. This limitation to the Boehm-Jacopini structures was first pointed out by Ashcroft and Manna [Ashc71], several years before Knuth's paper. In their analysis of programs, Farrow, Kennedy and Zucconi were able to introduce an additional control structure into their graphs; the new structure handles the Ashcroft-Manna counterexample, and in addition can handle EXAMPLES 1-3. The new structure is the _double decision_, corresponding to the program fragment: if (COND1) { action1; goto A; } action2; if (COND2) { action3; goto A; } action4; goto B; There is a trivial way to address this problem, introducing an auxiliary variable. We convert the program to the intermediate representation: if (COND1) { action1; flag = TRUE; } else { action2; if (COND2) { action3; flag = TRUE; } else { action4; flag = FALSE; } } if (flag) goto A; else goto B; and then reduce the concluding _if_ statement according to the techniques in Appendix A. I contend, without proof in this paper, that disciplined use of the _break_ statement can eliminate the auxiliary variable in all cases where (a) action1 and action3 are both null, or (b) action4 is null. I have a set of translation rules that will convert programs containing double decisions to conventional control structures; if there is sufficient interest, I may consider writing a paper on this. 1.2 The double-exit loop. A few programs (none of Knuth's fall into this category) cannot be reduced successfully with the double decision, but can by introducing another concept from Farrow, Kennedy, and Zucconi, the _double-exit loop._ The double-exit loop looks like the following: A: action1; if (COND1) { action2; goto B; } action3; if (COND2) { action4; goto C; } action5; goto A; Once again, there is a trivial translation to conventional `C' if we introduce an auxiliary Boolean variable: A: for (;;) { action1; if (COND1) { action2; flag = TRUE; break; } action3; if (COND2) { action4; flag = FALSE; break; } action5; } if (flag) goto B; else goto C; Once again, there are important special cases with null actions that allow the auxiliary variable to be eliminated by disciplined use of _break_ and _continue_. 1.3 Application to the examples. All of Knuth's examples use the double decision in a way that can avoid the use of auxiliary Booleans. Translated to idiomatic C (i.e., using 0-based indexing and the _for_ statement), Knuth's Example 1, with goto's, looks like: /* 1 */ for (i = 0; i < m; ++i) if (A[i] == x) goto found; /* not found */ i = m++; A[i] = x; B[i] = 0; found: ++B[i]; Knuth proposes, in Example 1a, eliminating the _goto_ with something like: /* 1a */ for (i = 0; i <= m && A[i] != x; ++i) /* empty loop body */ ; if (i > m) { i = m++; A [i] = x; B [i] = 1; } else ++B[i]; In this example, he points out that the test for (i <= m) in the inner loop is superfluous with proper data structures, and comes up with: /* 2 */ A[m] = x; for (i = 0; A[i] != x; ++i) /* empty loop body */ ; if (i > m) { i = m++; A [i] = x; B [i] = 1; } else ++B[i]; This last form is close to the way I'd code a linear search. It also has no _goto_s. EXAMPLE 3 does not admit gracefully of such restructuring. The example, with GO TO's, is /* 3 */ i = hash (x); while (A[i] != 0) { if (A[i] == x) goto found; if (--i < 0) i = m - 1; } A [i] = x; B [i] = 0; found: ++B[i]; Following Knuth's lead in interchanging the checks on A[i] (EXAMPLE 3B), we arrive at: /* 3b */ i = hash (x); while (A[i] != x) { if (A[i] == 0) goto not_found; if (--i < 0) i = m - 1; } goto found; not_found: A[i] = x; B[i] = 0; found: ++B[i]; But here we note that the `not_found' case can be hoisted into the lexical scope of the `while': i = hash (x); while (A[i] != x) { if (A[i] == 0) { A[i] = x; B[i] = 0; break; } if (--i < 0) i = m - 1; } ++ B[i]; This is close to the way I'd express it in `C'. I would, however, make a _for_ statement out of the first two lines; the _for_ statement would have an empty third part. Testing of this version (on a VAX, using 4.3BSD, and compiling all versions of the program with -O) shows that it's actually faster than any of Knuth's examples with the _goto_ statements, so in this case, at least, I contend that the arguments that structured programming degrades performance are specious. ------------------------------------------------------------------------ Appendix A. Reduction of flow graphs of already-structured programs. A.1 Sequencing. A program that contains a GOTO A, where the GOTO is the only way to reach label A, can have the GOTO eliminated in a trivial way, by reordering the code so that the statements at A immediately follow the GOTO. A.2 Conditionals. A program whose general structure is if (COND) goto A; else goto B; .... A: action1; goto C; .... B: action2; goto C; where there is no other way to reach the code contained in action1 and action2, can be replaced with if (COND) { action1; } else { action2; } A.3 Loops. The sequence of code: A: action1; if (COND) goto B; else goto C; .... B: action2; goto A; .... C: is a generalized one-entry one-exit loop. It can be converted mechanically to: A: for (;;) { action1; if (COND) break; action2; } There are two important special cases of this construct. If action1 is null, we can write A: while (!(COND)) { action2; } and if action2 is null, we can write A: do { action1; while (!(COND)); } Moreover, if action2 can be expressed as a single statement in C, we have the construct A: for (; !(COND); action2) { action1; } and, of course, straight-line code preceding A can be merged with the _for_ statement if desired. Which of these equivalent constructions is to be preferred depends on the context. These three reductions, applied successively, can be used to convert any flowgraph of a program that is already structured to the corresponding C program statements. Note that these reductions handle the common case of a `loop n and a half times,' where a loop exits from the middle; they avoid both copying code and the use of the GO TO by a disciplined use of the _break_ statement.