Relay-Version: version B 2.10.2 9/18/84; site lsuc.UUCP Posting-Version: version B 2.10.1 6/24/83; site nrcaero.UUCP Path: lsuc!pesnta!nrcaero!carl From: carl@nrcaero.UUCP (Carl P. Swail) Newsgroups: pe.cust.general Subject: UNIX Clinic Message-ID: <252@nrcaero.UUCP> Date: 18 Feb 85 14:54:49 GMT Date-Received: 18 Feb 85 20:29:00 GMT Reply-To: carl@nrcaero.UUCP (Carl P. Swail) Distribution: pe.cust Organization: NRCC-Aeroacoustics, Ottawa, Ontario Lines: 801 _U_N_I_X _c_l_i_n_i_c This column first appears in the German quarterly _u_n_i_x/_m_a_i_l (Hanser Verlag, Munich, Germany). It is copy- righted: copyright 1984 by Axel T. Schreiner, Ulm, West Ger- many. It may be reproduced as long as the copyright notice is included and reference is made to the original publica- tion. The column attempts to discuss typical approaches to problem solving using the UNIX* system. It emphasizes what the author considers to be good programming pratices and appropriate choice of tools. /_l_i_b/_c_p_p This quarter's column deals with uses and abuses of the C preprocessor. We demonstrate some techniques which can save a lot of work (and even more errors). The discussion applies to programming in C in general, and it assumes only very elementary prerequisites: C programs are run through a preprocessor _b_e_f_o_r_e they are handed to the actual compiler. The preprocessor performs (parametrized) text substi- tution (#define), inserts _h_e_a_d_e_r _f_i_l_e_s (#include), and can exclude parts of the source from compila- tion (#if). Since the preprocessor is independent of the actual compiler - and does not know C at all - one can use it in particular to extend the C language. Only one's taste limits one's imagination here... _E_x_c_l_u_d_i_n_g _t_e_x_t Every programmer presumably writes occasional comments. Sometimes we comment quite intentionally to exclude program parts from a compilation. Since in Standard C comments may not be nested, there is considerable temptation not to com- ment such excluded program parts any more. The following technique for text exclusion is much more __________________________ *UNIX is a Trademark of Bell Laboratories. February 18, 1985 - 2 - appropriate: #ifdef not_defined crash_the_system(NOW); /* this definitely goes wrong */ #endif not_defined Of course, the name _n_o_t__d_e_f_i_n_e_d should really not be defined... _V_e_c_t_o_r _d_i_m_e_n_s_i_o_n_s In principle one can determine the size of a vector using the _s_i_z_e_o_f operator. However, _s_i_z_e_o_f yields the size in bytes, not in elements. The following macro determines the number of elements in an arbitrary vector: #define DIM(x) (sizeof (x) / sizeof ((x)[0])) _s_i_z_e_o_f does not really need parentheses, if it is used to determine the size of an object and not of a data type. One should, however, enclose macro parameters in parentheses. Then things work out for a vector with more than one dimension, too: main() { struct { int a; char b } v[10][20][30]; printf("%d %d %d\n", DIM(v), DIM(v[1]), DIM(v[1][2])); } The program produces the values _1_0, _2_0 and _3_0. Parentheses should not be necessary in this use of _s_i_z_e_o_f since a vector subscript should have precedence over _s_i_z_e_o_f. At least my copy of the Mark Williams CP/M-86 C compiler does not seem to know this... We can carry these ideas somewhat further. The last element of a vector is #define LAST(x) ((x)[DIM(x)-1]) and the customary _f_o_r loop is for example #define END(x) ((x) + DIM(x)-1) int vector[10], * vp; ... for (vp = vector; vp <= END(vector); ++ vp) ... February 18, 1985 - 3 - _s_i_z_e_o_f is evaluated by the compiler during constant expressions. This can be used to determine the length of constant strings in an efficient and flexible fashion: #define STRLEN(s) (sizeof s - 1) char buf[STRLEN("model") + 1]; ... strcpy(buf, "model"); There is the danger, however, that _S_T_R_L_E_N is used for other objects, i.e., non-strings, by mistaking it for _s_t_r_l_e_n... _T_r_a_c_e It is well known that a _m_a_c_r_o _c_a_l_l is not recognized in a constant string. Less well known, but more useful, is perhaps that a _m_a_c_r_o _p_a_r_a_m_e_t_e_r is recognized and replaced within the replacement text of a macro definition. Rather than printf("variable = %d\n", variable); printf("formula = %f\n", formula); we write #define SHOW(val,fmt) fprintf(stderr,"SHOW: val = fmt\n",val) SHOW(variable, %d); SHOW(formula, %f); The latter is easier to use and conveys more information since _v_a_l is replaced in the format by the entire macro argument. A bit of caution is required: if the % operator is used within _v_a_l there will be problems with the format. This can be corrected as follows: #define SHOW(val,fmt) fprintf(stderr,"%s = fmt\n", "val",val) A macro can be defined without a replacement text. Uses of _S_H_O_W can thus easily be eliminated from the compiled program altogether. Alternatively we can specify a condi- tion: February 18, 1985 - 4 - #ifdef DEBUG char debugflag; # define SHOW(val,fmt) (debugflag && fprintf(...)) #else ! DEBUG # define SHOW(val,fmt) /* null */ #endif DEBUG In this example _S_H_O_W is always used as a statement and not as an expression. Using && rather than _i_f has two advantages: this way we do not _h_a_v_e to use _S_H_O_W as a state- ment, and a use of _S_H_O_W does not invite an unintentional _e_l_s_e... _d_e_b_u_g_f_l_a_g, by the way, should be used as a bit vector, e.g.: #define SHOW(level,val,fmt) (debugflag & 1<= sizeof(char *)); while (cp = (char *) yylex()) printf("%-.10s is \"%s\"\n",cp,yytext); } # define token(x) (int) "x" #else ! TRACE # include "y.tab.h" # define token(x) x #endif TRACE %} Normally _T_R_A_C_E is undefined and the _t_o_k_e_n_s, i.e., the values which are to be returned to the parser, are defined in the file _y._t_a_b._h generated by _y_a_c_c as: #define NAME 257 ... These defined names are used directly in the source presented to _l_e_x and are returned as a result of the func- tion _y_y_l_e_x(). If _T_R_A_C_E is defined, _y._t_a_b._h need not yet exist. In this case, i.e., in the debugging version, we want to return a string as a result of _y_y_l_e_x() which is then printed by the _m_a_i_n() program included in this case. Analyzing the __________________________ _t_o _C_o_m_p_i_l_e_r _C_o_n_s_t_r_u_c_t_i_o_n by A. T. Schreiner and H. G. Friedman Jr., to be published in January 1985 by Prentice-Hall. The technique requires that a pointer to a character string can be returned in place of an _i_n_t value. This February 18, 1985 - 7 - debugging output is most easily accomplished if the output uses exactly those words which later will appear in _y._t_a_b._h, i.e., which are a result of %_t_o_k_e_n statements in the source presented to _y_a_c_c. We are using the fact that macro parameters are replaced within strings in the replacement text of a macro. token(_x) either returns _x itself (to be passed on to _y_a_c_c), or a string "_x" for the purposes of _T_R_A_C_E. The remainder of the _l_e_x program is now quite obvious: %% [0-9]+ return token(NUMBER); [a-z_A-Z][a-z_A-Z0-9]* return word(); [ \t\n]+ ; . return token(yytext[0]); %% struct reserved { char * text; int yylex; } reserved[] = { { "begin", token(BEGIN) }, { "end", token(END) }, (char *) 0 }; int word() { struct reserved * rp; for (rp = reserved; rp->text; ++ rp) if (strcmp(yytext, rp->text) == 0) return rp->yylex; return token(NAME); } Yes - there should have been a binary chopped search, but we are dealing only with the principles... /_u_s_r/_s_r_c/_m_a_i_n._c _A_r_g_u_m_e_n_t _s_t_a_n_d_a_r_d_s Command arguments are always good for surprises. Some- times several options may be combined into one argument; sometimes each option must be a separate argument; sometimes a parameter value follows as part of the argument; sometimes it does not; all of the above; some of the above... ? __________________________ is not possible across all implementations of C, e.g., it is probably not allowed on the 7300 systems. We guard against a portability problem using _a_s_s_e_r_t(). February 18, 1985 - 8 - If one consults the sources of certain UNIX utilities, one learns to appreciate the flexibility of C (or the infin- ite patience of the C compiler?): everybody does his own thing, and most do it differently in every program! How- ever, it would be so simple to develop a standard: #include #define show(x) printf("x = %d\n", x) #define USAGE fputs("cmd [-f] [-v #]\n", stderr), exit(1) main(argc, argv) int argc; char ** argv; { int f = 0, v = 0; while (--argc > 0 && **++argv == '-') { switch (*++*argv) { case 0: /* - */ --*argv; break; case '-': if (! (*argv)[1]) /* -- */ { ++ argv, -- argc; break; } default: do { switch (**argv) { case 'f': /* -f */ ++ f; continue; case 'v': if (*++*argv) ; /* -v# */ else if (--argc > 0) ++argv; /* -v # */ else break; v = atoi(*argv); *argv += strlen(*argv)-1; continue; } USAGE; } while (*++*argv); continue; } break; } show(f), show(v), show(argc); if (argc) puts(*argv); } February 18, 1985 - 9 - At _s_h_o_w() _a_r_g_c contains the number of arguments which have not yet been processed and *_a_r_g_v is the first one of these. This argument can be a single - character - in some ancient (_c_a_t) and almost new (_t_a_r) utilities this indicates that standard input or output is to be used in place of a file argument. Flags can be combined at will. If an option requires a value, it can follow immediately (and then as rest of the argument) or it can be an argument of its own. Following a standard proposed in the "USENIX login" an option -- serves to terminate processing of the option list. Apart from that options must start with - and they must pre- cede other arguments. These rules, however, still do not cover all possibilities of _p_r... The skeleton above is useful but anatomically somewhat terrifying. The following incarnation is perhaps more attractive: #include #include "main.h" #define show(x) printf("x = %d\n", x) #define USAGE fputs("cmd [-f] [-v #]\n", stderr), exit(1) MAIN { int f = 0, v = 0; OPT ARG 'f': ++ f; ARG 'v': PARM v = atoi(*argv); NEXTOPT OTHER USAGE; ENDOPT show(f), show(v), show(argc); if (argc) puts(*argv); } The trick of course is concealed in the header file _m_a_i_n._h: here the macros _O_P_T, _A_R_G, _P_A_R_M, _N_E_X_T_O_P_T, _O_T_H_E_R, and _E_N_D_O_P_T must be defined using exactly those texts which were given explicitly in the previous example: February 18, 1985 - 10 - #define MAIN main(argc, argv) \ int argc; \ char ** argv; #define OPT while (--argc > 0 && **++argv == '-') \ { switch (*++*argv) { \ case 0: \ --*argv; \ break; \ case '-': \ if (! (*argv)[1]) \ { ++ argv, -- argc; \ break; \ } \ default: \ do \ { switch (**argv) { #define ARG continue; \ case #define OTHER continue; \ } #define ENDOPT } while (*++*argv); \ continue; \ } \ break; \ } #define PARM if (*++*argv); \ else if (--argc > 0)++argv; else break; #define NEXTOPT *argv += strlen(*argv)-1; The definitions are not exactly beautiful - especially if they need to be compacted so that the C preprocessor accepts the lengthy replacement texts - but they need to be developed only once to make the argument standard available for all applications. An application then is almost self- documenting: MAIN is the function header of the main program. OPT starts the loop during which the options are pro- cessed. ENDOPTcompletes this loop. ARG within the loop starts the processing of one option; the name of the option (a single charac- ter) enclosed in single quotes and a colon must follow. PARM follows the option specification if the option has a value parameter. The parameter itself is then available as *_a_r_g_v. February 18, 1985 - 11 - NEXTOPTis used in particular once such a parameter has been processed to advance to the next command argument. OTHERmust follow all options; following this, one specifies what should be done if an option could not be recognized. _N_E_X_T_O_P_T may be specified in this case, too. The unknown option itself is **_a_r_g_v. After the _O_P_T _E_N_D_O_P_T loop _a_r_g_c contains the number of command arguments which have not yet been processed and *_a_r_g_v is the first such argument. Arbitrarily many (dif- ferent) options _A_R_G can be specified. _p_r would be imple- mented approximately as follows: February 18, 1985 - 12 - MAIN { do { OPT ARG 'h': PARM header = *argv; NEXTOPT ARG 'w': PARM width = atoi(*argv); NEXTOPT ARG 'l': PARM length = atoi(*argv); NEXTOPT ARG 't': tflag = 1; ARG 's': PARM delimeter = **argv; ++*argv; NEXTOPT ARG 'm': mflag = 1; OTHER if (isdigit(**argv)) columns = atoi(*argv), NEXTOPT else USAGE, exit(1); ENDOPT if (argc) { if (**argv == '+') { PARM first_page = atoi(*argv); continue; } dopr(*argv); } else dopr("-"); } while (argc > 1); } There is a blemish: -_c_o_l_u_m_n_s must be specified as a _s_i_n_g_l_e argument (since - alone refers to standard input). February 18, 1985 -- Carl Swail Mail: National Research Council of Canada Building U-66, Montreal Road Ottawa, Ontario, Canada K1A 0R6 Phone: (613) 998-3408 USENET: {cornell,uw-beaver}!utcsrgv!dciem!nrcaero!carl {allegra,decvax,duke,floyd,ihnp4,linus}!utzoo!dciem!nrcaero!carl