Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!ux1.cso.uiuc.edu!usenet From: olson@sax.cs.uiuc.edu (Robert Olson) Newsgroups: comp.lang.c++ Subject: Perl script for generating C++ from lex/yacc Message-ID: <1990Aug21.164917.12361@ux1.cso.uiuc.edu> Date: 21 Aug 90 16:49:17 GMT Sender: usenet@ux1.cso.uiuc.edu (News) Organization: University of Illinois at Urbana Lines: 2006 Well, due to the large response I'm posting the scripts here. Realize that this is just a snapshot of my current work. I make no promises, though I am interested in what you think and if it works for you. If there is continued interest I may put the effort into making it more usable. Enjoy! #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # Gen # This archive created: Tue Aug 21 11:36:30 1990 export PATH; PATH=/bin:$PATH if test ! -d 'Gen' then echo shar: creating directory "'Gen'" mkdir 'Gen' fi echo shar: entering directory "'Gen'" cd 'Gen' if test ! -d 'src' then echo shar: creating directory "'src'" mkdir 'src' fi echo shar: entering directory "'src'" cd 'src' echo shar: extracting "'Makefile'" '(923 characters)' if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else cat << \SHAR_EOF > 'Makefile' # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/Makefile,v $ # $Log: Makefile,v $ # Revision 1.3 90/08/21 11:21:14 olson # added the legalese # # Revision 1.2 90/08/17 14:10:02 olson # fixed make clean # # Revision 1.1 90/08/17 14:04:31 olson # First rev # CC = gcc CFLAGS = all: tokenize tokens.pl TOKENIZE_OBJ = lextoken.o tokenize.o tokenize: $(TOKENIZE_OBJ) $(CC) $(CFLAGS) -o $@ $(TOKENIZE_OBJ) $(LIBS) tokenize.o: tokens.h lextoken.o: tokens.h .PRECIOUS: lextoken.c tokens.h tokens.pl: lextoken.l maketokens.pl lextoken.l clean: rm -f *.o *~ \#* tokenize tokens.pl tokens.h SHAR_EOF if test 923 -ne "`wc -c < 'Makefile'`" then echo shar: error transmitting "'Makefile'" '(should have been 923 characters)' fi fi # end of overwriting check echo shar: extracting "'lextoken.l'" '(2609 characters)' if test -f 'lextoken.l' then echo shar: will not over-write existing file "'lextoken.l'" else cat << \SHAR_EOF > 'lextoken.l' %Start COMMENT %{ /* * * Robert Olson, University of Illinois at Urbana * rolson@uiuc.edu * Copyright (c) 1990 by Robert Olson * All rights reserved * This program may not be sold, but may be distributed * provided this header is included. * * $Author: olson $ * $Revision: 1.2 $ * $State: Exp $ * $Locker: olson $ * $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/lextoken.l,v $ * $Log: lextoken.l,v $ * Revision 1.2 90/08/21 11:21:16 olson * added the legalese * * Revision 1.1 90/08/17 14:04:32 olson * First rev * * */ #include "tokens.h" %} delim [ \t\n] ws {delim}+ digit [0-9] expon ([Ee][+\-]?{digit}+) %% [ \t\n] {} \"[^"]*\" { return STRINGCONST; } ('[^']')|('\\[a-z']') { return CHARCONST; } "/*" { BEGIN(COMMENT); } "*/" { BEGIN(INITIAL); } .|\n { } static { return STATIC; } typedef { return TYPEDEF; } extern { return EXTERN; } struct { return STRUCT; } union { return UNION; } #.*$ { return DIRECTIVE; } "!=" { return BANGEQUAL; } "!" { return BANG; } "%" { return MOD; } "%=" { return MODEQUAL; } "&&" { return ANDAND; } "&=" { return ANDEQUAL; } "&" { return AND; } "(" { return LPAREN; } ")" { return RPAREN; } "*=" { return STAREQUAL; } "+=" { return PLUSEQUAL; } "+" { return PLUS; } "-=" { return MINUSEQUAL; } "->" { return POINTSTO; } "-" { return MINUS; } "." { return DOT; } "/" { return SLASH; } ":" { return COLON; } ";" { return SEMICOLON; } "<<=" { return LEFTSHIFTEQUAL; } "<<" { return LEFTSHIFT; } "<=" { return LESSEQUAL; } "<" { return LESSTHAN; } "=" { return EQUAL; } ">>=" { return RIGHTSHIFTEQUAL; } ">=" { return GREATEREQUAL; } ">>" { return RIGHTSHIFT; } ">" { return GREATERTHAN; } "?" { return QUESTIONMARK; } "[" { return LBRACKET; } "]" { return RBRACKET; } "^=" { return CARETEQUAL; } "," { return COMMA; } "^" { return CARET; } "{" { return LBRACE; } "|=" { return BAREQUAL; } "||" { return BARBAR; } "|" { return BAR; } "}" { return RBRACE; } [a-zA-Z_*$][a-zA-Z_*$0-9]+ { return WORD; } [+-]?(0|([1-9][0-9]*)) { return INTEGER; } [+-]?{digit}+(\.{digit}+)?{expon}? { return REAL; } . { return UNKNOWN; } %% SHAR_EOF if test 2609 -ne "`wc -c < 'lextoken.l'`" then echo shar: error transmitting "'lextoken.l'" '(should have been 2609 characters)' fi fi # end of overwriting check echo shar: extracting "'tokenize.pl'" '(1180 characters)' if test -f 'tokenize.pl' then echo shar: will not over-write existing file "'tokenize.pl'" else cat << \SHAR_EOF > 'tokenize.pl' # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/tokenize.pl,v $ # $Log: tokenize.pl,v $ # Revision 1.3 90/08/21 11:21:20 olson # added the legalese # # Revision 1.2 90/08/17 14:09:11 olson # fixed headers # # Revision 1.1 90/08/17 14:04:35 olson # First rev # # sub init_tokenizer { local($input_file) = @_; print "input file is $input_file dir is $perlprog_dir\n"; if (! -f $input_file) { die "Input file $input_file not found\n"; } open(INPUT, "$perlprog_directory/tokenize $input_file|") || die "Cannot open pipe: $!"; print "initialized tokenizer\n"; } sub next_token { local(*token, *val) = @_; local($end); $token = ; $end = !defined($token); if ($end) { print "got EOF\n"; return 0; } $val = ; chop $token; chop $val; return 1; } 1; SHAR_EOF if test 1180 -ne "`wc -c < 'tokenize.pl'`" then echo shar: error transmitting "'tokenize.pl'" '(should have been 1180 characters)' fi chmod +x 'tokenize.pl' fi # end of overwriting check echo shar: extracting "'tokenize.c'" '(1695 characters)' if test -f 'tokenize.c' then echo shar: will not over-write existing file "'tokenize.c'" else cat << \SHAR_EOF > 'tokenize.c' /* * Robert Olson, University of Illinois at Urbana * rolson@uiuc.edu * Copyright (c) 1990 by Robert Olson * All rights reserved * This program may not be sold, but may be distributed * provided this header is included. * * $Author: olson $ * $Revision: 1.3 $ * $State: Exp $ * $Locker: olson $ * $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/tokenize.c,v $ * $Log: tokenize.c,v $ * Revision 1.3 90/08/21 11:21:19 olson * added the legalese * * Revision 1.2 90/08/17 14:09:11 olson * fixed headers * * */ #include "tokens.h" #include extern int errno; extern char *sys_errlist[]; extern char yytext[]; extern FILE *yyin, *yyout; main(int argc, char **argv) { int rc; char *inputFile; FILE *fp; char *pgm = argv[0]; int withNames; argc--; argv++; withNames = 0; while (argc > 0) { if (**argv == '-') { if (strcmp(*argv, "-names") == 0) { withNames = 1; } else { fprintf(stderr, "Invalid flag %s. Usage: %s [-names] file.l\n", *argv, pgm); exit(1); } } else break; argc--; argv++; } if (argc > 0) { inputFile = argv[0]; fp = fopen(inputFile, "r"); if (fp == NULL) { fprintf(stderr, "Cannot open %s: %s\n", inputFile, sys_errlist[errno]); exit(1); } yyin = fp; } setbuf(yyout, 0); if (withNames) { while ((rc = yylex()) > 0) { printf("%s\n%s\n", tokenNames[rc], yytext); } } else { while ((rc = yylex()) > 0) { printf("%d\n%s\n", rc, yytext); } } } yywrap() { } SHAR_EOF if test 1695 -ne "`wc -c < 'tokenize.c'`" then echo shar: error transmitting "'tokenize.c'" '(should have been 1695 characters)' fi fi # end of overwriting check echo shar: extracting "'process.pl'" '(10764 characters)' if test -f 'process.pl' then echo shar: will not over-write existing file "'process.pl'" else cat << \SHAR_EOF > 'process.pl' # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/process.pl,v $ # $Log: process.pl,v $ # Revision 1.3 90/08/21 11:21:18 olson # added the legalese # # Revision 1.2 90/08/17 14:09:08 olson # fixed headers # # Revision 1.1 90/08/17 14:04:34 olson # First rev # # # $phfile_name=""; $hfile_name=""; $ccfile_name=""; $base_name = ""; %valid_types = ("int", 1, "short", 1, "float", 1, "double", 1, "char", 1, "FILE", 1, "YYSTYPE", 1); $n_variables = 0; %all_variable_index = (); @all_variable_names = (); @all_variable_types = (); @all_variable_inits = (); @all_variable_extern_flags = (); @all_variable_static_flags = (); @all_variable_typedef_flags = (); $n_functions = 0; %all_function_index = (); @all_function_names = (); @all_function_types = (); @all_function_args = (); @all_function_bodies = (); sub is_valid_type { local($type) = @_; local($flag); $flag = $valid_types{$type}; if (defined($flag) && $flag) { # print "type $type is valid\n"; return 1; } else { # print "type $type is invalid\n"; return 0; } } sub add_function { local($nargs) = 0 + @_; die "Wrong number of args ($nargs) to add_function" if $nargs != 4; local($name, $type, $args, $body) = @_; local($function_no) = $n_functions++; $all_function_index{$name} = $function_no; $all_function_names[$function_no] = $name; $all_function_types[$function_no] = $type; $all_function_args[$function_no] = $args; $all_function_bodies[$function_no] = $body; return $function_no; } sub add_variable { local($nargs) = 0 + @_; die "Wrong number of args ($nargs) to add_variable" if $nargs != 5; local($name, $type, $init, $eflag, $sflag, $tflag) = @_; local($variable_no); local($varname); ($varname) = ($name =~ /(\w+)/); # print "determined varname=\"$varname\"\n"; $variable_no = $n_variables++; if (defined($old = $all_variable_index{$varname})) { print "var $name already defined, adding new\n"; } $all_variable_index{$varname} = $variable_no; $all_variable_names[$variable_no] = $name; $all_variable_types[$variable_no] = $type; $all_variable_extern_flags[$variable_no] = $eflag; $all_variable_static_flags[$variable_no] = $sflag; $all_variable_typedef_flags[$variable_no] = $tflag; $all_variable_inits[$variable_no] = $init; return $variable_no; } sub init_process { ($base_name) = @_; print "init_process, base is $base_name\n"; $phfile_name = "$base_name-private.h"; $hfile_name = "$base_name.h"; $ccfile_name = "$base_name.cc"; open(PHFILE, ">$phfile_name") || die "Cannot open $phfile_name: $!\n"; open(HFILE, ">$hfile_name") || die "Cannot open $hfile_name: $!\n"; open(CCFILE, ">$ccfile_name") || die "Cannot open $ccfile_name: $!\n"; select(PHFILE); $| = 1; select(HFILE); $| = 1; select(CCFILE); $| = 1; select(STDOUT); $| = 1; print "calling header writing rtns\n"; do write_ccfile_header(); do write_hfile_header(); } sub finish_process { local($i); print "function info:\n"; for ($i = 0; $i < $n_functions; $i++) { printf("Function %d: name=%s type=%s args=%s\n", $i, $all_function_names[$i], $all_function_types[$i], $all_function_args[$i]); # print " Body: $all_function_bodies[$i]\n"; } print "variable info:\n"; for ($i = 0; $i < $n_variables; $i++) { printf("Variable %d: name=%s type=%s eflag=%d sflag=%d tflag=%d\n", $i, $all_variable_names[$i], $all_variable_types[$i], $all_variable_extern_flags[$i], $all_variable_static_flags[$i], $all_variable_typedef_flags[$i]); # print " Init: $all_variable_inits[$i]\n"; } print "index contents:\n"; for $key (keys(%all_variable_index)) { print "key=$key value=$all_variable_index{$key}\n"; } print "variable info from index:\n"; for $i (values(%all_variable_index)) { printf("Variable %d: name=%s type=%s eflag=%d sflag=%d tflag=%d\n", $i, $all_variable_names[$i], $all_variable_types[$i], $all_variable_extern_flags[$i], $all_variable_static_flags[$i], $all_variable_typedef_flags[$i]); # print " Init: $all_variable_inits[$i]\n"; } do write_variables(); do write_functions(); do write_hfile_trailer(); close(PHFILE); close(HFILE); close(CCFILE); } sub write_functions { local($key, $idx); local($name, $type, $args, $body); local($write_body); for $idx (0 .. @all_function_names - 1) { $name = $all_function_names[$idx]; $type = $all_function_types[$idx]; $args = $all_function_args[$idx]; $body = $all_function_bodies[$idx]; print "got func $idx name $name type $type\n"; print "args=\"$args\"\n"; # print "body=\"$body\"\n"; if ($body ne "") { $body =~ s/;/;\n/g; $body =~ s/^\s*#\s*(ifdef|elif|if)/# \1 /; if (($ptrstuff) = ($name =~ /([*&]+)/)) { local($newname) = $name; $newname =~ s/[*&]+//; print HFILE "\t$type $ptrstuff $newname $args;\n"; print CCFILE "$type $ptrstuff $base_name :: $newname $args\n"; } else { print HFILE "\t$type $name $args;\n"; print CCFILE "$type $base_name :: $name $args\n"; } print CCFILE "$body\n"; } else { # Assume its an extern decl. print HFILE "\textern $type $name $args;\n"; } } } sub write_variables { local($key); local($idx); local($name, $type, $init, $eflag, $sflag, $tflag); $constructor = "\tlexer = lex;\n" if ($output_mode eq "yacc"); $destructor = ""; for $key (keys %all_variable_index) { print "writing info for $key\n"; $idx = $all_variable_index{$key}; $name = $all_variable_names[$idx]; $type = $all_variable_types[$idx]; $init = $all_variable_inits[$idx]; $eflag = $all_variable_extern_flags[$idx]; $sflag = $all_variable_static_flags[$idx]; $tflag = $all_variable_typedef_flags[$idx]; if ($tflag) { next; } if ($init eq "") { print HFILE "\t$type $name;\n"; } elsif ($init =~ /std(in|out)/) { local($newname) = $name; local($newinit) = $init; $newinit =~ s/[{}]//g; $newname =~ s/\*//; print HFILE "\t$type $name;\n"; $constructor .= "$newname = $newinit;\n"; } elsif (($init =~ /{/) || ($init =~ /"/)) # } keep indenter happy { local($newname) = $name; $newname =~ s/\*//; print HFILE "\tstatic $type $name;\n"; print CCFILE "$type $base_name::$newname = $init;\n"; } else { local($newname) = $name; $newname =~ s/\*//; print HFILE "\t$type $name;\n"; $constructor .= "\t$newname = $init;\n"; } } if ($output_mode eq "yacc") { print CCFILE "$base_name::$base_name($other_class_name *lex)\n{\n"; } else { print CCFILE "$base_name::$base_name()\n{\n"; } print CCFILE "$constructor"; print CCFILE "}\n"; print CCFILE "$base_name::~$base_name()\n{\n"; print CCFILE "$destructor"; print CCFILE "}\n"; } sub write_ccfile_header { print "writing ccfile header\n"; print CCFILE "#include \"$hfile_name\"\n"; if ($other_class_name ne "") { print CCFILE "#include \"$other_class_name.h\"\n"; } print CCFILE "#define yylex lexer->yylex\n" if ($output_mode eq "yacc"); print CCFILE "#define yylval (parser->yylval)\n" if ($output_mode eq "lex"); print CCFILE "\n"; } sub write_hfile_header { local($optional_include) = ""; if ($other_class_name ne "") { $optional_include = "#include \"$other_class_name.h\"\nclass $other_class_name;\n"; } print "writing hfile header\n"; print HFILE < 'generate-class' #!/usr/local/bin/perl # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/generate-class,v $ # $Log: generate-class,v $ # Revision 1.3 90/08/21 11:21:15 olson # added the legalese # # Revision 1.2 90/08/17 14:09:05 olson # fixed headers # # Revision 1.1 90/08/17 14:04:32 olson # First rev # $class_name = "Foo"; $input_file = ""; do 'getopts.pl' || die "cant do getopts.pl"; do process_args(); do "$perlprog_directory/process.pl" || die "cant do process.pl: $!"; do "$perlprog_directory/parser-rtns.pl" || die "cant do parser-rtns.pl"; do "$perlprog_directory/tokens.pl" || die "cant do tokens.pl"; do "$perlprog_directory/tokenize.pl" || die "cant do tokenize.pl"; do initialize(); do init_process($class_name); do main(); do finish_process(); exit; sub main { local(@funcs) = (); local(@funcbodies) = (); local(@rest); local($line); local($extern_flag, $static_flag); local($token, $token_value) = ("", ""); # print "before loop\n"; while (&next_token(*token, *token_value)) { $extern_flag = 0; $static_flag = 0; $typedef_flag = 0; # print "in loop, token=$token val=$token_value\n"; while ($token == &TYPEDEF || $token == &EXTERN || $token == &STATIC) { if ($token == &TYPEDEF) { $typedef_flag = 1; } if ($token == &EXTERN) { $extern_flag = 1; } if ($token == &STATIC) { $static_flag = 1; } &next_token(*token, *token_value); } if ($token == &DIRECTIVE) { # Process a preprocessor directive do process_directive($token_value); } elsif ($token == &STRUCT || $token == &UNION) { # Here: struct | union . [ aggrname ] [ { aggr body } ] varlist ; # where varlist is as above local(@result); local(@vars, @inits, @funcargs, @funcbodies); local($which) = $token_value; local($aggrname, $aggrdef); ($token, $token_value) = &read_aggr_def(*aggrname, *aggrdef); # print "done read_aggr_def, token_value=$token_value\n"; if ($token != &SEMICOLON) { &read_decl($token, $token_value, *vars, *inits, *funcargs, *funcbodies); } local($typedef_done) = 0; if ($typedef_flag && $aggr_name eq "") { if (@vars != 1) { die "Malformed typedef\n"; } do add_anonstruct_typedef($which, $aggrdef, $vars[0]); $typedef_done = 1; } if ($aggrdef ne "" && !$typedef_flag) { do process_struct_decl($which, $aggrname, $aggrdef, $typedef_flag); } do process_simple_decl("$which $aggrname", *vars, *inits, *funcargs, *funcbodies, $extern_flag, $static_flag, $typedef_flag, $typedef_done); } elsif ($token == &WORD) { if (&is_valid_type($token_value)) { # Here: typename . varlist ; # varlist: varname [= initializer] ... # # Or, its a function declaration/definition # # typename . funcname ( args ) , funcname ( args ) ... ; # or # typename . funcname ( args ) { body } local(@vars, @inits); local(@funcargs, @funcbodies); $type = $token_value; # print "got type=$type\n"; &next_token(*token, *token_value); &read_decl($token, $token_value, *vars, *inits, *funcargs, *funcbodies); do process_simple_decl($type, *vars, *inits, *funcargs, *funcbodies, $extern_flag, $static_flag, $typedef_flag); } else { # Here I just assume that this is an implicitly int-defined # typename. # print "got word $token_value at toplevel\n"; local(@vars, @inits); local(@funcargs, @funcbodies); &read_decl($token, $token_value, *vars, *inits, *funcargs, *funcbodies); do process_simple_decl("int", *vars, *inits, *funcargs, *funcbodies, $extern_flag, $static_flag, $typedef_flag); } } else { &syntax_error("toplevel", $token, $token_value); } } } sub process_args { local($name); do Getopts('c:m:i:d:'); if ($opt_c ne "") { $class_name = $opt_c; } if ($opt_m eq "lex") { $output_mode = "lex"; } elsif ($opt_m eq "yacc") { $output_mode = "yacc"; } elsif ($opt_m ne "") { die "Unknown mode $opt_m: valid modes are \"lex\" and \"yacc\"\n"; } if ($opt_d eq "") { $perlprog_directory = "."; } else { $perlprog_directory = $opt_d; } die "Cannot access perl prog directory $perprog_directory: $!" if (! -d $perlprog_directory); if ($output_mode ne "") { if ($opt_i eq "") { die "for lex/yacc modes, must specify -i otherclassname\n"; } $other_class_name = $opt_i; } $nargs = @ARGV; if ($nargs == 0) { $input_file = ""; } elsif ($nargs == 1) { $input_file = $ARGV[0]; } else { die "Usage: $0 [-c name] [-m lex|yacc] [-d perlprog-directory] [file]\n"; } } sub initialize { do init_tokenizer($input_file); } SHAR_EOF if test 5230 -ne "`wc -c < 'generate-class'`" then echo shar: error transmitting "'generate-class'" '(should have been 5230 characters)' fi chmod +x 'generate-class' fi # end of overwriting check echo shar: extracting "'parser-rtns.pl'" '(5394 characters)' if test -f 'parser-rtns.pl' then echo shar: will not over-write existing file "'parser-rtns.pl'" else cat << \SHAR_EOF > 'parser-rtns.pl' # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/parser-rtns.pl,v $ # $Log: parser-rtns.pl,v $ # Revision 1.3 90/08/21 11:21:17 olson # added the legalese # # Revision 1.2 90/08/17 14:09:07 olson # fixed headers # # Revision 1.1 90/08/17 14:04:33 olson # First rev # # # # This parses # aggr_def: [aggrname] [ { aggr_body} ] [ varlist ]; # varlist: var | var, varlist # var: name [ = initializer ] # do 'tokens.pl' ; sub read_aggr_def { local(*aggr_name, *aggr_def) = @_; local($token, $token_value); local($brace_depth, $done); local($name, $init); &next_token(*token, *token_value); if ($token == &WORD) { $aggr_name = $token_value; &next_token(*token, *token_value); } if ($token == &LBRACE) { # Parse an aggr definition $aggr_def = $token_value; for ($brace_depth = 1; $brace_depth > 0; ) { &next_token(*token, *token_value); $brace_depth++ if ($token == &LBRACE); $brace_depth-- if ($token == &RBRACE); $aggr_def = &concat_tokens($aggr_def, $token_value); } &next_token(*token, *token_value); } print "token=$token val=$token_value\n"; print "got aggr name \"$aggr_name\"\n"; print "got aggr def \"$aggr_def\"\n"; return ($token, $token_value); } # Here: typename . varlist ; # varlist: varname [= initializer] ... # # Or, its a function declaration/definition # # typename . funcname ( args ) , funcname ( args ) ... ; # or # typename . funcname ( args ) { body } sub read_decl { local($token, $token_value, *vars, *inits, *funcargs, *funcbodies) = @_; local($done); local($args, $body); local($brace_depth); @vars = (); @inits = (); @funcnames = (); @funcargs = (); @funcbodies = (); print "read_decl: token=$token val=$token_value\n"; return if ($token == &SEMICOLON); if ($token != &WORD) { &syntax_error("read_decl", $token, $token_value); } for ($done = 0; !$done; ) { $name = $token_value; $init = ""; $body = ""; $args = ""; &next_token(*token, *token_value); if ($token == &LBRACKET) { # An array variable $name = &concat_tokens($name, $token_value); local($mydone) = 0; while (!$mydone) { for ($brace_depth = 1; $brace_depth > 0; ) { &next_token(*token, *token_value); $brace_depth++ if ($token == &LBRACKET); $brace_depth-- if ($token == &RBRACKET); $name = &concat_tokens($name, $token_value); } &next_token(*token, *token_value); $mydone = ($token != &LBRACKET); } } if ($token == &EQUAL) { # got an initializer, read until next comma or semicolon, # or until matching braces if there are any &next_token(*token, *token_value); if ($token == &LBRACE) { $init = $token_value; for ($brace_depth = 1; $brace_depth > 0; ) { &next_token(*token, *token_value); $brace_depth++ if ($token == &LBRACE); $brace_depth-- if ($token == &RBRACE); $init = &concat_tokens($init, $token_value); } &next_token(*token, *token_value); } else { while ($token != &COMMA && $token != &SEMICOLON) { $init = &concat_tokens($init, $token_value); &next_token(*token, *token_value); } } } elsif ($token == &LPAREN) { # A function declaration... $args = $token_value; for ($brace_depth = 1; $brace_depth > 0; ) { &next_token(*token, *token_value); $brace_depth++ if ($token == &LPAREN); $brace_depth-- if ($token == &RPAREN); $args = &concat_tokens($args, $token_value); } &next_token(*token, *token_value); if ($token == &LBRACE) { # A function definition $body = $token_value; for ($brace_depth = 1; $brace_depth > 0; ) { &next_token(*token, *token_value); $brace_depth++ if ($token == &LBRACE); $brace_depth-- if ($token == &RBRACE); $body = &concat_tokens($body, $token_value); } push(vars, $name); push(inits, $init); push(funcbodies, $body); push(funcargs, $args); # We return here, since you cant have more than one defn. return; } } print "got var name=\"$name\" init=\"$init\" \n"; print "got args=\"$args\" body=\"$body\"\n"; push(vars, $name); push(inits, $init); push(funcbodies, $body); push(funcargs, $args); if ($token == &COMMA) { &next_token(*token, *token_value); } else { $done = 1; } } print "Exiting from read_decl\n"; } sub syntax_error { local($where, $token, $token_value) = @_; local($name) = &get_token_name($token); die "Syntax error in $where token = $name val=$token_value at input line $.\n"; } sub concat_tokens { local($first, $next) = @_; local($lastchar, $firstchar); $lastchar = substr($first, -1, 1); $firstchar = substr($next, 0, 1); if ($firstchar eq "#") { return $first . "\n" . $next . "\n"; } elsif (($lastchar =~ /\w/) && ($firstchar =~ /\w/)) { return $first . " " . $next; } else { return $first . $next; } } 1; SHAR_EOF if test 5394 -ne "`wc -c < 'parser-rtns.pl'`" then echo shar: error transmitting "'parser-rtns.pl'" '(should have been 5394 characters)' fi fi # end of overwriting check echo shar: extracting "'maketokens.pl'" '(1768 characters)' if test -f 'maketokens.pl' then echo shar: will not over-write existing file "'maketokens.pl'" else cat << \SHAR_EOF > 'maketokens.pl' #!/usr/local/bin/perl # # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # $Author: olson $ # $Revision: 1.3 $ # $State: Exp $ # $Locker: olson $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/src/RCS/maketokens.pl,v $ # $Log: maketokens.pl,v $ # Revision 1.3 90/08/21 11:21:16 olson # added the legalese # # Revision 1.2 90/08/17 14:09:06 olson # fixed headers # # Revision 1.1 90/08/17 14:04:32 olson # First rev # die "Usage: $0 file\n" if (@ARGV != 1); $input_file = $ARGV[0]; open(INPUT, "<$input_file") || die "Cannot open input file $input_file: $!\n"; @names = (); open(HFILE, ">tokens.h") || die "cannot open tokens.h: $!\n"; open(PFILE, ">tokens.pl") || die "cannot open tokens.pl: $!\n"; print PFILE <) { if (($name) = /.*return\s*([A-Z]+).*$/) { push(names, $name); } } for ($i = 0; $i < @names; $i++) { $name = $names[$i]; print HFILE "#define $name $i\n"; print PFILE "sub $name { return $i; }\n"; } print HFILE "\n"; print HFILE "static char *tokenNames[] = {\n"; for ($i = 0; $i < @names; $i++) { $name = $names[$i]; print HFILE "\t\"$name\",\n"; print PFILE "\$token_names[$i] = \"$name\";\n"; } print HFILE "};\n"; print PFILE < 'parser.y' %{ /* * Robert Olson, University of Illinois at Urbana * rolson@uiuc.edu * Copyright (c) 1990 by Robert Olson * All rights reserved * This program may not be sold, but may be distributed * provided this header is included. * * $Author: olson $ * $Revision: 1.3 $ * $State: Exp $ * $Locker: $ * $Source: /home/reed/Pablo/Parser/cplus/Gen/Example/RCS/parser.y,v $ * $Log: parser.y,v $ * Revision 1.3 90/08/21 11:20:49 olson * Added the legalese * * Revision 1.2 90/08/17 13:52:36 olson * fixed headers * * Revision 1.1 90/08/17 13:50:59 olson * First revision * */ #include %} %token PLUS MINUS INTEGER SEMICOLON %% input: input_line | input_line input ; input_line: expr SEMICOLON { printf("got value %d\n", $1); } ; expr: plus_expr { process_plus($1); } | minus_expr { process_minus($1); } | num ; plus_expr: num PLUS expr { $$ = $1 + $3; } ; minus_expr: num MINUS expr { $$ = $1 - $3; } ; num: INTEGER { $$ = $1; } ; %% process_plus(int v) { printf("plus got %d\n", v); } process_minus(int v) { printf("minus got %d\n", v); } yywrap() { } yyerror(char *s) { fprintf(stderr, "Error %s\n", s); } SHAR_EOF if test 1199 -ne "`wc -c < 'parser.y'`" then echo shar: error transmitting "'parser.y'" '(should have been 1199 characters)' fi fi # end of overwriting check echo shar: extracting "'lex.l'" '(764 characters)' if test -f 'lex.l' then echo shar: will not over-write existing file "'lex.l'" else cat << \SHAR_EOF > 'lex.l' %{ /* * Robert Olson, University of Illinois at Urbana * rolson@uiuc.edu * Copyright (c) 1990 by Robert Olson * All rights reserved * This program may not be sold, but may be distributed * provided this header is included. * * $Author: olson $ * $Revision: 1.3 $ * $State: Exp $ * $Locker: $ * $Source: /home/reed/Pablo/Parser/cplus/Gen/Example/RCS/lex.l,v $ * $Log: lex.l,v $ * Revision 1.3 90/08/21 11:20:46 olson * Added the legalese * * Revision 1.2 90/08/17 13:52:29 olson * fixed headers * * Revision 1.1 90/08/17 13:50:58 olson * First revision * */ #include "parser.h" extern int yylval; %} %% ";" { return SEMICOLON; } "+" { return PLUS; } "-" { return MINUS; } [0-9]+ { yylval = atoi(yytext); return INTEGER; } SHAR_EOF if test 764 -ne "`wc -c < 'lex.l'`" then echo shar: error transmitting "'lex.l'" '(should have been 764 characters)' fi fi # end of overwriting check echo shar: extracting "'Makefile'" '(1756 characters)' if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else cat << \SHAR_EOF > 'Makefile' # # Robert Olson, University of Illinois at Urbana # rolson@uiuc.edu # Copyright (c) 1990 by Robert Olson # All rights reserved # This program may not be sold, but may be distributed # provided this header is included. # # # $Author: olson $ # $Revision: 1.4 $ # $State: Exp $ # $Locker: $ # $Source: /home/reed/Pablo/Parser/cplus/Gen/Example/RCS/Makefile,v $ # $Log: Makefile,v $ # Revision 1.4 90/08/21 11:34:58 olson # fixed GENERATE defn # # Revision 1.3 90/08/21 11:20:48 olson # Added the legalese # # Revision 1.2 90/08/17 13:54:53 olson # fixed make clean # # Revision 1.1 90/08/17 13:50:59 olson # First revision # # GENERATE = ../src/generate-class -d ../src PROTOIZE = protoize C++ = g++ C++FLAGS = YACC = yacc YFLAGS = -d LEX = lex LFLAGS = PARSER = Parser LEXER = Lexer all: example $(LEXER).cc $(LEXER).h: lex.prot $(GENERATE) -m lex -i $(PARSER) -c $(LEXER) lex.prot $(PARSER).cc $(PARSER).h: parser.prot $(GENERATE) -m yacc -i $(LEXER) -c $(PARSER) parser.prot EXAMPLE_OBJ = main.o Parser.o Lexer.o .PRECIOUS: lex.c parser.c example: $(EXAMPLE_OBJ) $(C++) $(C++FLAGS) -o $@ $(EXAMPLE_OBJ) $(LIBS) #PARSER_H = parser.h main.o: Parser.h Lexer.h $(PARSER_H) Lexer.o: Lexer.h Parser.h $(PARSER_H) Parser.o: Parser.h Lexer.h $(PARSER_H) lex.prot: $(PARSER_H) parser.h: parser.c clean: rm -f a.out core *.o lex.c parser.c *.prot rm -f example y.tab.c y.tab.h *~ \#* *.save rm -f Lexer.cc Lexer.h Lexer-private.h rm -f Parser.cc Parser.h Parser-private.h .SUFFIXES: .prot .cc .cc.o: $(C++) $(C++FLAGS) -c $< .y.c: $(YACC) $(YFLAGS) $< -mv y.tab.h $*.h -sed '/#[ ]*line.*\.y/d' y.tab.c > $*.c .l.c: $(LEX) $(LFLAGS) -t $< > $@ .c.prot: $(PROTOIZE) -c "$(CFLAGS)" $< cp $< $@ SHAR_EOF if test 1756 -ne "`wc -c < 'Makefile'`" then echo shar: error transmitting "'Makefile'" '(should have been 1756 characters)' fi fi # end of overwriting check echo shar: extracting "'main.cc'" '(852 characters)' if test -f 'main.cc' then echo shar: will not over-write existing file "'main.cc'" else cat << \SHAR_EOF > 'main.cc' /* * Robert Olson, University of Illinois at Urbana * rolson@uiuc.edu * Copyright (c) 1990 by Robert Olson * All rights reserved * This program may not be sold, but may be distributed * provided this header is included. * * $Author: olson $ * $Revision: 1.3 $ * $State: Exp $ * $Locker: $ * $Source: /home/reed/Pablo/Parser/cplus/Gen/Example/RCS/main.cc,v $ * $Log: main.cc,v $ * Revision 1.3 90/08/21 11:20:47 olson * Added the legalese * * Revision 1.2 90/08/17 13:52:39 olson * fixed headers * * Revision 1.1 90/08/17 13:50:59 olson * First revision * */ #include "Parser.h" #include "Lexer.h" Parser *parser; Lexer *lexer; main() { int rc; lexer = new Lexer; parser = new Parser(lexer); lexer->setParser(parser); rc = parser->yyparse(); return rc; } int yywrap() { return parser->yywrap(); } SHAR_EOF if test 852 -ne "`wc -c < 'main.cc'`" then echo shar: error transmitting "'main.cc'" '(should have been 852 characters)' fi fi # end of overwriting check echo shar: done with directory "'Example'" cd .. echo shar: extracting "'README'" '(1673 characters)' if test -f 'README' then echo shar: will not over-write existing file "'README'" else cat << \SHAR_EOF > 'README' This is generate-class, a perl script for generating C++ classes from lex and yacc output . It requires the perl program (written by Larry Wall, and available from tut.cis.ohio-state.edu and your local comp.sources.unix archives) and protoize. Protoize is a utility distributed with the Gnu C compiler which converts K&R type declarations to ANSI declarations. It also uses lex. To install generate-class, go to the src subdirectory and do a make. This builds the lex tokenizer that the script uses (I originally used a perl tokenizer, but went to lex because it was more powerful and easier to modify). There is an example lex/yacc parser in Example that you can look at. The arguments to generate-class are: -c name Sets the class name of the generated code -m mode Sets the mode for this run. Mode is either "lex" or "yacc" -- the generated classes have some dependencies on how they are going to be used, namely the lexer wants to have a pointer to the parser and vice versa. -i name The class name of the related object (eg the Lexer class will want to be given the name of the parser object). -d pathname The directory in which the auxilliary perl scripts reside. The script currently generates a lot of output -- this distribution is just a snapshot of my code. This is currently the only documentation as well, mainly since I just wrote this for my own use and thought others might find it useful. At some point I plan to write up a little more about it, but I don't have the time now and want to get it out since there is quite a bit of interest. Let me know what you think of the scripts. --bob 8/21/90 rolson@uiuc.edu SHAR_EOF if test 1673 -ne "`wc -c < 'README'`" then echo shar: error transmitting "'README'" '(should have been 1673 characters)' fi fi # end of overwriting check echo shar: extracting "'AboutPerl'" '(1976 characters)' if test -f 'AboutPerl' then echo shar: will not over-write existing file "'AboutPerl'" else cat << \SHAR_EOF > 'AboutPerl' I have been asked what perl is. Larry Wall says it better than I in the manpage for perl: Perl is an interpreted language optimized for scanning arbi- trary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, effi- cient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds quite closely to C expression syntax. Unlike most Unix utilities, perl does not arbitrarily limit the size of your data--if you've got the memory, perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the hash tables used by associative arrays grow as necessary to prevent degraded performance. Perl uses sophisticated pattern matching techniques to scan large amounts of data very quickly. Although optimized for scanning text, perl can also deal with binary data, and can make dbm files look like associative arrays (where dbm is available). Setuid perl scripts are safer than C programs through a dataflow tracing mechanism which prevents many stupid security holes. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capa- bilities or must run a little faster, and you don't want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts. OK, enough hype. SHAR_EOF if test 1976 -ne "`wc -c < 'AboutPerl'`" then echo shar: error transmitting "'AboutPerl'" '(should have been 1976 characters)' fi fi # end of overwriting check echo shar: done with directory "'Gen'" cd .. # End of shell archive exit 0 -- Bob Olson University of Illinois at Urbana/Champaign Internet: rolson@uiuc.edu UUCP: {uunet|convex|pur-ee}!uiucdcs!olson "You can't win a game of chess with an action figure!" AMA #522687 DoD #28 <>