Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!indri!uakari.primate.wisc.edu!larry!jwp From: jwp@larry.sal.wisc.edu (Jeffrey W Percival) Newsgroups: comp.unix.questions Subject: lex/yacc questions from a novice... Keywords: lex yacc Message-ID: <711@larry.sal.wisc.edu> Date: 22 Aug 89 16:41:14 GMT Organization: Space Astronomy Lab, Madison WI Lines: 81 I am trying to use lex and yacc to help me read a dense, long, machine-produced listing of some crappy "special purpose" computer language. I have a listing of the "rules" (grammar?) governing the format of each line in the listing. I believe lex and yacc are the right tools, because the set of rules I have seem to match the spirit of the examples I read in the lex and yacc papers by Lesk and Schmidt (Lex) and Johnson (yacc). For example: digit: [0-9] integer: {DIGIT}+ and so on to the more complicated command definition: {command introducer} {statement}+ {command terminator} My first question is how one trades off work between lex and yacc. Should lex do more than just return characters? There are all sorts of keywords in my language that a lexical analyzer could recognize, and just return tokens for them. Along these lines, a problem I am having is getting the message "too many definitions" from lex, when all I have are a few keywords and ancillary definitions: (lex file included below for illustration). Is lex truly this limited in the number of definitions? Can I increase this limit? Or am I using lex for too much, and not using yacc for enough? SMSHDR "SMSHDR" ENDSMS "ENDSMS" CP224 "CP224" GROUP "GROUP" PRT "PRT" RTS "RTS" SAFING "SAFING" BEGINDATA "BEGINDATA" ENDDATA "ENDDATA" _IF "_IF" _ELSE "_ELSE" _ENDIF "_ENDIF" _MESSAGE "_MESSAGE" _SET "_SET" _DELETE "_DELETE" INCLUDE "INCLUDE" LETTER [A-Za-z] DIGIT [0-9] HEX_DIGIT [0-9A-F] OCT_DIGIT [0-7] BIN_DIGIT [0-1] SPECIAL [_%#@] STRING ({DIGIT}|{LETTER}|{SPECIAL})+ WORD {LETTER}({DIGIT}|{LETTER}|{SPECIAL})* OCT_MNEMONIC ("_"{STRING})|({WORD}) LABEL {STRING}":" LABEL_REF "'"{STRING}"'" TEXT_STRING "'"[ -~]"'" HEX_INT '{HEX_DIGIT}+'X OCT_INT '{OCT_DIGIT}+'O BIN_INT '{BIN_DIGIT}+'B U_INT {DIGIT}+ S_INT [+-]?{U_INT} U_REAL {U_INT}"."{U_INT} S_REAL [+-]?{U_REAL} FLOAT ({S_REAL}|{S_INT})([ED]{S_INT})? YY {U_INT}"Y" DD {U_INT}"D" HH {U_INT}"H" MM {U_INT}"M" SS ({U_INT}|{U_REAL})"S" REL_TIME [+-]?(({HH})?({MM})?({SS}))|(({HH})?({MM})({SS})?)|(({HH})({MM})?({SS})?) UTC_TIME {YY}?{DD}{REL_TIME} DEL_TIME ({U_INT}C)|({REL_TIME}) ORB_REL_TIME "ORB,"{U_INT}","{WORD}(","[+-]?{REL_TIME})? ORB_TIME "("{ORB_REL_TIME}")" MFS_TIME "("({UTC_TIME}|{ORB_REL_TIME})",MFSYNC"(","[+-]?{REL_TIME})?")" SOI_OFFSET [+-](({HEX_DIGIT}+"%X")|({U_INT})|({OCT_DIGIT}+"%O")) SOI "'"{WORD}({SOI_OFFSET})?"'"[ND] EOL "\n" %% -- Jeff Percival (jwp@larry.sal.wisc.edu)