Xref: utzoo comp.unix.wizards:7559 comp.lang.c:8875 Path: utzoo!mnetor!uunet!husc6!umb!rouilj From: rouilj@umb.umb.edu (John P. Rouillard) Newsgroups: comp.unix.wizards,comp.lang.c Subject: Re: command line options Message-ID: <625@umb.umb.edu> Date: 4 Apr 88 19:55:57 GMT References: <2414@zyx.UUCP> <738@srs.UUCP> <26423@cca.CCA.COM> <761@srs.UUCP> <26550@cca.CCA.COM> <6546@bellcore.bellcore.com> Reply-To: rouilj@umb.UUCP (John P. Rouillard) Organization: Dept of Math and CS, UMass Boston. Lines: 174 The folowing structure allows a generic function to parse any cconcevable command line. The structure would have the form: struct command_entry { struct command_entry * next, /* for a linked list of these babies */ char *NAME, /* the full name of the option */ char *ABBREV, /* the shortest abbreviation for the option */ char *ARG_TYPE, /* the type of argument (string, char, int, float ...) char *format_type, /* Keyword = value, +keyword, -keyword ... */ type *VARIABLE_addr, /* the address of a variable to set */ enum v_type VAR_type, /* the type of the variable above */ int **FUNCTION_addr(), /* address of a function returning a pointer to int */ enum f_type FUNCT_type, /* the type the function actually returns */ int *Error_handler(), /* Your own personal error handler */ add you favorite options here }; a possible entry would be: (from the command make (augmented for show ) { NAME "file", ABBREV "f", ARG_TYPE string, /* ie char * */ format_type "-w" /* specifing "-"f and w signifies space between keyword and value */ VARIABLE_addr &makefile_name, VAR_type String, FUNCTION_addr NULL, /* not function needed */ FUNCT_TYPE NULL /* the type of a nonexistant function */ } This structure would allow: a: A long name that would be able to be abbreviated to the value in ABBREV. b: Handling multi character flags without values (I.E. "-las" in "ls -las") Simply loop over each character and set the appropriate flag. c: Whitespace elimination (I.E. -Kvalue) is easily done the value up to the next whitespace character is scanned according to its type. d: The setting of a variable to an argument value or if a function is specified the setting of the variable to the pointer value returned by the function. (The variable at the VARIABLE_addr is interpreted according to the value in VAR_type so appropriate casts can be made.) e: The ability to handle special parsing of the command line via calls to a function that takes 1) current argv location, 2) argc and 3) the address of the command_entry list as arguments. f: For those values that are multiples on the command line (i.e. multiple filenames), the function specified in the command_entry could create a list of the names (copying them if desired) and then have the variable in the command_entry point to the head of the list. g: Optionally to setting other variables, the values could be returned in the command_entry structure itself (maybe via a union in the struct??). h: The ability to specify in the command entry an error routine specific for the particular option being parsed. i: By adding the flexibility of calling a function to deal with funky parts of the command line the function to parse the command line will return only when it has parsed the whole command line thus eliminating the problem of dealing with the unparsed command line namely because it is an error [probably fatal] for it not to parse the whole command line. j: The command_entries could be created dynamically during runtime, or declared statically at compile time. k: The driver for Options_please (the get_ops lookalike ) would act similiarly to a LR or LL parser driver with a parse table (the linked list of command_entries). The driver is easy to maintain with all of the work actualy done during the creation of the parse tables. BUGS: a: This data for the command_entries could take up a lot of space and therefore may be troublesome. b: The second problem occurs because of the ambiguity in the command language. Please follow my description below: Assume we have defined: A keyword Kval that can have an optional argument, and boolean keywords (flags either on or off) "u" and "e". How do we parse "-Kvalue". Is it Kval with argument "ue" or is it Kval with no arguments and the boolean flags "u" and "e". If we allow eliding of whitespace between flag and value it is impossible to tell which is meant. By doing away with 'c' above we can then parse this as Kval with no arguments. Another ambiguity arises if we decide on having an argument that can be abbreviated "K" (Kval needs all four letters) and other arguments "v", "a", and "l". Now how does the above string parse: The boolean "K" the boolean "v" no wait those two letters are the prefix for Kval (ARRGH ;-[) (HELP LR GRAMMAR) Richard Harter also touched on this ambiguity problem in his article. This is a problem that is inherant with features a,b above. One way around this is to make sure that you never use the letters K,v,a, and l :-). A second way around the problem is to make the order of the keyword in the list of command_entries significant and therefore impart an priority to the commands. In the above example: if Kval appeared before K (which it would have to do in order to have Kval called at all) the interpretation of the flag Kval would occur first. A third way around it is to write the table such that no two command_entries have overlaping differences. The fourth way is to write a function that will allow the handling of this via look-ahead or whatever mechanism you devise. Basically you turn an NFA into a DFA by combining states. E.G. if a K was found a function would be called that would try to determine if the value was Kval or if the value was K followed by random characters. If you think this stuff was handled in The Dragon Book You are right on the money. But note the thing that causes all of the problems is allowing names and having possibly non-unique representations for every string that can be generated. However this facility seems to be the only way to even attempt generality and allow a way of working around the problem. PLEASE NOTE: that this is only an idea and I would like feedback on it. Please feel free to steal the idea and modify it as necessary. Sorry it is so long but I was trying to reply to everybodues favorite must haves. What do I know I am only a Physics major? ========================================================================== The opinions expressed above are all mine and belong to nobody else. To U-Mass I am just a number. E = M C**2 Not just an equation a way of life. John Rouillard U.S. Snail: Physics Department U-Mass Boston U-Mass Boston Physics Major Harbor Campus Boston, MA 02125 UUCP: harvard!umb husc6!umb