Path: utzoo!attcan!uunet!mcvax!ukc!its63b!csnjr From: csnjr@its63b.ed.ac.uk (Nick Rothwell) Newsgroups: comp.lang.misc Subject: Re: Concrete Syntax in ML (was: CAML Release 2.5) Keywords: ML, CAML, parsing, lexing Message-ID: <1323@its63b.ed.ac.uk> Date: 13 May 88 11:34:05 GMT References: <689@inria.UUCP> Reply-To: nick%ed.lfcs@uk.ac.ucl.cs.nss Organization: LFCS, University of Edinburgh Lines: 78 From article <689@inria.UUCP>, by mauny@inria.UUCP (Michel Mauny): > In CAML, the Yacc interface doesn't take care of the parser produced by > yacc: it is only interested in parsing tables which are used by > the CAML parser itself (and parsing tables of CAML's syntax are produced in > this way). That's quite a nice solution. I aim to have a single parsing "engine" both for ML and for any concrete quotations. Different instances of this parser wil be used with different tables and lexers. > We plan also to give a better syntax to a grammar definition and to > typecheck (statically when possible) grammar definitions. > > - What about "lexer" (are they user-definable, hand-written or mechanically > produced in lex-style)? I knocked up a lexer generator a couple of days ago. Here (roughly) is the interface: signature LEX_SPEC = sig type LexClass type Terminal type Value val whiteSpace: LexClass val follows: LexClass -> LexClass lib.set val lexClass: string -> LexClass val eof: Terminal val VOID: Value val makeTerminal: string -> Terminal * Value end You specify some LexClass type, give a function from characters to LexClasses, and say which LexClasses can "stick" to others (eg, LexClass = LETTER | DIGIT, "a" is a LETTER, "0" is a DIGIT, and LETTER can be followed by {LETTER,DIGIT}. What you get back is a (side-effecting) lexing function fun lex: (unit -> string) -> (unit -> Terminal * Value) This takes a string "generator" as argument (which may read from a file, or whatever), and returns a terminal (and value) generator. This isn't as powerful as lex, but serves my purposes for the present. > - Do you typecheck grammars? Yes, they're ML data structures, so ML typechecks them. The problem with this is that every production has to have the same type of attribute, so attribute functions range over some "union" datatype, and tend to have rather complex arguments. I have ideas for making this easier. > - Do you have a complete antiquotation mechanism (with escapes to the > meta-language)? We're implementing a "brute-force" approach in the first instance. The ML lexer will be able to associate any token (such as "<<") with a parser (of type instream -> string). When it sees a "<<", it calls the parser on its current input stream, waits to get the text back, and then carries on lexing from this. Hence, the concrete syntax parser must eventually return ML source text. I don't see any reason why a user-generated lexer for the concrete syntax shouldn't be able to do something similar, and call the ML parser to get back to the meta-language. My thoughts are a little unclear on this at the moment. I'll write a report on all of this in a month or two, when my ideas are more settled, and send you a copy, if you're interested. Nick. -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick%lfcs.ed.ac.uk@nss.cs.ucl.ac.uk !mcvax!ukc!lfcs!nick ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ...while the builders of the cages sleep with bullets, bars and stone, they do not see your road to freedom that you build with flesh and bone.