Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!think!ames!uhccux!munnari.oz.au!mudla!ok From: ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) Newsgroups: comp.lang.prolog Subject: Re: More fun with WG17 Message-ID: <2759@munnari.oz.au> Date: 20 Nov 89 14:21:42 GMT References: <2609@munnari.oz.au> <696@sce.carleton.ca> <24257@cup.portal.com> Sender: news@cs.mu.oz.au Lines: 103 In article <24257@cup.portal.com>, pgl@cup.portal.com (Peter G Ludemann) writes: > Two questions to Richard in OZ: > Last time I checked - > - blanks are not significant in ISO but they are > signficant in "Edinburgh" [for example, to determine > the meaning of comma in not (a,b) and not(a,b)] Unfortunately, I've packed the documents away in a couple of boxes to go back to Auckland, so I can't check the right answer. However, the claim "blanks are not significant in" WG17 Prolog is outrageously false, always has been, and always SHOULD have been. If you want a language where blanks are not significant, go use Fortran. Blanks are significant in English: light housekeeper \== lighthouse keeper a nice boat \== an iceboat Blanks are significant in Pascal. Blanks are significant in C. They are significant in COBOL, Ada, PL/I, Lisp, Pop, ML, you name it. (Think about x+++y in C. It's legal. And take a dekko at "preprocessor numbers" in the C standard.) This is one of the things which has changed quite often in BSI/WG17 Prolog. N40 does make blanks significant after integers, where they weren't before: 0x1 used to be the token 0 followed by the token x1. In document N40, this is a single hexadecimal number (1). (Why in the name of sanity did WG17 copy hexadecimal numbers from C when EXISTING Prolog systems allow any base from 2 (which I quite often use) to 36 (which is cute rather than useful)?) Similarly, given the declaration :- op(600, xfx, e). the character sequence 3e-22 unambiguously stood for e(3,-22) in Common Prolog, but in WG17 Prolog it stands for 3.0e-22, so you have to add spaces: 3 e -22. (I suspect that WG17 tokenisers won't be smart enough to treat 3e -22 or 3e- 22 as 3 e -22.) Again, what WG17 have done is to borrow syntax from C rather than from existing Prologs. WG17 _have_ done some things with operators, for example [this,is,an,example] is no longer legal. By the way, 'not' is not a predefined operator in WG17 Prolog and has no assigned meaning. (At least they haven't insisted on breaking NU Prolog and Quintus Prolog, where not/1 is sound (or, in Quintus's case, apologetic) negation.) The WG17 negation-as-failure form is 'fail_if'. Without N40 to check, I turned to some notes I made on how WG17 would break LPA Prolog. (That's Logic Programming Associates.) My summary was At a rough guess, for me, WG17/LPA syntax differences would show up about one per 70 lines, excluding if->then;else. (It is not just Quintus Prolog that WG17 would break, far from it!) But one difference I did _not_ list was the "not(" "not (" case. > - the reference grammars (C&M, Quintus, BSI) all > are ambiguous (at least, I couldn't figure out a > way to make them LR(k) --- which is a pretty good test > of ambiguity, I think) I was chagrined just now to discover that section 2-8 of the Quintus Prolog Reference Manual is explicitly headed "Formal Syntax". Of course it's only an INFORMAL formal syntax. It is there to help human readers, and is not runnable. Things like unit-clause --> head {where head is not otherwise a sentence} are clearly not part of a "REFERENCE grammar". (I just noticed that the grammar in 2-8-2 doesn't mention if->then;else at all. That will give you a clue as to how formal it is.) The grammar in section 2-8-3 is in rather better shape: the comments about operators and "(" are actually handled in the tokeniser: "p(" and "p (" are actually different sequences of tokens. Trying to make a grammar LR(k) is a pretty lousy way of testing whether an extensible operator-precedence language is ambiguous. We start with the simple fact that Prolog has 1201 priority levels (0..1200). LR(k) grammars can't handle parameters, so you essentially have to write out 1201 copies of the guts of the grammar. I admire the tenacity of anyone who did that. We then observe that the grammars you tend to see are mostly descended from the one in the DEC-10 Prolog manual, which didn't _try_ to be precise, so it fails to point out that "p(" and "p (" are actually two different token sequences which can be parsed unambiguously. We next note the distinction between a grammar being ambiguous and a language being ambiguous. No Prolog term is assigned more than one parse by the public-domain parser: the language is unambiguous. (But it is not LR(k) for any finite k.) It's worth reflecting that many programming languages aren't LR(k). Algol wasn't, Pascal isn't, C isn't, Modula-2 isn't, Ada isn't. You can get an LR(k) grammar for those languages using a very simple technique: you lie. I'm afraid I don't know which grammar Ludeman means by "the [BSI] reference grammar". There were several, and the one in WG17's N40 document is different again (and except for the excessive borrowing from C, it's a big improvement on earlier grammars). I'm quite sure that WG17 *intend* their grammar to be locally unambiguous: there are precisely two places where a Prolog parser has to make a choice": atom/prefix (which they've blocked by ruling that no operator can be taken as an atom unless it has parens directly around it) and infix/postfix (which they've blocked by making it illegal for an atom to be both an infix and a postfix operator). The grammar isn't suitable for LR parsing, but why should it be? It's an extensible operator-precedence language, and that's a heck of a lot SIMPLER to parse than an LR language.