Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!samsung!uunet!mcsun!ukc!keele!nott-cs!piaggio!anw
From: anw@maths.nott.ac.uk (Dr A. N. Walker)
Newsgroups: comp.lang.misc
Subject: Re: syn-taxed [was Re: Complexity of syntax]
Message-ID: <1991Jan9.170421.29111@maths.nott.ac.uk>
Date: 9 Jan 91 17:04:21 GMT
References: <19876@yunexus.YorkU.CA> <23484:Jan619:01:5391@kramden.acf.nyu.edu> <1991Jan7.202423.23420@agate.berkeley.edu> <14305:Jan800:43:4391@kramden.acf.nyu.edu>
Reply-To: anw@maths.nott.ac.uk (Dr A. N. Walker)
Organization: Maths Dept., Nott'm Univ., UK.
Lines: 76

In article <14305:Jan800:43:4391@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>[...]					      It really is possible to
>define a language feature in such a way that it cannot be turned into
>syntax.
>
[ example of "a%b + a/b*b = a" expressed as a switch deleted ]
>
> [...]				    Our syntactic model explodes. *This*
>is what makes the integer operations semantics and not syntax: no single
>syntactic definition suffices for arbitrarily large integers.

	But this depends on what you mean by a syntactic definition.
A two-level grammar is perfectly capable of dealing with arbitrarily
large integers (and so are other models of syntactic structure).  But
it starts to look like a programming language -- in other words, at one
extreme, a syntax is merely a program that when fed your program as input
will halt and either print "OK" or print "KO", and the *real* problem is
bootstrapping the syntax program ('cos we would like to write it in a
sensible language, and we can't until we know enough of the syntax of
the sensible language).  At the other extreme, we all agree that syntax
is (say) BNF, and then we find, as you say, that some things can't be
expressed.

>	 I'd say that a feature is semantic if the most powerful *fixed*
>grammar for the language can't express the feature.

	Explain "fixed".  A finite two-level grammar can express all
features of C (for example), including all those normally thought of as
semantic (such as the undesirability of following null pointers, or
indexing off the end of an array).  The only reason why this is more
interesting than the fact that a finite C program (such as a very
careful interpreting load-and-go compiler) can do the same thing is that
the two-level grammar *itself* has rather a simple syntax, so that most
people would [after some study] agree whether or not a 2LG is syntactically
correct, and it is probably rather easy to automate this.  (I don't know
whether anyone has tried.)

	To give a flavour of what a 2LG would look like for Dan's problem,
we would have an upper level which is very like traditional BNF, thus

		INT:	DIGIT | INT DIGIT
		DIGIT:  zero | one | two | ...

On the lower level, the higher level constructs can appear *inside the
LHS*, so that a finite collection of rules can represent an unbounded
collection of actual rules.  For example, we might have things like

	expression INT1 minus INT2: expression dec INT1 minus dec INT2.
	expression zero minus INT: negative INT.
	expression INT minus zero: INT.
	ANYTHING dec INT one: ANYTHING INT zero.
	ANYTHING dec INT two: ANYTHING INT one.
	ANYTHING dec INT zero: ANYTHING borrow dec INT.
	ANYTHING borrow INT: ANYTHING INT nine.
	...

In the lower level, occurrences of higher constructs such as INT must
be replaced identically on both sides, so, for example, the third rule
generates things like "expression one seven minus zero: one seven.", but
not "expression zero zero four three minus zero: six nine.".  Note that
you will also need some clever-ish rules to get rid of leading zeros, and
if you want to prevent chasing down blind alleys, you will need predicates
on the lines of

	expression INT1 minus INT2: where positive INT1 and positive INT2,
		expression dec INT1 minus dec INT2.

	After a page or three of such rules, we can have every expectation
that "expression one seven mod three plus one seven div three times three"
will eventually yield "one seven".  Full details are left to the reader.
[:-)]

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk