Xref: utzoo sci.lang:2603 comp.compilers:262 Path: utzoo!attcan!uunet!husc6!spdcc!ima!compilers-sender From: smryan@garth.uucp (Steven Ryan) Newsgroups: sci.lang,comp.compilers Subject: Re: Shallow Parsing Message-ID: <1118@ima.ISC.COM> Date: 17 Jun 88 19:08:37 GMT References: <1114@ima.ISC.COM> Sender: compilers-sender@ima.ISC.COM Reply-To: smryan@garth.uucp (Steven Ryan) Organization: INTERGRAPH (APD) -- Palo Alto, CA Lines: 72 Approved: compilers@ima.UUCP In article <1114@ima.ISC.COM> Bart Massey writes: >I don't believe that the above argument about time is correct. My guess is >that a human parsing *sentences of text* is much like a compiler parsing >*sequences of statements*... Certainly such a behavior will occur in >"linear time" on the number of sequences parsed. In fact, this is probably >part of the reason why languages have statements and texts have sentences. Yes, I'm thinking that we divide the stream into sentence-like entities. In spoken language we use pauses to signify a break (and catch our breaths) and variety of gestures and eye movements. In written language we endmarks and paragraphs. Programming languages use begin/end/if/... keywords that break out the overall structures. Dividing the stream could be done in context free, deterministic fashion. It may be the contents of each sentence are parsed differently. >the semantics of the language which will determine how parse time grows with >*program* length. But I'm not sure what this tells me -- in particular, >both humans and compilers seem to do quite well at recovering information >about distant tokens from memory in effectively *constant time* for the I don't know about humans, but most compilers use a hash table for the symbol table. In this case the time to identify one symbol out of n distinct symbols is O(n/m) for a hash table size m. If m is larger than n, it is constant time. However if n is large enough, the hash table linearly decreases search time, but the search down any hash entry chain remains n (usually) or log n (for a sophisticated table). >Does it pay to recognize this idiom seperately from other for() loops? Why >or why not? Also, I'm wonderring about the tradeoff of determinstic versus nondeterministic grammars or languages or parsing. For those unfamilar with these terms, if a parser merrily trips through string left to right, given everything it is seen so far, looking at the current symbol at maybe k symbols to the right, does it know exactly which production it is in? If this true for a fixed k, the parser is deterministic. If it requires arbritrarily large k (the lookahead), it is nondeterminstic. Or it might be ambiguous. (Deterministic langauges are not inherently ambiguous: each has an LR(1) grammar and LR(1) grammars are not ambiguous.) >[I always thought that computers use unambiguous context free grammars because >we know how to parse them fast, and because they seem empirically to be >convenient for humans to try to express things like computer programs. I >also think that something like the current parsing methods is probably >time-optimal for conventional computer architectures. I suspect it has more to do parse generators: a parser can be automatically generated for an LR(1) grammar, but not so for a nondeterministical language. (Personal remark:) I think Wirth has crippled a generation of programming language by stripping them of anything that makes life tough for a compiler writer (my occupation if you haven't already guessed). > Ned Irons looked at >context-free parsing on small numbers of parallel processors and decided that I first thought about this with respect to CDC Cyber 205 and ETA-10x vector machines. If somebody can find a way to vectorise parsing, I think it will open up a vast field of applications which currently appear to be scalar or sequential. Thank you for your support, sm ryan [Using Earley's algorithm you can parse ambigious grammars without trouble, although it is much slower than LR or LALR for the grammars that the latter two can handle. -John] [From smryan@garth.uucp (Steven Ryan)] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request