Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!usc!apple!snorkelwacker!spdcc!ima!esegue!compilers-sender From: hankd@dynamo.ecn.purdue.edu (Hank Dietz) Newsgroups: comp.compilers Subject: Re: Regular expressions & finite automata. Summary: Ambiguity on state merging Keywords: lex, DFA Message-ID: <9008242218.AA25179@dynamo.ecn.purdue.edu> Date: 24 Aug 90 22:18:18 GMT References: <9008231529.AA26343@sn1987a.compass.com> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: hankd@dynamo.ecn.purdue.edu (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 52 Approved: compilers@esegue.segue.boston.ma.us In article <9008231529.AA26343@sn1987a.compass.com> worley@compass.com (Dale Worley) writes: > > [There are many equivalent DFAs that a regular expression can translate to, > and vice versa. After all, any regular expression can be written in > infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really > pedantic, x vs. (x|x) vs. (x|x|x) ... -John] > >However, all DFAs for a given regular language can be derived from the >minimal DFA (the one with the fewest states that accepts that >language) by turning single states into sets of equivalent states. First, even without state merging, the DFAs generated for (x|x|x) and x are IDENTICAL. Still, state equivalence and merging are important concepts. Which reminds me of a strange and maybe useful thing I did a while back... I'm wondering if others think it useful also.... Basically, if someone gives you a DFA, it is possible to perform merging of equivalent states to try to reduce it to the minimal form. Two states A and B are equivalent if they have the same accept action and, for each transition from A to C labeled L, there is a corrseponding transition from B also labeled L and going to a state D| D is equivalent to C. That's the "standard" definition and, for example, it will correctly reduce the DFA for (x*|x) into the DFA for x*. Nothing tricky... I've implemented it and it works fine. Now, my little tweak was to recognize that, provided error detection is not needed, states can be considered equivalent if they don't have conflicting exit arcs and the accept action is either the same or is the error accept action. I call this "unsafe" state merging. The result is often a much simpler recognizer, but without error detection ability. For example, the lex-like specification: a*b { ACTION1 } aaaac { ACTION2 } would generate a DFA which actually corresponded to: a*b { ACTION1 } a*c { ACTION2 } I figured that this might be useful because it effectively reduces the patterns to the minimal trailing components which distinguish them. That clearly results in fewer states for the DFA... although my experiments show that it doesn't save all that many states for more complex patterns. So, what's the verdict? Is this "unsafe" state merging idea useful? Has it been done before? (If not, is it worth publishing?) -hankd@ecn.purdue.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.