Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site ucf-cs.UUCP Path: utzoo!linus!decvax!duke!tim@ucf-cs.UUCP (Tim Curry) From: tim@ucf-cs.UUCP Newsgroups: net.unix-wizards Subject: Re: egrep string too big Message-ID: <1078@ucf-cs.UUCP> Date: Thu, 17-Nov-83 00:09:25 EST Article-I.D.: ucf-cs.1078 Posted: Thu Nov 17 00:09:25 1983 Date-Received: Sun, 20-Nov-83 19:32:38 EST References: <991@utah-gr.UUCP> Organization: University of Central Florida Lines: 23 egrep allows full regular expression recognition and the search method requires a deterministic search string. When you ask for the pattern "a.c", you are really asking for "a(a+b+c+...+z+A+B+C+...+any ascii char)c" adding a question mark (?) after a don't care character (.) makes it even worse by now looking for the regular expression "a.?c" you actually search for: + a(a+b+c+...+z+A+B+C+...+any ascii character)c The four character short hand really represents a non-deterministic regular expression of 130 characters plus the operators. In this example, the non-determinism is only represented by what to do when a "c" is encountered. In the example ".?.", you introduce non-determinism in all the 128 characters of ascii. That expression must now be converted from non-deterministic to deterministic which has a potential exponential explosion in the number of states required. I don't know right off the top of my head what the growth is for this example (if you spent the time you could calculate it). After knowing this, it shouldn't suprise you that the pattern "^.........?.?.?..$" (or whatever it was) was considered too big for egrep to handle. -- Tim Curry USENET: decvax!ucf-cs!tim ARPANET: tim.ucf-cs@rand-relay