Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site rochester.UUCP Path: utzoo!watmath!clyde!cbosgd!gatech!seismo!rochester!ken From: ken@rochester.UUCP (Ipse dixit) Newsgroups: net.lang Subject: Re: Comments on this program please... Message-ID: <13666@rochester.UUCP> Date: Wed, 4-Dec-85 08:38:41 EST Article-I.D.: rocheste.13666 Posted: Wed Dec 4 08:38:41 1985 Date-Received: Fri, 6-Dec-85 06:19:08 EST References: <2389@ukma.UUCP> <7839@ucla-cs.ARPA> Reply-To: ken@rochester.UUCP (Ipse dixit) Organization: Sans Serif Lines: 20 In article <7839@ucla-cs.ARPA> jimc@ucla-cs.UUCP (Jim Carter) writes: >(Bug: if the pattern is ROUTE and the text is ROUROUTE, the pattern will >be missed. This is an easy fix. But if the pattern is TINTINABULATION >and the text is TINTINTINABULATION, even with the above fix the pattern >will be missed. It could be seen only with long and expensive pattern >checking -- or does someone see a smart way to do it? Yes, in the chapter on string matching in Design and Analysis of Computer Algorithms (Aho, Hopcroft and Ullman?) you can see how to construct a finite state machine to recognize a substring anywhere in the input word. Basically, when a failure to match occurs, the machine goes back to the last state which it could be in. In your second example, when it fails to match the third T it goes back to the state it would have been in after seeing TINT. All the states can be computed in advance and the algorithm runs in linear time. Ken -- UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa USnail: Dept. of Comp. Sci., U. of Rochester, NY 14627. Voice: Ken!