Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site ttrdc.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!mgnetp!ltuxa!ttrdc!levy From: levy@ttrdc.UUCP (Daniel R. Levy) Newsgroups: net.lang Subject: Re: Searches for string in a stream Message-ID: <624@ttrdc.UUCP> Date: Fri, 6-Dec-85 20:24:31 EST Article-I.D.: ttrdc.624 Posted: Fri Dec 6 20:24:31 1985 Date-Received: Sat, 7-Dec-85 20:51:23 EST References: <2389@ukma.UUCP> <7839@ucla-cs.ARPA> <13666@rochester.UUCP> Organization: AT&T, Computer Systems Division, Skokie, IL Lines: 86 In article <13666@rochester.UUCP>, ken@rochester.UUCP (Ipse dixit) writes: >In article <7839@ucla-cs.ARPA> jimc@ucla-cs.UUCP (Jim Carter) writes: >>(Bug: if the pattern is ROUTE and the text is ROUROUTE, the pattern will >>be missed. This is an easy fix. But if the pattern is TINTINABULATION >>and the text is TINTINTINABULATION, even with the above fix the pattern >>will be missed. It could be seen only with long and expensive pattern >>checking -- or does someone see a smart way to do it? > >Yes, in the chapter on string matching in Design and Analysis of Computer >Algorithms (Aho, Hopcroft and Ullman?) you can see how to construct a >finite state machine to recognize a substring anywhere in the input ^^^^^ >word. Basically, when a failure to match occurs, the machine goes back ^^^^ >to the last state which it could be in. In your second example, when >it fails to match the third T it goes back to the state it would have >been in after seeing TINT. All the states can be computed in advance and the >algorithm runs in linear time. > Ken Input WORD? The input to this example was a stream of characters, NOT a series of "words", if I can remember correctly. Does this algorithm deal with STREAMS, too? (I.e., inability to back up, no internal buf- fering.) Perhaps a bit dirtily, a buffer (big enough to hold the longest possible pattern which is to be matched) might be used, filling it and flushing it, watching for the first character of the pattern to be matched and doing a more complete check as necessary. This would be a worst case linear time algorithm, too. Example: Pattern to be matched: ABABC Stream: BABABABCABDABEBCDEFGHIJKLMNABABABABCDEFG ----- ----- Buffer: BABAB; Stream: ABCABDABEBCDEFGHIJKLMNABABABABCDEFG Flush out: B come to A; shift and fill; Buffer: ABABA; Stream: BCABDABEBCDEFGHIJKLMNABABABABCDEFG ABABA != ABABC Flush out: A,B come to A; shift and fill; Buffer: ABABC; Stream: ABDABEBCDEFGHIJKLMNABABABABCDEFG ABABC == ABABC Output pattern with newline: '\n'ABABC Fill buffer Buffer: ABDAB; Stream: EBCDEFGHIJKLMNABABABABCDEFG Buffer begins with A; ABDAB != ABABC Flush out: A,B,D come to A; shift and fill; Buffer: ABEBC; Stream: DEFGHIJKLMNABABABABCDEFG ABEBC != ABABC Flush out: A,B,E,B,C Fill buffer Buffer: DEFGH; Stream: IJKLMNABABABABCDEFG Flush out: D,E,F,G,H Fill buffer Buffer: IJKLM; Stream: NABABABABCDEFG Flush out: I,J,K,L,M Fill buffer Buffer: NABAB; Stream: ABABCDEFG Flush out: N Come to A; shift and fill; Buffer: ABABA; Stream: BABCDEFG ABABA != ABABC Flush out: A,B Come to A; shift and fill; Buffer: ABABA; Stream: BCDEFG ABABA != ABABC Flush out: A,B Come to A; shift and fill; Buffer: ABABC; Stream: DEFG ABABC == ABABC Output pattern with newline: '\n'ABABC Fill buffer Buffer: DEFG_ [EOF flag set]; Stream: empty Flush out: D,E,F,G (For good measure: output '\n') -- ------------------------------- Disclaimer: The views contained herein are | dan levy | yvel nad | my own and are not at all those of my em- | an engihacker @ | ployer or the administrator of any computer | at&t computer systems division | upon which I may hack. | skokie, illinois | -------------------------------- Path: ..!ihnp4!ttrdc!levy