Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!lll-crg!caip!clyde!burl!ulysses!mhuxr!mhuxt!houxm!mtuxo!mtune!codas!bsdpkh!latham From: latham@bsdpkh.UUCP (Ken Latham) Newsgroups: net.unix Subject: Re: regexp(3) A Clear Explanation (Medium length) Message-ID: <238@bsdpkh.UUCP> Date: Tue, 26-Aug-86 21:04:41 EDT Article-I.D.: bsdpkh.238 Posted: Tue Aug 26 21:04:41 1986 Date-Received: Wed, 27-Aug-86 20:44:18 EDT References: <516@chinet.UUCP> Reply-To: latham@bsdpkh.UUCP (Ken Latham) Distribution: net Organization: AT&T-IS (SDSS), Orlando Fl. Lines: 96 Keywords: re, subject, _loc1 Dr. Megabyte (megabyte@chinet.UUCP) writes: >I've poured myself over ny manual and looked at regcmp(1), regcmp(3), and >regexp(3), and I'm still not sure how to use these functions. Could someone >send me some clear info on how to use these functions along with some examples? > >For the record: I am running Zeus 3.21 which is SYS III port to those of you >who are fortunate to have never heard of it. I am not familiar with Zeus and am only quasi-familiar with sys3, the following is a sys5 explanation which, if memory serves me, should cover it. 1. regcmp(3) - a function which translates regular expressions ( a variant of ed(1) style ) to an internal form. The char pointer returned is the address of a ( non-null-terminated ) string that represents the regular expression. This 'compiled' regular expression can be interpreted by regex(3). If the returned pointer is NULL then you will have to 'walk' through the regular expression by hand and determine where the syntax error is. 2. regcmp(1) - a user level command that will compile files of regular expressions into either data files containing the compiled expressions or into C files declaring data structures containing same. 3. regex(3) - the compiled regular expression interpreter which parses the subject string to determine if it is in fact a member of the language described by the compiled regular expression. It returns a pointer to the first character in the subject string which caused the pattern acceptance to fail. Usually, this is a '\0' which terminated the subject string. There are many cases where the character that stopped the acceptance may not be '\0', this is program dependent. A global variable 'loc1' ( according to the manual ) points to the position at which the match started in the subject string. This is usually the start of the subject string, but may vary with the application. The ACTUAL NAME of 'loc1' may be different than advertised!! on sys5 it is '__loc1' . You can do a 'nm' on libPW.a to determine the name for your version. EX. char *compex, *badchar, *regcomp(), *regex(); . . compex = regcomp( "[a-zA-Z][_a-zA-Z0-9]*", 0 ); if ( compex == NULL ) .. some error routine to say that the RE is BAD ! . . badchar = regex( compex, "A_long_identifier_name" ); if ( badchar == '\0' && __loc1 == compex ) { ...then HOORAH, it was COMPLETE match!!! } else { ... BOO HISSS, only a partial or no match was made. you may want to accept some partial matches in which case you can look at what caused the match to fail before the string terminator ('\0'). look at *badchar. } . . NOTE: both "[a-zA-Z][_a-zA-Z0-9]*" and "A_long_identifier_name" could just as easily be variables that are pointers to strings !!! It is much more useful when used on variables :-). Some side notes: If it is the regular expressions and not the actual calls that give you problems then you need to buy a text book on the subject and get familiar with them. If you are familiar with REs then note that the (...)$n notation utilized in regex(3) is an added extension to normal REs. The other arguments ret0, ret1 ..., ret9 in regex(3) are there simply to provide pointers to regions where the (...)$n extractions should be copied. A subexpression surrounded by (....)$1 will extract a substring from the subject string which matches the portion of the regular expression enclosed in (...)$1. The ret0 pointer must hold the address of a preallocated area large enough to hold the longest possible substring. That should just about do it! Hope that helps. Sorry if you found this long winded, but I wanted to be complete. Ken Latham, AT&T-IS (via AGS Inc.), Orlando , FL uucp: ihnp4!codas!bsdpkh!latham