Path: utzoo!attcan!uunet!lll-winken!elroy.jpl.nasa.gov!jpl-devvax!lwall From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Newsgroups: comp.lang.perl Subject: Re: Changing the first character of a string. Message-ID: <8586@jpl-devvax.JPL.NASA.GOV> Date: 3 Jul 90 21:27:41 GMT References: <1990Jul3.144552.5407@uvaarpa.Virginia.EDU> Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) Organization: Jet Propulsion Laboratory, Pasadena, CA Lines: 50 In article <1990Jul3.144552.5407@uvaarpa.Virginia.EDU> worley@compass.com writes: : : From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) : : $flags =~ s/^[^\0]?/U/; : : Hmmm... Why is NUL treated specially? You'll have to ask the designers of C that question. Perl isn't treating it at all specially here--it's just relying on the convention that normal text never contains it. /[^\0]/ is just an idiom for matching any textual character including newline, which /./ specifically won't match. : Also, this illustrates one thing I don't like about regexps -- people : write code which depends on the order in which the alternatives are : matched. For instance, in the regexp above, the case where [^\0]? : matches the null string can always match, so it implicitly depends on : the fact that the non-null match is tried first. On the other hand, : it's hard (impossible?) to write a regexp which matches in only the : right way without some way to specify context for the match (shades of : \: and \;!!!). The longest-match-first principle is a long-standing tradition. It seems more useful (or at least, less confusing) to have things be self-fitting than to have to take external measures to make them fit. One interesting exception to this rule is that alternatives (in a typical backtracking regexp package, anyway) are matched left to right, even if the left alternative is shorter: $_ = 'abccccc'; /(ab|abc)c*/; print $1; will print "ab". In general, this doesn't get in your way. The place where longest-first makes problems is when you're scanning for opening and closing delimiters when there might be more than one pair on the line. In this case, you have to find some way of restricting the * from matching everything from the first opening delimiter to the final closing delimiter. With a single character delimiter, you can do it easily (neglecting backslashes for the moment): /"[^"]*"/ But what about multi-character delimiters such as C comments? The easiest way in Perl is to first translate the multi-character delimiters to single characters that don't otherwise occur, and then do the above trick. Larry