Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!fluke!inc From: inc@tc.fluke.COM (Gary Benson) Newsgroups: comp.emacs Subject: Re: ALL CAPS-->Mixed case converter? Message-ID: <1990Sep11.185353.26919@tc.fluke.COM> Date: 11 Sep 90 18:53:53 GMT References: <3245@nems.dt.navy.mil> Organization: John Fluke Mfg. Co., Inc., Everett, WA Lines: 102 In article mdb@ESD.3Com.COM (Mark D. Baushke) writes: MarkB: On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted MarkB: to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes: MarkZ: does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL MarkZ: LETTER TEXT into nice mixed-case text? There are obviously many MarkZ: degrees of sophistication in doing this; I'd like to be able to give MarkZ: the system a file of words to be capitalized and/or words to be MarkZ: lowercased and/or words to be all caps (NASA, USA, etc.), to MarkZ: customize, and it should try to find and capitalize the first word of MarkZ: each sentence. Any suggestions?? Is this better done in C or awk MarkZ: than in Emacs?? Tnx for help! - ^z - science@nems.dt.navy.mil MarkB: The perl script after my .signature was posted to comp.lang.perl some MarkB: time ago. With a little work it should be possible to add an exception MarkB: list for some words like NASA, USA, etc. to be output in all caps. #!/usr/bin/perl # This program copies its input to STDOUT, converting uppercase characters # to lowercase, except the first letter of each sentence and the word 'I'. # DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end # of the input file or at the first terminator (period, question mark, or # exclamation point) which is followed by white space. $/ = "\177"; # do not split the input into lines $_ = <; # read the entire input s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg; print; # print the results -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ We had a similar need, but with a slightlight different twist. We needed to convert from: A LINE THAT LOOKED LIKE THIS to: A Line That Looked Like This Here's how we did the exception list, followed by the subroutine that actually performed the substitution: # SPECIALS # words to translate by lookup - the left string must be lower-case # the right string is the replacement, ie. any case (or word even). $specials{"a"} = "a"; $specials{"an"} = "an"; $specials{"and"} = "and"; $specials{"as"} = "as"; $specials{"ascii"} = "ASCII"; $specials{"at"} = "at"; $specials{"but"} = "but"; $specials{"by"} = "by"; $specials{"for"} = "for"; $specials{"from"} = "from"; $specials{"in"} = "in"; $specials{"is"} = "is"; $specials{"it"} = "it"; $specials{"nor"} = "nor"; $specials{"of"} = "of"; $specials{"on"} = "on"; $specials{"onto"} = "onto"; $specials{"or"} = "or"; $specials{"out"} = "out"; $specials{"so"} = "so"; $specials{"to"} = "to"; $specials{"the"} = "the"; $specials{"with"} = "with"; $specials{"yet"} = "yet"; # INITIAL CAPITALS SUBROUTINE # inputs 1 argument: a string, returns the string with initial-caps except # words which appear in $specials are translated by look up. sub initcaps { local($out,$in,$word,$lcw,$found); $in = $_[0]; $out=''; while (length($in) > 0) { $in =~ s/^( *)([^ ]*)//; $out .= $1; # transfer the white space before words $word = $2; # setup the next word ($lcw=$word) =~ tr/A-Z/a-z/; # make a lower-case version $found = $specials{$lcw}; # is the word on the special list? if ($found && $out !~ /^ *$/) { # exception word but not first in line $word = $found; # exception words as specified. } elsif ($word =~ /^[a-z]*$/i){ # word is only alphabetics? $word =~ tr/a-z/A-Z/; # make an upper-case version $word = substr($word,0,1) . substr($lcw,1,999); # combine lc version } $out .= $word; } $out; } -- Gary Benson -=[ S M I L E R ]=- -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_- Many a bum show has been saved by the flag. -George M. Cohan