Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!fluke!inc
From: inc@tc.fluke.COM (Gary Benson)
Newsgroups: comp.emacs
Subject: Re: ALL CAPS-->Mixed case converter?
Message-ID: <1990Sep11.185353.26919@tc.fluke.COM>
Date: 11 Sep 90 18:53:53 GMT
References: <3245@nems.dt.navy.mil> <MDB.90Sep1111521@kosciusko.ESD.3Com.COM>
Organization: John Fluke Mfg. Co., Inc., Everett, WA
Lines: 102

In article <MDB.90Sep1111521@kosciusko.ESD.3Com.COM> mdb@ESD.3Com.COM (Mark D. Baushke) writes:

MarkB: On 1 Sep 90 13:06:34 GMT, in article <3245@nems.dt.navy.mil> posted
MarkB: to comp.emacs, science@nems.dt.navy.mil (Mark Zimmermann) writes:

MarkZ:  does anybody have (a pointer to) an Emacs way to convert ALL CAPITAL
MarkZ:  LETTER TEXT into nice mixed-case text?  There are obviously many
MarkZ:  degrees of sophistication in doing this; I'd like to be able to give
MarkZ:  the system a file of words to be capitalized and/or words to be
MarkZ:  lowercased and/or words to be all caps (NASA, USA, etc.), to
MarkZ:  customize, and it should try to find and capitalize the first word of
MarkZ:  each sentence.  Any suggestions??  Is this better done in C or awk
MarkZ:  than in Emacs??  Tnx for help! - ^z - science@nems.dt.navy.mil

MarkB: The perl script after my .signature was posted to comp.lang.perl some
MarkB: time ago. With a little work it should be possible to add an exception
MarkB: list for some words like NASA, USA, etc. to be output in all caps.

#!/usr/bin/perl

# This program copies its input to STDOUT, converting uppercase characters
# to lowercase, except the first letter of each sentence and the word 'I'.

# DEFINITION: a 'sentence' begins with an alphanumeric and ends at the end
# of the input file or at the first terminator (period, question mark, or 
# exclamation point) which is followed by white space.

    $/ = "\177";        # do not split the input into lines
    $_ = <;            # read the entire input

    s/(\w)(([^.?!I]+|[.?!]+\S|\BI|I\B)+)/($a=$2)=~tr|A-Z|a-z|,$1.$a/eg;

    print;              # print the results

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_

We had a similar need, but with a slightlight different twist. We needed to convert from:

    A LINE THAT LOOKED LIKE THIS

to:

    A Line That Looked Like This

Here's how we did the exception list, followed by the subroutine that
actually performed the substitution:

# SPECIALS
# words to translate by lookup - the left string must be lower-case
# the right string is the replacement, ie. any case (or word even).
$specials{"a"} = "a";
$specials{"an"} = "an";
$specials{"and"} = "and";
$specials{"as"} = "as";
$specials{"ascii"} = "ASCII";
$specials{"at"} = "at";
$specials{"but"} = "but";
$specials{"by"} = "by";
$specials{"for"} = "for";
$specials{"from"} = "from";
$specials{"in"} = "in";
$specials{"is"} = "is";
$specials{"it"} = "it";
$specials{"nor"} = "nor";
$specials{"of"} = "of";
$specials{"on"} = "on";
$specials{"onto"} = "onto";
$specials{"or"} = "or";
$specials{"out"} = "out";
$specials{"so"} = "so";
$specials{"to"} = "to";
$specials{"the"} = "the";
$specials{"with"} = "with";
$specials{"yet"} = "yet";

# INITIAL CAPITALS SUBROUTINE
# inputs 1 argument: a string, returns the string with initial-caps except
# words which appear in $specials are translated by look up.
sub initcaps {
    local($out,$in,$word,$lcw,$found);
    $in = $_[0];
    $out='';
    while (length($in) > 0) {
	$in =~ s/^( *)([^ ]*)//;
	$out .= $1;			# transfer the white space before words
	$word = $2;			# setup the next word
	($lcw=$word) =~ tr/A-Z/a-z/;	# make a lower-case version
 	$found = $specials{$lcw};	# is the word on the special list?
	if ($found && $out !~ /^ *$/) {	# exception word but not first in line
	    $word = $found;		# exception words as specified.
	} elsif ($word =~ /^[a-z]*$/i){ # word is only alphabetics?
	    $word =~ tr/a-z/A-Z/;	# make an upper-case version
	    $word = substr($word,0,1) . substr($lcw,1,999); # combine lc version
	    }
	$out .= $word;
	}
    $out;
    }
-- 
Gary Benson    -=[ S M I L E R ]=-   -_-_-_-inc@fluke.com_-_-_-_-_-_-_-_-_-_-

Many a bum show has been saved by the flag.   -George M. Cohan