Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!yale!cmcl2!lanl!dph From: dph@lanl.gov (David Huelsbeck) Newsgroups: comp.unix.questions Subject: Lower->Upper in AWK (was: Re: cascading pipes in awk) Message-ID: <13921@lanl.gov> Date: 24 May 89 18:21:23 GMT References: <818@manta.NOSC.MIL> Distribution: usa Organization: Los Alamos National Laboratory Lines: 184 From article <818@manta.NOSC.MIL>, by psm@manta.NOSC.MIL (Scot Mcintosh): > > Unfortunately, I only want to uppercase a few selected portions of the > text my awk program is reading (my original posting contained a > very simplified example, so this wasn't obvious). There just doesn't > seem to be a way to have a filter program in the middle of two groups > of awk statements. I afraid your right. Perhaps nawk or gawk would help you but I really don't know enough about either one to say. However, you can, somewhat painfully, translate lower to upper or rot13 or whatever in plain old awk. Here is my solution to this problem along with a summary of solutions I recieved from other awkers when I posted asking for a better way. Sorry for the length but I felt that every different solution showed a unique and interesting approach that might be useful in solving other sorts of problems in awk. ---------------------------------------------------------------------- BEGIN { cap["a"] = "A"; cap["b"] = "B"; cap["c"] = "C"; cap["d"] = "D" cap["e"] = "E"; cap["f"] = "F"; cap["g"] = "G"; cap["h"] = "H" cap["i"] = "I"; cap["j"] = "J"; cap["k"] = "K"; cap["l"] = "L" cap["m"] = "M"; cap["n"] = "N"; cap["o"] = "O"; cap["p"] = "P" cap["q"] = "Q"; cap["r"] = "R"; cap["s"] = "S"; cap["t"] = "T" cap["u"] = "U"; cap["v"] = "V"; cap["w"] = "W"; cap["x"] = "X" cap["y"] = "Y"; cap["z"] = "Z" } { if ($1 ~ /[a-z]+/) { new = "" last = length($1) for (char=1; char <= last; ++char) { cur = substr($1,char,1) if (cur ~ /[a-z]/) { new = new cap[cur] } else { new = new cur } } print new } } -------------------------------------------------------------------- From jjm%atavax.decnet@afwl-vax.arpa Mon Mar 21 15:23:18 1988 Date: 21 Mar 88 14:57:00 MST From: "ATAVAX::JJM" Subject: AWK answer To: "dph" Status: R Here is the answer. BEGIN {CAP = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; LOW = "abcdefghijklmnopqrstuvwxyz"} { new = "" for( i = 1; i< length($1)+1 ; i++) if(index(LOW,substr($1,i,1)) != 0) new = new substr(CAP,index(LOW,substr($1,i,1)),1) else new = new substr($1,i,1) $1 = new print $0 } Please note that things could be sped up with some vars (like x = substr($1,i,1) etc. Please let me know how this works for you. John McDermott Applied Technology Associates 505/247-8371 Albuquerque ----------------------------------------------------------------------- From ima!ima.ISC.COM!marc@harvard.harvard.edu Tue Mar 22 06:33:35 1988 Date: Tue, 22 Mar 88 08:30:03 EST From: marc@ima.isc.com (Marc Evans) Message-Id: <8803221330.AA17525@ima.ISC.COM> To: dph@LANL.GOV Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion) It appears to me that in your examples, there is a specific argument that you are interrested in making the conversion on (eg. $1). Therefore, saying that the '| tr \[a-z\] \[A-Z\]' mechanism will not work is too narrow sighted. If this is indead the case, try the following: BEGIN {...} ($1 ~ [a-z]+) { print $1 | tr \[a-z\] \[A-Z\] } (rest of patterns) { ... } END {...} In theory, I beleive that you should be able to express your rules in the pattern section, such that the hierarchy of the patterns catches your special needs, before the patterns below them. Remember, multiple patterns can be matched, unless the 'next' directive is used (or something simular). I hope that this may help? 8-) ------------------------------------------------------------------------------- Marc Evans {decvax,inhp4,bbn,harvard}!ima!symetrx!marc Symmetrix 11 Market Square, Ipswich, MA (617) 356-7811 ------------------------------------------------------------------------------- Date: Tue, 22 Mar 88 16:56:16 EST From: Dick St.Peters Posted-Date: Tue, 22 Mar 88 16:56:16 EST Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion) This ain't real elegant but is offered for consideration. It's shorter than your version but undoubtedly slower too. Dreaming it up was fun. -- Dick St.Peters GE Corporate R&D, Schenectady, NY stpeters@ge-crd.arpa uunet!steinmetz!stpeters { if ($1 ~ /[a-z]+/) { new = "" last = length($1) for (char=1; char <= last; ++char) { cur = substr($1,char,1) if (cur ~ /[a-z]/) { for (i=0; i<26; i++) { tmp = substr("abcdefghijklmnopqrstuvwxyz",i,1) if (tmp == cur) { break; } } new = new substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,1) } else { new = new cur } } print new } } ----------------------------------------------------------------------------- >From: bzs@bu-cs.BU.EDU (Barry Shein) Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion) Date: 22 Mar 88 06:58:55 GMT > Convert possibly mixed-case strings to upper-case. > (not counting case-less chars like digits) The attached works under 4.3bsd as you required. -Barry Shein, Boston University BEGIN { upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; lower = "abcdefghijklmnopqrstuvwxyz"; } { out = ""; for(i=1;i <= length($1);i++) { if((cpos = index(lower,c = substr($1,i,1))) > 0) c = substr(upper,cpos,1); out = out c; } print out; } ------------------------------------------------------------------------- >From: sjmz@otter.hple.hp.com (Stefek Zaba) Subject: Re: ATTN: AWK GURUS!!! (lower to upper conversion) Date: 22 Mar 88 14:54:46 GMT Make no apology! Your lookup table is a perfectly neat solution given the wierd constraints you've acquired. Personally I'd even avoid the "if lc-alpha" test, and construct a table with the full 128 (sorry, non-USASCII users!) characters, using an awk FOR with printf %c, and then overwrite the 26 elements of interest. This avoids the "if" in your inner loop (though maybe awk table lookup is slow enough to make the "if" test a win in efficiency, if not clarity.) Keep at it - awk's clearly Turing-complete! (**Please**, no TM's-in-awk!!!)