Xref: utzoo comp.unix.questions:18442 comp.unix.wizards:19770 comp.sources.wanted:9782 Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!ncrlnk!ncrcce!ncrons!johnson From: johnson@ncrons.StPaul.NCR.COM (Wayne D. Johnson) Newsgroups: comp.unix.questions,comp.unix.wizards,comp.sources.wanted Subject: Re: Datafile conversion with AWK ! Keywords: datafile conversion awk sed Message-ID: <220@ncrons.StPaul.NCR.COM> Date: 14 Dec 89 20:34:21 GMT References: <539@csoftec.csf.com> Reply-To: johnson@ncrons.StPaul.NCR.COM (Wayne D. Johnson) Organization: NCR Comten, St Paul Lines: 69 In article <539@csoftec.csf.com> root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) writes: >(here is the ASCII datafile I have) > >AA1|name1|4you|ABC Co|3|56|a|bbb|c|d|eeee|fff|g|1| >AA1|name1|4you|5th ST||||||||||2| >AA1|name1|4you|Go4it, TX||||||||||3| >ZZ1|name2|5ght|SED Co.|4|88|b|ccc|c|d|eee2|ff2|h|1| >JJ1|name3|6ghi|AWK Inc.|4|98|c|ddd|e|f|gggg|ff3|i|1| >JJ1|name3|6ghi|2nd St||||||||||2| >JJ1|name3|6ghi|POB 34||||||||||3| >JJ1|name3|6ghi|Town, USA||||||||||4| >BB1|name4|7mob|B H Co|2|56|d|eee|f|g|hhhh|ii4|j|1| >BB1|name4|7mob|Athens, TN||||||||||2| > Interesting little bugger... The first problem is to identify the records. The easiest way to do that is with awk. Try search paremeters like: /1|$/ /2|$/ This will seperate the records so you can process them. The 1| part matches the 1| in the record and the $ guarentees that it will only match when the 1| is the last characters of the line. The processing should be somthing like: /1|$/ { addr1=$4; f1=$1; f2=$2; f3=$3...} /2|$/ { addr2=$4; ...} /3|$/ { addr3=$4; ...} /4|$/ { addr4=$4; ...} This will save off the contents of the address(?) field from record 1 and 2, you will need to have a process for each record type. This allows you to process all the records but how about outputing them. You first need to detect when the line read is part of a new record. I like to use somthing like: NR == 1 { old = $1} This will set the variable old to the first field of the first record ($1 != old) || END { print f1 f2 f3 addr1 addr2 addr3 addr4 ... addr1="" addr2="" addr3="" addr4="" old=$1 } This will detect when you have read the first line of the next record. When this happens, you print out the information you have gathered from the last record and set your old variable to the new key. Lastly, you need to define your field seperator character as | so you put IFS="|" and OFS="|" on the awk command line. I've been pretty general and breif about this whole thing, if you need more help or have a question about anything, just email me. Disclaimer: I have not run any of this code, but it should work. note that the code listed has some generalities to it (i.e. "...") that means additional code could be placed here. I wasn't shure what your knowlege of awk was so I tried to be pretty simple. -- Wayne Johnson | Is a baby's life worth more than the right to NCR Comten, Inc. | make a choice? Babies are people too. Roseville MN 55113 +----------------------------------------------------- (Voice) 612-638-7665 (E-MAIL) W.Johnson@StPaul.NCR.COM