Xref: utzoo comp.unix.questions:18442 comp.unix.wizards:19770 comp.sources.wanted:9782
Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!ncrlnk!ncrcce!ncrons!johnson
From: johnson@ncrons.StPaul.NCR.COM (Wayne D. Johnson)
Newsgroups: comp.unix.questions,comp.unix.wizards,comp.sources.wanted
Subject: Re: Datafile conversion with AWK !
Keywords: datafile conversion awk sed
Message-ID: <220@ncrons.StPaul.NCR.COM>
Date: 14 Dec 89 20:34:21 GMT
References: <539@csoftec.csf.com>
Reply-To: johnson@ncrons.StPaul.NCR.COM (Wayne D. Johnson)
Organization: NCR Comten, St Paul
Lines: 69

In article <539@csoftec.csf.com> root@csoftec.csf.com (Cliff Manis (cmanis@csoftec)) writes:
>(here is the ASCII datafile I have)
>
>AA1|name1|4you|ABC Co|3|56|a|bbb|c|d|eeee|fff|g|1|
>AA1|name1|4you|5th ST||||||||||2|
>AA1|name1|4you|Go4it, TX||||||||||3|
>ZZ1|name2|5ght|SED Co.|4|88|b|ccc|c|d|eee2|ff2|h|1|
>JJ1|name3|6ghi|AWK Inc.|4|98|c|ddd|e|f|gggg|ff3|i|1|
>JJ1|name3|6ghi|2nd St||||||||||2|
>JJ1|name3|6ghi|POB 34||||||||||3|
>JJ1|name3|6ghi|Town, USA||||||||||4|
>BB1|name4|7mob|B H Co|2|56|d|eee|f|g|hhhh|ii4|j|1|
>BB1|name4|7mob|Athens, TN||||||||||2|
>
Interesting little bugger...

The first problem is to identify the records.  The easiest way to do that is 
with awk.  Try search paremeters like:
/1|$/
/2|$/
This will seperate the records so you can process them.  The 1| part matches the
1| in the record and the $ guarentees that it will only match when the 1| is the
last characters of the line.

The processing should be somthing like:
/1|$/ { addr1=$4; f1=$1; f2=$2; f3=$3...}
/2|$/ { addr2=$4; ...}
/3|$/ { addr3=$4; ...}
/4|$/ { addr4=$4; ...}
This will save off the contents of the address(?) field from record 1 and 2, you
will need to have a process for each record type.  

This allows you to process all the records but how about outputing them.  You
first need to detect when the line read is part of a new record.  I like to use
somthing like:

NR == 1 { old = $1}

This will set the variable old to the first field of the first record

($1 != old) || END { 
	print f1 f2 f3 addr1 addr2 addr3 addr4 ...
	addr1=""
	addr2=""
	addr3=""
	addr4=""
	old=$1
}
This will detect when you have read the first line of the next record.  When
this happens, you print out the information you have gathered from the last
record and set your old variable to the new key.

Lastly, you need to define your field seperator character as | so you put
IFS="|" and OFS="|" on the awk command line.

I've been pretty general and breif about this whole thing, if you need more
help or have a question about anything, just email me.  

Disclaimer: I have not run any of this code, but it should work.  note that 
the code listed has some generalities to it (i.e. "...") that means additional
code could be placed here.

I wasn't shure what your knowlege of awk was so I tried to be pretty simple.

-- 
Wayne Johnson         |  Is a baby's life worth more than the right to 
NCR Comten, Inc.      |  make a choice?  Babies are people too.
Roseville MN 55113    +-----------------------------------------------------
(Voice) 612-638-7665   (E-MAIL) W.Johnson@StPaul.NCR.COM