Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!haven!umbc3!tron!moran
From: moran@tron.UUCP (Harvey R Moran)
Newsgroups: comp.unix.questions
Subject: Re: Shell Database Management (?)
Summary: a wrapper for grep to anchor searches at column positions
Keywords: shell database
Message-ID: <438@tron.UUCP>
Date: 1 Sep 89 11:29:55 GMT
References: <10596@dasys1.UUCP>
Reply-To: moran@tron.UMD.EDU (Harvey R Moran)
Distribution: usa
Organization: Westinghouse Electric Corporation
Lines: 157

In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes:
>I would like to use a Bourne shell script to extract records from a simple
>database of fixed-length fields terminated with a new-line character.
>
>I've tried R'ing TFM to no avail.  
>
>The problem... the database consists of addresses... positions 99 and 100
>in each record contain a two-position abbreviation for the state.  It's easy
>to get cut to read those two characters, and grep to identify the state I
>want to extract, but how the ^#$&! do you then copy the ENTIRE record 
>thus identified to another file???  Using grep alone is no good because 
>the abbreviation appears in various other places in the record...
>
>-- 
>David Parsons
>Big Electric Cat Public UNIX
>..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips

Mail from here to to dasys1 bounces.  Well, someone else might find it
useful, so ...
# -------- snip -------- snip -------- snip -------- snip --------
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  agrep.sh agrep.c test.data
# Wrapped by moran@tron on Fri Sep  1 07:11:18 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'agrep.sh' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'agrep.sh'\"
else
echo shar: Extracting \"'agrep.sh'\" \(1850 characters\)
sed "s/^X//" >'agrep.sh' <<'END_OF_FILE'
X#!/bin/sh
X# agrep.sh -- anchored grep
X# 
X# Usage:    agrep beginning column match_pattern [file_name_or_list]
X#
X# Intended usage is searching fixed length ascii records for matches at a
X# particular column position.  Written in response to a posted example
X# desire to search for a match with 2 character state designators
X# in columns 99 and 100 of records containing (USA) mailing addresses.
X#
X# This uses brute force to generate a match pattern anchored
X# at the column of interest.  If the records get long enough, this technique
X# will probably die a horrible death by exceeding a pattern buffer length
X# allocation in grep.
X#
X# There are also potential pitfalls associated with full regular
X# expressions because the calculation of APAT below has "$2" in it
X# which exposes $2 to possible unintended wildcard expansion by the shell.
X# If *I* needed this capability, I would re-write "agrep" in a few lines
X# of equivalent C code.  A (garbiginous) sample is in this shar as "agrep.c"
X#
X# Harvey Moran  moran@tron.UUCP or moran@tron.UUCP@umbc3.UMBC.EDU 9/1/89
X
Xcase $# in
X0|1|2) echo "Usage:  $0 beginning_column match_pattern [file_name_or_list]"
X       echo "Acceptable range of beginning_column is 1 to 300"
X       exit 1
X       ;;
Xesac
XDOTS50=".................................................." # 50 dots in a row
XDOTS300="${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}"
X
Xif [ $1 -lt 1 -o $1 -gt 300 ]
Xthen
X       echo "Usage:  $0 beginning_column match_pattern file_name_or_list"
X       echo "Acceptable range of beginning_column is 1 to 300"
X       exit 1
Xfi
XAPAT="^"`echo "${DOTS300}" | cut -c1-$1`"$2"
Xshift
Xshift
X
X# NEWEXP is now the original match_pattern prefixed with a left anchor
X# and a number of dots (match any character) to position
X# the original match_pattern at the column of interest.
X
Xecho grep "\"${APAT}\"" "$@"
END_OF_FILE
if test 1850 -ne `wc -c <'agrep.sh'`; then
    echo shar: \"'agrep.sh'\" unpacked with wrong size!
fi
chmod +x 'agrep.sh'
# end of 'agrep.sh'
fi
if test -f 'agrep.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'agrep.c'\"
else
echo shar: Extracting \"'agrep.c'\" \(785 characters\)
sed "s/^X//" >'agrep.c' <<'END_OF_FILE'
X
X/*
X * agrep.c -- garbiginous version of "anchored grep"
X * Yes, I know this is not portable, but it is supposed to be a clue
X * rather than a program and it "works"
X * (given correct arguments) under Ultrix 3.1
X *
X * Making this trash into a program is left as an exercise to "the student".
X *
X * Usage:
X *    agrep anchor_column_number pattern [files]
X */
X
Xmain(ac, av, envp)	/* Yeah, I know envp is not portable */
Xint ac;
Xchar *av[], *envp[];
X{
X   static char *dots50 = "..................................................";
X    static char dots300[1024] = "^";
X    int i;
X    for (i = 0; i < 6; ++i )
X       (void) strcat(dots300, dots50);
X	  dots300[atoi(av[1])] = '\0';
X    strcat(dots300, av[2]);
X    av[1] = "agrep";
X    av[2] = dots300;
X    execve("/bin/grep", av+1, envp);
X}
END_OF_FILE
if test 785 -ne `wc -c <'agrep.c'`; then
    echo shar: \"'agrep.c'\" unpacked with wrong size!
fi
# end of 'agrep.c'
fi
if test -f 'test.data' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'test.data'\"
else
echo shar: Extracting \"'test.data'\" \(244 characters\)
sed "s/^X//" >'test.data' <<'END_OF_FILE'
X-------------------- test as: agrep 10 MD test.data, and variations thereof
X123456789VA234567890
XabcdefghiMD2zyxwvuts
XabcdefghiNH2zyxwvuts
X12MD56789NJ2zyxwvuts
XabcdefghiMD2zyxwvuts
X123MD6789NH2zyxwvuts
XabcdefghiCA23MD67890
XMD3456789FL2zyxwvuts
END_OF_FILE
if test 244 -ne `wc -c <'test.data'`; then
    echo shar: \"'test.data'\" unpacked with wrong size!
fi
# end of 'test.data'
fi
echo shar: End of shell archive.
exit 0

-- 
# Harvey Moran                  #    moran%tron.UUCP@umbc3.UMBC.EDU    #
# Westinghouse Electric Corp.   #    ...!netsys!tron!moran             #
# Electronic Systems Group      #    ...!{wb3ffv,netsys}!hrmhpc!harvey #
# Baltimore, Md.