Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uflorida!haven!umbc3!tron!moran From: moran@tron.UUCP (Harvey R Moran) Newsgroups: comp.unix.questions Subject: Re: Shell Database Management (?) Summary: a wrapper for grep to anchor searches at column positions Keywords: shell database Message-ID: <438@tron.UUCP> Date: 1 Sep 89 11:29:55 GMT References: <10596@dasys1.UUCP> Reply-To: moran@tron.UMD.EDU (Harvey R Moran) Distribution: usa Organization: Westinghouse Electric Corporation Lines: 157 In article <10596@dasys1.UUCP> parsnips@dasys1.UUCP (David Parsons) writes: >I would like to use a Bourne shell script to extract records from a simple >database of fixed-length fields terminated with a new-line character. > >I've tried R'ing TFM to no avail. > >The problem... the database consists of addresses... positions 99 and 100 >in each record contain a two-position abbreviation for the state. It's easy >to get cut to read those two characters, and grep to identify the state I >want to extract, but how the ^#$&! do you then copy the ENTIRE record >thus identified to another file??? Using grep alone is no good because >the abbreviation appears in various other places in the record... > >-- >David Parsons >Big Electric Cat Public UNIX >..!cmcl2!{ccnysci,cucard,hombre}!dasys1!parsnips Mail from here to to dasys1 bounces. Well, someone else might find it useful, so ... # -------- snip -------- snip -------- snip -------- snip -------- #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh 'agrep.sh' <<'END_OF_FILE' X#!/bin/sh X# agrep.sh -- anchored grep X# X# Usage: agrep beginning column match_pattern [file_name_or_list] X# X# Intended usage is searching fixed length ascii records for matches at a X# particular column position. Written in response to a posted example X# desire to search for a match with 2 character state designators X# in columns 99 and 100 of records containing (USA) mailing addresses. X# X# This uses brute force to generate a match pattern anchored X# at the column of interest. If the records get long enough, this technique X# will probably die a horrible death by exceeding a pattern buffer length X# allocation in grep. X# X# There are also potential pitfalls associated with full regular X# expressions because the calculation of APAT below has "$2" in it X# which exposes $2 to possible unintended wildcard expansion by the shell. X# If *I* needed this capability, I would re-write "agrep" in a few lines X# of equivalent C code. A (garbiginous) sample is in this shar as "agrep.c" X# X# Harvey Moran moran@tron.UUCP or moran@tron.UUCP@umbc3.UMBC.EDU 9/1/89 X Xcase $# in X0|1|2) echo "Usage: $0 beginning_column match_pattern [file_name_or_list]" X echo "Acceptable range of beginning_column is 1 to 300" X exit 1 X ;; Xesac XDOTS50=".................................................." # 50 dots in a row XDOTS300="${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}${DOTS50}" X Xif [ $1 -lt 1 -o $1 -gt 300 ] Xthen X echo "Usage: $0 beginning_column match_pattern file_name_or_list" X echo "Acceptable range of beginning_column is 1 to 300" X exit 1 Xfi XAPAT="^"`echo "${DOTS300}" | cut -c1-$1`"$2" Xshift Xshift X X# NEWEXP is now the original match_pattern prefixed with a left anchor X# and a number of dots (match any character) to position X# the original match_pattern at the column of interest. X Xecho grep "\"${APAT}\"" "$@" END_OF_FILE if test 1850 -ne `wc -c <'agrep.sh'`; then echo shar: \"'agrep.sh'\" unpacked with wrong size! fi chmod +x 'agrep.sh' # end of 'agrep.sh' fi if test -f 'agrep.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'agrep.c'\" else echo shar: Extracting \"'agrep.c'\" \(785 characters\) sed "s/^X//" >'agrep.c' <<'END_OF_FILE' X X/* X * agrep.c -- garbiginous version of "anchored grep" X * Yes, I know this is not portable, but it is supposed to be a clue X * rather than a program and it "works" X * (given correct arguments) under Ultrix 3.1 X * X * Making this trash into a program is left as an exercise to "the student". X * X * Usage: X * agrep anchor_column_number pattern [files] X */ X Xmain(ac, av, envp) /* Yeah, I know envp is not portable */ Xint ac; Xchar *av[], *envp[]; X{ X static char *dots50 = ".................................................."; X static char dots300[1024] = "^"; X int i; X for (i = 0; i < 6; ++i ) X (void) strcat(dots300, dots50); X dots300[atoi(av[1])] = '\0'; X strcat(dots300, av[2]); X av[1] = "agrep"; X av[2] = dots300; X execve("/bin/grep", av+1, envp); X} END_OF_FILE if test 785 -ne `wc -c <'agrep.c'`; then echo shar: \"'agrep.c'\" unpacked with wrong size! fi # end of 'agrep.c' fi if test -f 'test.data' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'test.data'\" else echo shar: Extracting \"'test.data'\" \(244 characters\) sed "s/^X//" >'test.data' <<'END_OF_FILE' X-------------------- test as: agrep 10 MD test.data, and variations thereof X123456789VA234567890 XabcdefghiMD2zyxwvuts XabcdefghiNH2zyxwvuts X12MD56789NJ2zyxwvuts XabcdefghiMD2zyxwvuts X123MD6789NH2zyxwvuts XabcdefghiCA23MD67890 XMD3456789FL2zyxwvuts END_OF_FILE if test 244 -ne `wc -c <'test.data'`; then echo shar: \"'test.data'\" unpacked with wrong size! fi # end of 'test.data' fi echo shar: End of shell archive. exit 0 -- # Harvey Moran # moran%tron.UUCP@umbc3.UMBC.EDU # # Westinghouse Electric Corp. # ...!netsys!tron!moran # # Electronic Systems Group # ...!{wb3ffv,netsys}!hrmhpc!harvey # # Baltimore, Md.