Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!ncar!noao!arizona!dave
From: dave@cs.arizona.edu (Dave Schaumann)
Newsgroups: comp.sys.amiga.misc
Subject: Re: Making AB20's FILES.Z more usable (UNIX'ers please read)
Keywords: UNIX AB20 FTP
Message-ID: <1453@caslon.cs.arizona.edu>
Date: 25 Apr 91 21:00:10 GMT
References: <amuser.672554857@cutmcvax>
Organization: U of Arizona CS Dept, Tucson
Lines: 59

In article <amuser.672554857@cutmcvax> amuser@cutmcvax.cs.curtin.edu.au (Bill Sharp-Smith AUG) writes:
>To save time trying to find files in AB20, could some UNIX programmer
>come up with a shell script that takes the FILES file and processes it.
>I would like a script that strips out all files uploaded before 1/1/91.
>
>However, more complex ones could produce lists of all files containing
>the letters xyz in their filename, sort them in directory order, remove
>all non-Amiga files etc. etc.... Who can come up with the best one ?


Why not learn to use awk and/or sed?  Then you could write your own scripts
to massage FILES.Z any old way your heart desires.  The first time I looked
at FILES.Z, I thought "Boy, this sure is a big file, lots of stuff I don't
care about, and I'm not too keen on the order the file names are listed".
So I whipped up a quick script to strip off unwanted data, sort by file name,
and compress some names.

Here is my script (in the file names "fix"):
awk -f f.awk $1 | sed -f f.sed | sort +2

Which means run the awk code in "f.awk" on the file named in the first command
line parameter, pipe it to sed, which runs the sed commands in "f.sed" on
the input, which then pipes it to sort, which sorts it according to the 2nd
column (starting with 0), which at this point, is file names.

f.awk contains:
	{ printf "%s  %7d  %s\n", $2, $4, $5; }

This simply strips out columns 1, 3, and 6-???, which (if I remember right)
contain file permissions and file owner.  Who cares about that stuff?

and f.sed contains
s/comp.sources.amiga/c.src.a/
s/comp.sources.misc/c.src.m/
s/comp.sources.unix/c.src.u/
s/comp.binaries.amiga/c.bin.a/

This just shortens a few oft-repeated strings.

So all I have to do is uncompress FILES.Z, and then type "fix FILES >xff"
(xff stands for Xanth Fixed Files).  I then generally use lharc to archive
it, and uncompress it whenever I want to search for something.  This works
well, since I've gotten as good as 5:1 compression ratio on this file:

 PACKED    SIZE  RATIO  CRC       STAMP       NAME
------- ------- ------ ---- ----------------- -------------
  58166  285131  20.3% 7831 Apr 21 18:38 1991 xff
------- ------- ------ ---- ----------------- -------------
  58166  285131  20.3%      Apr 21 18:41 1991    1 file  


Disclaimer:  This code works for me.  It's possible that the sed stuff could
	be combine into the awk commands, but I haven't had the time to learn
	much more awk than what you see above.


-- 
Dave Schaumann      | We're so sorry, Uncle Albert.  But the kettle's
dave@cs.arizona.edu | on the boil, and we're so *easly* called away...