Path: utzoo!attcan!uunet!lll-winken!ames!mailrus!tut.cis.ohio-state.edu!rutgers!njin!princeton!notecnirp!nr
From: nr@notecnirp.Princeton.EDU (Norman Ramsey)
Newsgroups: comp.binaries.ibm.pc.d
Subject: Is uncompression faster than disk I/O?
Keywords: compress zoo arc I/O speed
Message-ID: <14227@princeton.Princeton.EDU>
Date: 16 Jan 89 21:03:44 GMT
Sender: news@princeton.Princeton.EDU
Reply-To: nr@princeton.Princeton.EDU (Norman Ramsey)
Organization: Dept. of Computer Science, Princeton University
Lines: 35


Someone suggested to me that it might pay off to store my data files
in compressed format, then uncompress them when I get ready to use
them.  The claim was that uncompression is faster than the associated
disk I/O.  so here's the $64 question: has anybody substantiated this
claim for IBM PC, XT, AT, or PS/2  (remember computation and I/O
speeds differ on these machines, so your mileage may vary)

In particular, how hard would it be to adapt the zoo source to do the
following: 
	f = zopen (file, pathname, "r")
		open pathname in the zoo archive file for read only
	zread(f,...)
		read from the zoo archive
	zgets(f,...)
		like fgets only compressed
	zgetc(f,...)
		like fgetc only compressed
	zeof(f,...)
		you get the idea...

If I had this facility, I might actually be able to speed up my
programs and have them use less disk space, at the same time (provided
the claim is true).  So, has anybody built a library like this?

My immediate desire is for a fast Boyer-Moore grep on files in a zoo
archive.   Does anybody have that?

Does anybody who has messed with the zoo source (or Rahul of course)
have any idea how hard it would be to twiddle zoo to do something like
this?  Obviously the space overhead would be low, since sez is just a
few Kbytes extra...

Norman Ramsey
nr@princeton.edu