Path: utzoo!attcan!uunet!bionet!snorkelwacker!usc!sdd.hp.com!samsung!zaphod.mps.ohio-state.edu!rpi!sci.ccny.cuny.edu!phri!news
From: roy@alanine.phri.nyu.edu (Roy Smith)
Newsgroups: bionet.molbio.genbank
Subject: Why do the various index files have different formats?
Message-ID: <1990Sep27.170344.2052@phri.nyu.edu>
Date: 27 Sep 90 17:03:44 GMT
Sender: news@phri.nyu.edu (News System)
Organization: Public Health Research Institute, New York City
Lines: 25


	I'm working on some software to do keyword searches on genbank using
the distributed index (.idx) files.  Everything was going fine, until
somebody asked me to use my program to figure out what locus contained a
certain accession number; it was at that point that I realized that the
gbacc.idx file is a different format from the other .idx files.  Reading the
docs, I see the gene index is the same as the acc index.

	Why?  I can see no advantage that the gbacc.idx format has over the
other format, and it has the big disadvantage that it is different (i.e.
programs that search the index files have to know which file they are
searching and adjust their parsing accordingly).  It seems to me that this
is just wanton lossage.  Am I missing something?

	It's certainly far too late to do anything about it now without
breaking a lot of existing software, but it sure is irrating.  Hopefully,
any additional index files that are invented in the future will stick to the
"standard" format (i.e. the one that gbkey.idx uses).  At least that way,
software developers will only have to special case a finite (and fixed)
number of indicies (currently 2).
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"