Path: utzoo!attcan!uunet!bionet!snorkelwacker!usc!sdd.hp.com!samsung!zaphod.mps.ohio-state.edu!rpi!sci.ccny.cuny.edu!phri!news From: roy@alanine.phri.nyu.edu (Roy Smith) Newsgroups: bionet.molbio.genbank Subject: Why do the various index files have different formats? Message-ID: <1990Sep27.170344.2052@phri.nyu.edu> Date: 27 Sep 90 17:03:44 GMT Sender: news@phri.nyu.edu (News System) Organization: Public Health Research Institute, New York City Lines: 25 I'm working on some software to do keyword searches on genbank using the distributed index (.idx) files. Everything was going fine, until somebody asked me to use my program to figure out what locus contained a certain accession number; it was at that point that I realized that the gbacc.idx file is a different format from the other .idx files. Reading the docs, I see the gene index is the same as the acc index. Why? I can see no advantage that the gbacc.idx format has over the other format, and it has the big disadvantage that it is different (i.e. programs that search the index files have to know which file they are searching and adjust their parsing accordingly). It seems to me that this is just wanton lossage. Am I missing something? It's certainly far too late to do anything about it now without breaking a lot of existing software, but it sure is irrating. Hopefully, any additional index files that are invented in the future will stick to the "standard" format (i.e. the one that gbkey.idx uses). At least that way, software developers will only have to special case a finite (and fixed) number of indicies (currently 2). -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"