Path: utzoo!attcan!uunet!bywater!scifi!acheron!phri!roy From: roy@phri.nyu.edu (Roy Smith) Newsgroups: bionet.molbio.genbank,bionet.molbio.pir Subject: Re: GenBank gets big and PIR format has problems! Message-ID: <1990Apr10.032155.22233@phri.nyu.edu> Date: 10 Apr 90 03:21:55 GMT References: <6588@wehi.dn.mu.oz> Sender: news@phri.nyu.edu (News System) Organization: Public Health Research Institute, New York City Lines: 28 TONY@wehi.dn.mu.oz (Tony Kyne, Walter and Eliza Hall Institute) writes: > is a new generally agreed format about to emerge that will facilitate > more dynamic updating now that we have weekly ftp updates and USENET daily > updates. The current PIR format more or less requires a complete database > reload (or part thereof) every day or week as the case maybe. The need to rebuild the database each time it is updated is a problem which has not escaped our attention. I can guess how Ross Smith and Dave Kristofferson (my partners in crime on the daily updates experiment), would answer your question, but I'll let them talk for themselves. As for my part, what we have done is to keep essentially a complete separate database just for the daily updates. That makes the size of the index file rebuilds managable. We currently have a mishmosh of all the updates in one file, but we envision probably doing something like a 3-tier system. For each division of the data base (viral, bacterial, etc), we see having 3 files. The first two are the full and new-sequences files as they come off the GB tape. The third are those daily updates that belong to that section. Only that last (presumably fairly small) file need be rebuilt often. If I understand it right, file 2 is a subset of file 1. So, if somebody wants to search the entire database, they only need search 1 and 3. If they want to search just the new stuff since the last major release (i.e. run in keeping-up-with-the-jonses mode), they need to search 2 and 3. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Don't Worry, Be Happy"