Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!samsung!usc!apple!bionet!GENBANK.BIO.NET!kristoff From: kristoff@GENBANK.BIO.NET (Dave Kristofferson) Newsgroups: bionet.molbio.genbank Subject: Re: Suggestion for keywords in genbank Message-ID: Date: 1 Feb 91 05:21:23 GMT Sender: kristoff@genbank.bio.net Lines: 29 Roy, The utility of the GenBank keywords has been problematic for many years. The standardization of vocabulary for such a complex subject as ours is not a trivial task, but I acknowledge from my own experience that examples of suboptimal keyword choices are not hard to find in the database. Please note that the index file provided with the database merely compiles what is on the KEYWORDS line in the flat files and does not attempt any additional classifications. For the more astute, one could always try utilities such as grep, etc., on this file. On GOS we have surmounted this problem through the use of IRX which basically indexes every word in the database and makes keyword searches trivial. The National Library of Medicine has developed (with considerable effort) a standard terminology called MeSH (Medical Subject Headings). However, at this stage it would require much, much more effort and money to try and rework all of the GenBank keyword entries than to simply adopt the IRX approach and invert the database for keyword searches. Our colleagues at LANL are now working on the RDBMS version of the database and perhaps they can elaborate on how keywords are treated in the relational format. Sincerely, Dave Kristofferson GenBank Manager kristoff@genbank.bio.net