Path: utzoo!utgpu!watserv1!watmath!att!pacbell.com!ucsd!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!sci.ccny.cuny.edu!phri!news From: roy@alanine.phri.nyu.edu (Roy Smith) Newsgroups: bionet.molbio.genbank Subject: Suggestion for keywords in genbank Message-ID: <1991Jan31.222051.16861@phri.nyu.edu> Date: 31 Jan 91 22:20:51 GMT Sender: news@phri.nyu.edu (News System) Organization: Public Health Research Institute, New York City Lines: 32 Twice in the last couple of days, the same thing has happened to me. J. Random biologist walks into my office and wants to find an entry in genbank. I try hard to extract some useful keywords. In this case, I got from said JRB the name of a gene, FemA. Unfortunately, FEMA isn't a keyword that the entry is indexed under, but "FEMA PROTEIN" is, which we only discover by some trial and error. Obviously, a query of FEMA should match the "FEMA PROTEIN" keyword supplied by the submitter, but what's the best way to make that work? One strategy is to have the searching program (be it IRX or anything else) be smart enough to do partial matches. Another would be to have the database maintainers/indexers be smart enough to realize that while PROTEIN by itself would not make a very good keyword, FEMA by itself would and turn the submitter's "FEMA PROTEIN" into "FEMA PROTEIN, FEMA". As a programmer who writes data base searching software, I'd prefer the later solution, since it makes my life easier at the expense of somebody else's effort. I imagine the database maintainers feel just the other way. I'd be interested to hear comments from other people about what is the best way to generate good keyword/keyphrase indicies for genbank. I suppose I could always just take the keyword index IG provides and re-work it to split keyphrases into their component words, but after my flame last week about personalized reformatting of files on the distribution tapes, I'd probably just end up getting into a shouting match with myself :-) -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy "Arcane? Did you say arcane? It wouldn't be Unix if it wasn't arcane!"