Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: MJB1@VMS-SUPP.CAM.AC.UK Newsgroups: bionet.molbio.seqnet Subject: SEQNET Bulletin Message-ID: <5016@ig.ig.com> Date: 5 Feb 88 18:28:22 GMT Sender: daemon@presto.ig.com Lines: 181 From: MJB1@VMS-SUPP.CAM.AC.UK Bulletin_# 72 ATTIMONELLI%VAXBA0.INFNET 5 Feb 88 BBOARDS on Nucleic Acid colle From: 5-FEB-1988 02:29 To: SEQNET Subject: BBOARDS on Nucleic Acid collections Date: 5 Feb 88 Via: UK.AC.RL.EARN; Fri, 05 Feb 88 02:27:26 GMT Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 2730; Fri, 05 Feb 88 02:27:26 GM Received: from IBOINFN.BITNET by UKACRL.BITNET (Mailer X1.25) with BSMTP id 2728; Fri, 05 Feb 88 02:27:25 Date: Thu, 4 Feb 88 19:30 N From: Reply-To: Subject: BBOARDS on Nucleic Acid collections To: MJB1@UK.AC.CAM.VMS-SUPP X-Original-To: "MJB1@VMS-SUPP.CAM.AC.UK" Message-id: <4495> Date: THU, 4-FEB-88 19:29 N From: Reply-To: (alternate reply) Subject: BBOARDS on Nucleic Acid collections To: X-Original-To: MJB1@VMS-SUPP.CAM.AC.UK, ATTIMONELLI To the managers of GenBank and EMBL collections and to all the Databank users. This is a note on the release 54 of GenBank that we have recently examined. In this release the GenBank has changed the FEATURE table format and has announced that they are moving toward a common format together with the EMBL and the DNA Japan databank. We are glad to note that there is at least an intent of reaching a standardization on the format of the Nucleic Acid Bank, but we want to point out a few important issues which we hope will be taken into account by both the scientific community and the Bank managers. As reported in our paper [ACNUC - a portable retreival system for...(CABIOS, vol.1(3), 1985, pp.167-172)], we have adopted the GenBank collection for the generation of the database ACNUC. We preferred the GenBank format to the EMBL one mainly for the organization of the FEATURE and SITES tables. In fact we considered very useful the use of SITE keys indicating start and stop of a region (e.g. -> and <- ) and boundary between two regions (e.g. /). Moreover the distinction apported by GenBank between FEATURES table and SITES table allowed us to easily select the regions that in ACNUC are extracted as SUBSEQUENCES. In other words this organization gave us the possibility to extract directly through ACNUC, specific fragments of a GenBank locus. This facility is one of the most useful features of ACNUC which makes this software more flexible and powerful. The great advantages of the old organization of GenBank has been stressed also by several researchers (see for example [Nussinov,R. et al. Biochimica et Biophysica Acta 866 (1986),109-119]). It is therefore a pity to note that just these useful keys have been abolished. Moreover in our opinion at the moment the temporary structure of GenBank is floppy and not very useful. Of course we do not know the future developments and goals of GenBank but we would like to stress that with this new format the scientific community has lost a very important tool. We wish also to point out several incongruencies encountered between the news reported in the release notes and the content of the entries files. In particular in the Primate entry file we have noted : a) several EMBL sequences have been converted into the GenBank format in a pedestrian way (EMBL feature tables have been simply confined to Comments); b) the feature keys as pept.psi, matp.psi, mRNA.psi, sigp.psi, mRNA+IVS are not reported in the Feature keys names (section 3.5.7.1 of the release notes); c) the announced substitution of the key "variation" into the key "variant" has not been applied and this has produced an uncorrected tabulation of the 'from' and 'to/span' fields; The examples below reported can clarify the situation: 1) Partial feature table of GenBank entry HUMHBB pept.psi 45741 45831 pseudo-hbp, exon 1 [62] 45953 46175 pseudo-hbp, exon 2 [62] 47030 47157 pseudo-hbp, exon 3 [62] mRNA.psi 45688 47425 pseudo-hbp mRNA [62] mRNA+IVS 19289 21098 hbe mRNA (alt.) [19],[40],[52] mRNA+IVS 19504 21098 hbe mRNA (alt.) [19],[40],[52] mRNA+IVS 19506 21098 hbe mRNA (alt.) [19],[40],[52] rpt 66817 66827 Alu flank repeat 5' copy [49],[63] rpt 66828 67094 Alu family repeat [49],[63] variation 17864 17866 cag in clone lambda-epsilon; g in ph 1.8 [24] revision 18641 18646 aatata in [34]; gatgtg in [19] refnumbr 19120 19120 numbered 1 in [19] refnumbr 19560 19560 numbered 1 in [67]; zero used variation 32761 32762 ag in [26]; ga in [25] variation 33204 33204 a in [26]; g in [25] variation 46596 46597 aa in [62]; a in [63] variation 46851 46853 aca in [62]; a in [63] variation 47186 47208 ggtccactatgtttgtacctatg in [62]; g in [63] variation 47341 47341 t in [62]; tt in [63] refnumbr 50768 50768 numbered 1 in [45],[54] 2) EMBL PTAGGLOG entry converted into GenBank CHPAGGLOG LOCUS CHPAGGLOG 1815 bp ds-DNA pre-entry 12/31/87 DEFINITION Chimpanzee fetal A-gamma-globin gene. ACCESSION X03110 KEYWORDS A-gamma-globin; direct repeat; gamma-globin; tandem repeat. SOURCE chimpanzee (Pan troglodytes). ORGANISM Pan troglodytes Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia; Eutheria; Primates; Anthropoidea; Hominoidea; Ponginae; Ponginae. REFERENCE 1 (bases 1 to 1815; enum. 1 to 1815) AUTHORS Slightom,J.L., Chang,L.-Y.E., Koop,B.F. and Goodman,M. TITLE Chimpanzee fetal G-gamma and A-gamma globin gene nucleotide sequences provide further evidence of gene conversions in hominine evolution JOURNAL Mol Biol Evol 2, 370-389 (1985) COMMENT Data kindly reviewed (07-JUL-1986) by Slightom J.L. EMBL features not translated to GenBank features: key from to description PRM 24 28 put. TATA-box TRANSCR 55 1647 put. primary transcript CAP 55 55 put. cap site MSG 55 199 put. exon 1 IVS 200 321 intron I IVS 545 1431 intron iI RPT 1123 1162 TG(14) repeat (hot spot sequence MSG 1431 1647 put. exon 3 SITE 1621 1626 put. polyadenylation signal POLYA 1647 1647 put. polyadenylation site FEATURES from to/span description pept 108 199 A-gamma-globin (aa 1-31) (199 is 2nd base in codon) 322 544 A-gamma-globin (aa 32-105) (322 is 3rd base in codon) 1432 1560 A-gamma-globin (aa 106-147) BASE COUNT 471 a 357 c 474 g 513 t ORIGIN We agree that this is an intermediate format, but we believe that it would have been more correct to distribute the collection in the old format before completing the conversion. We cannot utilize the release 54 for updating our database ACNUC. Since fortunately we have included into our package MERGE (in press on NAR special issue - Jan 1988) the program TRANSFORM which convert EMBL format into the "old" GenBank format, we prefer to use at the moment only the EMBL collection. We hope that GenBank can accomplish quickly a revision of the data, checking the collection in all its structural parts. We would like to stress another important point. Many italian research units have adopted our database and softwares (ACNUC and GLORIA) which are distributed through italian network. This demonstrates the responsability of the Bank management and the importance for the users (researchers and software developers) to rely on a structure which could be easily manipulated with automatic procedures. In this contest we can welcome changes but only if they provide an improvement. Marcella Attimonelli BioComputing Unit Manager Bari (Italy) >>>>>>>>>>>>>>>>>>>>>>>>>>> attimonelli@vaxba0.infnet