Path: utzoo!utgpu!watmath!uunet!bionet!cgecmu51.bitnet!BAIROCH From: BAIROCH@cgecmu51.bitnet Newsgroups: bionet.molbio.swiss-prot Subject: SWISS-PROT release 10. Message-ID: <8904071509.AA26849@kth.se> Date: 7 Apr 89 15:28:00 GMT Sender: daemon@NET.BIO.NET Lines: 259 ----------------------------------------------------------------------------- SWISS-PROT bulletin board Concern : SWISS-PROT release 10. Category: Release notes. Date : March 5, 1989 ----------------------------------------------------------------------------- 1. Introduction 1.1 Evolution Release 10.0 of SWISS-PROT contains 10008 sequence entries, comprising 2'952'613 amino acids abstracted from 9920 references. This represents an increase of 15% over release 9.0. The recent growth of the data bank is summarised below: Release Date Number of entries Nb of amino acids 3.0 11/86 4160 969 641 4.0 04/87 4387 1 036 010 5.0 09/87 5205 1 327 683 6.0 01/88 6102 1 653 982 7.0 04/88 6821 1 885 771 8.0 08/88 7724 2 224 465 9.0 11/88 8702 2 498 140 10.0 03/89 10008 2 952 613 1.2 Source of data Release 10.0 has been updated using protein sequence data from release 19.0 of the PIR (Protein Identification Resource) protein data bank, as well as translation of nucleotide sequence data from release 18.0 of the EMBL nucleotide sequence Data Library. As an indication to the source of the sequence data in the SWISS-PROT data bank we list here the statistics concerning the DR (Databank Reference) pointer lines: Entries with pointer(s) to only PIR entri(es): 3009 Entries with pointer(s) to only EMBL entri(es): 3383 Entries with pointer(s) to both EMBL and PIR entri(es): 2973 Entries with no pointers lines (entered in house): 643 2. Description of the changes made to SWISS-PROT since release 9 2.1 Sequences and annotations Some 1306 new sequences have been added since the last release, the sequence data of 159 existing entries has been updated and the annotations of 1699 entries have been revised. In particular we have used reviews articles to update the annotations of the following groups or families of proteins: Aerolysin type toxins Asparaginases / Glutaminases Aspartyl proteases ATP-binding proteins 'active transport' family Bacterial and fungi ribonucleases Bowman-Birk serine protease inhibitors Cadherins Calcitonins Clathrin light chains Crystallins beta and gamma Cytosine-specific DNA methylases Colony stimulating factors Glucagon / GIP / secretin / VIP family E1-E2 ATPases Crystaline entomocidal toxin proteins Fos/jun proteins family. Galactose-1-phosphate uridyl transferase Glutathione S-transferase Herpes and Varicella Zooster viruses proteins Int-1 family Interferons alpha and beta Interferons induced proteins Kazal serine protease inhibitors Lipoxygenases LysR bacterial activator proteins family Manganese and Iron superoxide dismutases Myb family proteins Myc family proteins Nicotinic acetylcholine receptors Pancreatic hormone / Neuropeptide Y family Pancreatic ribonucleases Peptidyl-prolyl cis-trans isomerase (ppiase) (cyclophilin) Platelet factor 4 family. Shiga/Ricin ribosomal inactivating toxins Somatotropin, prolactin and related hormones Squash family of serine protease inhibitors Thymidylate synthase TNF alpha and beta Topoisomerases type II Tropomyosins 2.2 New format for the date (DT) line type The format of the DT line has been changed and is now: DT DD-MMM-YYYY (REL. XX, COMMENT) where DD is the day, MMM the month, YYYY the year, and XX the SWISS-PROT release number. The comment portion of the line indicates the action taken on That date. There are always three DT lines in each entry, each of them is associated with a specific comment: - The first DT line indicates when the entry first appeared in the data bank. The associated comment is "CREATED". - The second DT line indicates when the sequence data was last modified. The associated comment is "LAST SEQUENCE UPDATE". - The third DT line indicates when any data other then the sequence was last modified. The associated comment is "LAST ANNOTATION UPDATE". Example of a block of DT lines: DT 01-JAN-1988 (REL. 06, CREATED) DT 01-AUG-1988 (REL. 08, LAST SEQUENCE UPDATE) DT 01-MAR-1989 (REL. 10, LAST ANNOTATION UPDATE) 2.3 Extension of the taxonomic classification in the OC lines In previous releases of SWISS-PROT the OC (Organism Classification) lines only contained the first node of the taxonomic tree (PROKARYOTA, EUKARYOTA or VIRIDAE). Starting with release 10 we are implementing a full taxonomic classification. In release 10, 164 different taxonomic nodes have been defined. The list of these nodes is available in the SPECLIST.TXT document file. 2.4 New topic for the comments (CC) line type As of release 10 we have added a new 'topic' for the comments (CC) line-type: COFACTOR, which is used to describe enzyme cofactor(s). Example of its usage: CC -!- COFACTOR: REQUIRES PYRIDOXAL PHOSPHATE. 2.5 New feature key A new feature key has been introduced in this release: TRANSIT, which describes the extent of a transit peptide (mitochondrial or chloroplastic). Examples of TRANSIT key feature lines: FT TRANSIT 1 25 MITOCHONDRION. FT TRANSIT 1 42 CHLOROPLAST. 3. THE NEXT RELEASE SWISS-PROT release 11.0 will be available in June 1989. 4. WE NEED YOUR HELP ! We welcome any feedback from our users. We especially would appreciate that you notify us if you find that sequences belonging to your field of expertise are missing from the data bank. We also would like to be notified about annotations to be updated, as for example if the function of a protein has been clarified or if new post- translational information has become available. APPENDIX A: SOME STATISTICS A.1 Amino acid composition COMPOSITION IN PERCENT FOR THE COMPLETE DATA BANK Ala (A) 7.77 Gln (Q) 4.11 Leu (L) 9.08 Ser (S) 7.00 Arg (R) 5.23 Glu (E) 6.15 Lys (K) 5.81 Thr (T) 5.84 Asn (N) 4.36 Gly (G) 7.30 Met (M) 2.26 Trp (W) 1.35 Asp (D) 5.21 His (H) 2.29 Phe (F) 3.94 Tyr (Y) 3.22 Cys (C) 1.89 Ile (I) 5.30 Pro (P) 5.21 Val (V) 6.51 Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03 CLASSIFICATION OF THE AMINO ACIDS BY THEIR FREQUENCY Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Arg, Asp, Pro, Asn, Gln, Phe, Tyr, His, Met, Cys, Trp A.2 Repartition of the sequences by their organism of origin Total number of species represented in this release of the data bank: 1590 Species represented 1x: 741 2x: 291 3x: 163 4x: 99 5x: 58 6x: 45 7x: 27 8x: 28 9x: 30 10x: 13 11- 20x: 43 21-100x: 42 >100x: 10 TABLE OF THE MOST REPRESENTED SPECIES 918: HUMAN 173: DROME 71: BPT4 59: VACCV 838: ECOLI 143: RABIT 70: HSV11 57: WHEAT 497: MOUSE 126: PIG 69: TOBAC 54: BPT7 414: RAT 93: XENLA 67: VZVD 53: SOYBN 324: YEAST 84: BACSU 62: LAMBD 53: SHEEP 282: BOVIN 80: SALTY 60: MARPO 176: CHICK 74: MAIZE 59: EPBAR A.3 Repartition of the sequences by size From To Number From To Number 1- 50 582 1001-1100 70 51- 100 1291 1101-1200 46 101- 150 2043 1201-1300 35 151- 200 1066 1301-1400 24 201- 250 805 1401-1500 17 251- 300 662 1501-1600 8 301- 350 588 1601-1700 11 351- 400 549 1701-1800 9 401- 450 420 1801-1900 8 451- 500 461 1901-2000 5 501- 550 369 >2000 49 551- 600 216 601- 60 168 Currently the two largest sequences are: 651- 700 114 APB$HUMAN 4563 a.a. 701- 750 99 APOA$HUMAN 4548 a.a. 751- 800 70 801- 850 66 851- 900 85 901- 950 37 951-1000 35