Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: BIORELAY@BIO.CAM.AC.UK
Newsgroups: bionet.molbio.seqnet
Subject: SEQNET Bulletin RELAY ONLY: reply to SEQNET@UK.AC.CAM.BIO
Message-ID: <6380@ig.ig.com>
Date: 24 May 88 17:49:28 GMT
Sender: daemon@presto.ig.com
Lines: 314

From: BIORELAY@BIO.CAM.AC.UK

From:    SEQNET@UK.AC.CAM.PHX 24-MAY-1988 11:16
To:    SEQNET
Subj:


Date: Tue, 24 May 88 11:15:05 BST
From: SEQNET@UK.AC.CAM.PHX
To:   seqnet@UK.AC.CAM.BIO
Message-ID: <9E900D0A61660C30@UK.AC.CAM.PHX>

(Message number 2)
Accepted:  11:13:07 24 May 88
Submitted: 16:16:20 21 May 88
IPMessageId: 9E8C8AC803CD4E90
From: MA11
To: seqnet

                   Drosophila Codon Table

                      Version 4.0

                   Michael Ashburner,
                 Department of Genetics,
                 University of Cambridge,
                    Cambridge, England.

              Telephone 44-(0)223-333969
              Electronic mail:ma11@uk.ac.cam.phx

                       May 18 1988

These Tables are supplied with the understanding that they can be freely used
for research, although if quoted in any publication a suitable acknowledgement
(e.g. Michael Ashburner, personal communication) would be appreciated.

I will automatically post new versions on the SEQNET and BIONET Bulletin
Boards. These will generally be compiled whenever enough new data warrents
the work. I am very happy to include new sequences that have not yet made
the Sequence Data Banks, if these can be sent to me by electronic mail
with sufficient data for the coding sequences to be extracted. If anyone
should need the files of coding sequences that have been used to generate
these tables please send me a message.


Two series of Tables are included, one for "host" genes and one for orfs carried
by transposable elements. For each series you have a codon table, a base
composition and the names of the sequences used to compile these.

By and large these sequences are taken from the EMBL, GENBANK or DAYHOFF
Libraries. However some have been privately communicated to me. All sequences
have been checked that they translate but many are incomplete. Hence, for
example, the number of sequences is greater than the number of TER codons.

The latest versions of the databanks used are EMBL V15.0 and GENBANK V55.0.
//
Table 1A: Codons of "host" genes:
     TTT       477     TCT       341     TAT       521     TGT       323
     TTC      1270     TCC      1071     TAC      1187     TGC       899
     TTA       147     TCA       291     TAA        61     TGA        18
     TTG       687     TCG       831     TAG        28     TGG       560

     CTT       362     CCT       371     CAT       521     CGT       552
     CTC       605     CCC      1201     CAC       910     CGC       927
     CTA       298     CCA       631     CAA       598     CGA       344
     CTG      1986     CCG       814     CAG      1877     CGG       316

     ATT       787     ACT       437     AAT       912     AGT       404
     ATC      1417     ACC      1418     AAC      1442     AGC       909
     ATA       282     ACA       399     AAA       593     AGA       209
     ATG      1304     ACG       672     AAG      2304     AGG       259

     GTT       577     GCT       864     GAT      1414     GGT      1034
     GTC       897     GCC      2164     GAC      1333     GGC      1805
     GTA       226     GCA       522     GAA       770     GGA      1233
     GTG      1527     GCG       660     GAG      2491     GGG       205
Total = 52495
//
Table 1A in Staden format:
     ===========================================
     F TTT 477. S TCT 341. Y TAT 521. C TGT 323.
     F TTC1270. S TCC1071. Y TAC1187. C TGC 899.
     L TTA 147. S TCA 291. * TAA  61. * TGA  18.
     L TTG 687. S TCG 831. * TAG  28. W TGG 560.
     ===========================================
     L CTT 362. P CCT 371. H CAT 521. R CGT 552.
     L CTC 605. P CCC1201. H CAC 910. R CGC 927.
     L CTA 298. P CCA 631. Q CAA 598. R CGA 344.
     L CTG1986. P CCG 814. Q CAG1877. R CGG 316.
     ===========================================
     I ATT 787. T ACT 437. N AAT 912. S AGT 404.
     I ATC1417. T ACC1418. N AAC1442. S AGC 909.
     I ATA 282. T ACA 399. K AAA 593. R AGA 209.
     M ATG1304. T ACG 672. K AAG2304. R AGG 259.
     ===========================================
     V GTT 577. A GCT 864. D GAT1414. G GGT1034.
     V GTC 897. A GCC2164. D GAC1333. G GGC1805.
     V GTA 226. A GCA 522. E GAA 770. G GGA1233.
     V GTG1527. A GCG 660. E GAG2491. G GGG 205.
     ===========================================
 TOTAL CODONS=    52495.
//
Table 1B: Base composition of "host" genes:
T = 31458    C = 44456    Y = 0    Pyrimidine = 75914
A = 37333    G = 44242    R = 0    Purine = 81575
N = 8        Nucleotides = 157497
//
Table 1C: "Host" gene sequences used for Tables 1A and 1B

The numbers after the names indicate the number of codons (excluding ter but
including N-terminal met); if this number if bracketed then the coding
sequence is incomplete.

                      [EMBL/GENBANK Acession numbers]
M14643;                      alpha-tubulin-1, 450
M14644;                      alpha-tubulin-2, 449
M14645;                      alpha-tubulin-3, 450
M14646;                      alpha-tubulin-4, 462
M16922;                      beta-tubulin-2, 446
M16922;                      beta-tubulin-3, 448
X05893;                      acetyl cholinesterase, 649
X06384;Y00212;               actin 5C, [137]
K00670;K00671;               actin 42A, [308]
J01064;                      actin 79B, 376
K00674;K00675;               actin 87E, [93]
J01065;                      actin 88F, 376
Z00030;                      alcohol dehydrogenase, 256
Z00030;                      3' orf to Adh, [145]
X04695;                      a-methyl-dopa resistant (amd), 510
X04569;                      amylase-1, 494
X03788-X03791;               Antp, 378
M14549;                      bicoid, [71]
X04896;                      bsg25D, 741
M14131;                      C1A9 nuclear protein, 161
K01042;                      c-ash, [275]
X05939;                      c-myb (13E), 697
K01960;                      c-ras1 (85D), 189
M10759;M10803;M10804;        c-ras2 (64B), 195
X02200;                      c-ras3 (62B), 182
M11917;                      c-src (64B), 552
M16599;                      c-src4 (28C), 590
Y00133;                      calmodulin, [128]
M16534;J03452;               casein-hydrolase-alpha-chain, 336
M16534;J03452                casein-hydrolase-beta-chain, 215
X03062;                      caudal, [197]
M13219;                      choline acetyl transferase, [728]
X02947;                      chorion gene s15-1, 115
X02497;                      chorion gene s18-1, 272
X02947;                      chorion gene s19-1, 373
X05245;                      chorion gene s36, 286
X05245;                      chorion gene s38, 306
V00200;                      collagen-like gene fragments, [589]
J02727;                      collagen-IV, [711]
X05144;                      crumbs (EGF-like at 95F), [293]
X01761;                      cytochrome c gene DC3, 105
X01760;                      cytochrome c gene DC4, 108
X05136;                      Deformed, 590
X05140;                      Delta, [200]
X04426;                      dopa decarboxylase, 511
M14978-14982;                dunce, 362
X04521;                      eip28/29, 255
X04024;                      eip40, 393
M11744;                      elongation factor (48D), 463
M10017;                      engrailed, 552
M15961;                      esterase-6, 548
X05138;                      even-skipped, 376
X00854;K01951;               fushi tarazu, 413
M11254;                      Gapdh-1, 332
M11255;                      Gapdh-2, 332
J02527;K02461;               glycinimide ribotide transformylase (GART), 1353
M13786;                      Gpdh [exon 3], [40]
J01085;                      heat shock cognate 70C [exon 1], [68]
K01296;K01297;               heat shock cognate 87D [exons 1 & 2], [70]
J02569;                      heat shock cognate 88E, [104]
X04073;                      Histone H1, 256
Dayhoff;                     Histone H2A, [122]
Dayhoff;                     Histone H2B, [118]
Dayhoff;                     Histone H3, [122]
Dayhoff;                     Histone H4, [72]
V00209;                      hsp22, 174
V00210;                      hsp23, 186
V00211;                      hsp26, 208
V00212;                      hsp27, 213
V00213;V00214;               hsp70 [87A], [347]
J01104;J01105;               hsp70 [87C], 641
X03810;                      hsp82, 717
Y00274;                      hunchback, 757
M13568;                      Insulin-like receptor protein-1, [1095]
M14778;                      Insulin-like receptor protein-2, [300]
X05273;                      invected, 576
X03414;                      Kruppel, 466
X04227;                      l(2)37Cc, 326
X05426;                      lethal(2)giant larva, 1160
V00202;                      larval cuticle protein-1 [44D], 130
V00203;                      larval cuticle protein-2 [44D], 126
V00203;                      larval cuticile protein-3 [44D], 112
V00204;                      larval visceral protein-D [44D], 508
V00204;                      larval visceral protein-H [44D], 522
V00204;                      larval visceral protein-L [44D], 505
X03872;                      LSP1-alpha, [70]
X03873;                      LSP1-beta, [100]
X03874;                      LSP1-gamma, [105]
X03758;                      metallothionein  (Mtn), 40
Y00831;                      mst(3)gl-9 sperm protein, 56
J02788;                      myosin-heavy chain, 269
M10125;                      myosin-light chain, 155
X04016;                      nicotinic acetylcholine receptor (AChR), 521
M11664;                      Notch, 2703
Y00043;                      ospsin R7 specific, 383
K02315;                      opsin, ninaE, 373
M12896;                      opsin at 91D, 373
M15762;                      pen#9b, 365
M11969;                      period, 1127
Y00402;                      Phosphoenolpyruvate carboxykinase, 647
M14548;                      paired, 613
X05076;Y00042;               protein kinase C, 639
J02527;K02461;               pupal cuticle protein (Gart), 184
X05016;                      ribosomal protein rpA1, 113
X00848;                      ribosomal protein rp49, 133
X05709;                      RNA polymerase II-140, 1123
M11798;                      RNA polymerase II-215, [470]
Y00308;                      rosy, 1335
X04813;                      rudimentary, 2356
X01918;                      Sgs3, 307
J01135;J01136;               Sgs4, [141]
X04269;                      Sgs5, 163
X01918;                      Sgs7, 74
X01918;                      Sgs8, 75
Y00288;                      snail, 390
X04513;                      snake, 430
X03121;                      serendipity-alpha, 530
X03121;                      serendipity-beta, 351
X03121;                      serendipity-delta, 430
Y00367;                      superoxide dismutase, 213
K03277;                      tropomyosin I, T-isoform, [198]
M15466;                      tropomyosin II, 285
X02989;                      trypsin-like enzyme, alpha-chain, 256
X05723;Y00206;               Ubx, 389
X01802;                      vitelline membrane protein, [96]
X02974;                      white, 541
Chia;                        yellow, 696
V00248;                      yolk protein-1, 459
J01157;                      yolk protein-2, 459
M15898;                      yolk protein-3, 420
Y00049;                      zeste, 575
//
Table 2A: Codon table TE genes:
     TTT       366     TCT       129     TAT       264     TGT       108
     TTC       200     TCC       120     TAC       230     TGC       107
     TTA       351     TCA       197     TAA         1     TGA         1
     TTG       195     TCG        74     TAG         0     TGG       108

     CTT       216     CCT       112     CAT       187     CGT        64
     CTC       104     CCC       104     CAC       165     CGC        38
     CTA       199     CCA       271     CAA       396     CGA        99
     CTG       105     CCG        52     CAG       160     CGG        22

     ATT       463     ACT       205     AAT       620     AGT       180
     ATC       175     ACC       171     AAC       403     AGC       145
     ATA       447     ACA       374     AAA       888     AGA       260
     ATG       199     ACG        64     AAG       282     AGG        83

     GTT       181     GCT       160     GAT       330     GGT       130
     GTC       106     GCC       129     GAC       305     GGC       107
     GTA       188     GCA       222     GAA       566     GGA       148
     GTG       113     GCG        63     GAG       227     GGG        39
Total = 12718
//
Table 2A in Staden format:
     ===========================================
     F TTT 366. S TCT 129. Y TAT 264. C TGT 108.
     F TTC 200. S TCC 120. Y TAC 230. C TGC 107.
     L TTA 351. S TCA 197. * TAA   1. * TGA   1.
     L TTG 195. S TCG  74. * TAG   0. W TGG 108.
     ===========================================
     L CTT 216. P CCT 112. H CAT 187. R CGT  64.
     L CTC 104. P CCC 104. H CAC 165. R CGC  38.
     L CTA 199. P CCA 271. Q CAA 396. R CGA  99.
     L CTG 105. P CCG  52. Q CAG 160. R CGG  22.
     ===========================================
     I ATT 463. T ACT 205. N AAT 620. S AGT 180.
     I ATC 175. T ACC 171. N AAC 403. S AGC 145.
     I ATA 447. T ACA 374. K AAA 888. R AGA 260.
     M ATG 199. T ACG  64. K AAG 282. R AGG  83.
     ===========================================
     V GTT 181. A GCT 160. D GAT 330. G GGT 130.
     V GTC 106. A GCC 129. D GAC 305. G GGC 107.
     V GTA 188. A GCA 222. E GAA 566. G GGA 148.
     V GTG 113. A GCG  63. E GAG 227. G GGG  39.
     ===========================================
 TOTAL CODONS=    12718.
//
Table 2B: Base composition TE genes:
T = 9774     C = 7350     Y = 0     Pyrimidine = 17124
A = 14591    G = 6439    R = 0    Purine = 21030
N = 0        Nucleotides = 38154
//
Table 2C: TE genes used for Tables 2A and 2B:

                  [EMBL/GENBANK Accession numbers]
X01472;                      17.6 element
X03431;                      297 element
X04132;X03733;               412 element
X02599;                      copia element [Saigo]
V00246;                      FB4
X03734;                      gypsy element
X01748;                      HB1
X04705;                      hobo
Finnegan                     I element
O'Hare;                      P element
X01747;                      transposon HB2
X02600;                      virus like particle RNA (VLP H-RNA)
//