Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!lll-lcc!ptsfa!well!msudoc!umich!itivax!m-net!michael From: michael@m-net.UUCP Newsgroups: sci.bio Subject: Re: question - DNA's information Message-ID: <1146@m-net.UUCP> Date: Mon, 6-Apr-87 02:43:23 EST Article-I.D.: m-net.1146 Posted: Mon Apr 6 02:43:23 1987 Date-Received: Sat, 11-Apr-87 06:21:53 EST References: <11189@teknowledge-vaxc.ARPA> <978@aecom.UUCP> <3310@udenva.UUCP> <1534@husc6.UUCP> Organization: M-NET, Ann Arbor, MI Lines: 45 Summary: Yet another answer - what did you mean by that question? In article <11189@teknowledge-vaxc.ARPA>, rburns@teknowledge-vaxc.ARPA (Randy Burns) writes: > I was wondering roughly how many 'bytes' of information are contained > within human chromosomes? and gets large number of replies (some with novel definitions of bits and bytes). Some of the replies made the assumption that only the information that was eventually transcribed into protein was significant, and that repeated sequences, introns, the third base pair of some three-letter codes, and so on, could be ignored. I must take issue with that. Introns may yet prove to have functions beyond their own excision. (Negative feedback on protein production is an obvious candidate.) Some of the regions near protein-coding sections regulate expression. The third codon for the hypervariable part of antibodies is significant (affecting the potential antibodies after DNA editing), and may have other effects in other genes (i.e. affecting the likelyhood of cancer by changing the probability that a growth-regulator will mutate into an "always-grow" form). Repeating sequences may function in DNA repair mechanisms. Because of these and other possible functions of DNA, I would not exclude any information from consideration, and would answer with the number of 8-bit bytes required to specify the reconstruction of the actual DNA found in an individual. At 3*10^9 codons, and two bits per codon, that's 3/4 * 10^9 bytes, or about 750 Megabytes. (Specifying cutting points to separate the chromosomes, and other artifacts like the arbitrary choice of direction of the chromosomes and their order in the representation, gain or lose so few extra bits they're lost in the one-significant-digit estimate of the number of codons). We're starting to see disk drives about that big. But this number really represents the maximum data that the genome could hold. Some of the redundancies could be absorbed in data compressing codings, squeezing down (but not quite eliminating) the data representing repeating regions, but not affecting third-codon redundancies. Sorry, I can't put a number on that. "I've got code in my node." | UUCP: ...!ihnp4!itivax!node!michael | AUDIO: (313) 973-8787 Michael McClary | SNAIL: 2091 Chalmers, Ann Arbor MI 48104 (If you want to be sure I see it, MAIL to the address in my SIGNATURE!)