Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!husc6!yale!cmcl2!philabs!aecom!werner From: werner@aecom.UUCP Newsgroups: sci.bio Subject: Re: question Message-ID: <978@aecom.UUCP> Date: Sat, 28-Mar-87 01:25:16 EST Article-I.D.: aecom.978 Posted: Sat Mar 28 01:25:16 1987 Date-Received: Sun, 29-Mar-87 07:26:14 EST References: <11189@teknowledge-vaxc.ARPA> Organization: Albert Einstein Coll. of Med., NY Lines: 28 In article <11189@teknowledge-vaxc.ARPA>, rburns@teknowledge-vaxc.ARPA (Randy Burns) writes: > I was wondering roughly how many 'bytes' of information are contained > within human chromosomes? The human genome contain 3 * 10^9 base pairs, which is 1000 times as much as that of Escherichia coli, and about 300 times the total of all published sequences to date (*). Much of that is repeated DNA, either satellite DNA, interspersed repeats, or moderately repeated gene families (like ribosomal RNA). Hence, if a byte is a base pair, that's your answer, although only two bits are required to specify a base, ergo a 'byte' could actually be a tetranucleotide, but most sequences are stored as letters (ATCG). Similarly, if information is the key phrase here, only about 10-20% of the genome encodes information, so that brings the total storage requirements down from 3000 Mbp to 300-600Mbp, maybe even less. (*) Latest release of Genbank contains 10,913 sequences from 13,774 publications, totalling 10,961,365 base pairs. -- Craig Werner (MD/PhD '91) !philabs!aecom!werner (1935-14E Eastchester Rd., Bronx NY 10461, 212-931-2517) "Viruses do to cells what Groucho did to Freedonia."