Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!udel!haven!ncifcrf!fcs260c2!toms From: toms@fcs260c2.ncifcrf.gov (Tom Schneider) Newsgroups: sci.bio Subject: Re: Alu locations Keywords: alu sequences Message-ID: <1968@fcs280s.ncifcrf.gov> Date: 7 Dec 90 00:11:33 GMT References: <1965@fcs280s.ncifcrf.gov> <70131@bu.edu.bu.edu> Sender: news@ncifcrf.gov Organization: NCI Supercomputer Facility, Frederick, MD Lines: 49 In article <70131@bu.edu.bu.edu> colby@bu-bio.UUCP (Chris Colby) writes: >In article <1965@fcs280s.ncifcrf.gov> toms@fcs260c2.ncifcrf.gov (Tom Schneider) writes: >>Hi everyone! >> >>A friend of mine, Doug Halverson, has cloned several pieces of human DNA and >>has found that they don't all have Alu sequences in them. His question is, in >>human genes that have been mapped or sequenced so far, have stretches of DNA >>as long as 10 to 20 kb been found to be Alu free? > Is this a trick question? Why don't you just do a restriction >digestion with the enzyme Alu1 (for which the Alu site was named I'm >led to believe). Look and see if you get any DNA larger than 20 kB. No, it isn't a trick question. I presume that you mean AluI, not Alu1. AluI cuts at AGCT, and so appears very frequently in the genome. I find 3542 sites in 683804 bases of human sequence, which is one every 193.06 bases. In equi-probable random sequence one would expect to find it every 256 bases, so perhaps the low frequency reflects some bias in the sequences I looked at. If we performed the experiment you suggest, we would get a lovely smear, mostly at the bottom of the gel in small fragments. The restriction enzyme AluI comes from the bacterium Arthrobacter luteus, hence the name. According to Darnell, the Alu's were named because SOME of them have AluI sites; conversely other sequences also have AluI sites, so the name is not so good. > Of course since an Alu sequence is 300 bp long (I just opened >my copy of Watson and looked it up) Alu1 will cut at some sites that >are not Alu sequences. This is because Alu1 has a 4bp recognition >sequence. Watson sez: Alu sequences are present in more than a million >copies and represent 3 - 6 percent of the genome. Any 5000 bp segment >will probably contain one because they are widely distributed. If you look back at the original posting, you will see that I made a similar calculation. The point of the posting was that this is a GUESS, not knowledge of the actual distribution in the genome. If Alu sites happen to come in bunches, then it will not be true that they appear as you predict. The question is: what is the actual frequency in known sequences. Silly me, I see here in Darnell (p. 432) that "If human DNA is used to make a set of genomic clones with an average length of 20 kb, more than 90 percent of all randomly chosen clones contain an intermediate repeat sequence." Looks like that ties it up! >Chris Colby >email: colby@bu-bio.bu.edu Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov