Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: BIORELAY@BIO.CAM.AC.UK Newsgroups: bionet.molbio.seqnet Subject: SEQNET Bulletin RELAY ONLY: reply to SEQNET@UK.AC.CAM.BIO Message-ID: <5754@ig.ig.com> Date: 5 Apr 88 11:58:05 GMT Sender: daemon@presto.ig.com Lines: 128 From: BIORELAY%BIO.CAM.AC.UK@CUNYVM.CUNY.EDU From: SEQNET@UK.AC.CAM.PHX 5-APR-1988 11:32 To: SEQNET Subj: Date: Tue, 05 Apr 88 11:31:59 BST From: SEQNET@UK.AC.CAM.PHX To: seqnet@UK.AC.CAM.BIO Message-ID: <9E527571CB3E3A50@UK.AC.CAM.PHX> (Message number 2) Accepted: 11:30:37 05 Apr 88 Submitted: 18:24:00 31 Mar 88 IPMessageId: -unspecified- From: PFEIFFER@EARN.DM0MPB51 To: SEQNET@UK.AC.CAMBRIDGE.PHOENIX Subject: restriction enzyme database Via: UK.AC.RL.EARN; Thu, 31 Mar 88 17:27:54 BST Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 8722; Thu, 31 Mar 88 17:27:54 BS Received: from DM0MPB51.BITNET (PFEIFFER) by UKACRL.BITNET (Mailer X1.25) with BSMTP id 8721; Thu, 31 Mar 88 17:27:53 X-Original-To: g--seqnet, g--george, g--roberts, mewes, PFEIFFER 30-MAR-1988 Dear Rich Roberts! I am sorry that my restriction enzyme lists were announced in Bionet as better than yours. This has never been my intention. Clearly, I myself do not intensely collect information on restriction enzymes, but try to combine yours and Kesslers data with some generally available information (publications and enzyme catalogues) and make them available to users of the PIR and UWGCG package. Thus, my lists are secondary information and cannot be proposed as a substitute for the primary data collection. A separate notice of this will be sent to Seqnet and thus will be distributed through Bionet. I hope that this makes things clear between us and for the scientific community. As David George pointed out, many secondary distributors step in to fill a perceived need. This is exactly what happened to me. Here in Martinsried, we use the PIR and UWGCG software and detected that the enzyme lists were inadequate. Thus, I prepared a local update and made it available to the scientific community so that other scientists do not need to do the same. I do not see a reason why you cannot prepare the restriction enzyme list in the format required by the PIR and UWGCG software yourself. Your database output is very flexible as you say. Please look at their format, prepare the output and send it to both groups. If you do, there is no need for me to continue with my redistribution of data. To my knowledge, you did not offer this in the past. David made a strong point for a standardized format also for your restriction enzyme database. You will remember that I made a similar proposal some time ago. Let me explain why I still feel this to be a necessary topic. It is our historical bad experience With the sequence databases. PIR and UWGCG had inconsistent formats some years ago (luckily solved by now). At that time, the sequence databases were distributed together with the software. Due to this, different database releases were accessed by the different software packages. This situation was very confusing and unsatisfactory. Therefore we decided to get a copy directly from the primary database and to reformat it locally. If you would prepare the restriction enzyme files in the different formats yourself, we would run into the same problem again. Therefore, it seems better to have an updated release of your database in a standard format locally, and to have software for transformation. This software would be specific to the software package and could be distributed with it. Thus each institute could generate the required lists locally and have the same enzymes available for all their program packages. As you can see, the lack of standards causes a lot of confusion. I wonder why Bionet is still not using the IUPAC nomenclature for ambiguous nucleotide symbols. This standard is now several years old and should definitively be adopted by all sequence analysis software packages. Lack of standardization, especially concerning data format, is a severe problem in all existing biological databases. As David pointed out already, the PIR, MIPS and JIPID protein sequence databases try to avoid this problem by close cooperation. During the last months, I could not get hold of your lists from yourself, but received more or less updated versions floating around. Thus, EMBL still seems to distribute a list from beginning of 1987. I recieved the list you recently sent to Kessler at Boehringer, which seems to be a very recent version, only a few days ago. I have not yet managed to compare the data and to update my lists. How far are you with the Oracle database system? We are still in the process of implementation, because my major task is to participate in the organization of the MIPS protein sequence database, and restriction enzymes are only a minor activity. It would be nice, if you and Kessler could agree on a format for data exchange, so that comparison of the data by computer becomes possible. You stated that Kessler's data are not reliable and incomplete. How much do you collect data on methylases and on sensitivity to methylation of restriction enzymes. This is a major point in Kesslers lists. His system discriminates between inhibitory, tolerated, required and undocumented for RE. He also discriminated between different types of modification (pos 4 and 5 of cytosine are currently known, but probably more types of modifiaction will be detected in the future). I am completely confused with your abbreviations of commercial companies. The enzyme list received via Kessler has L = PL-Pharmacia-LKB and P = Promega, while the list received today via David George has P = Pharmacia P-L and R = Promega. My intention is to use the same abbreviation as you, but this is difficult if you are inconsistent with yourself. After talking to Kessler, I have adopted the abbreviations you sent to him, and have extended this list as necessary. Here is my current version. Please comment if you want any changes. Which of your abbreviations will be used in the future? Source Abbreviations A = Amersham B = BRL C = Cambridge Biotechnology Laboratories D = Biores (NL) E = Stratagene (European distributor is Genofit) F = Genofit G = Anglian Biotechnology I = International Biotechnologies Inc K = Takara L = PL-Pharmacia-LKB M = Boehringer Mannheim N = New England Biolabs O = Toyobo (USB in USA, Medac in FRG) P = Promega Biotec R = Brisco (does this exist any longer?) S = Sigma T = Atomergic Chemetals U = USB (see O = Toyobo) V = Serva W = Worthington X = New York Biolabs Y = Chemical Dynamics I am sorry that my lists caused confusion and hope that this letter and the note in Seqnet/Bionet will settle things. Friedhelm Pfeiffer