Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: BIORELAY@BIO.CAM.AC.UK
Newsgroups: bionet.molbio.seqnet
Subject: SEQNET Bulletin RELAY ONLY: reply to SEQNET@UK.AC.CAM.BIO
Message-ID: <5754@ig.ig.com>
Date: 5 Apr 88 11:58:05 GMT
Sender: daemon@presto.ig.com
Lines: 128

From: BIORELAY%BIO.CAM.AC.UK@CUNYVM.CUNY.EDU

From:    SEQNET@UK.AC.CAM.PHX  5-APR-1988 11:32
To:    SEQNET
Subj:


Date: Tue, 05 Apr 88 11:31:59 BST
From: SEQNET@UK.AC.CAM.PHX
To:   seqnet@UK.AC.CAM.BIO
Message-ID: <9E527571CB3E3A50@UK.AC.CAM.PHX>

(Message number 2)
Accepted:  11:30:37 05 Apr 88
Submitted: 18:24:00 31 Mar 88
IPMessageId: -unspecified-
From: PFEIFFER@EARN.DM0MPB51
To: SEQNET@UK.AC.CAMBRIDGE.PHOENIX
Subject: restriction enzyme database

Via:           UK.AC.RL.EARN; Thu, 31 Mar 88 17:27:54 BST
Received:
          from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 8722; Thu, 31
               Mar 88 17:27:54 BS
Received:
           from DM0MPB51.BITNET (PFEIFFER) by UKACRL.BITNET (Mailer X1.25) with
               BSMTP id 8721; Thu, 31 Mar 88 17:27:53
X-Original-To: g--seqnet, g--george, g--roberts, mewes, PFEIFFER

                                         30-MAR-1988
Dear Rich Roberts!
I am sorry that my restriction enzyme lists were announced in Bionet as
better than yours. This has never been my intention. Clearly, I myself do not
intensely collect information on restriction enzymes, but try to combine yours
and Kesslers data with some generally available information (publications and
enzyme catalogues) and make them available to users of the PIR and UWGCG
package. Thus, my lists are secondary information and cannot be proposed as a
substitute for the primary data collection. A separate notice of this will be
sent to Seqnet and thus will be distributed through Bionet. I hope that this
makes things clear between us and for the scientific community.
As David George pointed out, many secondary distributors step in to fill a
perceived need. This is exactly what happened to me. Here in Martinsried, we
use the PIR and UWGCG software and detected that the enzyme lists were
inadequate. Thus, I prepared a local update and made it available to the
scientific community so that other scientists do not need to do the same.
I do not see a reason why you cannot prepare the restriction enzyme list in
the format required by the PIR and UWGCG software yourself. Your database
output is very flexible as you say. Please look at their format, prepare the
output and send it to both groups. If you do, there is no need for me to
continue with my redistribution of data. To my knowledge, you did not offer
this in the past.
David made a strong point for a standardized format also for your restriction
enzyme database. You will remember that I made a similar proposal some time
ago. Let me explain why I still feel this to be a necessary topic. It is our
historical bad experience With the sequence databases.
PIR and UWGCG had inconsistent formats some years ago (luckily solved by now).
At that time, the sequence databases were distributed together with the
software. Due to this, different database releases were accessed by the
different software packages. This situation was very confusing and
unsatisfactory. Therefore we decided to get a copy directly from the primary
database and to reformat it locally.
If you would prepare the restriction enzyme files in the different formats
yourself, we would run into the same problem again. Therefore, it seems better
to have an updated release of your database in a standard format locally, and
to have software for transformation. This software would be specific to the
software package and could be distributed with it. Thus each institute could
generate the required lists locally and have the same enzymes available for
all their program packages.
As you can see, the lack of standards causes a lot of confusion. I wonder why
Bionet is still not using the IUPAC nomenclature for ambiguous nucleotide
symbols. This standard is now several years old and should definitively be
adopted by all sequence analysis software packages. Lack of standardization,
especially concerning data format, is a severe problem in all existing
biological databases. As David pointed out already, the PIR, MIPS and JIPID
protein sequence databases try to avoid this problem by close cooperation.
During the last months, I could not get hold of your lists from yourself, but
received more or less updated versions floating around. Thus, EMBL still seems
to distribute a list from beginning of 1987. I recieved the list you recently
sent to Kessler at Boehringer, which seems to be a very recent version, only a
few days ago. I have not yet managed to compare the data and to update my
lists.
How far are you with the Oracle database system? We are still in the process
of implementation, because my major task is to participate in the organization
of the MIPS protein sequence database, and restriction enzymes are only a
minor activity. It would be nice, if you and Kessler could agree on a format
for data exchange, so that comparison of the data by computer becomes
possible.
You stated that Kessler's data are not reliable and incomplete. How much do
you collect data on methylases and on sensitivity to methylation of
restriction enzymes. This is a major point in Kesslers lists. His system
discriminates between inhibitory, tolerated, required and undocumented for RE.
He also discriminated between different types of modification (pos 4 and 5 of
cytosine are currently known, but probably more types of modifiaction will be
detected in the future).
I am completely confused with your abbreviations of commercial companies. The
enzyme list received via Kessler has L = PL-Pharmacia-LKB and P = Promega,
while the list received today via David George has P = Pharmacia P-L and R =
Promega. My intention is to use the same abbreviation as you, but this is
difficult if you are inconsistent with yourself. After talking to Kessler,
I have adopted the abbreviations you sent to him, and have extended this list
as necessary. Here is my current version. Please comment if you want any
changes. Which of your abbreviations will be used in the future?
Source Abbreviations
A = Amersham
B = BRL
C = Cambridge Biotechnology Laboratories
D = Biores (NL)
E = Stratagene (European distributor is Genofit)
F = Genofit
G = Anglian Biotechnology
I = International Biotechnologies Inc
K = Takara
L = PL-Pharmacia-LKB
M = Boehringer Mannheim
N = New England Biolabs
O = Toyobo (USB in USA, Medac in FRG)
P = Promega Biotec
R = Brisco (does this exist any longer?)
S = Sigma
T = Atomergic Chemetals
U = USB  (see O = Toyobo)
V = Serva
W = Worthington
X = New York Biolabs
Y = Chemical Dynamics
I am sorry that my lists caused confusion and hope that this letter and the
note in Seqnet/Bionet will settle things.
Friedhelm Pfeiffer