Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: MJB1@VMS-SUPP.CAM.AC.UK
Newsgroups: bionet.molbio.seqnet
Subject: SEQNET Bulletin
Message-ID: <4731@ig.ig.com>
Date: 15 Jan 88 18:34:11 GMT
Sender: daemon@presto.ig.com
Lines: 330

From: MJB1@VMS-SUPP.CAM.AC.UK

Bulletin_# 59  C.RAWLINGS 15-JAN-1988 IUSC Workshop Expert Systems
From: Janet"C.RAWLINGS@UK.AC.CRC" <C.RAWLINGS@UK.AC.CRC> 15-JAN-1988 17:13
To: MJB1
Subject: IUSC Workshop Expert Systems


Date:  15-JAN-1988 17:06:38 GMT
From:  POST@UK.AC.CRC
To:  MJB1@UK.AC.CAM.VMS-SUPP
Sender: Janet"C.RAWLINGS@UK.AC.CRC" <C.RAWLINGS@UK.AC.CRC>
Reply-to: C.RAWLINGS@UK.AC.CRC
Subject: IUSC Workshop Handout

Received: from ICRF20 (not validated) by CRC; Fri 15 Jan 88 15:04:40-GMT
Date: Thu 14 Jan 88 14:16:30-GMT
From: Chris Rawlings <C-RAWLINGS%ICRF20@ICRF20>
Subject: IUSC Workshop Handout
To: mjb1%uk.ac.cam.vms-supp%JANET@CRC
Message-ID: <12366536977.32.C-RAWLINGS@ICRF20>


                      Expert Systems in Molecular Biology


                  IUSC Workshop on Molecular Biology Software
                   University of Cambridge 5-6 January 1988

                                 C.J. Rawlings

                         Imperial Cancer Research Fund
                                 P.O. Box 123
                             Lincoln's Inn Fields
                                London WC2A 3PX
                        Janet: C.RAWLINGS@UK.AC.MRC-CRC


EXPERT SYSTEMS
------ -------

Expert systems are programs designed to capture the skills of a  specialist  so
that  his  or  her  expertise  may be applied to a problem by a non-specialist.
Expert systems use IF-THEN rules as a  representation  of  the  problem-solving
skills  of  the specialist.  These rules are executed by an interpreter (called
the inference engine) that in many systems engages the user  in  some  sort  of
dialogue.   When  sufficient  information  has been obtained from the user, the
expert system will proffer its opinion.


SHELLS
------

Most expert systems are now built in programs called expert system shells.  The
shell is an expert system with no expertise.  It provides the rule language for
capturing the decision rules and the inference engine that drives the consulta-
tion  and  generates  conclusions  from  the  information provided by the user.
Shell commands  also  allow the user to interrogate the rules in the system and
support  simple  explanations  of  the  line  of  reasoning being followed (see
below).  Many of the commercial expert system shells are highly engineered  and
their  proponents  claim  that  it is possible for someone to start building an
expert system after only a couple of hours training.

EXPLANATION - The How and the Why
-----------   --- --- --- --- ---

An important feature of true expert systems that distinguishes them from  other
approaches   to  computer  assisted  decision-making  is  that  the  rule-based
representation of the specialists knowledge affords the possibility of generat-
ing  explanations  as to why a particular question is being asked or how a par-
ticular conclusion has been reached.

Although  relatively  crude,  when  used   appropriately,   these   explanation
facilities can make the program and the decisions it makes more accountable and
more intelligible to the non-specialist user.

KNOWLEDGE BASED SYSTEMS
--------- ----- -------

Expert systems are an example of the class of artificial intelligence  programs
called  knowledge  based systems. This style of programming emphasizes declara-
tive representations of human expertise but does not necessarily  restrict  the
representation  language  to   IF-THEN type rules and simple logical sentences.
Knowledge based systems are often large LISP or Prolog programs that use a par-
ticular  set  of  AI  techniques  to  realize  a  level  of competence at least
equivalent to the human specialists that normally perform the task.   The  sup-
port  tools for developing knowledge based systems are generally referred to as
toolkits rather than shells, since they provide a range of  representation  and
reasoning  methods  from  which  the developer may choose to fit the particular
application.  Where the toolkit does not support a  specialist  requirement  of
the  task  domain,  the  developer  has access to the underlying implementation
language (usually LISP, sometimes Prolog) to extend the toolkit.


KNOWLEDGE BASED SYSTEMS RESEARCH IN MOLECULAR BIOLOGY
--------- ----- ------- -------- -- --------- -------

Most of the existing research into the use of knowledge based method in molecu-
lar  biology  has used knowledge based systems rather than the more restrictive
expert systems.  The topics that have been addressed include the derivation  of
restriction  maps  from  restriction fragment data,[1] the automatic design and
debugging of gene cloning experiments,[2,3,4,5] (this research has lead to  the
development  of the commercial system from IntelliGenetics called STRATEGENE[6]
) advising on optimal sequencing  strategy  for  the  Maxam-Gilbert  method,[7]
simulation  of  gene  expression  and control,[8,9,10] solving the three dimen-
sional structure of proteins from NMR data,[11,12] and representing and reason-
ing about protein topology.[13]


EXPERT SYSTEMS IN MOLECULAR BIOLOGY
------ ------- -- --------- -------

It is generally agreed that expert systems techniques are well  suited  to  the
development  of  programs  that  either provide advice on a specialist topic or
solving classification problems such as those needed for fault diagnosis.  How-
ever,  it is also the case that todays expert systems shells do not provide the
computational power nor the representation techniques  required  for  molecular
sequence  data  analysis.  Nevertheless, there are other important and hitherto
relatively neglected areas of computer assistance for molecular biologists that
could be developed using present day expert systems.


In the Laboratory
-- --- ----------

As powerful computers become standard equipment in  molecular  biology  labora-
tories it will be possible to extend their use beyond the more obvious tasks of
data capture, storage and analysis and manuscript  preparation.   It  would  be
practical  to consider the development of expert systems to assist with a range
of laboratory-related tasks.  For example:


+    Advisory Expert Systems

     Expert systems could be used to provide  advice  on  topics  such  as  the
     selection  of  the best or alternative reagents (e.g. in buffers) or tech-
     nique to meet a particular experimental design constraint  such  as  cost,
     time  or  availability  of  reagents.  A commercially available example of
     such a program is Beckman's SPIN-PRO expert system that advises on aspects
     of  preparative  ultracentrifugation.   MAXAMIZE, [7] is a knowledge based
     system for advising on the  best  strategy  for  Maxam-Gilbert  sequencing
     strategies.

     One form that expert systems of this kind might take is an expert  labora-
     tory  notebook,  where  general  knowledge  and  advice  about techniques,
     reagents etc. could be mixed with the preferences that hold in the indivi-
     dual  laboratory.   Therefore  as  well  as providing supporting advice to
     existing members of the laboratory, the system(s) could be used  to  guide
     the newcomer in the ways of the lab.


+    Debugging Experimental Techniques

     For particularly complex experiments, or new techniques, or where  techni-
     cal expertise is limited to (typically) one member of a laboratory, expert
     systems could be developed to help diagnose  and  rectify  faults  in  the
     methods or reagents being used.


+    Transferring Expertise

     As the techniques of molecular biology become applied  in  more  and  more
     laboratories,  the  availability of skilled personnel can often be a prob-
     lem.  Expert systems could be used to complement  written  description  of
     methods  or  as part of computer aided instruction (CAI) systems  intended
     to transfer expertise out from the innovating laboratories to the rest  of
     the community.


Assisting Data Analysis
--------- ---- --------

Although present day expert systems are inadequate for most molecular  sequence
analyses, they could be used to augment existing analysis software.  These sys-
tems would probably require the more sophisticated representation techniques of
knowledge  based  system  development  tools,  rather than simple expert system
shells.


+    Selecting the Best Analysis Methods

     An important part of the  expertise of a sequence analysis  specialist  is
     translating  the  biological  question raised by some data into terms that
     can be solved using the algorithms and programs  available  on  the  local
     computer  system.  This involves knowing which techniques to apply  to the
     data (i.e. which programs to run) in  what  order  and  how  to  interpret
     and/or  modify  the results of one analysis before applying the next.  For
k can be daunting  and  it
     is  often the case that anyone with particular skills in sequence analysis
     gets inundated with requests from colleagues to assist or to perform  ana-
     lyses  on their behalf.  The role of sequence analysis advisor is one that
     is well suited to implementation using an expert system.  This problem  is
     largely  equivalent  to providing intelligent assistance for a statistical
     analysis package and the GLIMPSE project at Imperial College has  recently
     successfully used an expert system to develop a front-end to GLIM.

+    Making Better Use of Resources

     A potentially important factor in selecting the best data  analysis  stra-
     tegy is to minimize the computing resources required.  This issue could be
     separated from the scientific requirements  of  determining  the  analysis
     strategy or it could be an integrated part of it.

+    Tuning an Algorithm

     Making the optimum use and correctly interpreting the results of the  more
     complex  sequence  analysis  programs  such as those that perform sequence
     alignment and protein structure prediction requires some understanding  by
     the  user  of  the theoretical underpinnings of the algorithm employed and
     occasionally of the way it is implemented as a program.  Such  skills  are
     not  yet  common  amongst  laboratory  scientists  and therefore it is not
     unheard of for a scientist to abdicate all judgement to the results  of  a
     computer  program  without  understanding  its behaviour.  More often than
     not,  the behaviour of these types of programs is controlled by a  set  of
     numerical  parameters  that  tune the algorithm.  It is also the case that
     tuning can profoundly alter the results of an analysis.  Without  a  clear
     understanding  of  how  each  parameter affects the behaviour of the algo-
     rithm, the user cannot use the method properly.  Tuning sequence  analysis
     algorithms   requires knowledge and expertise and could be supported using
     an expert system or knowledge based  approach.

+    Intelligent Front Ends

     Some programs are notoriously difficult to  use  or  require  considerable
     experience  before meaningful results can be generated. It is the function
     of a front-end program to insulate the user from the idiosyncrasies of the
     offending software.  Intelligent Front Ends (IFEs) use expert system tech-
     niques to represent the knowledge required to run a program as well as the
     skills that  a specialist would apply when using the program.  The GLIMPSE
     expert system front-end to the statistical package GLIM is a good  example
     of an IFE.

     Whilst there are likely to be a number of sequence analysis programs  that
     might  qualify  for  an  IFE to help the novice or occasional user, an IFE
     might also help a molecular biologist  use  a  piece  of  general  purpose
     software  (e.g.  a  database management system) by tailoring the system to
     her likely requirements.  An IFE such as this could be  of  any  arbitrary
     sophistication,  from  simply  supporting  familiar terminology to a fully
     interactive system that solicits the users requirements in order  to  con-
     figure a database system.


FUTURE POSSIBILITIES FOR KNOWLEDGE BASED SYSTEMS
------ ------------- --- --------- ----- -------

Although research into knowledge based methods for molecular  biology  is  res-
tricted to relatively few centres, the interest is increasing.  Areas that have
not yet been extensively studied, but where research has begun include predict-
ing protein structure from amino acid sequence[14] and protein  modelling.

A consideration of developments in  the  way  that  scientists  are  using  the
rapidly  growing  DNA  sequence data libraries and the expectation that routine
exhaustive similarity searches will soon be possible using non-Von Neumann com-
puter architectures reveals that a problem will arise in sifting and interpret-
ing the results.   As the data libraries grow ever larger, the use of a  thres-
hold  value  to select interesting alignments based on a similarity metric will
become less practical since there is a conflict between selecting  a  realistic
number of hits for further analysis (increasing the threshold) and reducing the
threshold to include potentially interesting, but marginally significant align-
ments.   This  problem arises because the statistical significance of an align-
ment does not always predict biological significance.   It  should  however  be
possible  to  employ knowledge based techniques to allow a lowered threshold to
admit a large number of potentially significant alignments  with  a  subsequent
intelligent  filtering and partial interpretation of the results before presen-
tation to the scientist.


References
----------


1.   Stefik, M., "Inferring DNA Structures from Segmentation Data,"  Artificial
     Intelligence, vol. 11, pp. 85-114, 1978.

2.   Stefik, M., "Planning  with  Constraints  [MOLGEN:  Part  1],"  Artificial
     Intelligence, vol. 16, pp. 111-140, 1981.

3.   Stefik, M., "Planning and meta-planning   [MOLGEN:  Part  2],"  Artificial
     Intelligence, vol. 16, pp. 141-169, 1981.

4.   Friedland, P., Kedes, L., Brutlag, D.L., Iwasaki, Y.,  Bach, R., "GENESIS:
     a Knowledge Based Genetic Engineering Simulation System for Representation
     of Genetic Data and Experiment Planning ," Nucleic  Acids  Research,  vol.
     10, pp. 323-340, 1982.

5.   Bach, R.,  Iwasaki, Y.,  Friedland, P., "Intelligent computational  Assis-
     tance  for Experiment Design," Nucleic Acids Research, vol. 12, pp. 11-29,
     1984.

6.   Abarbanel, R.M., Bonura, T., Smith, D.H., "STRATEGENE, A Cloning  Worksta-
     tion  and  Librarian,"  Proceedings  of  AI  Biomed  1986, pp. 1-17, CRIM,
     Montpellier, France, 1986.

7.   Bach, R., Friedland, P.,  Brutlag,  D.L.,  Kedes,  L.,  "MAXAMIZE:  A  DNA
     Sequencing  Strategy  Advisor,"  Nucleic Acids Research, vol. 10, pp. 295-
     304, 1982.

8.   Meyers,  S.,  Friedland,  P.,  "Knowledge-based  Simulation   of   Genetic
     Regulation  in Bacteriophage lambda," Nucleic Acids Research, vol. 12, pp.
     1-9, 1984.

9.   Koton, P.A., Towards a Problem Solving System for Molecular Genetics,  MIT
     Laboratory of Computer Science; Technical Report MIT/LCS/TR-338, 1985.

10.  Sabey Weld, D., Switching Between Discrete and Continuous  Process  Models
     to  Predict  Genetic  Activity,  MIT  Artificial  Intelligence Laboratory;
     Technical Report 793.

11.  Hayes-Roth, B., Buchanan, B.G., Lichtarge, O.,  Hewett,  M.,  Altman,  R.,
     Brinkley, J., Cornelius, C., Duncan, B., Jardetsky, O., "PROTEAN: Deriving
     Protein Structure from Constraints," Proceedings of  American  Association
     of Artificial Intelligence, vol. 5, pp. 904-909, 1986.

12.  Freyman, F., "PROTO: An Approach for Determining Protein  Structures  from
     NMR  Data:  An Exercise in Large Scale Interdependent Constraint Satisfac-
     tion," Proceedings of AI Biomed  1986,  pp.  122-143,  CRIM,  Montpellier,
     France, 1986.

13.  Rawlings, C.J., Taylor, W.R., Nyakairu, J., Fox,  J.,  Sternberg,  M.J.E.,
     "Reasoning  about  protein  topology  using the logic programming language
     PROLOG," Journal of Molecular Graphics, vol. 3, pp. 151-157, 1985.

14.  Rawlings, C.J., Analysis and Prediction of Protein Structure using Artifi-
     cial  Intelligence,  Proceedings:   4th European Seminar in Computer Aided
     Molecular Design, IBC Press, 1987.
-------