Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: MJB1@VMS-SUPP.CAM.AC.UK Newsgroups: bionet.molbio.seqnet Subject: SEQNET Bulletin Message-ID: <4731@ig.ig.com> Date: 15 Jan 88 18:34:11 GMT Sender: daemon@presto.ig.com Lines: 330 From: MJB1@VMS-SUPP.CAM.AC.UK Bulletin_# 59 C.RAWLINGS 15-JAN-1988 IUSC Workshop Expert Systems From: Janet"C.RAWLINGS@UK.AC.CRC" 15-JAN-1988 17:13 To: MJB1 Subject: IUSC Workshop Expert Systems Date: 15-JAN-1988 17:06:38 GMT From: POST@UK.AC.CRC To: MJB1@UK.AC.CAM.VMS-SUPP Sender: Janet"C.RAWLINGS@UK.AC.CRC" Reply-to: C.RAWLINGS@UK.AC.CRC Subject: IUSC Workshop Handout Received: from ICRF20 (not validated) by CRC; Fri 15 Jan 88 15:04:40-GMT Date: Thu 14 Jan 88 14:16:30-GMT From: Chris Rawlings Subject: IUSC Workshop Handout To: mjb1%uk.ac.cam.vms-supp%JANET@CRC Message-ID: <12366536977.32.C-RAWLINGS@ICRF20> Expert Systems in Molecular Biology IUSC Workshop on Molecular Biology Software University of Cambridge 5-6 January 1988 C.J. Rawlings Imperial Cancer Research Fund P.O. Box 123 Lincoln's Inn Fields London WC2A 3PX Janet: C.RAWLINGS@UK.AC.MRC-CRC EXPERT SYSTEMS ------ ------- Expert systems are programs designed to capture the skills of a specialist so that his or her expertise may be applied to a problem by a non-specialist. Expert systems use IF-THEN rules as a representation of the problem-solving skills of the specialist. These rules are executed by an interpreter (called the inference engine) that in many systems engages the user in some sort of dialogue. When sufficient information has been obtained from the user, the expert system will proffer its opinion. SHELLS ------ Most expert systems are now built in programs called expert system shells. The shell is an expert system with no expertise. It provides the rule language for capturing the decision rules and the inference engine that drives the consulta- tion and generates conclusions from the information provided by the user. Shell commands also allow the user to interrogate the rules in the system and support simple explanations of the line of reasoning being followed (see below). Many of the commercial expert system shells are highly engineered and their proponents claim that it is possible for someone to start building an expert system after only a couple of hours training. EXPLANATION - The How and the Why ----------- --- --- --- --- --- An important feature of true expert systems that distinguishes them from other approaches to computer assisted decision-making is that the rule-based representation of the specialists knowledge affords the possibility of generat- ing explanations as to why a particular question is being asked or how a par- ticular conclusion has been reached. Although relatively crude, when used appropriately, these explanation facilities can make the program and the decisions it makes more accountable and more intelligible to the non-specialist user. KNOWLEDGE BASED SYSTEMS --------- ----- ------- Expert systems are an example of the class of artificial intelligence programs called knowledge based systems. This style of programming emphasizes declara- tive representations of human expertise but does not necessarily restrict the representation language to IF-THEN type rules and simple logical sentences. Knowledge based systems are often large LISP or Prolog programs that use a par- ticular set of AI techniques to realize a level of competence at least equivalent to the human specialists that normally perform the task. The sup- port tools for developing knowledge based systems are generally referred to as toolkits rather than shells, since they provide a range of representation and reasoning methods from which the developer may choose to fit the particular application. Where the toolkit does not support a specialist requirement of the task domain, the developer has access to the underlying implementation language (usually LISP, sometimes Prolog) to extend the toolkit. KNOWLEDGE BASED SYSTEMS RESEARCH IN MOLECULAR BIOLOGY --------- ----- ------- -------- -- --------- ------- Most of the existing research into the use of knowledge based method in molecu- lar biology has used knowledge based systems rather than the more restrictive expert systems. The topics that have been addressed include the derivation of restriction maps from restriction fragment data,[1] the automatic design and debugging of gene cloning experiments,[2,3,4,5] (this research has lead to the development of the commercial system from IntelliGenetics called STRATEGENE[6] ) advising on optimal sequencing strategy for the Maxam-Gilbert method,[7] simulation of gene expression and control,[8,9,10] solving the three dimen- sional structure of proteins from NMR data,[11,12] and representing and reason- ing about protein topology.[13] EXPERT SYSTEMS IN MOLECULAR BIOLOGY ------ ------- -- --------- ------- It is generally agreed that expert systems techniques are well suited to the development of programs that either provide advice on a specialist topic or solving classification problems such as those needed for fault diagnosis. How- ever, it is also the case that todays expert systems shells do not provide the computational power nor the representation techniques required for molecular sequence data analysis. Nevertheless, there are other important and hitherto relatively neglected areas of computer assistance for molecular biologists that could be developed using present day expert systems. In the Laboratory -- --- ---------- As powerful computers become standard equipment in molecular biology labora- tories it will be possible to extend their use beyond the more obvious tasks of data capture, storage and analysis and manuscript preparation. It would be practical to consider the development of expert systems to assist with a range of laboratory-related tasks. For example: + Advisory Expert Systems Expert systems could be used to provide advice on topics such as the selection of the best or alternative reagents (e.g. in buffers) or tech- nique to meet a particular experimental design constraint such as cost, time or availability of reagents. A commercially available example of such a program is Beckman's SPIN-PRO expert system that advises on aspects of preparative ultracentrifugation. MAXAMIZE, [7] is a knowledge based system for advising on the best strategy for Maxam-Gilbert sequencing strategies. One form that expert systems of this kind might take is an expert labora- tory notebook, where general knowledge and advice about techniques, reagents etc. could be mixed with the preferences that hold in the indivi- dual laboratory. Therefore as well as providing supporting advice to existing members of the laboratory, the system(s) could be used to guide the newcomer in the ways of the lab. + Debugging Experimental Techniques For particularly complex experiments, or new techniques, or where techni- cal expertise is limited to (typically) one member of a laboratory, expert systems could be developed to help diagnose and rectify faults in the methods or reagents being used. + Transferring Expertise As the techniques of molecular biology become applied in more and more laboratories, the availability of skilled personnel can often be a prob- lem. Expert systems could be used to complement written description of methods or as part of computer aided instruction (CAI) systems intended to transfer expertise out from the innovating laboratories to the rest of the community. Assisting Data Analysis --------- ---- -------- Although present day expert systems are inadequate for most molecular sequence analyses, they could be used to augment existing analysis software. These sys- tems would probably require the more sophisticated representation techniques of knowledge based system development tools, rather than simple expert system shells. + Selecting the Best Analysis Methods An important part of the expertise of a sequence analysis specialist is translating the biological question raised by some data into terms that can be solved using the algorithms and programs available on the local computer system. This involves knowing which techniques to apply to the data (i.e. which programs to run) in what order and how to interpret and/or modify the results of one analysis before applying the next. For k can be daunting and it is often the case that anyone with particular skills in sequence analysis gets inundated with requests from colleagues to assist or to perform ana- lyses on their behalf. The role of sequence analysis advisor is one that is well suited to implementation using an expert system. This problem is largely equivalent to providing intelligent assistance for a statistical analysis package and the GLIMPSE project at Imperial College has recently successfully used an expert system to develop a front-end to GLIM. + Making Better Use of Resources A potentially important factor in selecting the best data analysis stra- tegy is to minimize the computing resources required. This issue could be separated from the scientific requirements of determining the analysis strategy or it could be an integrated part of it. + Tuning an Algorithm Making the optimum use and correctly interpreting the results of the more complex sequence analysis programs such as those that perform sequence alignment and protein structure prediction requires some understanding by the user of the theoretical underpinnings of the algorithm employed and occasionally of the way it is implemented as a program. Such skills are not yet common amongst laboratory scientists and therefore it is not unheard of for a scientist to abdicate all judgement to the results of a computer program without understanding its behaviour. More often than not, the behaviour of these types of programs is controlled by a set of numerical parameters that tune the algorithm. It is also the case that tuning can profoundly alter the results of an analysis. Without a clear understanding of how each parameter affects the behaviour of the algo- rithm, the user cannot use the method properly. Tuning sequence analysis algorithms requires knowledge and expertise and could be supported using an expert system or knowledge based approach. + Intelligent Front Ends Some programs are notoriously difficult to use or require considerable experience before meaningful results can be generated. It is the function of a front-end program to insulate the user from the idiosyncrasies of the offending software. Intelligent Front Ends (IFEs) use expert system tech- niques to represent the knowledge required to run a program as well as the skills that a specialist would apply when using the program. The GLIMPSE expert system front-end to the statistical package GLIM is a good example of an IFE. Whilst there are likely to be a number of sequence analysis programs that might qualify for an IFE to help the novice or occasional user, an IFE might also help a molecular biologist use a piece of general purpose software (e.g. a database management system) by tailoring the system to her likely requirements. An IFE such as this could be of any arbitrary sophistication, from simply supporting familiar terminology to a fully interactive system that solicits the users requirements in order to con- figure a database system. FUTURE POSSIBILITIES FOR KNOWLEDGE BASED SYSTEMS ------ ------------- --- --------- ----- ------- Although research into knowledge based methods for molecular biology is res- tricted to relatively few centres, the interest is increasing. Areas that have not yet been extensively studied, but where research has begun include predict- ing protein structure from amino acid sequence[14] and protein modelling. A consideration of developments in the way that scientists are using the rapidly growing DNA sequence data libraries and the expectation that routine exhaustive similarity searches will soon be possible using non-Von Neumann com- puter architectures reveals that a problem will arise in sifting and interpret- ing the results. As the data libraries grow ever larger, the use of a thres- hold value to select interesting alignments based on a similarity metric will become less practical since there is a conflict between selecting a realistic number of hits for further analysis (increasing the threshold) and reducing the threshold to include potentially interesting, but marginally significant align- ments. This problem arises because the statistical significance of an align- ment does not always predict biological significance. It should however be possible to employ knowledge based techniques to allow a lowered threshold to admit a large number of potentially significant alignments with a subsequent intelligent filtering and partial interpretation of the results before presen- tation to the scientist. References ---------- 1. Stefik, M., "Inferring DNA Structures from Segmentation Data," Artificial Intelligence, vol. 11, pp. 85-114, 1978. 2. Stefik, M., "Planning with Constraints [MOLGEN: Part 1]," Artificial Intelligence, vol. 16, pp. 111-140, 1981. 3. Stefik, M., "Planning and meta-planning [MOLGEN: Part 2]," Artificial Intelligence, vol. 16, pp. 141-169, 1981. 4. Friedland, P., Kedes, L., Brutlag, D.L., Iwasaki, Y., Bach, R., "GENESIS: a Knowledge Based Genetic Engineering Simulation System for Representation of Genetic Data and Experiment Planning ," Nucleic Acids Research, vol. 10, pp. 323-340, 1982. 5. Bach, R., Iwasaki, Y., Friedland, P., "Intelligent computational Assis- tance for Experiment Design," Nucleic Acids Research, vol. 12, pp. 11-29, 1984. 6. Abarbanel, R.M., Bonura, T., Smith, D.H., "STRATEGENE, A Cloning Worksta- tion and Librarian," Proceedings of AI Biomed 1986, pp. 1-17, CRIM, Montpellier, France, 1986. 7. Bach, R., Friedland, P., Brutlag, D.L., Kedes, L., "MAXAMIZE: A DNA Sequencing Strategy Advisor," Nucleic Acids Research, vol. 10, pp. 295- 304, 1982. 8. Meyers, S., Friedland, P., "Knowledge-based Simulation of Genetic Regulation in Bacteriophage lambda," Nucleic Acids Research, vol. 12, pp. 1-9, 1984. 9. Koton, P.A., Towards a Problem Solving System for Molecular Genetics, MIT Laboratory of Computer Science; Technical Report MIT/LCS/TR-338, 1985. 10. Sabey Weld, D., Switching Between Discrete and Continuous Process Models to Predict Genetic Activity, MIT Artificial Intelligence Laboratory; Technical Report 793. 11. Hayes-Roth, B., Buchanan, B.G., Lichtarge, O., Hewett, M., Altman, R., Brinkley, J., Cornelius, C., Duncan, B., Jardetsky, O., "PROTEAN: Deriving Protein Structure from Constraints," Proceedings of American Association of Artificial Intelligence, vol. 5, pp. 904-909, 1986. 12. Freyman, F., "PROTO: An Approach for Determining Protein Structures from NMR Data: An Exercise in Large Scale Interdependent Constraint Satisfac- tion," Proceedings of AI Biomed 1986, pp. 122-143, CRIM, Montpellier, France, 1986. 13. Rawlings, C.J., Taylor, W.R., Nyakairu, J., Fox, J., Sternberg, M.J.E., "Reasoning about protein topology using the logic programming language PROLOG," Journal of Molecular Graphics, vol. 3, pp. 151-157, 1985. 14. Rawlings, C.J., Analysis and Prediction of Protein Structure using Artifi- cial Intelligence, Proceedings: 4th European Seminar in Computer Aided Molecular Design, IBC Press, 1987. -------