Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh From: markh@csd4.csd.uwm.edu (Mark William Hopkins) Newsgroups: comp.ai.neural-nets Subject: Connectionist Finite State Machines -- description of an architecture Message-ID: <7982@uwm.edu> Date: 1 Dec 90 00:22:09 GMT Sender: news@uwm.edu Organization: University of Wisconsin - Milwaukee Lines: 102 This is a description of a rather simple architecture that can be used to train a neural net to be a finite state machine using only backpropagation. (0) BASIC DESCRIPTION The net consists of 4 layers: an input layer, an output layer, a hidden layer and a "hidden input" layer. The hidden input layer is used to help represent the transition function of the finite state machine. The hidden input layer and input layer are both used to determine the activations of the cells in the hidden layer. The hidden input layer receives its activations from the hidden layer itself WITH A DELAY of one time step. (1) FINITE STATE MACHINES For our purposes, a Finite State Machine (FSM) can be defined as an automata with the following components: * A set of STATES (S). * A set of INPUT signals (I). * A set of OUTPUT signals (O). and * A TRANSITION function T: IxS -> SxO. which may be decomposed into a * State Transition Function Ts: IxS -> S and * Output Function To: IxS -> O. with T(i, s) = (Ts(i, s), To(i, s)). We are not considering what start states or terminating states might be in this application of FSM theory. Two Finite State Machines are considered equivalent here if they produce the same output, given the same history and the same input. The "history" should be considered as the left-context if the input is thought of as being arranged in a linear sequence. This context may be infinite, since we're not necessarily considering start states here. Actually for an N-state FSM I believe you only need to consider contexts consisting of the previous N inputs in this definition. The task at hand if to train a neural net to coverge to a FSM that is equivalent to the one it is being presented with. This equivalent machine may or may not have the same set of states, the intent of the definition given above was to allow for the possibility that the state sets may be different. (2) THE FUNCTION OF THE LAYERS. Intuitively, the two hidden layers are supposed to represent both the current state and previous state of the FSM. The hidden layer effectively collects the output of the State Transition function using the previous state represented in the hidden input layer and the input represented in the input layer, and the output layer collects the outputs derived from the input and hidden input layers. The weights between ALL the layers are variable, so the state set itself IS BEING LEARNED during training. (3) ARCHITECTURE. This is a standard layered backpropagation neural net, with one minor modification. (a) External inputs are collected into the input layer. (b) The current state, represented by the set of activations of the hidden layer during the previous step are 'shifted' into the hidden input layer. (c) Weighted sums of the activations in the hidden input layer and input layer are passed to the hidden layer. A sigmoid function such as 1/(1 + exp(-x)) is applied to derive the activations of the hidden layer. (d) The same process is applied to the hidden input layer and input layer to derive the activations of the output layer. An alternative to (d), would be to simply pass the hidden layer's activations to the output layer, but this will not be guaranteed to produce valid learning behavior though it is a simpler architecture. After presentation of the training output backpropagation is applied just as it would be for any layered neural net. (4) TRAINING SEQUENCE Training a FSM differs slightly from training a layered backprop. neural net. The order in which inputs and outputs are presented to the neural net does matter. Outputs should faithfully reflect the output that the FSM being presented to the neural net would produce with history taken into account. (5) EXAMPLE A flip-flop can be considered as a 2-state FSM with the following components: INPUTS: (P, Q, N) OUTPUTS: (R, S) STATES: (0, 1) TRANSITION FUNCTION: Ts(P, s) = 0, Ts(Q, s) = 1, Ts(N, s) = s for s = 0, 1. To(P, s) = To(N, 0) = R To(Q, s) = To(N, 1) = S A valid training sequence would accurately reflect the fact that outputs from previous time steps are preserved when the input is N: (P, R), (N, R), (Q, S), (N, S), (N, S), (P, R), (Q, S), ... The way in which this condition would be violated is if the next item in the training sequence were (N, R). A demo neural net was written as per (3) with the simplification mentioned after (d) and successfully trained to follow this FSM.