Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uwm.edu!csd4.csd.uwm.edu!markh
From: markh@csd4.csd.uwm.edu (Mark William Hopkins)
Newsgroups: comp.ai.neural-nets
Subject: Connectionist Finite State Machines -- description of an architecture
Message-ID: <7982@uwm.edu>
Date: 1 Dec 90 00:22:09 GMT
Sender: news@uwm.edu
Organization: University of Wisconsin - Milwaukee
Lines: 102


   This is a description of a rather simple architecture that can be used to
train a neural net to be a finite state machine using only backpropagation.

(0) BASIC DESCRIPTION
   The net consists of 4 layers: an input layer, an output layer, a hidden
layer and a "hidden input" layer.  The hidden input layer is used to help
represent the transition function of the finite state machine.

   The hidden input layer and input layer are both used to determine the
activations of the cells in the hidden layer.  The hidden input layer receives
its activations from the hidden layer itself WITH A DELAY of one time step.

(1) FINITE STATE MACHINES
   For our purposes, a Finite State Machine (FSM) can be defined as an
automata with the following components:

	      * A set of STATES (S).
	      * A set of INPUT signals (I).
	      * A set of OUTPUT signals (O).
and           * A TRANSITION function T: IxS -> SxO.
		which may be decomposed into a
		* State Transition Function Ts: IxS -> S
           and  * Output Function To: IxS -> O.
		with T(i, s) = (Ts(i, s), To(i, s)).

We are not considering what start states or terminating states might be in this
application of FSM theory.

   Two Finite State Machines are considered equivalent here if they produce
the same output, given the same history and the same input.  The "history"
should be considered as the left-context if the input is thought of as being
arranged in a linear sequence.  This context may be infinite, since we're
not necessarily considering start states here.  Actually for an N-state FSM
I believe you only need to consider contexts consisting of the previous N
inputs in this definition.

   The task at hand if to train a neural net to coverge to a FSM that is
equivalent to the one it is being presented with.  This equivalent machine may
or may not have the same set of states, the intent of the definition given
above was to allow for the possibility that the state sets may be different.

(2) THE FUNCTION OF THE LAYERS.
   Intuitively, the two hidden layers are supposed to represent both the
current state and previous state of the FSM.  The hidden layer effectively
collects the output of the State Transition function using the previous
state represented in the hidden input layer and the input represented in
the input layer, and the output layer collects the outputs derived from the
input and hidden input layers.

   The weights between ALL the layers are variable, so the state set itself
IS BEING LEARNED during training.

(3) ARCHITECTURE.
   This is a standard layered backpropagation neural net, with one minor
modification.  
   (a) External inputs are collected into the input layer.
   (b) The current state, represented by the set of activations of the hidden
       layer during the previous step are 'shifted' into the hidden input layer.
   (c) Weighted sums of the activations in the hidden input layer and input
       layer are passed to the hidden layer.  A sigmoid function such as
       1/(1 + exp(-x)) is applied to derive the activations of the hidden
       layer.
   (d) The same process is applied to the hidden input layer and input layer to
       derive the activations of the output layer.

An alternative to (d), would be to simply pass the hidden layer's activations
to the output layer, but this will not be guaranteed to produce valid learning
behavior though it is a simpler architecture.

   After presentation of the training output backpropagation is applied just
as it would be for any layered neural net.

(4) TRAINING SEQUENCE
    Training a FSM differs slightly from training a layered backprop.
neural net.  The order in which inputs and outputs are presented to the neural
net does matter.  Outputs should faithfully reflect the output that the
FSM being presented to the neural net would produce with history taken into
account.

(5) EXAMPLE
   A flip-flop can be considered as a 2-state FSM with the following
components:
	     INPUTS: (P, Q, N)
	     OUTPUTS: (R, S)
	     STATES: (0, 1)
	     TRANSITION FUNCTION:
	        Ts(P, s) = 0, Ts(Q, s) = 1, Ts(N, s) = s for s = 0, 1.
	        To(P, s) = To(N, 0) = R
	        To(Q, s) = To(N, 1) = S

A valid training sequence would accurately reflect the fact that outputs from
previous time steps are preserved when the input is N:

       (P, R), (N, R), (Q, S), (N, S), (N, S), (P, R), (Q, S), ...

The way in which this condition would be violated is if the next item in the
training sequence were
			    (N, R).

   A demo neural net was written as per (3) with the simplification mentioned
after (d) and successfully trained to follow this FSM.