Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!uunet!aplcen!jhunix!ins_atge
From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards)
Newsgroups: comp.ai.neural-nets
Subject: Re: Help for RTRL?
Message-ID: <6119@jhunix.HCF.JHU.EDU>
Date: 16 Aug 90 19:38:09 GMT
References: <1243.26cac1c4@waikato.ac.nz>
Reply-To: ins_atge@jhunix.UUCP (Thomas G Edwards)
Organization: The Johns Hopkins University - HCF
Lines: 37

In article <1243.26cac1c4@waikato.ac.nz> coms2146@waikato.ac.nz (Alistair Veitch, University of Waikato, New Zealand) writes:
>Has anybody out there worked with Williams and Zipsers "Real-time recurrent
>learning algorithm"? [Connection Science, Vol 1, No 1].

I haven't actually implemented this algorithm, but I have heard
that it is important to use the "Teacher Forcing" method
they discuss to learn difficult problems.

You might also want to look at J. Schmidhuber, "Making the World
Differentiable:  On using supervised learning fully-recurrent
networks for dynamic reinforcement learning and planning in non-stationary
environments", FKI Report 125-90, Technische Univeritat Munchen,
1990.  A pole-balancer is trained by reinforcement learning (i.e.
apply pain when the pole is dropped).

And to explain why gradient-descent methods will probably not give
you reasonable temporal learning see J. Schmidhuber, "Towards
compositional learning with dynamic neural networks",
FKI Report 129-90, TUM, April 1990.

He explains that gradient-descent-only methods must take into
account training learned during all past time steps when dealing with
a new problem.  For "toy" temporal learning problems, this is not
a big impediment.  For "serious" temporal learning problems,
dynamic neural systems must develop methods of breaking goals down
into subgoals, most of which have already been learned, some of which
need to be developed by gradient-descent.  In this way, only small
problems are trained by gradient-descent, and they are used by
the system combinatorially to allow the network-of-networks to
solve real problems by "divide-and-conquer" methods.
The research is very fresh into this area, and I think in about a year
there will be a move away from naive implementations of gradient-descent
learning in both stationary and temporal learning and a move
towards connectionist compositional learning (Cascade-Correlation
is a simple example of this).

-Thomas Edwards