Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!uunet!aplcen!jhunix!ins_atge From: ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) Newsgroups: comp.ai.neural-nets Subject: Re: Help for RTRL? Message-ID: <6119@jhunix.HCF.JHU.EDU> Date: 16 Aug 90 19:38:09 GMT References: <1243.26cac1c4@waikato.ac.nz> Reply-To: ins_atge@jhunix.UUCP (Thomas G Edwards) Organization: The Johns Hopkins University - HCF Lines: 37 In article <1243.26cac1c4@waikato.ac.nz> coms2146@waikato.ac.nz (Alistair Veitch, University of Waikato, New Zealand) writes: >Has anybody out there worked with Williams and Zipsers "Real-time recurrent >learning algorithm"? [Connection Science, Vol 1, No 1]. I haven't actually implemented this algorithm, but I have heard that it is important to use the "Teacher Forcing" method they discuss to learn difficult problems. You might also want to look at J. Schmidhuber, "Making the World Differentiable: On using supervised learning fully-recurrent networks for dynamic reinforcement learning and planning in non-stationary environments", FKI Report 125-90, Technische Univeritat Munchen, 1990. A pole-balancer is trained by reinforcement learning (i.e. apply pain when the pole is dropped). And to explain why gradient-descent methods will probably not give you reasonable temporal learning see J. Schmidhuber, "Towards compositional learning with dynamic neural networks", FKI Report 129-90, TUM, April 1990. He explains that gradient-descent-only methods must take into account training learned during all past time steps when dealing with a new problem. For "toy" temporal learning problems, this is not a big impediment. For "serious" temporal learning problems, dynamic neural systems must develop methods of breaking goals down into subgoals, most of which have already been learned, some of which need to be developed by gradient-descent. In this way, only small problems are trained by gradient-descent, and they are used by the system combinatorially to allow the network-of-networks to solve real problems by "divide-and-conquer" methods. The research is very fresh into this area, and I think in about a year there will be a move away from naive implementations of gradient-descent learning in both stationary and temporal learning and a move towards connectionist compositional learning (Cascade-Correlation is a simple example of this). -Thomas Edwards