Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!hp-sdd!ucsdhub!calmasd!wlp
From: wlp@calmasd.Prime.COM (Walter L. Peterson, Jr.)
Newsgroups: comp.ai.neural-nets
Subject: Re: PDP programs not working for large nets?
Summary: PDP software porting (longish)
Keywords: Neural Nets, PDP, pa
Message-ID: <216@calmasd.Prime.COM>
Date: 23 Feb 89 16:52:26 GMT
References: <539@tekno.chalmers.se> <2730@usceast.UUCP>
Organization: Prime-Calma, San Diego R&D, Object and Data Management Group
Lines: 115

In article <2730@usceast.UUCP>, fenimore@usceast.UUCP (Fred Fenimore) writes:
> 
>   [stuff deleted] ... 
> part of the course, we were to implement some type of project using one
> of the simulators availible.  What we found with ours was that if you 
> use BP or PA, then you cannot use the block commands in the .net file.
> We tried it on a Vax 11/725 and a Apollo.  Both machines gave either
> a segmentation fault or out of memory error. We spent some time looking
> in the various files to see if we could find the error and to confirm
> that it was a real bug in the code or what.  The semester ended with
> no results so I gave up and coded the project in C. 
>  ... [stuff deleted]

Since there have been several questions about porting the PDP code
lately, I'll post this rather than e-mailing it.

The obvious first question is: are you certain that you declared the
block(s) correctly?  Getting things out of order could cause the
program to attempt to allocate 0 bytes. e.g. if you take the XOR.NET 
and give it :
                %r 2 2 2 0
                %r 4 1 2 2
rather than :
                %r 2 2 0 2
                %r 4 1 2 2  

the incorrect definition of the sending level of the first block will
cause the system to attempt to allocate 0 nodes for the sending level
and you will get a run-time such as you describe.  If your network
definitions are correct, then there are several other possibilities;
these should be checked anyway, since the PDP code *IS* sensitive to
compiler and system differences.

First off - the block network definitions DO work. The XOR.NET and
XOR2.NET files that are distributed with the PDP software use them for
BP and there are other network definition files that use them also.  I
have made networks with over 100 nodes in 4 layers (in, out, 2 hidden),
using the block notation and have found no bugs *in the code that 
reads or utilizes* this type of definition.

Note the asterix above; this emphasis indicates that I did not find
bugs in  THAT part of the code, I *DID* find problems elsewhere.  When I
began using the PDP code I found numerous, albeit minor, problems when
I compiled, linked and ran it using TURBO-C V2.0 under MS-DOS V3.1 .

The problem which you found seems to be the same, or close
to one of the ones which I encountered.  My first attempt to run the
BP program after having re-compiled and relinked it under TURBO-C gave
me the "no memory" error.  As I was using the XOR.* files that
come with the code and had not yet made any mods to the code, I knew
that something was not porting correctly.  After a bit I found that
the PDP code's "shells" arround calloc, malloc and realloc allowed an
input parameter of 0 to slip through; if you try to calloc 0 bytes calloc
returns NULL and the code *was* testing for that.  Having fixed that I
was at least able to get started. ( Note: this error happened soon
after the copyright notice was displayed, before any display comes up
on the screen; did yours do the same ? ).

*THEN* I hit the *real* problem.  I started getting Floating Point
errors. In a program that uses floats for darn near everything, that
was real fun to track down :-).  ( I need to acknowledge some VERY
helpful hints from Walter Bright and Eric Raymond ).  The actual
problem with the PDP code when ported to compilers and systems other
than the one on which it was written ( SUN UNIX ? ) is in the casting
of floats to doubles and doubles to floats.  The culprits are at the
points were there are calls to exp(x) and pow(y, x). I don't have
the code here and I don't remember off hand in what functions these
occur, but you can use grep to find them.  The solution is relatively
straight forward.  In those functions the return value is computed in
the return statement; change that.  Add a local variable that is
declared as double, do the computations outside of the return
statment, BEING VERY CAREFUL ABOUT USING PROPER CASTING. Assign the
result to the local variable and then return the local variable . 
For example:

           ...
           double foo;
        
            ...

           foo = exp( < some expression > );
  
            ...


           return(foo);

This simple expedient should solve your problems.  Also in the
functions that use the pow(y, x)  [ that is, y raised to the x ], y is
ALWAYS 10, so if your C library provides it, you might want to change
this to pow10(x).  

These casting problems can get nasty and can cause problems that are
not easy to track down; however, once you get them fixed the code runs
just fine.  I have been able to make some rather extensive
modifications to the BP code, having gone so far as converting it to
use Scott Fahlman's "Quick-Prop" ( see "Proc. of the 1988 Connectionist
Models Summer School", Morgan-Kaufman, NY, 1988 ).

If you have the time, it might also be helpfull to convert the code
from the "old" K&R style to ANSI-C with function prototypes, but that
is really not necessary. If you have a LOT of time and you are using
TURBO-C or some other system which provides good screen IO routines,
you might want to get rid of the CURSES emulation stuff.  That will
eliminate some unnecessary function calls and for long runs of large
models that might help to speed things up.

Good Luck,..


-- 
Walt Peterson.  Prime - Calma San Diego R&D (Object and Data Management Group)
"The opinions expressed here are my own and do not necessarily reflect those
Prime, Calma nor anyone else.
...{ucbvax|decvax}!sdcsvax!calmasd!wlp