Xref: utzoo comp.ai:5024 comp.ai.neural-nets:1078
Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!gem.mps.ohio-state.edu!ctrsol!sdsu!ucsd!ogccse!orstcs!tgd
From: tgd@orstcs.CS.ORST.EDU (Tom Dietterich)
Newsgroups: comp.ai,comp.ai.neural-nets
Subject: Re: Backpropagation applications
Summary: NETtalk error rates
Keywords: Neural Networks, Efficient Learning
Message-ID: <13659@orstcs.CS.ORST.EDU>
Date: 9 Nov 89 06:10:26 GMT
References: <1690@cod.NOSC.MIL> <77404@linus.UUCP>
Organization: Oregon State University, Corvallis
Lines: 43


Your accuracy claims for NETtalk are greatly exaggerated.  I have
replicated the NETtalk study using the same training data.  In this
case, training on 1000 words chosen at random from the 20000-word
dictionary provided by Sejnowski.

After running back propagation for 30 epochs using the parameters
given in Sejnowski and Rosenberg (1986), I obtain the following
results.  Testing is performed on a randomly chosen test set of 1000
words.


                              WORDS  LETTERS (PHON/STRESS)  BITS 
------------------------------------------------------------------
BP                    TRAIN:  65.3    94.0     97.0  96.4    99.5
                      TEST :  14.9    71.6     81.8  81.4    96.7

Numbers give percentage of correct performance:

Explanation:
  TRAIN: performance on the training set
  TEST: performance on the test set
  BITS: average performance on the 26 output bits of the network.
  STRESS: performance on the 5 stress bits
  PHONEME: performance on the 21 phoneme bits
  LETTERS: performance on all 26 bits
  WORDS: performance on whole words (i.e., each letter must be
correct). 

The nettalk network has 120 hidden units, 203 input units (that code,
very sparsely, a 7 letter window), and 26 output units (that code in a
distributed fashion the 54 phonemes and 6 stresses).  The 26 output
bits are mapped to the nearest phoneme/stress combination that was
observed in the training data.  (i.e., a pass was made over the
training data to find all phoneme/stress pairs appearing in the data.
Decoding only considers those pairs.  Ties are broken in favor of the
phoneme/stress pair that appeared more frequently.)  This decoding
scheme is superior to decoding to the nearest syntactically legal
phoneme/stress pair. 


--Tom Dietterich