Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!ncifcrf!lhc!usenet From: usenet@nlm.nih.gov (usenet news poster) Newsgroups: comp.ai.neural-nets Subject: Re: Really smart systems Keywords: Complexity, interesting. Message-ID: <1990Aug17.211911.5619@nlm.nih.gov> Date: 17 Aug 90 21:19:11 GMT References: <3430007@hpwrce.HP.COM> <650892116.17086@minster.york.ac.uk> Reply-To: states@tech.NLM.NIH.GOV (David States) Organization: National Library of Medicine, Bethesda, Md. Lines: 70 r> russell@uk.ac.york.minster (russell) writes: km> in respone to kingsley@hpwrce.HP.COM (Kingsley Morse): km> I'll post some km> computational complexity figures for some common algorithms. If you add to km> (correct?) these, please do. The terminology that I propose is that km> linear computational complexity is better than polynomial which is km> better than exponential. Furthermore, the algorithms' scalability must be km> measured with respect to learning, recall, patterns, and dimensions... r> Well, if you want tuppeny-worth's thrown in, here's mine. Backprop has r> NO guarenteed convergence, therefore to quote computational complexity r> is misleading. It may be possible to give an "average case" complexity r> figure, but this is so dependent on the initial weight settings as to r> be not too much use. Tesauro and Janssens report empirical results r> between learning time and predicate order (q) of patterns. The net has r> q inputs, 2q nodes in the first layer, fully connected, and 1 output r> node. The task set is the parity function on the q bits. Leaning r> times in b-p scale as approx. 4^q. The task set is of size 2^q so the r> training time is about 2^q. Conclusion is that empirically learning r> time is of exponential order. r> r> The perceptron scales as exponential in input size (Hampson and Volper 1986). r> r> (See Neural Network Design and the Complexity of Learning by Judd for r> more.....I aint read it all yet, but it's interesting so far.....) r> r> I confess to not understanding quite what "Cart" means in the above r> table - I can make an educated guess, however..... From a separate communiction with km: CART means "Classification and Regression Tree". It is similar to ID3, and here's how it works. The training vectors are used to "grow" a decision tree. The decision tree can be used catagorize new inputs. OK, let me add a couple more cents to the pot. Classifications need to consider both the computational complexity and the storage requirements. Algorithms like nearest neighbor, CART? and ADAM? require that the complete training set be stored for recall while a perceptron needs storage of the order #inputs+#outputs, and a single hidden layer fully connected neural net needs (#inputs+#output)*#hidden_nodes. A second consideration is will the algorithm generalize? Ie., will it make use of information from more than one input pattern to formulate an output. So at the risk of grossly misquoting and being flamed horribly let me reorder the classification in terms of Np=#patterns, Ni=#inputs, No=#outputs, and Nh=#hidden nodes: Algorithm Learning time Recall time Storage Generalizes ------------------------------------------------------------------------------ Perceptron No*(x^Ni) No*Ni No*Ni limited Neural Net (1 hidden layer) Nh*(Ni+No) Nh*(Ni+No) yes Backprop x^(Nh*(Ni+No)) ? Conjugate gradient (Nh*(Ni+No))^3 - locally Monte Carlo x^(Nh*(Ni+No)) ? Nearest neighbor (Ni+No)*Np Ni*Np Np*(Ni+No) no Class & Reg. Tree Ni*(Np log Np) Ni*log Np Np*(Ni+No) no (CART) + (Ni+No)*Np ? + Ni*log Np Adaptive Memory Np? ? Np*(Ni+No)? no? (ADAM) David States