Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!hplabs!hpcc05!hpwrce!kingsley From: kingsley@hpwrce.HP.COM (Kingsley Morse) Newsgroups: comp.ai Subject: Re: Re: The AI Breakthough -- What It Will Be Like !!! Message-ID: <1870006@hpwrce.HP.COM> Date: 2 Nov 90 18:32:11 GMT References: Organization: HP Western Response Center Lines: 96 OPPORTUNITY If CLASSIFICATION ALGORITHMs that don't use a lot of CPU on large problems exist, they could be applied widely in challenging areas such as speech recognition, forecasting the stock market, forecasting weather, artificial intelligence, pattern recognition, neural nets, etc..... Perhaps you're working on one. See the end of this posting for a detailed description of this type of problem. PROPOSAL I propose we use this entry on the net as a grass roots "CLEARINGHOUSE" for people interested in such algorithms. For example, If you are working with an algorithm which classifies a LOT of data, you can tell others about it here; the algorithm's name and computational complexity for large problems would be helpful. I've started the list, and I'm hoping people will contribute in kind. A natural consequence of enough contributions will be a comprehensive list that specifies the state of the art in classification. PS: I've speculated on some of the complexity measures, please feel free to correct them. Algorithm CPU time needed name Pre-processing Classification (P = number of training patterns C = number of classes (bins) D = number of dimensions in each pattern * = multiplication ** = exponentiation x = a data dependent constant) ========================================================================= Multiple regression P*D**3 D CART (PlnP)*D*(C**x) (x**D)*(lnP) ID3 (PlnP)*D (x**D)*(lnP) nearest neighbor P*D P*D FACT (PlnP)*D*(C**x) (x**D)*(lnP) FIRM (PlnP)*D*(C**x) (x**D)*(lnP) Back propagation P**x D Comments: FACT runs > 100 faster than CART. FIRM is faster than CART or FACT. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Detailed Problem Description Can you suggest a fast algorithm for the following classification problem? I have millions of patterns, and the class that each belongs to. I want to use these patterns (and their corresponding classes) to predict which class new patterns belong to. For example, say we already have the following historical data: Pattern historical patterns, ('Dimensions' or historical Class (or example) 'Keys') (or 'bin') number number =============================================================================== 1 K(1,1) K(1,2) K(1,3) ..... K(1,D) C(4) 2 K(2,1) K(2,2) K(2,3) ..... K(2,D) C(7) 3 K(3,1) K(3,2) K(3,3) ..... K(3,D) C(2) . . . . . . . . . . . . . . . . . . . . . . . . P K(P,1) K(P,2) K(P,3) ..... K(P,D) C(15) I would like to use this historical data to classify new patterns whose class is unknown. I've used class numbers of 4,7,2 and 15 as examples only. Many patterns will belong to the same class. I expect to have millions of historical patterns to start with (say "P" = O(1E6)), hundreds of dimensions in each pattern (say "D" = O(1E2)) and hundreds of classes (say C = O(1E2)). I'm looking for a FAST algorithm which uses this data to classify new patterns whose class is unknown. The algorithm will probably have two parts, each with it's own computational complexity. 1.) Pre-processing the historical data once to facilitate classifying subsequent new patterns. 2.) Actually classifying new patterns