Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!hplabs!hpcc05!hpwrce!kingsley
From: kingsley@hpwrce.HP.COM (Kingsley Morse)
Newsgroups: comp.ai
Subject: Re: Re: The AI Breakthough -- What It Will Be Like !!!
Message-ID: <1870006@hpwrce.HP.COM>
Date: 2 Nov 90 18:32:11 GMT
References: <ERICH.90Oct31191951@near.cs.caltech.edu>
Organization: HP Western Response Center
Lines: 96


                              OPPORTUNITY

If CLASSIFICATION ALGORITHMs that don't use a lot of CPU on large problems 
exist, they could be applied widely in challenging areas such as speech 
recognition, forecasting the stock market, forecasting weather, artificial 
intelligence, pattern recognition, neural nets, etc..... Perhaps you're working
on one. See the end of this posting for a detailed description of this type 
of problem.


                               PROPOSAL

I propose we use this entry on the net as a grass roots "CLEARINGHOUSE" for 
people interested in such algorithms. For example, If you are working with an 
algorithm which classifies a LOT of data, you can tell others about it here;
the algorithm's name and computational complexity for large problems would 
be helpful. 

I've started the list, and I'm hoping people will contribute in kind. A 
natural consequence of enough contributions will be a comprehensive list 
that specifies the state of the art in classification.

PS: I've speculated on some of the complexity measures, please feel free
to correct them.


Algorithm                                      CPU time needed
  name                               Pre-processing            Classification

                                     (P = number of training patterns
                                      C = number of classes (bins)
                                      D = number of dimensions in each pattern
                                      * = multiplication
                                      ** = exponentiation
                                      x = a data dependent constant)
=========================================================================
Multiple regression                        P*D**3                     D
                                    
CART                                       (PlnP)*D*(C**x)         (x**D)*(lnP)

ID3                                        (PlnP)*D                (x**D)*(lnP)

nearest neighbor                           P*D                       P*D
                                
FACT                                       (PlnP)*D*(C**x)         (x**D)*(lnP)

FIRM                                       (PlnP)*D*(C**x)         (x**D)*(lnP)

Back propagation                           P**x                        D


Comments: 

FACT runs > 100 faster than CART.
FIRM is faster than CART or FACT.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                     Detailed Problem Description

Can you suggest a fast algorithm for the following classification problem?
I have millions of patterns, and the class that each belongs to. I want to 
use these patterns (and their corresponding classes) to predict which class
new patterns belong to. For example, say we already have the following 
historical data:

Pattern           historical patterns, ('Dimensions' or        historical Class
(or example)      'Keys')                                      (or 'bin') number
number
===============================================================================
 1                K(1,1)     K(1,2)   K(1,3) ..... K(1,D)      C(4)
 2                K(2,1)     K(2,2)   K(2,3) ..... K(2,D)      C(7)
 3                K(3,1)     K(3,2)   K(3,3) ..... K(3,D)      C(2)
 .                  .          .        .            .          . 
 .                  .          .        .            .          . 
 .                  .          .        .            .          . 
 .                  .          .        .            .          . 
 P                K(P,1)     K(P,2)   K(P,3) ..... K(P,D)      C(15)

I would like to use this historical data to classify new patterns whose class
is unknown. I've used class numbers of 4,7,2 and 15 as examples only. Many  
patterns will belong to the same class. I expect to have millions of 
historical patterns to start with (say "P" = O(1E6)), hundreds of dimensions
in each pattern (say "D" = O(1E2)) and hundreds of classes (say C = O(1E2)).

I'm looking for a FAST algorithm which uses this data to classify new patterns 
whose class is unknown. The algorithm will probably have two parts, each with
it's own computational complexity.

1.) Pre-processing the historical data once to facilitate classifying 
    subsequent new patterns.  
             
2.) Actually classifying new patterns