Xref: utzoo comp.ai:2389 comp.lang.prolog:1332
Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mailrus!cornell!uw-beaver!teknowledge-vaxc!sri-unix!quintus!ok
From: ok@quintus.uucp (Richard A. O'Keefe)
Newsgroups: comp.ai,comp.lang.prolog
Subject: Re: Concept Learning & ID3 (Quinlan) - in prolog
Message-ID: <543@quintus.UUCP>
Date: 16 Oct 88 01:39:59 GMT
References: <395@uiag.UUCP> <39500@aero.ARPA>
Sender: news@quintus.UUCP
Reply-To: ok@quintus.UUCP (Richard A. O'Keefe)
Organization: Quintus Computer Systems, Inc.
Lines: 58

In article <39500@aero.ARPA> mcguire@aero.aerospace.org (Rod McGuire) writes:
>In article <395@uiag.UUCP> gerrit@uiag.UUCP (Cap Gerrit) writes:
>>Does anybody out there in the world has an implementation of
>>the ID3 algorithm of Quinlan


>Prolog is one of the worst possible languages in which to write the
>ID3 algorithm. I don't know if it can be written to be efficient.
>The problem is that one needs to count up statistics in parallel and
>then somewhat randomly access these statistics.  While in prolog it
>is possible to kludge up arrays with mutable elements, 

There is not the slightest need for mutable arrays in an implementation
of ID3 or similar algorithms, and Prolog is not at all a poor choice.
The Iterative Dichotomiser involves two kinds of steps:

    (1) making a sweep through the training set collecting a random
	sample of *incorrectly predicted* examples to add to the
	"window" (this is the "Iterative" part)
    (2) doing a kind of back-to-front radix sort on the contents of
	the "window" to turn it into a decision tree (this is the
	"Dichotomiser" part)

>Since the ID3 algorithm is usually applied to a moderate amount of
>data (let's say, to construct a tree discriminating 10 classes from
>analysis of 5000 attribute vectors each with 20 attributes that can

If you are working with tiny training sets like that, you don't need
ID3.  Quinlan's innovation was the "windowing" technique -- forming
decision trees is nothing new, you will even find a tiny Fortran
implementation in Algorithm AS 165, JRSS -- which permitted him to
work with training sets having millions of examples.  The "windowing"
idea can be applied to many induction schemes:

    {Initialise}
	set the window to a random sample of N1 examples.
    {Induce}
	induce a "rule" from the current window.
    {Evaluate}
	make a pass through the complete training set,
	adding a random sample of N2 examples which are incorrectly
	classified (if fewer than N2 misclassifications, take all).
	If performance was adequate, stop.
    {Iterate}
	Go back to {Induce}.

Two references:
	"Discovering rules by induction from large collections of examples",
	Quinlan, J.R.
	(in) Expert Systems in the micro-electronic age, (Ed. D. Michie)
	Edinburgh University Press, 1979

	"Learning efficient classification procedures",
	Quinlan, J.R.
	(in) Machine learning: an artificial intelligence approach
	Eds Michalski, Carbonell, & Mitchell
	Tioga press, 1983
The algorithm descriptions in these articles are quite clear.