Xref: utzoo comp.ai:2407 comp.lang.prolog:1342 Path: utzoo!attcan!uunet!pyrdc!pyrnj!rutgers!orstcs!mist!tgd From: tgd@mist.cs.orst.edu (Tom Dietterich) Newsgroups: comp.ai,comp.lang.prolog Subject: Re: Concept Learning & ID3 (Quinlan) - in prolog Message-ID: <6934@orstcs.CS.ORST.EDU> Date: 18 Oct 88 16:27:45 GMT References: <395@uiag.UUCP> <39500@aero.ARPA> <543@quintus.UUCP> Sender: usenet@orstcs.CS.ORST.EDU Reply-To: tgd@mist.UUCP (Tom Dietterich) Organization: Oregon State University - CS - Corvallis, Oregon Lines: 31 There is evidence that the windowing feature of ID3 does not provide much benefit. Consult the following paper for details: Wirth, J. and Catlett, J. (1988). Experiments on the costs and benefits of windowing in ID3. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI. Available from Morgan-Kaufmann, Inc, Los Altos, CA. 87--99. Here is the abstract: "Quinlan's machine learning system ID3 uses a method called windowing to deal economically with large training sets. This paper describes a series of experiments performed to investigate the merits of this technique. In nearly every experiment, the use of windowing considerably increased the CPU requirements of ID3, but produced no significant benefits. We conclude that in noisy domains (where ID3 is now commonly used), windowing should be avoided." The paper reports several studies involving training sets as large as 20,000 examples. The authors state that if you have the physical memory to store the examples, it is best to avoid windowing. Windowing seems to work best on noise-free training sets where there are many redundant features. These turn out to be rather uncommon although the initial domains in which ID3 was developed had these properties. --Tom Dietterich tgd@cs.orst.edu Department of Computer Science Oregon State University Corvallis, OR 97331