Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!columbia!caip!im4u!milano!begeman From: begeman@milano.UUCP Newsgroups: net.cog-eng,net.research Subject: Re: Utterances Message-ID: <2134@milano.UUCP> Date: Fri, 22-Aug-86 15:00:40 EDT Article-I.D.: milano.2134 Posted: Fri Aug 22 15:00:40 1986 Date-Received: Fri, 22-Aug-86 22:17:22 EDT References: <639@bcsaic.UUCP> Sender: begeman@milano.UUCP Distribution: net Organization: MCC, Austin, TX Lines: 64 Summary: A statistical note Shakespeare and butterflys Xref: mnetor net.cog-eng:262 net.research:409 In article <639@bcsaic.UUCP>, douglas@bcsaic.UUCP (douglas schuler) writes: > I have a very simple question that I'd like an answer to. > > What percentage of the utterances made in the world have been made before? > An interesting, though slightly ambiguous question. What follows is a report of a statistical technique used to answer similar questions such as: How many words did Shakespeare know, but not use in his works? How many species of butterfly exist on this island, but which I did not capture? The January 24, 1986 issue of Science magazine (p. 335) has an article titled "Shakespeare's New Poem: An Ode to Statistics" which spells this out: In the 1940's, a biologist was collecting butterflys in Malaysia and noticed that he caught members of some species dozens of times, some species several times, and others just once. He told a statistician by the name of Sir Ronald Fisher what species he had seen (and how many times he had seen each), and then asked the question How many species are there that he did *not* see? Assuming that the butterflys were randomly captured in proportion to how many of each species there are, the question is, surprisingly, answerable. The same technique was recently applied to determine the authenticity of an alleged Shakespearean poem discovered in Oxford in 1985. Fortunately for the researchers, all of Shakespeare's works have been put into machine readable form. Using a modified "butterfly" technique, the researchers predicted how many words in a new work of a certain length would have been never used before, used just once before, used twice before, and so on. (The new poem, by the way, fell into the Shakespearean pattern beautifully - it is now considered to be an authentic "find".) For more details on the technique, please check out the original article in Science. There was a followup letter and clarification in the March 21, 1986 issue. Now as for how many utterances in the world have been made before, well, if we start counting.... References: R.A. Fisher, A.S. Corbet, C.B. Williams, "The relation between the number of species and the number of individuals in a random sample of an animal population," J. Anim. Ecol. 12, 42(1943). B. Efron and R. Thisted, "Estimating the number of unseen species: How many words did Shakespeare know?" Biometrica 63, 435(1976). "Shakespeare's New Poem: An Ode to Statistics," Science v.321 335(1986). -- Of all the things I've lost, I miss my mind the most. Michael L. Begeman Microelectronics and Computer Technology Corp Software Technology Program Austin (where the sun always shines) Texas uucp: {ihnp4, seismo, harvard, gatech, pyramid}!ut-sally!im4u!milano!begeman arpa: begeman@mcc.ARPA