Path: utzoo!mnetor!uunet!husc6!yale!decvax!ucbvax!CS.ROCHESTER.EDU!nl-kr-request From: nl-kr-request@CS.ROCHESTER.EDU (NL-KR Moderator Brad Miller) Newsgroups: comp.ai.nlang-know-rep Subject: NL-KR Digest Volume 4 No. 32 Message-ID: <8803250007.AA15203@castor.cs.rochester.edu> Date: 24 Mar 88 21:03:00 GMT Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: nl-kr@cs.rochester.edu Organization: University of Rochester, Department of Computer Science Lines: 652 Approved: nl-kr@cs.rochester.edu NL-KR Digest (3/24/88 16:02:28) Volume 4 Number 32 Today's Topics: Re: Linguistic Theories What is a grammar (for) What are grammars (for)? -- Filters left-associative generation I need words UPSID pro-drop English "located" pro-drop Submissions: NL-KR@CS.ROCHESTER.EDU Requests, policy: NL-KR-REQUEST@CS.ROCHESTER.EDU ---------------------------------------------------------------------- Date: Fri, 11 Mar 88 03:30 EST From: Celso Alvarez Subject: Re: Linguistic Theories In article <67@dogie.edu> edwards@dogie.macc.wisc.edu ( Mark Edwards) writes: > I think I could argue that the US as a whole has less culture than >japan, if only on a certain level. For instance, I would address my >older brother as "oniisan" (older brother), my older sister as "oneesan" >(older sister). I would probably almost never use their real names. It means that the unmarked usage of address terms according to Japanese communicative conventions is the signaling of "positional" vs. "personal" roles. This characterizes their interactions as what other cultures would regard as "formal" (cf. formal events in many societies where participants are addressed by their situated roles as "president", "chairperson", etc.) (LINES DELETED) > Perhaps this just shows that there are more interpersonal relationships in >japan than there is here. I wouldn't say "more", but *different* types of social relationships -- or, rather, different ways of marking them in conversation by evoking different interactional roles. C.A. (sp202-ad@garnet.berkeley.edu.UUCP) ------------------------------ Date: Wed, 23 Mar 88 14:23 EST From: William L. Rupp Subject: Re: Linguistic Theories ->In article <3630@killer.UUCP> elg@killer.UUCP (Eric Green) writes: ->> ->>I also seem to vaguely recall that some time in the 15th? century, the King of ->>Spain called together a bunch of scholars, and standardized the Spanish ->>language to great extent. They couldn't straighten out things like soy es est, ->>apparently, but they did standardize the grammar for new words. And the ->>spelling. Ah yes the spelling. Spanish spelling is almost perfectly ->>phonetical. Ya say it the way it spells, neat! No, that is not true. It is a misnomer that Spanish is a phonetic language. It is true that you do not run into such things a the 'gh' of 'tough' versus the 'gh' of 'ghost' versus the 'gh' of 'thought.' That is all to the good, but does not change the fact that all languages have dialects whose distinctive pronunciations are not reflected in the written language. Consider the words 'greasy' and 'idea' in English. In the North and West, the 's' of 'greasy' is unvoiced. Head south, however, and you will hear a voiced 's' (greazy). The word 'idea' ends in an 'r' sound when spoken by many New Englanders, but not by speakers in most other areas of the country. And what about "Earl bought some oil for his car" when spoken by a New Yorker. Wouldn't that sound more like "Oil bought some earl for his car"? Well, the same thing happens in Spanish, and in all languages. A standard spelling (orthography) must serve to represent all pronunciations given to words by the various speakers of the language. This may not have been quite what you had in mind, but I think it is valuable to understand how languages work. What it amounts to is this; a language is phonetic as long as you do not consider different dialects to be legitimate pronunciations of your language. However much one may prefer his or her speech to that of people across the country, few of us would say they are not speaking the same language. As long as that is true, no written language is truly phonetic, with the exception of the phonetic symbols used by linguists. The antiquated spelling of English is particularly annoying, I admit. The point is, every language is going to have antiquated spelling eventually because spoken languages change, whereas written languages stay the same. By the way, that makes me think of the Spanish Academy issue you brought up. A language that has had some degree of regularity imposed upon it in the written area may be worse off in the long run because the written language is not allowed to change with changing speech patterns. That is most obvious in English. By the way, my credentials for commenting in this area, such as they are; B.A (Spanish major, German minor), graduate work in Spanish at U.C.L.A.(two years), 17 years secondary teacher, Spanish, German, English, etc. Bill ====================================================================== I speak for myself, and not on behalf of any other person or organization ..........................How's that, Gary? ====================================================================== ------------------------------ Date: Fri, 11 Mar 88 11:25 EST From: Bruce E. Nevin Subject: What is a grammar (for) I apologize for the long delay responding. Other obligations interfere with timeliness and will continue to do so. NL-KR Digest (2/29/88 23:51:29) Volume 4 Number 22 From: John Nerbonne JN> to speak of a formalism as (over)generating is a category JN> error. The formalism doesn't generate at all. . . . It just allows JN> the formulation of rules and grammars whose job is to generate. Sorry that my use of synecdoche was confusing. This is not at all an important point in my message, so I didn't give it the care and precision that perhaps I ought to have. (The relation of form to information is far more important than the status of the term "generative", for example.) JN> Furthermore, it is trivial to avoid overgeneration within any JN> formalism; just be very cautious in what you include in the JN> grammar. It's getting things just right that's hard. "Trivial but very hard." A curiously equivocal understatement! What I had in mind was that language-like mathematical systems of the sort enumerated all have characteristics (or generate structures that have characteristics) that natural languages lack. As a simple example, PSG rewrite rules can easily and quite naturally generate strings of the form pn pn-1 . . . p2 p1 q1 q2 . . . qn-1 qn but structures of this sort (to my knowledge) never occur in natural languages. There is something inherently disparate about such formal systems and natural language. (An otherwise intelligent reviewer once objected that the "respectively" construction in English is an example of {S -> pSq, S -> pq}, but a moment's thought should demonstrate that it is not.) You have given examples of individual sentences that are marginal for one reason or other. I am talking about structures (corresponding to like-formed sets of sentences or more-or-less-near-sentences or non-sentences). The alternative paradigm to which I have referred is concerned not with language-like formal systems that may have some relation to natural language, but rather with the structure of natural language as a mathematical object, in terms of sets and mappings and the like. If this seems unclear or unreasonable to you the fault I am sure must be mine and I urge you to read at least one of the books I have cited. JN> . . . Chomsky introduced the notion "generative" . . . as referring JN> to any attempt to describe the syntax of natural language in a JN> way complete enough to define precisely what is in the language. By this definition, Gross's lexicon-grammar is a generative grammar, and Gross argues that TG work is not complete enough to qualify. Of course the Lexicon-Grammar of Gross is not an instance of Generative Grammar. Hence the observation that the (capitalized) term has become a mere trademark. You have not denied this. JN> He seems to have assumed that this must be a generating device, JN> or that it might as well be (which assumption has since been JN> discredited, of course, but not too easily). Seems to have? I thought this assumption was pretty unequivocal. Could you elaborate? BN> The Constructive Grammar of Harris (construction-reduction grammar, BN> composition-reduction grammar, operator-argument grammar) is an example BN> of [...] a mathematical theory of precisely those relations and BN> operations that suffice for language. It does not overgenerate. JN> I can only take this to mean (a) that Con G doesn't attempt a generative JN> characterization at all, in which case the news that it doesn't JN> overgenerate is tautological, and not tidings of comfort or joy; How do you get from the above to the notion that CG is not an "attempt to describe the syntax of natural language in a way complete enough to define precisely what is in the language"? Or are you now using a different definition of "generative"? (And what rhetorical purpose can you have substituting "Con" for "C"? Please explain.) Again, I am referring to generation of structures. At the "growing edge" of language change there are unavoidably sentences and sets of sentences (structures) that are more or less marginal. Some are on their way out of the language, some on their way in, some of currently indeterminate status. I do not consider this "overgeneration" in the same sense. All of this business of marginally acceptable sentences becomes much more regular and tidy in sublanguage grammar, and also somewhat more regular when you limit the grammar to a single regional dialect and social dialect, so the relation between contiguous grammars ("transfer grammar") is the key to forging some kind of description of the "language as a whole". This is also the key to understanding language change. To my knowledge, there is no work on sublanguage grammar, next to none on dialect, and nothing coherent on language change in the various flavors of Generative Grammar. Please correct me if I'm wrong. (Maybe Kiparsky's done something recent that does more than polish a synchronic axe with isolated diachronic examples.) "A grammar that is generative in Chomsky's sense (a complete characterization of what is in the language)" is not possible when one assumes that a language as a whole comprises a set (infinite) of sentences that is syncronically characterizable. That assumption is contrary to fact. A synchronic grammar of a sublanguage may be possible. The relations between sublanguages are not well understood, and attempts to make a grammar of the language as a whole ignoring sublanguage can only founder in the complexities of those relations. JN> or (b) that it can provide a generative characterization of English JN> syntax. Since this would be a solution (even though not a unique solution) JN> to all the problems of descriptive syntax, my confidence in the JN> scholarly community is such that I think would have heard more of JN> the details by now. I just don't believe (b). Con G [sic] must be a JN> different game, so to speak. Come, now, this is a blatant instance of the kind of argument that folks have been objecting to--some of us claiming that it is an offensive characteristic of Generativist argumentation, others defensively denying that they do any such thing. "Nobody I respect believes that, so it must not be true." (The medieval schoolmen had some Aristotelian term for it, ad hominem is not the one. Maybe ad vericundiam?) The problem is that the Generative community is provincial and self-involved and ignores things outside their own closed system of bibliographical citations. I have given you a number of references to examine, and I assure you they accessible and acceptable as literature for the "scholarly community" of linguistics. Frawley's review of GEMP in _Language 60.1:180 (1988) displays a number of profound misconstruals. For instance, the claim that Harris "relies essentially on a first-order predicate logic consisting of operators and arguments for a notational system to capture the fundamental lexical collocations of the language" is an absurd inversion--the system arrived at turns out to have some superficial resemblances to predicate calculus, but the latter was not applied as a formal system to explain the linguistic data, and to suppose it was is to presume the approach exemplified by Generative Grammar, starting with a more or less well-understood formal system as a framework and trying to make the facts of language fit into it. Bruce Nevin bn@cch.bbn.com ------------------------------ Date: Fri, 11 Mar 88 12:46 EST From: John Nerbonne Subject: What are Grammars for? In reply to: Date: Wed, 2 Mar 88 11:55 EST From: Rick Wojcik Subject: What are grammars (for)? RW> Generative grammars do not reduce the problem of multiple parses. You RW> don't get multiple parses in the first place without grammars. There was a time, not too long ago, when the use of grammar in parsing (in natural language processing) was regarded as controversial. Parsing, e.g. as conceived by Riesbeck and by Wilks, consisted of assigning a semantics to strings, and multiple parses occurred when more than one semantics was assigned. (There was even a time, in pre-ALGOL days, when the use of grammars in parsing (in compiling) was unknown.) I realize there's an equivocation in 'parsing' that is in play here. Parsing may be conceived as basically semantic analysis, in which case the use of grammar is an orthogonal issue; or it may be thought of as grammatical analysis, in which case some grammar is at least implicit in the procedures. The point, however, is that natural language expressions are ambiguous, and that the information in grammars may be used to reduce the degree of ambiguity one would otherwise postulate. This is useful in building natural language understanding systems. If it's now regarded as inconceivable that one might obtain multiple parses except in using grammar, I suggest that that's because it is now regarded as nearly inconceivable not to use the grammar-based approach. We all do. RW> [...] In fact, RW> generative linguistic theory is not designed to explain how grammars are RW> used in language understanding. It is left up to the psychologist or RW> computer scientist to address the issue. This is correct, but beside the point. Computers were designed as numerical calculators, but turned out to be splendid information storage devices. Generative grammars were proposed as standards of precision in linguistic analysis, but turn out to be useful in language processing. --John Nerbonne nerbonne@hplabs.hp.com ------------------------------ Date: Fri, 11 Mar 88 20:17 EST From: HESTVIK%BRANDEIS.BITNET@MITVMA.MIT.EDU Subject: what grammar is for (final version of reply to Rick Wojcik) From: BINAH::HESTVIK "Arild Hestvik, Brandeis University" 11-MAR-1988 20: 04 To: Orig_To! nl-kr@cs.rochester.edu, HESTVIK Subj: what grammar is for (reply to Rick Wojcek) Rick Wojcik replied to me: Arild Hestvik (2/25) writes: AH> AH> be of no use in understanding natural language. We want the grammar to tel l AH> us *why* a string is grammatical or ungrammatical, i.e. the grammar should AH> give a structural description (an analysis) of both well-formed AND AH> ill-formed expressions. RW>One can use a generative grammar to give structural analyses to parts of RW>ill-formed strings. This is one possible use of a chart parser in NLP RW>systems. But having a lot of well-formed pieces does not tell you how RW>to render the expression interpretable. There are so many reasons why a RW>string could be ungrammatical that there is really no hope of building RW>an automated string-repair device into a grammar. Given the way in RW>which we currently define grammars, it is self-contradictory to talk RW>about grammars that analyze ill-formedness. You would have to develop a RW>concept of well-formed ill-formedness. You have misunderstood. As has been pointed out repeatedly by Chomsky, the notion "grammar" is ambiguous between the two meanings: (i) the theory about the grammar, and (ii) the grammar itself. The THEORY of the grammar will tell us WHY something is illformed or wellformed, just as much as a theory of physics will tell us why, if you drop a stone, it falls to the ground and doesn't instead go flying up in the sky, which certainly is imaginable. The action of not falling to the ground is comparable to say, an ill-formed sentence from the point of view of theoretical linguistics. Of course, there could be many reasons why the stone would not fall to the ground! For example, someone might blow water at the stone with a garden hose. Certainly it would be absurd to require the theory of physics to explain that fact. Similarly, there may be many reasons why a sentence is perceived as ungrammatical; maybe someone threw a firecracker at the moment I said it and you missed a word. But the sentence 'How did you wonder whether Bill fixed the car', with the intended reading that 'how' is a question about the manner of fixing, is ill-formed for a very specific reason from the point of view of theoretical linguistics. These are the *kinds* of ill-formed sentences we want the grammar to tell us something about. We want to know this because apparently similiar sentences, like 'How did you say that John fixed the car' is perfectly fine with 'how' modifying 'fix'. That is, something illformed is only interesting from the point of view of a theoretical issue. Of course, the sentence 'John John John John' is ill-formed for very many reasons (too numerous and unknown to be listed), but the point is that for whatever reason, it is not interesting from the point of view of current research. It simply doesn't tell us anything about the questions we are interested in. It appears that Rick Wojcik thinks that the main interest of linguists is empirical coverage (i.e. to account for any possible string of words you might care to put together). However, that would be very misleading (at least for part of the field). Rather, the main interest is to try to understand the very nature of grammars under (ii) above, namely the psychologically represented mechanism that underlies e.g. language acquisition and language processing. Arild Hestvik Dept. of Psychology Brandeis University [hestvik@brandeis.bitnet] ------------------------------ Date: Tue, 15 Mar 88 15:54 EST From: Stephan Busemann Subject: What are grammars (for)? -- Filters In reply to John Nerbonne's article concerning filters in NL-KR Digest 4 (24): 1. Metagrammar, grammar, and formalism JN> The concept of "generation" in GPSG is twofold due to its scheme JN> of first generating a grammar, then having the grammar generate the JN> object language. JN> Schematically: Metagrammar ==> Grammar ==> Language JN> In any case, the GPSG concepts refer to metagrammatical devices used JN> in specifying grammars, not ones used directly for language JN> generation. I don't think so. As belonging to the metagrammar I consider metarules or notational requisites like H (for Head, whatever category this may be in a rule). You can expand metarules to get a set of ID rules, which is part of the grammar. Similarly, you can replace a symbol like H by every possible Head category, thereby extending your set of ID rules, and get rid of the meta stuff. This would cover the first '==>' in JN's scheme. What is certainly not intended in this first step is to apply Feature Cooccurrence Restrictions (FCRs), Feature Instantiation Principles (FIPs) and so on; this would require to enumerate the admissible syntactic structures of a language! I find it useful to introduce the notion of formalism as opposed to grammar. The formalism does not depend on properties of some particular language while a grammar (the set of ID rules, LP statements, FCRs) specifies the language-particular properties. The formalism comprises e.g. the FIPs and the definitions of the nature of FCRs, ID rules, local trees etc. (This distinction is often referred to as universal vs. language-specific part of grammar.) From a computational point of view, the formalism represents the machinery to 'run a grammar'. It is depicted by the second '==>' in JN's schema. What I considered to be filters therefore applies to the grammar. 2. GPSG has no filters JN> A filter in grammar is a device that allows an otherwise legitimate JN> derivation to abort. You run the rules, get a structure, then check JN> it against the filter. The filter throws some things out. I agree: Such things should not happen in GPSG (and at least don't happen with the examples under consideration). However, there ARE things thrown out, but these are no derivations. Rather they are items that follow the definitions (of a category or a local tree) but do not belong to the relations specified by FCRs or FIPs. 3. GPSG-based processing requires filtering JN> SB says that Feature Cooccurrence Restrictions etc. are nothing but JN> filters, but the analogy is poor. A FCR simply restricts the JN> categories in a language so that we know e.g. that NP[subcat -] JN> is an available category and NP[subcat +] is not. In a standard JN> CFG specification, this is achieved via the provision of nonterminal JN> symbols, not via filters. Similarly, it seems to me for linear JN> precedence principles and feature instantiation principles. This seems to bring the descriptive aspect to the fore. From a computational point of view, however, we can not simply have some FCR decide upon each category whether or not it is legal. Before this can happen, we have to generate (write down) every category. Unfortunately this turns out to be expensive (Ristad gives a lower bound of 10exp774 for the grammar in Gazdar et al.85; see Procs. 24th ACL p.31). Most of the categories will then be thrown out by virtue of the FCRs. It is important to note that this depends on the way FCRs are supposed to work and not on the formulation of some particular FCR (so it does not depend on language-particular things)! This process I called 'filter'. The whole point can be clarified by distinguishing descriptive and procedural aspects of a grammar formalism. Looking at the descriptive aspects of filters and GPSG devices, we note that the former are (refer to a) part of the grammar while the latter are not; consequently, the former abort derivations while the latter prevent derivations from being generated. This is one of JN's points (as I understood him). Looking from a procedural (computational) point of view, filters as well as GPSG devices decide about the wellformedness of some input. They behave in an identical way. This was the basis for my analogy. What I find most interesting in this discussion is that it's obviously necessary to bear both sides of the medal in mind, the descriptive as well as the procedural one. Stephan Busemann busemann@db0tui11.bitnet ------------------------------ Date: Mon, 14 Mar 88 12:00 EST From: a.e. mossberg Subject: left-associative generation I would appreciate references to work on left-associative grammar used in text generation, including summaries of work in progress. I would also be interested in any further work after Hausser 1986 in left associative analysis. Andrew Mossberg aem@miavax.miami.edu Univ. of Miami Dept of Math and Computer Science ------------------------------ Date: Tue, 22 Mar 88 14:29 EST From: Fridrik Skulason Subject: I need words Since there is no comp.data.wanted group ... Can anyone help me with the following: What I need are lists of ~100.000 words in various languages, English, German, and the scandinavian languages. The English list should be easy, since some versions of 'spell' come with a uncompressed, public-domain list, but does anyone have such lists in the other languages ? I already have such a list for Icelandic, but I really need it for the other languages. -- Fridrik Skulason University of Iceland UUCP frisk@rhi.uucp BIX frisk This line intentionally left blank ................... ------------------------------ Date: Mon, 21 Mar 88 23:55 EST From: Mark William Hopkins Subject: UPSID Does anyone have any information as to where I can get an on-line copy of the UCLA Phonological Inventory (UPSID)? or the Stanford inventory? ------------------------------ Date: Sun, 20 Mar 88 19:41 EST From: kathryn henniss Subject: pro-drop Re: prodrop Just to hammer one more nail in the coffin of the recent conjecture that pro-drop is necessarily linked to to verbal agreement morphology... Malayalam (a Dravidian language) has subject-drop, object-drop, indirect-object drop and even preposition-phrase drop; basically, any argument of a verb may be omitted. The interpretation of these null arguments may be definite (when there is a discourse context from which the reference of the omitted arguments may be inferred) or indefinite (in the absence of such a context). And Malayalam has no morphological agreement whatsoever. I suspect that there is a difference between subject pro-drop and non-subject pro-drop (i.e. there may be something behind the null-subject parameter of GB, much as I hate to admit it), since there are LOTS of languages with subject pro-drop, and not so many with non-subject pro-drop. Furthermore, there is an implicational relationship between the two: no languages systematically allow non-subjects to be dropped, while requiring overt subjects; yet all languages with non-subject pro-drop also have subject pro-drop. Kathryn Henniss -- +-----------------------------------------------------------------+ | Kathryn Henniss Department of Linguistics | | henniss@csli.stanford.edu Stanford University | +-----------------------------------------------------------------+ ------------------------------ Date: Tue, 22 Mar 88 04:56 EST From: Rod McGuire Subject: English "located" pro-drop In this note I take "pro-drop" to mean omission of an NP because there is enough context to reconstruct it's semantic presence. (as opposed to being some formal feature defined in somebody's pet theory). In many cases a pro-dropped sentence can have the missing pronouns reinserted and be a perfectly acceptable sentence, but not always, as we shall see. In article <2935@csli.STANFORD.EDU> henniss@csli.UUCP (kathryn henniss) writes: >Just to hammer one more nail in the coffin of the recent >conjecture that pro-drop is necessarily linked to to verbal >agreement morphology... ..... >I suspect that there is a difference between subject pro-drop >and non-subject pro-drop (i.e. there may be something behind >the null-subject parameter of GB, much as I hate to admit it), >since there are LOTS of languages with subject pro-drop, and >not so many with non-subject pro-drop. Furthermore, there is >an implicational relationship between the two: no languages >systematically allow non-subjects to be dropped, while requiring >overt subjects; yet all languages with non-subject pro-drop also >have subject pro-drop. Even though proper English does not allow subject pro-drop, it occurs quite often in a context that I will call "located" NPs. These are cases where enough spatial context has been established so that stating a relation can imply the object of the relation - for example, objects of prepositions: Fred got too close to the edge of the cliff and fell off [the cliff]. A bus pulled up and I got on [the bus]. With transfer verbs (taking an object and destination) the dropped NP can create an orphan preposition that behaves suspiciously like a particle (i.e. it is attracted to the verb): Here's the barbecue grill. Can I put on a burger for you? (= put a burger on it/the grill) It seems weird to consider such constructions as containing lexical particles, in the same sense that "up" in "throw up" (= vomit) is. Yet, they do not allow reinsertion of pronouns in the attracted form: *Can I put on it a burger for you? [Can your theory handle this?] This example brings up the peculiarities of "dative movement" (which happen with verbs describing transfer of possession). It is well known that certain verbs allow an indirect object to appear directly after the verb (often prefer it if IO is a pronoun), while other possession transfer verbs do not permit this: John gave Mary a book John gave a book to Mary Fred returned the book to the library. *Fred returned the library the book. What never seems to be mentioned in discussions of dative movement is that the verbs that do not allow IO attraction, do (as far as I know) allow the IO to be dropped. And those that do don't: Fred took out a book from the library, and returned it the next day. *Mary asked John for the book so John gave it. My pet theory is that 1) verbs such a "return" are semantically rich in that they establish an abstract spatial framework (Schank's scripts, Fillmore's frames) that locates the recipient of the transfer. It is impossible to have a "return" event without having a "borrow" event, and knowing who an object was borrowed from tells you who it will be returned to. Verbs such as "give" tell you next to nothing about the recipient. 2) Dative attraction to a verb usually happens in situations where the IO is "given" information. For verbs that locate their IO, since the IO is given it can be omitted. Why this is required is beyond me. Maybe languages like to have nice neat patterns, or maybe linguists like to find them when they are only partially there. ------------------------------ End of NL-KR Digest *******************