Path: utzoo!mnetor!uunet!husc6!hao!oddjob!gargoyle!ihnp4!homxb!mtuxo!mtune!codas!novavax!murphy!dcornutt From: dcornutt@murphy.UUCP (Dave Cornutt) Newsgroups: comp.unix.wizards Subject: Re: globbing in the shell (Was Re: more rm insanity) Message-ID: <766@murphy.UUCP> Date: 7 Dec 87 22:14:28 GMT References: <1257@boulder.Colorado.EDU> <6840002@hpcllmv.HP.COM> <9555@mimsy.UUCP> <12441@think.UUCP> Organization: Gould CSD, Fort Lauderdale, FL Lines: 150 Summary: I don't wanna glob no file names, no sir! In article <12441@think.UUCP>, barmar@think.COM (Barry Margolin) writes: > In article <6774@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) ) writes: > >In fact that is a key "win" of UNIX over OSes that make applications deal with > >globbing. > > Ahah, now you've hit one of my favorite complaints about Unix. > > I do NOT think it is such a win that wildcard expansion is done by the > shell, at least not when it is done in the haphazard style that Unix > shells use. It assumes that all commands take filenames as arguments, > and that any argument with wildcard characters is supposed to be a > filename. > > A very common counterexample is grep. Its first argument will often > contain wildcard characters, for example > > grep "foo.*bar" The problem here is that there are so many punctuation characters that are special to the shell that you have to get in the habit of quoting the pattern anyway, just to be safe. I do agree with you that the "No match" is confusing to novice users. But then, they shouldn't be using C shell anyway. > I wonder how many new users get screwed when they forget to quote the > first argument and it says "No match" so they assume that none of the > files contain the pattern (I think the Bourne shell "solved" this > problem by making unmatched tokens expand into themselves, but the C > shell just aborts the line). If you set "nonomatch", it behaves like the Bourne shell. This is documented; it's been in there at least since 4.2. > Other commands may want to take > wildcards, although not necessarily to match filenames; for example > > who bar* > > should list all the logged-in users whose names begin with "bar" > (equivalent to "who | grep '^bar'"). > > It should be up to the command to decide the appropriate context for > treating arguments as pathnames and performing wildcard expansion. I've worked on a couple of systems that did this. It sounds great in principle, but if every command interprets the wild card characters differently, you will start to go nuts trying to remember what a wild card character does with a particular command. Take VMS BACKUP for instance. The wild card means different things depending on whether it's in the first arg or the second, or in a SELECT option, or what direction the data is going it, the phase of the moon, etc. It winds up being a very difficult-to-remember syntax. On the other hand, with C shell, if I put an asterisk in an arg, I *know* what it will do. > This way, a command that knows it is dangerous, such as rm, can check > whether it was called with a wildcard and perhaps be more careful. On > Multics, the delete command does exactly this, querying "Are you sure > you want to 'delete **' in ?" unless -force is specified. Barf! Gag! If I say 'rm *', I *mean* 'rm *'! Multics isn't exactly my model of an easy-to-use system. This is a little off the subject, though. I don't want to start the rm wars up again. I will just say that if you want a safe rm, there are plenty of ways to get one on Unix. On the other hand, if you want an "unsafe" rm in Multics, there is no convenient way to do it. I hate inflexible systems, especially ones that try to second-guess me. > Globbing in the shell also severely limits the syntax of commands; I > will admit that this could be seen as a benefit, because it forces > conformity, but sometimes a minor syntax change can be useful. For > example, there's no way to write a version of the cp or mv commands > that takes an alternating list of source and destination pathnames, > where the source pathnames are permitted to have wildcards. You also > can't do something like Multics's > > rename foo.** foo.bar.== > > (the == is replaced by whatever the ** matched) without writing a > complicated script that used grep and sed on the output of ls. > > Finally, even when an argument is a pathname, it is sometimes not > allowed to be multiple files. For example, diff takes pathnames, but > it requires exactly two of them, and ar allows only one archive > pathname to be specified. On Multics, a command with a syntax like > this can check whether the argument contains wildcards and complain. > Diff can check that it received exactly two pathnames, but it won't > know whether this is simply because one wildcard happened to match > exactly two files (maybe this was intentional on the user's part, but > maybe it wasn't), and ar will simply treat the extra arguments as > member files. You mean if I have two files named "foo.1xxx" and "foo.2yyy" and I want to diff them, you won't let me type "diff foo.*"? But I *want* to be able to do this! > So does this mean that globbing MUST be done by the commands > themselves? Well, yes and no. This is how it is done on Multics, > although the actual matching is done by a system call for filenames > (for efficiency, since Multics directories are not directly readable > by user-mode code, so it saves lots of data copying) and by a library > subroutine for non-filenames. VMS has a setup like this. It's an enormous pain in the ass. You have to call this RMS routine and give it the pattern, then keep calling another routine to get the names. And they aren't easy to use; there are all kinds of parameter blocks and things you have to set up. > Some more modern systems allow commands > to provide information to the command processor that tell it how to do > the automatic parsing; in this case, this data would specify which > arguments are pathnames that allow wildcards, and the command > processor would automatically perform the expansion in the right > cases. Again, VMS has just such a setup, and again, it's a pain in the ass. First of all, since the command parser has to have knowledge of what the commands are and what their syntax is, you have to load in a bunch of tables in order to set up new commands. (You can't just write a program and run it -- the only way to run something without setting up a syntax table is to use a RUN command, which does not allow parameter passing. And I challenge the notion that any system in which it is necessary to use a RUN command qualifies as a modern system.) The other thing is that, while you complain that sh and csh enforce a rigid command syntax, VMS DCL enforces an even more rigid one -- because so many things are special to the parser, and *there is no way to bypass them.* And the available syntax for command parsing (a programming language in itself) will never quite do exactly what you want. All in all, I think that it's not the way to go, because it removes flexibility and doesn't give you anything in return. Even the convenience of not having to parse the args yourself in your program is offset by the inconvenience of having to write a syntax table and load it into the DCL, and the gyrations that you have to go through to access the parsed parameters in the program. P.S.: in the column recently, I have seen a lot of talk about the virtues and failings of "the" UNIX user interface. My question is: what's this "the" stuff? Last time I checked, there was sh (Bourne shell, old and new), csh, ksh, tcsh, msh, and all manner of screen-oriented shell front ends. If you don't like any of these, you can install your own. That's one of the great virtues of UNIX! --- Dave Cornutt, Gould Computer Systems, Ft. Lauderdale, FL [Ignore header, mail to these addresses] UUCP: ...!{sun,pur-ee,brl-bmd,uunet,bcopen,rb-dc1}!gould!dcornutt or ...!{ucf-cs,allegra,codas,hcx1}!novavax!gould!dcornutt ARPA: dcornutt@gswd-vms.arpa "The opinions expressed herein are not necessarily those of my employer, not necessarily mine, and probably not necessary."