Xref: utzoo comp.arch:21375 comp.os.misc:1647 Path: utzoo!news-server.csri.toronto.edu!rutgers!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.arch,comp.os.misc Subject: Re: Globbing Message-ID: <5946:Mar1122:11:0691@kramden.acf.nyu.edu> Date: 11 Mar 91 22:11:06 GMT References: <19217@cbmvax.commodore.com> <5573:Feb2307:19:4491@kramden.acf.nyu.edu> <00085@meph.UUCP> Followup-To: comp.os.misc Organization: IR Lines: 127 In article <00085@meph.UUCP> gsarff@meph.UUCP writes: > >Name one thing that you could accomplish by moving globbing into > >programs---that you couldn't accomplish at least as easily by modifying > >the shell. After all, you're complaining about the user interface, and > >the shell is the program responsible for that interface. > Ok, one thing, modifying the shell to know about all the argument > types/usages of all the utilities you are going to run from it. This has nothing to do with globbing. (The easiest way to do this under current UNIXen is to have getopt() or parseargs() or your pet argument-processing library recognize some switch, like -, to report what it knows about the arguments recognized by the program. Then the shell can do the rest. Even this would be simpler if the shell did all argument processing to begin with, but it's too late for that change.) > >Here are some disadvantages: 1. Programs (such as shell scripts) often > >invoke other programs, even with (gasp) arguments. As is, it suffices to > >use an occasional -- to turn off all argument processing. With globbing > >in every program, this would become much harder. > Really? Yes, really. There are lots of examples of programs that exec other programs, from /bin/nice on up, not to mention shell scripts. If they don't glob their arguments, they're being inconsistent. If they do glob their arguments, then they have to quote them again for the sub-program. This is inefficient and IMNSFHO stupid. > WMCS: > wscan *.c include*.h > UNIX: > grep include\*.h *.c > Which is easier, or more intuitive? *.c is globbed the same way in both examples; the difference between wscan's include*.h and grep's 'include.*\.h' is just that grep has a more powerful pattern-matching syntax. This pattern-matching has nothing to do with globbing. Globbing is a certain type of pattern-matching *upon existing files*. > I have to remember to escape the *.h > field in UNIX. Obviously if the pattern-matcher and globber recognize the same characters, then you have to do *something* to say whether you're trying to glob or to pattern-match. You may believe that it's better to pass this information positionally than explicitly. In either case it's the shell's problem. > And what about the case where there are a _LOT_ of files in the > directory. I and many others have been pushing for utilities that understand (null-terminated) lists of filenames passed through a descriptor. Then as long as echo * (or echo0 *) works, you can pass arbitrarily many filenames to any program. You can already do this with find, of course, though its syntax is more powerful and hence less concise. > Which is easier now? Oh, the UNIX way, I should have thought of that and > used "find" or written a shell script on the command line and suffered the > process creation overhead as the thing loaded and ran grep 24,000 times, > silly me. It makes sense to me to say ``find every file in the current directory and its subdirectories, and print the null-terminated list on output; have the matcher read the null-terminated list from its input and search for a pattern in each file in that list.'' find . -print0 | match -i0 pattern Hardly inefficient. Current systems don't have this, but xargs does the job well enough. You want a more concise syntax? Fine. Put it into your shell. That's what shells are for. Different shells have different levels of support for different types of globbing. In any case there is absolutely no reason to stick the globbing logic into applications. > find / -name \* -exec grep include\*.h \{\} \; That seems an awfully complex way to write find / -exec grep 'include.h' '{}' \; Oh, by the way: Should find glob its arguments or not? Well? Should it pass the globbed arguments to grep or not? Should it quote the results of its globbing? > >3. Programmers shouldn't be forced to manually handle > >standard conventions just to write a conventional program. Ever heard of > >modularity? > Oh, but programmers and users should be forced to remember which arguments > need to be escaped and which don't, They don't. You quote everything that you don't want your shell to interpret. Done. > and remember that they can't put too many files in one > directory or all the unix utilities that use shell globbing will not work in > that directory? I agree that it is a problem that so many utilities refuse to take file lists from a descriptor. This is a good reason to make those utilities work better. This is not a reason to take globbing out of the shell. echo * works perfectly in every csh I've seen and the newer sh's. (For many applications it would make even more sense to have one stream encode not only the file names but their contents. This would solve problems like grep'ing through compressed files without making a specialized grep that understands compression. The streams could be in tar or cpio format, but those formats are both too complex and too restricted for general use. See my forthcoming article in comp.unix.shell.) > And this seems reasonable to you? Yes. > >4. The system is slow enough as is without every application scanning its > >arguments multiple times and opening up one directory after another. > Either the shell scans the directory or the utility does, how can one be > slower than the other? Again consider the case of applications with a syntax like that of /bin/nice. Do they scan their arguments or not? ---Dan