Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!cs.utexas.edu!hellgate.utah.edu!uplherc!wicat!meph!gsarff
From: gsarff@meph.UUCP (Gary Sarff)
Newsgroups: comp.arch
Subject: Re: Globbing
Message-ID: <00085@meph.UUCP>
Date: 8 Mar 91 18:24:58 GMT
References: <1991Feb18.152347.28521@dgbt.doc.ca> <474@bria> <19217@cbmvax.commodore.com> <5573:Feb2307:19:4491@kramden.acf.nyu.edu>
Reply-To: gsarff@meph.UUCP
Organization: WICAT Systems Inc., Orem Utah
Lines: 123

Sorry for the diatribe, but this posting struck a sensitive spot for me.

In article <5573:Feb2307:19:4491@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>> Unless all possible commands fit into the 
>> 	command [flags] arg1..argN globbed_filesystem_arg
>> model, you're pretty much in trouble if you only have shell globbing.
>
>Why? You didn't provide any justification for this statement.
>
>> Program driven globbing doesn't force inconsistency, and certainly shell
>> globbing doesn't force consistency, as UNIX is more than happy to prove to
>> anyone using it.
>
>Why? You didn't provide any justification for this statement.
>
>Name one thing that you could accomplish by moving globbing into
>programs---that you couldn't accomplish at least as easily by modifying
>the shell. After all, you're complaining about the user interface, and
>the shell is the program responsible for that interface.

Ok, one thing, modifying the shell to know about all the argument
types/usages of all the  utilities you are going to run from it.  It seems to
me that somebody has to type all that information in so the shell will know,
it isn't a mind reader.  And it seems to me that the information may have to
be entered more than once for the different shells, csh,sh,ksh,etc. _AND_
updated every time some user wants to add a program, again possibly multiple
times for the different flavours of shell on the system.  Is this easier than
_not_ adding code to the shell and _not_ entering argument information,
but doing it _ONCE_ when the program/utility is written?

>
>Here are some disadvantages: 1. Programs (such as shell scripts) often
>invoke other programs, even with (gasp) arguments. As is, it suffices to
>use an occasional -- to turn off all argument processing. With globbing
>in every program, this would become much harder. 

Really?  (as an aside, from looking at many shell scripts, the escaping of
shell metacharacters is hardly "occasional", and is one reason that some
people have the opinion, as I believe it was Jim Giles who started this by
saying that shell scripts appeared to him to be line noise.)

Let's take an example.  The OS I develop on at work has a pattern searching
utility similar to grep, and all of our OS utilities call a library routine
to parse their command lines.  As an example, suppose we want to scan all of
our .c source files in a directory for the string inlcude*.h  (include
followed by some possibly empty sequence, followed by ".h") We have (our OS
called WMCS) the following command lines:

Example 1:
WMCS:
    wscan *.c include*.h
UNIX:
    grep include\*.h *.c

    Which is easier, or more intuitive?  I have to remember to escape the *.h
    field in UNIX.  Or I could go to the trouble of turning the globbing off
    for this one command and then turning it back on.  

Example 2:

    And what about the case where there are a _LOT_ of files in the
    directory.  A customer sent in a WREN VII 1.2 Gigabyte hard drive a few
    weeks back and wanted us to rescue the data off of it and prepare a new
    drive to send back to him with his data intact.  One directory on that
    drive contained over 24,000 files!  (Big! application) My command still
    works (even delivered the files in alphabetical order!) The best unix
    could do was print "Command line too long", because I had used * for file
    wildcarding.  A lot of help that is, it didn't even scan my files!  

Which is easier now?  Oh, the UNIX way, I should have thought of that and
used "find" or written a shell script on the command line and suffered the
process creation overhead as the thing loaded and ran grep 24,000 times,
silly me.  Which looks easier now?

Example 3:

    And what about this case, I want to do the same scan on the entire disk,
    so for the two OS's we get the following:

wscan /*/*.c include*.h

Unix:
probably something like a shell loop or using find
find / -name \* -exec grep include\*.h \{\} \;

Now which is easier?  Five backslashes for UNIX, the perfect environment for
developers?  Bah! 

Or of course remember to turn the shell globbing off for the find and then
turn it back on.  Again Bah!

>3. Programmers shouldn't be forced to manually handle
>standard conventions just to write a conventional program. Ever heard of
>modularity?

Oh, but programmers and users should be forced to remember which arguments
need to be escaped and which don't, or turn off globbing and turn it back on,
or have to write shell scripts to hide the backslashes and the quotes from
the light of day, and remember that they can't put too many files in one
directory or all the unix utilities that use shell globbing will not work in
that directory?

And this seems reasonable to you?

Every time I have asked which seems easier above, I meant for the user.
Many of us here in comp.arch have "users" of our computers, our OS's, our
software, and in turn we ourselves are users of software and OS's to get our
work done.  I for one have more important things to do, like improving the
kernel and utilities, to spare time remembering what should be quoted and
what should not.

>4. The system is slow enough as is without every application  scanning its
>arguments multiple times and opening up one directory after another.

Either the shell scans the directory or the utility does, how can one be
slower than the other?

---------------------------------------------------------------------------
Do memory page swapping to floppies?, I said, yes we can do that, but you 
haven't lived until you see our machine do swapping over a 1200 Baud modem
line, and keep on ticking.
     ..gsarff@meph.UUCP