Path: utzoo!dptcdc!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hoptoad!tim From: tim@hoptoad.uucp (Tim Maroney) Newsgroups: comp.sys.mac.programmer Subject: Re: Reading Between the Lines Message-ID: <7021@hoptoad.uucp> Date: 16 Apr 89 22:16:07 GMT References: <451@biar.UUCP> <28839@apple.Apple.COM> <4012@ece-csc.UUCP> <6987@hoptoad.uucp> <4015@ece-csc.UUCP> <7015@hoptoad.uucp> <2551@cps3xx.UUCP> Reply-To: tim@hoptoad.UUCP (Tim Maroney) Distribution: na Organization: Eclectic Software, San Francisco Lines: 101 In article <2551@cps3xx.UUCP> rang@cpswh.cps.msu.edu (Anton Rang) writes: >1. Why should an OS provide newline support when high-level languages > also provide it? To make life easier for the developer of a HLL. > Also, suppose that a program uses both C and Pascal, using both > fgets() and readln(). If the OS provides the newline support then > you don't have (much) duplication of code in the support libraries. Could be true of Pascal, but not of C. C's "stdio" buffered i/o library does a lot more than just read lines. Most C compilers use code licensed from AT&T Bell Labs for at least some part of stdio, and this assumes an underlying OS file system is being used for block-structured reads. It would actually be considerably harder (and less efficient) to use the OS to do line-oriented reads. So, the OS might make it easier for a Pascal implementer to write readln, but it wouldn't help a C implementer, nor would it reduce functional overlap in library code between a program incorporating both C and Pascal. >2. Using individual read calls is slow; why use them? Well, they're > probably always slower than doing stuff at a very low level--I can > write my own disk I/O routines and read stuff faster by totally > bypassing the file manager. And break over LANs, other external file systems, new system releases, etc. > Just as one answer, maybe there's a > reason I don't want to allocate a big fixed-size buffer for > reading this file--after all, the smallest size which would make > sense for a buffer is a disk block. Maybe I'm trying to conserve > memory in an INIT; maybe I need to read the file without worrying > about running out of memory in the process. First, you allocate the buffer before you do any reading at all, so there's no chance you can run out in the middle of the operation. Second, you just get the biggest buffer you can given the current memory space limitations. If there's enough for the whole file, go for it; if there's only 512 bytes in the largest buffer you can allocate, use that instead. (Though if you're that low on storage, you probably won't be able to read in the file anyway....) >3. Why do stuff inefficiently during development which we'd make more > efficient for a production program anyway? Perhaps I'm porting a > program from another operating system. To the Mac? Maybe as an MPW Tool, but everyone who's tried to do this kind of porting on a real application has wound up with awfully ugly results. There's a real philosophical difference between prompt driven software (the computer telling the user what to do) and event driven software (the user telling the computer what to do). I can see porting specific libraries without user interfaces to the Mac, e.g., a B-tree database package for developers, but forget about porting ordinary programs. >4. A bit more complex. Joseph Hall claims that reading as much as > possible on each read call isn't necessarily the key to speed. > Tim says it's speculation. One point here--if allocating a 32K > buffer to read a text file quickly means swapping out 32K of code > from somewhere, this might be true. A procedure which counts the > number of lines in a text file may well find that using a huge > buffer is overkill. I have to admit -- I never swap out code. I use too many function pointers and segment unloading seems like an anachronism from the 128K Mac days. Now everybody gets a chance to take shots at me for not using this great feature of the Mac. One more point -- 32K is hardly a huge buffer on a megabyte machine. >5. A final note (of my own). Tim says that "if you're reading a line > at a time on any machine, it's likely you're taking a performance > hit." Just to make things a little more complicated, I'd just > like to say that there are systems which do NOT require any > specific character to mark the end of a line--if you say writeln() > it writes out your data, whether it contains ^M or ^J or whatever. > On these systems, reading data block-by-block and trying to figure > out the end of a line is either near-impossible or just plain slow. Er, good point. You're right. It's been so long since I've done any VMS programming that I forgot about line-structured files. Of course, the VMS people at DEC finally got around to implementing byte-stream files a few years ago, and everyone treated this as a great step forward.... >6. Tim says "And writing a loop to turn blocks into lines on your own > is so easy that a first-semester programmer could do it." > Probably true. But writing an *efficient* loop probably means > using assembly language, at least until some decent optimizing > compilers are widely available on the Mac. First, MPW C 3.0 is supposedly a pretty smart optimizer. Second, I don't agree. Any good compiler can create reasonably good code for a simple loop of this kind. With an old C compiler, you may have to use register declarations, but there's no reason a compiler can't produce code as good as assembler for a "for" loop. (I refuse to use register declarations in 1989; the techniques of register optimization have been well understood for more than a dozen years now, and a compiler that doesn't use them is brain damaged. I'm only using LSC now because my client preferred it.) -- Tim Maroney, Consultant, Eclectic Software, sun!hoptoad!tim "Next prefers its X and T capitalized. We'd prefer our name in lights in Vegas." -- Louis Trager, San Francisco Examiner