Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!rutgers!netnews.upenn.edu!eecae!cps3xx!rang
From: rang@cpsin3.cps.msu.edu (Anton Rang)
Newsgroups: comp.sys.mac.programmer
Subject: Re: Reading Between the Lines
Summary: There are reasons to have newline support in the OS.
Keywords: newline, OS support, reading lines from files
Message-ID: <2551@cps3xx.UUCP>
Date: 16 Apr 89 00:22:57 GMT
References: <451@biar.UUCP> <28839@apple.Apple.COM> <4012@ece-csc.UUCP> <6987@hoptoad.uucp> <4015@ece-csc.UUCP> <7015@hoptoad.uucp>
Sender: usenet@cps3xx.UUCP
Reply-To: rang@cpswh.cps.msu.edu (Anton Rang)
Distribution: na
Organization: Michigan State University, Computer Science Dept.
Lines: 58
In-reply-to: tim@hoptoad.uucp's message of 15 Apr 89 19:38:34 GMT

In article <7015@hoptoad.uucp> tim@hoptoad.uucp (Tim Maroney) wrote
lots of stuff in reply to article <4015@ece-csc.UUCP> by jnh@ece-csc.UUCP 
(Joseph Nathan Hall).  I've deleted the articles to save space....

1.  Why should an OS provide newline support when high-level languages
    also provide it?  To make life easier for the developer of a HLL.
    Also, suppose that a program uses both C and Pascal, using both
    fgets() and readln().  If the OS provides the newline support then
    you don't have (much) duplication of code in the support libraries.

2.  Using individual read calls is slow; why use them?  Well, they're
    probably always slower than doing stuff at a very low level--I can
    write my own disk I/O routines and read stuff faster by totally
    bypassing the file manager.  Just as one answer, maybe there's a
    reason I don't want to allocate a big fixed-size buffer for
    reading this file--after all, the smallest size which would make
    sense for a buffer is a disk block.  Maybe I'm trying to conserve
    memory in an INIT; maybe I need to read the file without worrying
    about running out of memory in the process.

3.  Why do stuff inefficiently during development which we'd make more
    efficient for a production program anyway?  Perhaps I'm porting a
    program from another operating system.  Maybe the newline
    character is different (gasp!)--I might not want to worry about
    fixing this up yet.  As Tim pointed out, there isn't really
    anything to complain about here if you're using C or Pascal anyway.

4.  A bit more complex.  Joseph Hall claims that reading as much as
    possible on each read call isn't necessarily the key to speed.
    Tim says it's speculation.  One point here--if allocating a 32K
    buffer to read a text file quickly means swapping out 32K of code
    from somewhere, this might be true.  A procedure which counts the
    number of lines in a text file may well find that using a huge
    buffer is overkill.

5.  A final note (of my own).  Tim says that "if you're reading a line
    at a time on any machine, it's likely you're taking a performance
    hit."  Just to make things a little more complicated, I'd just
    like to say that there are systems which do NOT require any
    specific character to mark the end of a line--if you say writeln()
    it writes out your data, whether it contains ^M or ^J or whatever.
    On these systems, reading data block-by-block and trying to figure
    out the end of a line is either near-impossible or just plain slow.
    [Quibble, quibble.]

6.  Tim says "And writing a loop to turn blocks into lines on your ownn
    is so easy that a first-semester programmer could do it."
    Probably true.  But writing an *efficient* loop probably means
    using assembly language, at least until some decent optimizing
    compilers are widely available on the Mac.

I apologize (a little) for using net bandwidth on this.  It probably
doesn't really belong in this group....

+---------------------------+------------------------+----------------------+
| Anton Rang (grad student) | "VMS Forever!"         | "Do worry...be SAD!" |
| Michigan State University | rang@cpswh.cps.msu.edu |                      |
+---------------------------+------------------------+----------------------+