Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!ll-xn!ames!oliveb!sun!guy From: guy@sun.uucp (Guy Harris) Newsgroups: comp.lang.c Subject: Re: Distorting fseek semantics Message-ID: <27734@sun.uucp> Date: Thu, 10-Sep-87 22:01:41 EDT Article-I.D.: sun.27734 Posted: Thu Sep 10 22:01:41 1987 Date-Received: Sat, 12-Sep-87 15:43:17 EDT References: <493@its63b.ed.ac.uk> <6061@brl-smoke.ARPA> <8560@utzoo.UUCP> <1129@bsu-cs.UUCP> Organization: Sun Microsystems, Inc. - Mtn View, CA Lines: 109 > I realize I'm in the minority, but ANSI did something wrong here. ANSI > is supposed to be standardizing an existing language. ... > No such justification exists for crippling the beautiful and simple > semantics of fseek that have been in use for many years. Make that "have been in use on *some* systems for many years". I agree, the UNIX semantics of "fseek" are wonderful and beautiful and all that irrelevant Mom-and-apple-pie stuff, but they aren't always implementable on non-UNIX systems. > ANSI had a simple choice: (a) Leave fseek as it is, Here is "fseek" "as it is", from the document "A New Input-Output Package", by D. M. Ritchie, Bell Laboratories, Murray Hill, New Jersey 07974: fseek(ioptr, offset, ptrname) FILE *ioptr; long offset The location of the next byte in the stream named by "ioptr" is adjusted. "Offset" is a long integer. If "ptrname" is 0, the offset is measured from the beginning of the file; if "ptrname" is 1, the offset is measured from the current read or write pointer; if "ptrname" is 2, the offset is measured from the end of the file. The routine accounts properly for any buffering. (When this routine is used on non-Unix systems, the offset must be a value returned from "ftell" and the ptrname must be 0). The only difference between this and what appears in the August 3, 1987 ANSI C draft is that: 1) DMR's description didn't mention the possibility of "offset" being 0 being used as a portable "rewind" function; perhaps the intent was that "rewind" be used for this, because the cited document does not state that "rewind(f)" is equivalent to "fseek(f, 0L, 0)". 2) DMR's description doesn't allow for the "offset" being a byte ordinal number on binary files - but his description didn't even *mention* binary files; it didn't describe the "b" flag to "fopen". So ANSI *did* leave "fseek" as it is *in descriptions of it as a C language routine*; they didn't "change 'fseek' so (vendors with OSes where it can't act as a generalized seek) would not have to work so hard". They didn't give the description of "fseek" *as a UNIX library routine*, but X3J11 is not a UNIX interface standard! Actually, given point 2) there, you could argue that they made it *more* like the UNIX "fseek" than Dennis' paper did. You *do* have the ability to deal with the file as an ordered sequence of bytes; however, to do so you must open the file as a binary file, which means you won't see UNIX-style lines unless the native OS implements them. (For instance, such a file could be treated in a record-oriented OS as a sequence of 512-byte records.) As such, you *can* port programs of the sort you're used to writing on UNIX to those other systems *as long as you use the "b" option to "fopen" and as long as you're willing to accept that these files may be in a private format comprehensible only to other C programs or programs that know about this format*. You just can't be guaranteed to do this sort of thing on *text* files. > The portability argument is a red herring. ANSI is free to add an > appendix that describes a weaker fseek, in which one cannot directly to > go where one has not sequentially gone before, that nonconforming C > implementations can provide. Software developers who really want to > support all systems, including the ones whose developers refuse to fix > their punched-card-based designs, could restrict themselves to this > weaker specification. The rest of us would be able to write programs > as we've been writing them for a decade without being accused of not > conforming to ANSI specs. If this were done, there would be a lot fewer compliant implementations out there, so people who were interested in writing not just standard-conforming but code that was *in practice* portable, would conform to the *de facto* standard formed by replacing the standard's "fseek" by the one described in this appeendix. In effect, this would mean that the *de facto* ANSI C standard, as opposed to the *de jure* ANSI C standard, would not include a UNIX-flavored "fseek". What has this bought you? > C compilers for, UNIX, MS-DOS, AmigaDOS, Macintosh, CP/M, Minix, OS/2, > and numerous other systems support a generalized fseek. UNIX and Minix are red herrings here; those systems implement UNIX-compatible I/O. If any of those operating systems store lines UNIX-style, with a single end-of-line character, implementing UNIX-style "fseek" isn't difficult, as the translation between native and C lines does not change the number of bytes in a record. (I infer from the Lightspeed C manual that the Macintosh puts CR rather than LF at the end of the line, so C implementations on the Mac can provide UNIX-style "fseek".) I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really is. It would seem that the file position would either have to be the ordinal number of the current byte in the underlying file - in which case, were you to use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O library by putting the file pointer on the LF of a CR/LF pair - or would have to be translated to what the byte offset would have been, had MS-DOS used UNIX-style line formats - in which case, seeks could end up being quite expensive or require an auxiliary data structure to do the mapping. Even if you have this auxiliary data structure, you would either have to keep it around in permanent storage for all text files, which seems a bit tacky (and doesn't solve the problem of text files created before this auxiliary data structure was introduced) or would have to contruct it as needed, which could get expensive. > Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF > files that allow the original fseek semantics to be preserved. Which doesn't help you if you feed a non-stream-LF file to a C program as an input text file; if you can't do that, there is a strong disincentive to write text-processing applications in C. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)