Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!iuvax!bsu-cs!dhesi From: dhesi@bsu-cs.UUCP (Rahul Dhesi) Newsgroups: comp.lang.c Subject: Re: Distorting fseek semantics Message-ID: <1134@bsu-cs.UUCP> Date: Fri, 11-Sep-87 17:00:48 EDT Article-I.D.: bsu-cs.1134 Posted: Fri Sep 11 17:00:48 1987 Date-Received: Sat, 12-Sep-87 19:22:11 EDT References: <493@its63b.ed.ac.uk> <6061@brl-smoke.ARPA> <8560@utzoo.UUCP> <1129@bsu-cs.UUCP> <27734@sun.uucp> Reply-To: dhesi@bsu-cs.UUCP (Rahul Dhesi) Organization: CS Dept, Ball St U, Muncie, Indiana Lines: 98 In article <27734@sun.uucp> guy@sun.uucp (Guy Harris) writes: [first major point summarized here in my words]: Dennis Ritchie's description of fseek includes an exception for non-UNIX systems, and ANSI's description of fseek largely conforms to that exception. I can't argue this on legalistic grounds, but when vendors have implemented C on non-UNIX systems, they have always** used the UNIX implementation as a de facto standard. A vendor whose version of C is different from that under UNIX faces a competitive pressure to conform. When a user wants to know why a C implementation differs from the UNIX way, it's probably not going to be effective for a vendor to point out the exception that Ritchie made for non-UNIX systems. But now, the standard to model implementations after will not be UNIX but the ANSI standard. To the extent that the ANSI standard weakens the power of the C standard library, the user will lose. For example, the mail delivery agent smail uses a binary search on a sorted text file containing mail paths. Unless I'm missing something, such a binary search will be impossible in a C implementation that conforms to the ANSI standard and goes no further. >If this were done, there would be a lot fewer compliant implementations out >there, so people who were interested in writing not just standard-conforming >but code that was *in practice* portable, would conform to the *de facto* >standard formed by replacing the standard's "fseek" by the one described in >this appeendix. In effect, this would mean that the *de facto* ANSI C >standard, as opposed to the *de jure* ANSI C standard, would not include a >UNIX-flavored "fseek". What has this bought you? Those conforming to the de facto fseek would still continue to try to make it into the de jure fseek. It's a competitive advantage for a vendor to be able to claim full compliance with an ANSI standard. In the long run, it would be more likely that most vendors would offer the UNIX-style fseek. Users would win. It's quite possible that, had ANSI C existed some years ago, DEC would have managed to conform to it without having to introduce stream-LF files. Users in general would have been losers. >I have no idea how UNIX-like the "fseek" on MS-DOS C implementations really >is. It would seem that the file position would either have to be the ordinal >number of the current byte in the underlying file - in which case, were you to >use UNIX-style "fseek"s, you could conceivably confuse the heck out of the I/O >library by putting the file pointer on the LF of a CR/LF pair - or would have >to be translated to what the byte offset would have been, had MS-DOS used >UNIX-style line formats - in which case, seeks could end up being quite >expensive or require an auxiliary data structure to do the mapping. Confession: I exaggerated about MSDOS. On the compilers I've tried you can fseek, but you get to fseek to the nth byte, where the nth byte is the same byte that you would get if you opened the file as a binary file. I think most implementations of stdio under MSDOS simply ignore all CR characters on a read, so no confusion will result after a generalized fseek. Note that a binary search on a text file will still work, which cannot be said for ANSI's more restrictive fseek. >> Even VAX/VMS, which is heavily into record-based I/O, supports stream-LF >> files that allow the original fseek semantics to be preserved. > >Which doesn't help you if you feed a non-stream-LF file to a C program as an >input text file; if you can't do that, there is a strong disincentive to write >text-processing applications in C. Not really. One can still sequentially read any VMS text file. The output from the application can be in stream-LF format. Because of the competitive pressure to conform to UNIX conventions, DEC has modified most (perhaps all) its utilities that normally use text files to also accept stream-LF format. VMS will even load and execute a file in stream-LF format if it has the same data bytes as a standard executable 512-byte fixed-length record executable file. (I couldn't believe my eyes when I saw this.) DEC is getting a little closer to embracing the UNIX model, and is no worse off for it. This would likely not have happened if the standard to aspire to had been ANSI C rather than the de facto standard of the UNIX implementation. I believe at one time ANSI actually allowed a binary file to return more characters on a read than had been ever written to it. That such bizarre behavior could be even considered, let alone included in the draft, shows how much pressure there must be on ANSI. SUMMARY: The weakened fseek in ANSI C will lead to fewer vendors being pressured into providing the more flexible UNIX-style fseek, without a compensating gain in portability. Users will lose. --- **The only exception I know of, where a vendor did not use the UNIX standard as a model, was one that had "putfmt" instead of "printf", and a lot of other unusual functions. I think it was from Whitesmiths. I believe it has been changed since them. -- Rahul Dhesi UUCP: !{iuvax,pur-ee,uunet}!bsu-cs!dhesi