Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!brl-tgr!gwyn From: gwyn@brl-tgr.ARPA (Doug Gwyn ) Newsgroups: net.bugs,net.unix Subject: Re: Fix to sed (what's a text file?) Message-ID: <2308@brl-tgr.ARPA> Date: Mon, 21-Oct-85 20:01:24 EDT Article-I.D.: brl-tgr.2308 Posted: Mon Oct 21 20:01:24 1985 Date-Received: Wed, 23-Oct-85 05:02:18 EDT References: <23@pixel.UUCP> <2235@brl-tgr.ARPA> <2333@flame.warwick.UUCP> Organization: Ballistic Research Lab Lines: 42 Xref: watmath net.bugs:706 net.unix:5982 > >Many UNIX text-file utilities will discard a (necessarily final) > >text line that does not end in a newline. Quite simply, such a > >file is not a proper UNIX text file. > > Who says? Where's the definition of a 'proper' UNIX text file? The problem is, there are several interpretations of such a file, depending on the utility involved. Perhaps there should be a well-defined standard interpretation, but there isn't currently. "A file of text consists simply of a string of characters, with lines demarcated by the newline character." -- from "The UNIX Time-Sharing System" by Ritchie & Thompson "text file, ASCII file -- a file, the bytes of which are understood to be in ASCII code" -- from "Glossary" in "UNIX Time-Sharing System Programmer's Manual", 8th Ed. "A text stream is an ordered sequence of bytes composed into lines, each line consisting of zero or more characters plus a terminating new-line character. ... The sequentially last character read in from a text stream will, however, always be sequentially the last character that was earlier written out to the text stream, if that character was a new-line." -- from ANSI X3J11/85-045 My personal choice would be similar to Ritchie & Thompson, where newlines delimit (NOT "terminate") text lines, so that the last character in a text file would not need to be a newline. However, this raises the question of what utilities should do with the null line at the end of every text file that DOES end with a newline; this will still be utility-dependent (and should be documented whenever it is handled differently from other text lines in the file). X3J11/85-045 botched it anyhow, since they intended that ALL UNIX files qualify as "text streams" under stdio (vs. "binary streams", which have to be handled differently on some non-UNIX OSes). So, how do we establish a standard interpretation for non-newline- terminated UNIX text files? (Discussion should move to net.unix.)