Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!NUSVM.BITNET!GBOPOLY1 From: GBOPOLY1@NUSVM.BITNET (fclim) Newsgroups: comp.sys.apollo Subject: Re: Unix eof detection and long filenames under tar. Message-ID: <8905250248.AA24921@umix.cc.umich.edu> Date: 25 May 89 02:46:58 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 205 X-Unparsable-Date: Thu, 25 May 89 10:45:29 SST Hi, This is a pretty long reply. My apologies to others. In article <8905230413.AA23343@umix.cc.umich.edu>, CORNELLC.cit.cornell.edu:Jacques_Gelinas@CMR001.BITNET writes > First question: tar and very-very-long-filenames >[stuff deleted] > Can I get these files out of the tape ? > (How did prof. Mackay get them on the tape ?) >[stuff deleted] >% tar xf /dev/rct8 >tar: ./tex82/README.WRITE-WHITE - cannot create >tar: ./DVIware/laser-setters/dvi2adobe_fonts/ > StoneInformal-SemiboldItalic.tfm - cannot create > (10 similar lines deleted) The maximum filename length on Domain/IX at SR9.7 is 32; see the MAXNAMLEN macro in /usr/include/sys/dir.h. A file with namelength longer than that can't be created. Even though strlen("README.WRITE-WHITE") is < 32, the length of the name actually stored in the Domain VTOC (their version of UNIX inode) is > 32. Aegis at Sr9.7 is case-insensitive (eg /COM/SH is no different from /cOm/Sh), but Unix has always been case-sensitive. Their workaround is to map upper- case char to a ':' followed by the lower-case char. Eg README.WRITE-WHITE is stored in the VTOC as :r:e:a:d:m:e.:w:r:i:t:e-:w:h:i:t:e which is why this file can't be extracted. There are two ways to extract those files: (1) hack John Gilmore's pd tar. When the fileNameLength is > MAXNAMLEN, then prompt for a new filename or truncate the filename. (2) get SR10 and use BSD4.3. MAXNAMLEN should be 1024 (me think). Furthermore, Aegis at SR10 is case-insensitive. Prof MacKay probably created the tape on a Sun box which has MAXNAMLEN set at 1024 (me think again). >(also shows that BSD4.2 at SR9.7 is compatible with other systems) What'll you say now? > Second question: Paranoia and (text) eof >[stuff deleted] > Can this be simplified on Apollo BSD4.2 systems ? >[stuff deleted] >testeof(iop) >FILE *iop; >{ register int c; > if (feof(iop)) > return(TRUE); > else { /* check to see if next is EOF */ > c = getc(iop); > if (c == EOF) > return(TRUE); > else { > (void) ungetc(c,iop); > return(FALSE); >} } } The simplest way is to delete all but the else body. Hence, if the file has n bytes, then there will be n less tests. This should work for Domain/IX at SR9.7 and most probably for BSD4.3 at SR10 or at least when /lib/clib becomes ANSI-compatible. However, Harbison and Steele in "C: A Reference Manual" sez that feof() should be used to check for EOF. >The 2nd ed. of the K.R. white book ... The 2nd ed. describe the ANSI definition of C and standard library. Domain/C and /lib/clib is not ANSI-compatible at SR9.7. I suggest you refer to the manuals provided by Apollo. > Third question: eof and binary files. >[stuff deleted] > Could someone explain to me the line > "fgetc returns EOF: Error 0" ? > Why is the first use of fgetc different ? > (By permuting the calls to getc and fgetc, you > can get other results. This looks weird.) >[stuff deleted] >% cat fgetc.c==getc.c >/* ------ is fgetc "like" getc ? -------- */ > main (){ ># include > FILE * datf ; > int c ; > > datf = fopen("fgetc.dat","w+") ; > ># define BYTE 0377 > printf( "BYTE = %o, (int)(char)BYTE = %o\n",BYTE,(int)(char)BYTE); > if(fputc( BYTE, datf)==EOF ) perror("fputc returns EOF") ; > if( putc( BYTE, datf)==EOF ) perror(" putc returns EOF") ; > c = fputc(BYTE, datf ) ; printf("fputc: c = %o\n", c ) ; > c = putc(BYTE, datf ) ; printf(" putc: c = %o\n", c ) ; > > fseek( datf, 0L, 0) ; > > if( (c = fgetc(datf)) == EOF ) perror("fgetc returns EOF") ; > printf("fgetc: c = %o\n", c) ; > if( (c = getc(datf)) == EOF ) perror(" getc returns EOF") ; > printf(" getc: c = %o\n", c) ; > if( (c = fgetc(datf)) == EOF ) perror("fgetc returns EOF") ; > printf("fgetc: c = %o\n", c) ; > if( (c = getc(datf)) == EOF ) perror(" getc returns EOF") ; > printf(" getc: c = %o\n", c) ; > > if(fclose(datf)) perror("fclose") ; > system( "od -b fgetc.dat ; rm -i fgetc.dat" ) ; > } >% cc !* >cc fgetc.c==getc.c > >% a.out >BYTE = 377, (int)(char)BYTE = 37777777777 >fputc returns EOF: Error 0 > putc returns EOF: Error 0 >fputc: c = 37777777777 > putc: c = 37777777777 >fgetc: c = 377 > getc: c = 377 >fgetc returns EOF: Error 0 >fgetc: c = 37777777777 > getc: c = 377 >0000000 377 377 377 377 >0000004 Fgetc() is broken. Getc() is a macro defined in /usr/include/stdio.h as #define getc(p) (--(p)->_cnt >= 0 ? *(p)->_ptr++ & 0377 : _filbuf(p)) Fgetc() and getc() are among the buffered I/O routines. p->_base points to the buffer and p->_ptr points to the next byte to be read in. Normally, getc() will return an int with a value equal to the byte masked with 0377. In effect, this returns an unsigned char. When the buffer is empty, a (undocumented) routine _filbuf() is called to fill the buffer. After filling, _filbuf() also returns the next byte as an unsigned char if there is a next byte. Otherwise, when the end-of-file has been reached, _filbuf() returns EOF which is -1 or 0xffff or 0377...7. Fgetc() is similar to getc() but it is a function and not a macro. The first time it's called, it returns the value of _filbuf() which is an unsigned char since eof has not been reached. The next time fgetc() is called, it should returns an unsigned char or EOF. (this is the ANSI definition of fgetc()). However, Domain/IX.SR9.7 fgetc() returns the next char promoted to an int. In Domain/C, when a char is promoted to an int, the signed-ness is preserved. Therefore, 377 (a char -1) is promoted to 377...7 (an int -1). This int value is undistinguishable from the EOF -1 value. No error had occurred. This is indicated by perror()'s output: "Error 0". Fgetc() does work consistently except when it needs to call _filbuf() to fill the buffer. Normally, it will return the next byte promoted to an int; or when _filbuf() is called, it return the next byte as an unsigned char. To illustrate this, let's cat fgetc.dat fgetc.dat fgetc.dat fgetc.dat > foo Foo has 12 bytes of 377. When we run f = fopen("foo", "r"); for (i = 0; i < 12; i++) printf("%o\n", fgetc(f)); we'll get 377 377...7 \ 377...7 |__ 11 times ... | 377...7 / By default, I/O is buffered with a 1024 bytes buffer. We can change this by char buf[4]; f = fopen("foo", "r"); (void) setbuffer(f, buf, sizeof(buf)); for (i = 0; i < 12; i++) printf("%o\n", fgetc(f)); Now, we'll get 377 \ 377...7 |__ pattern 377...7 | 377...7 / 377 377...7 377...7 377...7 repeated 2 more times. Here, we have a 4 bytes buffer, so _filbuf() is called every 4 bytes. > Last question: default cc flags >All the machines we have are DN3000 or DN4000. Why is it necessary >to specify the -M3000 flag for the cc compiler? The RT/11 operating >system permitted me -in 1979- to customize the compilers by setting >some switches (like the number of lines per page for listings). >Can this be done also at installation time for the Apollo system? Don't know why -M3000 is needed. You can edit the Makefile and add -M3000 to CFLAGS. Hope this helps. :-) fclim --- gbopoly1 % nusvm.bitnet @ cunyvm.cuny.edu computer centre singapore polytechnic dover road singapore 0513.