Path: utzoo!utgpu!water!watmath!clyde!rutgers!mit-eddie!uw-beaver!tektronix!decvax!ucbvax!HNYKUN53.BITNET!SCHOMAKE From: SCHOMAKE@HNYKUN53.BITNET (Lambert Schomaker) Newsgroups: comp.os.vms Subject: FORT vs. zero-length strings Message-ID: <8802220641.AA17291@ucbvax.Berkeley.EDU> Date: 15 Feb 88 11:23:00 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 63 To understand the problem one first has to know how character strings are passed in Fortran-77. Contrary to many other languages, they are not passed by mere reference (i.e., "pointer" or "address"). In f77, a character argument is in fact a reference to a special table, the descriptor. In this table, we can find, among other things, the pointer to the location where the data actually are stored. Character argument (points to)-----> Descriptor: [class...][dtype...][length..........] [address.............................] (points to)------> [string.....] The class field (DEC/VMS DSC$B_CLASS) is 8bits, the type field (DSC$B_DTYPE) is 8bits, the length field (DSC$W_MAXSTRLEN) is 16bits which explains the maximum string size of 64kbytes. In fact the character string is just one data type possibly pointed to by such a descriptor. To my knowledge, VAX Fortran-77 only uses descriptors for strings, though. Now the important part: the length is the DECLARED length. What the designers (ANSI?) forgot is that in practice you need USED length most of the time. We are missing a "DSC$W_USDSTRLEN" field. CHARACTER*132 STR STR='FOO' in most applications the trailing 129 blanks are a nuisance There are several solutions to this problem. The dirtiest I have ever seen is falling back to NULL termination, STR='foo'//CHAR(0), and using INDEX(STR,CHAR(0))-1 to find USED string length. Another solution is to get behold of the used length as soon as possible. In the constant assignment the programmer knows 'foo' has three characters. When reading a string one should use the Q format: READ(*,'(Q,A)') LS,STR. The next step is to pass the obtained string to a subroutine in the following way: CALL MAKE_LOWER_CASE(STR(:LS)). This way we make sure the trailing blanks do not require any processing. In the subroutine, LEN(STR) will return the value of LS in the caller. In concatenating strings, we explicitly keep track of the current string size by covarying a separate integer variable: STR='foo'//SUBS(2:K) LS=3+K-2+1 If this is unwanted or boring, use a function LENU(str) which scans a string backward until a non-blank is found and returns the used size. Too bad if you meant some blanks to be there at the tail. About zero-length strings. Using the above rules, a zero length string is a string passed to a subroutine as STR(I1:I2) where I2=I1-1 and I1.GT.0 (otherwise you get ugly memory access violations). CHARACTER*5 STR LS=0 CALL SUB(STR(:LS)) . SUBROUTINE SUB(S) CHARACTER*(*) S WRITE(*,*) LEN(S) will return length of zero. RETURN END It is all a bit kludgy. Nevertheless, I wouldn't want to go back to the old days of FORTRAN-IV string handling. At least f77 allows character functions. Lambert Schomaker, SCHOMAKER@HNYKUN53.BITNET PS Does anybody know if the forthcoming F8x is different in this respect?