Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!longway!std-unix
From: std-unix@longway.TIC.COM (Moderator, John S. Quarterman)
Newsgroups: comp.std.unix
Subject: Re: Query about <dirent.h>
Message-ID: <450@longway.TIC.COM>
Date: 1 Dec 89 01:32:16 GMT
References: <448@longway.TIC.COM>
Reply-To: Doug Gwyn <uunet!brl.mil!gwyn>
Organization: Ballistic Research Lab (BRL), APG, MD.
Lines: 115
Approved: jsq@longway.tic.com (Moderator, John S. Quarterman)

From: Doug Gwyn <uunet!smoke.brl.mil!gwyn>

In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes:
>I wish Gwyn et. al had sounded a bit more embarrassed about using
>`char d_name[1]' in struct dirent.

Here is the line in question taken directly from my PD dirent implementation:
	char		d_name[1];	/* name of file */	/* non-ANSI */
You will note that I'm well aware that a trick is being used here.

I don't like such tricks either.  The problem is, the alternatives
were all worse:
	char	*d_name;	/* programs need to know whether d_name
				   specifies an array or not, due to a
				   generic C botch in using array names;
				   P1003 used to be ambiguous about this
				   but finally required it to be an array */
	char	d_name[HUGE_NUMBER];	/* valid, but wastes a lot of space */
	char	d_name[0];	/* worse than [1] according to ANSI C */
	char	d_name[];	/* almost certain to cause a diagnostic */

>There is no such type as char[], and `char d_name[]' may not appear
>in a structure, and if the declaration is `char d_name[1]' then
>you may not refer to d_name[i] when i>1.

Certainly it is unportable usage, i.e. not guaranteed to work
by the C language specification.  However, there is a large body
of existing C code (typically implementing network protocols) that
relies on exactly this trick, precisely because there is no really
good alternative.  I have yet to hear of a production UNIX system
where this trick fails.  (Perhaps 10th Edition UNIX is one?)

Probably what I really should have done was to parameterize the "1":
	char	d_name[1+_DNAME_LEN];	/* _DNAME_LEN=0 if you can,
					   _DNAME_LEN=PATH_MAX otherwise */
That would allow the dirent package installer a quick solution for
C environments that are fussier about this than the typical UNIX ones.
I may do this for future releases of my package.

>I don't have the POSIX wording at hand, but if it forbids
>`struct dirent d = *readdir(dp)' then it is flaky.

It says:
	The readdir() function returns a pointer to an object of type
	struct dirent that includes the member:

		Member	Member
		 Type	 Name		Description
		______	______	________________________
		char[]	d_name	Null-terminated filename

	The character array d_name is of unspecified size, but the
	number of bytes preceding the terminating null character
	shall not exceed {NAME_MAX}.

I believe my implementation meets these specifications, taken
literally.

At one time, the description of readdir() contained a warning about
copying struct dirents, but by the time of the final Std 1003.1 the
entire section had been rewritten and this got lost in the shuffle.
I think some other unwanted changes were introduced too, but at the
moment I can't recall what they were.  (We also have to keep beating
down attempts to legislate support for seekdir() and telldir().)

Anyway, the whole point of the words "unspecified size" really was to
permit implementations to use the [1] trick so they could allocate
a relatively small struct_dirent+secret_extension if the C compiler
permitted it.  Otherwise either NAME_MAX+1 or some other defined
implementation constant would have been specified in Std 1003.1
(as for c_cc[NCCS]).

I would have preferred char*d_name; however, that would be as hard
for an application to copy as a struct_dirent+secret_extension.

Certainly
	char	d_name[1+PATH_MAX];	/* use actual value for PATH_MAX */
is a legal and portable declaration for d_name that meets the POSIX
specs.  I happen not to like it because PATH_MAX is potentially
unbounded in an ideal networked universe, and always allocating big
chunks of space of which a tiny portion is used bothers me more than
this particular well-known implementation-specific cheat.

My advice for applications using dirent facilities is NOT to assume
that a literal copy of the struct dirent is good for anything.  If
you need to keep the entry string around, you should allocate storage
for it based on its strlen().  (Since the other members of a struct
dirent are unspecified, you can't use them anyway in a POSIX-portable
application.)

There are numerous related issues with IEEE Std 1003.1 that we could
get into.  For example, it is not stated whether or not it is safe
for an application to use a copy of a struct dirent or of several other
system data structures where the struct has a different address from
the one that the allocator (e.g. readdir()) assigned.  (Presumably an
implementation could depend on the object residing in a known place.)
Also, since there are no constraints on other struct dirent member
names, the traditional practice of using d_* for these is unsafe;
instead the "always reserved for the implementation" name space must
be used to avoid problems like
	#define d_namlen 42
	#include <dirent.h>
I don't know if there's much point into going into such problems in
more detail.  My personal feeling is that 1003.1 serves ONE useful
purpose:  By specifying it in OS procurements (in ADDITION to more
useful specs such as ANSI/ISO C and SVID), one can obtain portable
interfaces for some otherwise problematic areas such as reliable
signals and terminal modes.  I wish I could say the same about other
1003.* standards-in-progress, but I cannot.  1003.2 in particular
seems to be legislating an utterly horrible environment instead of
specifying cleanly the UNIX utility subset of interest to portable
applications.  You can bet I'm not going to include it in procurement
specifications.

Volume-Number: Volume 17, Number 78