Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!longway!std-unix From: std-unix@longway.TIC.COM (Moderator, John S. Quarterman) Newsgroups: comp.std.unix Subject: Re: Query about Message-ID: <450@longway.TIC.COM> Date: 1 Dec 89 01:32:16 GMT References: <448@longway.TIC.COM> Reply-To: Doug Gwyn Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 115 Approved: jsq@longway.tic.com (Moderator, John S. Quarterman) From: Doug Gwyn In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes: >I wish Gwyn et. al had sounded a bit more embarrassed about using >`char d_name[1]' in struct dirent. Here is the line in question taken directly from my PD dirent implementation: char d_name[1]; /* name of file */ /* non-ANSI */ You will note that I'm well aware that a trick is being used here. I don't like such tricks either. The problem is, the alternatives were all worse: char *d_name; /* programs need to know whether d_name specifies an array or not, due to a generic C botch in using array names; P1003 used to be ambiguous about this but finally required it to be an array */ char d_name[HUGE_NUMBER]; /* valid, but wastes a lot of space */ char d_name[0]; /* worse than [1] according to ANSI C */ char d_name[]; /* almost certain to cause a diagnostic */ >There is no such type as char[], and `char d_name[]' may not appear >in a structure, and if the declaration is `char d_name[1]' then >you may not refer to d_name[i] when i>1. Certainly it is unportable usage, i.e. not guaranteed to work by the C language specification. However, there is a large body of existing C code (typically implementing network protocols) that relies on exactly this trick, precisely because there is no really good alternative. I have yet to hear of a production UNIX system where this trick fails. (Perhaps 10th Edition UNIX is one?) Probably what I really should have done was to parameterize the "1": char d_name[1+_DNAME_LEN]; /* _DNAME_LEN=0 if you can, _DNAME_LEN=PATH_MAX otherwise */ That would allow the dirent package installer a quick solution for C environments that are fussier about this than the typical UNIX ones. I may do this for future releases of my package. >I don't have the POSIX wording at hand, but if it forbids >`struct dirent d = *readdir(dp)' then it is flaky. It says: The readdir() function returns a pointer to an object of type struct dirent that includes the member: Member Member Type Name Description ______ ______ ________________________ char[] d_name Null-terminated filename The character array d_name is of unspecified size, but the number of bytes preceding the terminating null character shall not exceed {NAME_MAX}. I believe my implementation meets these specifications, taken literally. At one time, the description of readdir() contained a warning about copying struct dirents, but by the time of the final Std 1003.1 the entire section had been rewritten and this got lost in the shuffle. I think some other unwanted changes were introduced too, but at the moment I can't recall what they were. (We also have to keep beating down attempts to legislate support for seekdir() and telldir().) Anyway, the whole point of the words "unspecified size" really was to permit implementations to use the [1] trick so they could allocate a relatively small struct_dirent+secret_extension if the C compiler permitted it. Otherwise either NAME_MAX+1 or some other defined implementation constant would have been specified in Std 1003.1 (as for c_cc[NCCS]). I would have preferred char*d_name; however, that would be as hard for an application to copy as a struct_dirent+secret_extension. Certainly char d_name[1+PATH_MAX]; /* use actual value for PATH_MAX */ is a legal and portable declaration for d_name that meets the POSIX specs. I happen not to like it because PATH_MAX is potentially unbounded in an ideal networked universe, and always allocating big chunks of space of which a tiny portion is used bothers me more than this particular well-known implementation-specific cheat. My advice for applications using dirent facilities is NOT to assume that a literal copy of the struct dirent is good for anything. If you need to keep the entry string around, you should allocate storage for it based on its strlen(). (Since the other members of a struct dirent are unspecified, you can't use them anyway in a POSIX-portable application.) There are numerous related issues with IEEE Std 1003.1 that we could get into. For example, it is not stated whether or not it is safe for an application to use a copy of a struct dirent or of several other system data structures where the struct has a different address from the one that the allocator (e.g. readdir()) assigned. (Presumably an implementation could depend on the object residing in a known place.) Also, since there are no constraints on other struct dirent member names, the traditional practice of using d_* for these is unsafe; instead the "always reserved for the implementation" name space must be used to avoid problems like #define d_namlen 42 #include I don't know if there's much point into going into such problems in more detail. My personal feeling is that 1003.1 serves ONE useful purpose: By specifying it in OS procurements (in ADDITION to more useful specs such as ANSI/ISO C and SVID), one can obtain portable interfaces for some otherwise problematic areas such as reliable signals and terminal modes. I wish I could say the same about other 1003.* standards-in-progress, but I cannot. 1003.2 in particular seems to be legislating an utterly horrible environment instead of specifying cleanly the UNIX utility subset of interest to portable applications. You can bet I'm not going to include it in procurement specifications. Volume-Number: Volume 17, Number 78