Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!linus!decvax!decwrl!sun!guy
From: guy@sun.uucp (Guy Harris)
Newsgroups: net.lang.c
Subject: Re: Uses of "short" ?
Message-ID: <2883@sun.uucp>
Date: Sat, 12-Oct-85 19:08:36 EDT
Article-I.D.: sun.2883
Posted: Sat Oct 12 19:08:36 1985
Date-Received: Tue, 15-Oct-85 06:40:11 EDT
References: <486@houxh.UUCP> <2600017@ccvaxa>
Organization: Sun Microsystems, Inc.
Lines: 147

> > I don't want them to have a pretty good idea when it's going to violate
> > that default assumption on a particular machine.  I want them to have a
> > pretty good idea when it's going to violate that default assumption on
> > a 16-bit-"int" machine;

> Well, I can see how that would make life easier for you, but it's not
> really my problem.  The project I work on would have saved a lot of
> time if the code we're porting hadn't been written for a system using
> memory-mapped files, but I don't curse the authors for writing for the
> environment they had.

One can write:

	int	size_of_UNIX_file;

or one can write

	long	size_of_UNIX_file;

The former is incorrect, and the latter is correct.  The two are equivalent
on 32-bit machines, so there is NO reason to write the former rather than
the latter on a 32-bit machine.  If one can write code for a more general
environment with NO extra effort other than a little thought, one should
curse an author who didn't make that extra effort.

> > (Consider all the postings that say "our news system truncates items
> > with more than 64KB, so could you please repost XXX" for an example of
> > why it is a bad practice.)

> What has that to do with anything?  Somebody failed to anticipate
> future needs and used a short when she should have used a long.

The code in question uses an "int" where it should have used a "long".
Using a "short" would have been *more* acceptable; the documentation for
this system says

	<items> can be up to 65535 bytes long (2^32 bytes in 4.1c BSD),

Since the only *real* constraint on the size of items is the amount of disk
space available and the time taken to transmit the items, neither of which
is significantly affected by the width of a processor's ALU and registers,
the system should not make the maximum item size dependent on either of
those two factors.  The ideal would have been to use "long" instead of
"int"; however, if the cost of converting item databases on PDP-11s would
have been too high, using "short" would have been acceptable.  The ideal
would have been to do something like

	#ifdef BACKCOMPAT
	typedef itemsz_t unsigned int;
	#else
	typedef itemsz_t unsigned long;
	#endif

and *not* restrict items to 65535 bytes by default; if it's really too much
trouble for a site to convert its database, then they can build a version
which is backwards-compatible with older versions.

> There are people who rely on two-digit year codes, too.

Yes, but how many of them rely on two-digit year codes on 16-bit machines
and four-digit year codes on 32-bit machines?  Not planning for future needs
may be regarded as a misfortune; having a system like the aforementioned
meet future needs or not depending on the width of a machine's registers
looks like carelessness.  (Sorry, Oscar, but I didn't think you'd mind...)
There are cases where the difference between a 16-bit machine and a 32-bit
machine *is* relevant; an example would be a program which did FFTs of large
data sets.  I have no problem with

	1) the program being written very differently for a PDP-11, which
	would have to do overlaying, or provide a software virtual memory
	system, or perform some other technique to do the FFTing on disk,
	and for a VAX, where you could (assuming you could keep the entire
	data set in *physical* memory) write it in a more straightforward
	fashion (although, if it *didn't* all fit in physical memory,
	it would have to use some techniques similar to the PDP-11
	techniques to avoid thrashing)

or

	2) saying "this program needs a machine with a large address
	space".

> Portability is one of many factors to be considered in setting local
> coding standards.  I have spent a lot of time recently understanding
> code written for a very different environment and converting it to C.
> It had lots of size and byte-ordering problems.  That's the breaks.
> It's not the authors' fault that I had different requirements than they.

In many of these cases, there is little if any gain to be had by writing
software in a non-portable fashion.  Under those circumstances, it *is* the
authors' fault that they did something one way when they could have done it
another way with little or no extra effort.  In the case of byte ordering,
it takes more effort to write something so that data is portable between
machines.  If it's a question of a program which *doesn't* try to exchange
data between machines and *still* fails on machines with a different byte
order than the machine for which it was written, there'd better have been a
significant performance improvement gained by not writing it portably.  And
in the case of using "long" vs. "int", there is NOTHING to be gained from
using "int" instead of "long" on a 32-bit machine (on a truly 32-bit
machine, "long"s and "int"s will both be 32-bit quantities unless the
implementor of C was totally out to lunch), so it SHOULD NOT BE DONE.
Period.

> But it is not our business to produce code that runs on PDP-11s,
> let alone (as you requested in a previous posting) code that runs
> efficiently on PDP-11s.

I made no such request, but I'll let that pass.  If you can get code that
runs on PDP-11s with no effort other than getting people to use C properly,
it *is* your business to get them to so use C and write portable code
whenever possible.  If your system permits code to reference location 0 (or
whatever location a null pointer points to, assuming it doesn't have a
special "invalid pointer" bit pattern), it *is* your business not to write
code which dereferences null pointers - such code is NOT valid C.
Programmer X can get away with writing code like that, if they have such a
system; programmers Y, Z, and W who work for a company which does not permit
code to get away with dereferencing null pointers have every right to stick
it to programmer X when their company's customers stick it to them because
"your machine is broken and won't run this program".

Saying "programmer X is not at fault" is blaming the victim, not the
perpetrator.

> I have no objection to the principle that we should try, other things
> being equal, to write portable code.  But the FIRST consideration of
> good professional practice is to write code that is clear,
> maintainable, and efficient in the environment for which we are paid
> to produce it.  It is not bad practice to put that environment first.

If all other things are not equal, or close to it, I have no objection to
unportable code.  The trouble is that people don't even seem to try to write
portable code when they *are* equal.  It *is* bad practice to blindly assume
that the environment you're writing for is the only interesting environment.
Some minimum amount of thought should be given to portability, even if
portability concerns are rejected.  Can you absolutely guarantee that the
people who paid you to write that code won't ever try to build it in a
different environment?  If not, by writing non-portable code you may end up
costing them *more* money in the long run; it's more expensive to
retroactively fix non-portable code than to write it portably in the first
place.

If somebody says that, now that ANSI C finally "defines 'int's as 16-bit
quantities", they'll start thinking about when it's appropriate to use
"long" and when it's appropriate to use "int", they haven't given the proper
minimum amount of thought to portability.

	Guy Harris