Xref: utzoo comp.unix.questions:7153 comp.unix.wizards:8663
Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!pasteur!ucbvax!ulysses!dgk
From: dgk@ulysses.homer.nj.att.com (David Korn[eww])
Newsgroups: comp.unix.questions,comp.unix.wizards
Subject: Re: KSH portability
Summary: Portability is a complex issue...
Message-ID: <10309@ulysses.homer.nj.att.com>
Date: 20 May 88 16:15:51 GMT
References: <295@cmtl01.UUCP> <12142@tut.cis.ohio-state.edu> <631@vsi.UUCP> <341@alice.marlow.reuters.co.uk>
Organization: AT&T Bell Laboratories, Murray Hill
Lines: 171


There have been some recent articles concerning the portability of ksh.
Since I wrote ksh, let me comment about portability in general and
about ksh specifically.

Portability is a complex issue; enough to write a book about.  In fact,
Mark Horton is writing a book about portability called "How to Write
Portable Software in C".  The book should be published by Prentice Hall
sometime next year.

I have worked on many UNIX and UNIX look-alike systems during the last
twelve years.  One of the most annoying things to me is how difficult
it is to port software to each of these environments.  Ironically, low
level languages, such as C tend to be more portable than higher level
languages such as shell.  Makefiles tend to be least portable of all.

There are two distinct considerations concerning portability.  One
concern is how portable is the code across different systems.  The
second, is how portable is the code with respect the the various
compilers or interpreters that it may run with.  The first is a
design goal, I refer to as design portability, the latter is an
implementation issue, I refer to as implementation portability.
To write a portable program, you must design it to be as system
independent as possible and then implement it in a way that the code
ports to as many environments as possible.  The importance of
design portability is that if the program becomes universally available,
then users will not have to be concerned with implementation
portability issues when they use the program.

To achieve a very high degree of implementation portability in C
requires doing things in less than ideal ways.  For example, some
implementations of C allow only externals of no more than six case
insensitive characters.  To be portable, you have to constrain the
name space for external routines.  You also can't use many of the
recent (not very recent) features of the language.  I have found
problems with using void, structure assignment, non-unique structure
names, and enumerations.  There are some C pre-processors that allow
#ifdef but do not allow #if constructs.

To write portable shell scripts, you cannot use shell functions,
# for comment, a colon within expansions (for example ${name:-bar}),
pattern classes which use ! for negation, and a number of other features
that we often take for granted.  The Bourne shell was not designed to be
system portable and relies on the underlying system for carrying out
basic tasks.  For example, the echo command is not part of the Bourne
shell on some systems and since the behavior is incompatible on
different systems, any script that uses echo may not be portable.
Another prime example is test.

One of the goals of ksh is to be able to write shell scripts that
are portable across environments. The full benefit to this can
only be realized when ksh is readily available everywhere.  To meet the
design goal, the shell had to have enough built-in capability that
useful scripts could be written without relying on the host environment.
This is one reason why test is a built-in command and why the print
built-in was added. The echo command, while widely used, varies on
different systems.  I know of at least four variants around.
To become readily availabe, the code and makefiles for ksh must
port easily to other environments.

I decided to base the shell I/O on standard I/O several years ago.
This is a decision I have frequently regretted because it has caused
many portability problems and been the source of many of the
bugs that have been reported.  I chose to use standard I/O
for two reasons.  First of all, I wanted ksh to port to non-UNIX
systems.  Secondly, I wanted to make it easy to add built-in commands.
I did not feel that it should be necessary to rewrite a command to make
it a built-in command.  I envision being able to add built-ins at run time.

When I changed the code to use standard I/O there was no clear
description of what was the interface and what was private
to the implementation.  There were and still are holes in the 
specification of this interface that makes it necessary to
muck around with the interface in order to be used by the
shell.  Let me list a few of the problems that I encountered:
1.	How do I find the stream, given the file number.  This
	is required in order to implement the construct 3<&4.
2.	How can I use a stream after a fork()?  How does the
	parent and child synchronize?  For example, the program
	read line;cat < file
	should behead one line of the file.  Does, the C program
	main()
	{
		char buffer[80];
		gets(buffer);
		system("cat");	
	}
	work correctly on your system when the input comes from a file?
3.	What is the state of stdio after a longjmp()?
4.	Can I duplicate a stream? How can I move a stream to another
	file descriptor?
5.	Is it legal to use the same buffer for more than one stream?
	Copying buffers each time the shell forks can be expensive.
	Why not use one buffer for all the output.
6.	How can I create a stream from a string?  After all, I can
	print to a string, why shouldn't I be able to read from one?
	I should be able to implement eval by calling the parser with
	input from a stream corresponding to a string.
7.	How do I hook the stdio library to the shell editing code?
8.	How can I tell whether a buffer has been written or not?  How
	can I tell what the last character written was?

To use standard I/O within ksh, I had to make some decisions.
Some of the decisions that I made were wise ones, some were not.
At one point I wrote a stdio library that conformed with an early
ANSI C draft and added extensions so that it could be used by ksh
without mucking with its internals.  I decided not to use it because
it is not .o compatible and thus might conflict with a routine
that happend to get loaded in by a weak dependency.  Fortunately,
I was able to get ksh working with the native stdio on most
configurations. The list of operating systems that ksh runs with
is quite impressive. It has even ported to some rather non-UNIX
like targets such as OS-9.

One reason that ksh has been so widely ported is that the makefiles
automatically configure in most instances.  My experience is that
most people who are responsible for bringing up software do not
know how to configure them.  If the software does not get built on first
try, it often does not get built at all.  I know that I have trouble
figuring out how to answer questions when I get software to install.
Often, a month or so after using some software package, it dawns on me that
the reason that some feature isn't working is because I misunderstood
a question that was asked when I built the software.

One of the comments about ksh is that while externally it presents
a portable interface, internally, the code is hard to follow because
of all the conditional compilation.  Overall, this is a valid
criticism and one that I have been trying to rectify for the next
release.  In the past, there were two basic flavors of UNIX, BSD and
System V.  The way ksh configures itself is to look at its environment
and see what files are there and then to deduce what flavor of UNIX
is running.  For example, /vmunix on BSD and /unix on System V.
Because of the number of hybrid systems, this strategy no longer
works.  POSIX and ANSI hasn't helped.  The result is to have even
more variants.  The next version of ksh uses a differently strategy which
I summarize here:
1.	Use POSIX 1003.1 as the standard in writing the code.  Define
	macros as needed to map POSIX into each implementation.
2.	Generate an include file that defines the feature variants.
	For example, ksh needs to know whether signals are automatically
	reset when caught.  At compile time a test program is run
	that tests this feature and then appends a define constant to
	this file indicating whether signals get reset or not.
3.	Use a shell script to build ksh, not make.  The shell script
	uses only features that were in V7 Bourne shell and is
	highly portable.  Dependency checking is needed to maintain
	a tool, not to build it initially.  I use nmake (4th generation
	make) to maintain ksh since it handles dependency checking and
	conditional compilation automatically.


Some of the comments about ksh refer to bugs, such as referencing through
location zero. These are bugs that were discovered after ksh was put
into the UNIX Toolchest over two years ago.  As I become aware of bugs,
I fix them.  However, the UNIX Toolchest is for non-supported software
and therefore bug fixes do not find their way into Toolchest.

To achieve maximum benefit from the design portability of ksh requires
it to be available on all machines.  This would eliminate uses having
to concerned about implementation portabilities when writing shell
scripts.  I have done as much as I can to achieve this.  I have gone
to great lengths to make ksh port easily to new systems.  I distribute
ksh throughout AT&T.  I am nearly finished writing a book that specifies
the ksh language.

However, I have no control over the release of ksh outside of AT&T.
Personally, I think that ksh should come with the UNIX system, 
just as HyperCard comes with MacIntosh systems.

David Korn
ulysses!dgk