Xref: utzoo comp.unix.questions:7153 comp.unix.wizards:8663 Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!pasteur!ucbvax!ulysses!dgk From: dgk@ulysses.homer.nj.att.com (David Korn[eww]) Newsgroups: comp.unix.questions,comp.unix.wizards Subject: Re: KSH portability Summary: Portability is a complex issue... Message-ID: <10309@ulysses.homer.nj.att.com> Date: 20 May 88 16:15:51 GMT References: <295@cmtl01.UUCP> <12142@tut.cis.ohio-state.edu> <631@vsi.UUCP> <341@alice.marlow.reuters.co.uk> Organization: AT&T Bell Laboratories, Murray Hill Lines: 171 There have been some recent articles concerning the portability of ksh. Since I wrote ksh, let me comment about portability in general and about ksh specifically. Portability is a complex issue; enough to write a book about. In fact, Mark Horton is writing a book about portability called "How to Write Portable Software in C". The book should be published by Prentice Hall sometime next year. I have worked on many UNIX and UNIX look-alike systems during the last twelve years. One of the most annoying things to me is how difficult it is to port software to each of these environments. Ironically, low level languages, such as C tend to be more portable than higher level languages such as shell. Makefiles tend to be least portable of all. There are two distinct considerations concerning portability. One concern is how portable is the code across different systems. The second, is how portable is the code with respect the the various compilers or interpreters that it may run with. The first is a design goal, I refer to as design portability, the latter is an implementation issue, I refer to as implementation portability. To write a portable program, you must design it to be as system independent as possible and then implement it in a way that the code ports to as many environments as possible. The importance of design portability is that if the program becomes universally available, then users will not have to be concerned with implementation portability issues when they use the program. To achieve a very high degree of implementation portability in C requires doing things in less than ideal ways. For example, some implementations of C allow only externals of no more than six case insensitive characters. To be portable, you have to constrain the name space for external routines. You also can't use many of the recent (not very recent) features of the language. I have found problems with using void, structure assignment, non-unique structure names, and enumerations. There are some C pre-processors that allow #ifdef but do not allow #if constructs. To write portable shell scripts, you cannot use shell functions, # for comment, a colon within expansions (for example ${name:-bar}), pattern classes which use ! for negation, and a number of other features that we often take for granted. The Bourne shell was not designed to be system portable and relies on the underlying system for carrying out basic tasks. For example, the echo command is not part of the Bourne shell on some systems and since the behavior is incompatible on different systems, any script that uses echo may not be portable. Another prime example is test. One of the goals of ksh is to be able to write shell scripts that are portable across environments. The full benefit to this can only be realized when ksh is readily available everywhere. To meet the design goal, the shell had to have enough built-in capability that useful scripts could be written without relying on the host environment. This is one reason why test is a built-in command and why the print built-in was added. The echo command, while widely used, varies on different systems. I know of at least four variants around. To become readily availabe, the code and makefiles for ksh must port easily to other environments. I decided to base the shell I/O on standard I/O several years ago. This is a decision I have frequently regretted because it has caused many portability problems and been the source of many of the bugs that have been reported. I chose to use standard I/O for two reasons. First of all, I wanted ksh to port to non-UNIX systems. Secondly, I wanted to make it easy to add built-in commands. I did not feel that it should be necessary to rewrite a command to make it a built-in command. I envision being able to add built-ins at run time. When I changed the code to use standard I/O there was no clear description of what was the interface and what was private to the implementation. There were and still are holes in the specification of this interface that makes it necessary to muck around with the interface in order to be used by the shell. Let me list a few of the problems that I encountered: 1. How do I find the stream, given the file number. This is required in order to implement the construct 3<&4. 2. How can I use a stream after a fork()? How does the parent and child synchronize? For example, the program read line;cat < file should behead one line of the file. Does, the C program main() { char buffer[80]; gets(buffer); system("cat"); } work correctly on your system when the input comes from a file? 3. What is the state of stdio after a longjmp()? 4. Can I duplicate a stream? How can I move a stream to another file descriptor? 5. Is it legal to use the same buffer for more than one stream? Copying buffers each time the shell forks can be expensive. Why not use one buffer for all the output. 6. How can I create a stream from a string? After all, I can print to a string, why shouldn't I be able to read from one? I should be able to implement eval by calling the parser with input from a stream corresponding to a string. 7. How do I hook the stdio library to the shell editing code? 8. How can I tell whether a buffer has been written or not? How can I tell what the last character written was? To use standard I/O within ksh, I had to make some decisions. Some of the decisions that I made were wise ones, some were not. At one point I wrote a stdio library that conformed with an early ANSI C draft and added extensions so that it could be used by ksh without mucking with its internals. I decided not to use it because it is not .o compatible and thus might conflict with a routine that happend to get loaded in by a weak dependency. Fortunately, I was able to get ksh working with the native stdio on most configurations. The list of operating systems that ksh runs with is quite impressive. It has even ported to some rather non-UNIX like targets such as OS-9. One reason that ksh has been so widely ported is that the makefiles automatically configure in most instances. My experience is that most people who are responsible for bringing up software do not know how to configure them. If the software does not get built on first try, it often does not get built at all. I know that I have trouble figuring out how to answer questions when I get software to install. Often, a month or so after using some software package, it dawns on me that the reason that some feature isn't working is because I misunderstood a question that was asked when I built the software. One of the comments about ksh is that while externally it presents a portable interface, internally, the code is hard to follow because of all the conditional compilation. Overall, this is a valid criticism and one that I have been trying to rectify for the next release. In the past, there were two basic flavors of UNIX, BSD and System V. The way ksh configures itself is to look at its environment and see what files are there and then to deduce what flavor of UNIX is running. For example, /vmunix on BSD and /unix on System V. Because of the number of hybrid systems, this strategy no longer works. POSIX and ANSI hasn't helped. The result is to have even more variants. The next version of ksh uses a differently strategy which I summarize here: 1. Use POSIX 1003.1 as the standard in writing the code. Define macros as needed to map POSIX into each implementation. 2. Generate an include file that defines the feature variants. For example, ksh needs to know whether signals are automatically reset when caught. At compile time a test program is run that tests this feature and then appends a define constant to this file indicating whether signals get reset or not. 3. Use a shell script to build ksh, not make. The shell script uses only features that were in V7 Bourne shell and is highly portable. Dependency checking is needed to maintain a tool, not to build it initially. I use nmake (4th generation make) to maintain ksh since it handles dependency checking and conditional compilation automatically. Some of the comments about ksh refer to bugs, such as referencing through location zero. These are bugs that were discovered after ksh was put into the UNIX Toolchest over two years ago. As I become aware of bugs, I fix them. However, the UNIX Toolchest is for non-supported software and therefore bug fixes do not find their way into Toolchest. To achieve maximum benefit from the design portability of ksh requires it to be available on all machines. This would eliminate uses having to concerned about implementation portabilities when writing shell scripts. I have done as much as I can to achieve this. I have gone to great lengths to make ksh port easily to new systems. I distribute ksh throughout AT&T. I am nearly finished writing a book that specifies the ksh language. However, I have no control over the release of ksh outside of AT&T. Personally, I think that ksh should come with the UNIX system, just as HyperCard comes with MacIntosh systems. David Korn ulysses!dgk