Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!uupsi!sunic!news.funet.fi!hydra!cc.helsinki.fi!wirzenius From: wirzenius@cc.helsinki.fi (Lars Wirzenius) Newsgroups: comp.lang.c Subject: Re: isalpha in ctype.h Message-ID: <1991Mar21.004208.5641@cc.helsinki.fi> Date: 21 Mar 91 00:42:07 GMT References: <1991Mar20.112543.5515@ericsson.se> Organization: University of Helsinki Lines: 53 In article <1991Mar20.112543.5515@ericsson.se>, etxnisj@eos8c21.ericsson.se (Niklas Sjovall) writes: > #define _U 01 > #define _L 02 > extern char _ctype_[]; > #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) > > It's the part (_ctype_+1)[c] i don't understand. Could there be any > segmentation errors using this? Since isalpha is a library function (and a common one at that), there shouldn't be any errors if you use it correctly, i.e. only give it valid arguments. In this case, the arguments have to be valid characters or the value of EOF (as defined in ). The way this (seems to be) implemented by Sun is: _ctype_ is an array, which is subscripted with the character argument (henceforth referred to as c), and each element of the array is a collection of flags that identify various characteristics of the character, such as whether it is a letter or not. As long as you only need to test real characters, you can simply use _ctype_[c]. However, isalpha should handle the value of EOF also. We could first test whether c == EOF, and use _ctype_ only if it isn't, but that requires using c twice, which isn't good, because of possible side effects (isalpha(getchar()) is quite reasonable sometimes). What we do instead is define EOF as -1 (we can do that, since we're writing the whole library), and arrange so that EOF's flags come at the beginning of the array (_ctype_[0]), then the real characters' flags, each at an index one greater than the numeric value of the character. This means that we can write _ctype_[c+1] to access the flags for character c; EOF is -1 so its flags come at _ctype_[-1+1], i.e. _ctype_[0]. Another way to write the expression is to use pointer arithmetic. This is what Sun has done. The value of the name of an array, _ctype_, becomes in value contexts a pointer to the first element of the array, &_ctype_[0]. If we add 1 to this pointer, we get a pointer to the next element, _ctype_[1]. This pointer is then subscripted with the character argument, since now the flags for character c are at offset c. The flags for EOF are at index -1, which in this case is a valid index, since it is still inside the real array, _ctype_. However, subscripting _ctype_ with -1 (i.e. _ctype[-1]) is quite illegal, and can very well result in a segmentation error; the same happens if you call isalpha(-2). Exactly what happens depends on the system, I believe 'undefined behaviour' is the phrase used in the ANSI standard for C (there have been many nice suggestions for this behaviour, ranging from mailing a complaint to Dennis Ritchie, to launching a nuclear attack; segmentation errors and system crashes are more normal ones (I hope :-)). -- Lars Wirzenius wirzenius@cc.helsinki.fi