Newsgroups: comp.std.c Path: utzoo!sq!msb From: msb@sq.sq.com (Mark Brader) Subject: Re: qsort(), bsearch(), and rude comparison functions Message-ID: <1990Oct9.062455.21197@sq.sq.com> Summary: comparison function is required to be well-behaved Organization: SoftQuad Inc., Toronto, Canada References: <1990Oct6.013443.5364@twinsun.com> Date: Tue, 9 Oct 90 06:24:55 GMT Lines: 119 Paul Eggert (eggert@ata.twinsun.com) writes: > Suppose I pass to qsort() a comparison function that isn't a total order. > May qsort() dump core? I think this is a Yes. 4.10.5.2 says ... # ... a comparison function pointed to by compar, which is called with # two arguments that point to the objects being compared. The function # shall return an integer less than, equal to, or greater than zero if # the first argument is considered to be respectively less than, equal # to, or greater than the second. As there is no special language to say otherwise, I think that the terms "less than", "equal to", and "greater than" must be taken to have their ordinary meanings, including the property that "less than" is transitive. The general rule in 4.1.6 then applies: # If an argument to a function has an invalid value ... the behavior # is undefined. A pointer to a function that doesn't have the required property is clearly an invalid value, so the behavior is undefined and it may indeed dump core. > While we're on the subject, what happens if the comparison function has side > effects, benign or otherwise? Nothing prohibits this. > How are the sequence points defined here? For > example, suppose I qsort() an N-item array using a comparison function that > increments a global counter starting at zero; must the counter be at least > N-1 when qsort() returns? The general rules for sequence points apply. From 3.3.2.2: # ... there is a sequence point before the actual call [to a function]. And from 3.6: # A full expression is an expression that is not part of another # expression. ... The end of a full expression is a sequence point. Thus, if the comparison function does "++count;" or "bar = foo * count++;" then there is a sequence point before the call to the comparison function and another at the end of the "++count;" or "bar = foo * count++;". This means that each of the incrementings of "count" is properly separated, so no problem there. On the other hand, there is no guarantee that the library function qsort() actually calls the comparison function. The following, while no doubt silly, is a permissible implementation as far as I can tell. (No, I haven't tested the code, and there may be minor errors. This is just to give an idea of how a sort function might not call its comparison function.) { static int (*prev_compar) (const void *,const void *); static void *prev_result; static size_t prev_nmemb, prev_size; if (prev_result && compar == prev_compar && (compar == strcmp || compar == memcmp) && nmemb == prev_nmemb && size == prev_size && memcmp (base, prev_result, size * nmemb) == 0) return; /* it's already sorted */ prev_compar = compar; prev_nmemb = nmemb; prev_size = size; free (prev_result); /* ANSI C, no NULL check needed */ prev_result = malloc (size * nmemb); ... /* the actual sort goes here */ if (prev_result) memcpy (prev_result, base, size * nmemb); } The reason for the check that compar is either memcmp() or strcmp() is that if it was a user-written function then qsort() couldn't know whether its behavior was conditional on some global variable. But this does seem to mean that the count is not guaranteed to be at least N-1. > Similar questions apply to bsearch(). Similar answers apply, with one important exception. The comparison function is no longer required to induce a full ordering on the data, but only a partial ordering. 4.10.5.1 says: # The comparison function pointed to by compar is called with two # arguments that point to the key object and to an array element, # in that order. The function shall return an integer less than, # equal to, or greater than zero if the key object is considered, # respectively, to be less than, to match, or to be greater than # the array element. The array shall consist of: all the elements # that compare less than, all the elements that compare equal to, # and all the elements that compare greater than the key object, # in that order. Thus if K is the key, and A < K and K < B, and A, K, and B occur in the array in that order, then it is irrelevant if the comparison function would indicate that B < A. The Standard imposes requirements on "(*compar) (K, A)" and "(*compar) (K, B)" but not on "(*compar) (A, B)". Footnote 129 is attached to the above text and says that "In practice, the entire array is sorted according to the comparison function." Footnotes are not, of course, part of the actual standard. -- Mark Brader "Actually, $150, to an educational institution, Toronto turns out to be about the same as a lower amount." utzoo!sq!msb, msb@sq.com -- Mark Horton This article is in the public domain.