Path: utzoo!attcan!uunet!snorkelwacker!apple!agate!ucbvax!ulysses!ggs From: ggs@ulysses.att.com (Griff Smith) Newsgroups: comp.bugs.sys5 Subject: Re: Sort bug causes data loss Summary: it may be a feature Keywords: bug, sort Message-ID: <13761@ulysses.att.com> Date: 18 Sep 90 20:20:56 GMT References: <2675@crdos1.crd.ge.COM> Organization: AT&T Bell Laboratories, Murray Hill Lines: 81 In article <2675@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > > I have discovered what appears to be a serious bug in the sort > routine used in several SysV variants including Stellar. Since it > causes silent loss of data I am cross posting a bit more than I usually > do. > [deleted some details to save space, followed by test script...] > > sort -nu <x$$.tmp > 1: a > 3: b > 2: c > 1: a > 10: x > XX > > Of course someone may tell me it's supposed to work that way, and that > the BSD version is broken. I suspect this may be the case. The system V manual page says this about the -u option: -u Unique: suppress all but one in each set of lines hav- ing equal keys. This doesn't agree with the code, though. The real behavior matches what I find in the BSD manual page: u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison. The next clue is from the System V manual page again: -n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. The -n option implies the -b option (see below). Note that the -b option is only effective when restricted sort key specifications are in effect. The tricky point is that a numeric comparison stops as soon as it finds a non-numeric character. Since your test file has leading blanks, and you didn't specify a sort key, the numeric comparison stops when it sees the leading blank in each record; the test file appears to contain five empty records as seen by the numeric comparison code. Furthermore, the -u option suppresses the following escape clause in the manual page: When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that oth- erwise compare equal are ordered with all bytes significant. Translation: if a numeric comparison, or a set of keyed comparisons, shows that two records match, `sort' then compares both records as simple text to determine whether the records are really identical. This `tie breaking' test is suppressed if the -u option is enabled. Since all five of your test lines appear to be identical, the -u option deletes all but one of them. I think the command you want to use is sort -nu +0 This forces a trip through the key finder, which activates the code that strips leading blanks. > -- > bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) > VMS is a text-only adventure game. If you win you can use unix. Flames, counter arguments, cheerfully accepted. I didn't write the rules, I just work here. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com