Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!cwjcc!gatech!ncsuvx!lll-winken!lll-lcc!pyramid!infmx!aland From: aland@infmx.UUCP (Dr. Scump) Newsgroups: comp.databases Subject: Re: need HELP with index problem Summary: index strategy for indexing fields with many duplicates Keywords: index informix Message-ID: <715@infmx.UUCP> Date: 28 Dec 88 05:55:01 GMT References: <8973@cit-vax.Caltech.Edu> Organization: Informix Software Inc., Menlo Park, CA. Lines: 104 In article <8973@cit-vax.Caltech.Edu>, citron@cit-vax.Caltech.Edu (Mark C. Citron) writes: > Hope someone can understand what is going wrong here. I have a > table in a database that has three indices: > a composite unique index made up of date, name1 and slotnumber > an index of name2 > an index of unitnumber > > there are about 50,000 rows in the table. In about 90% of the rows the > unitnumber is 0. In time these will be changed to other numbers (the > slots will be "filled"). I am assuming that you query on this field often, looking for the "exceptions" (e.g. those rows with "filled" slots). If you do *not* search on this field often, then having an index on it is counter- productive. > Now here is the problem: When I update the unitnumber field > and that field already contains a number OTHER THAN 0 > the update goes very fast. GREAT! Well, that's nice to hear :-] > Updating the unitnumber field to 0 (independent of what it was) also > goes very fast (less than a sec). Now I'm confused. Well, maybe I'm not. Lessee... in other words, as long as the field's value *before the update* was other than 0, it goes fast. OK, makes sense... > HOWEVER, when I update the unitnumber field > and that field already contains a 0, the update takes forever (about 30 > sec). There is a lot of disk accessing. Yikes! 30 seconds is forever? I'm older than I thought! :-] Seriously, though, this makes sense. What happens when there is a lot of duplication within a field is that those "identical" index entries (since you had just the one field in that index) must be traversed sequentially, since it can't identify whether or not other WHERE clause criteria (if any) are satisfied just by looking at that particular index. When you run your update, the proper row(s) must be identified first. *That* is what takes so much time; since you have an index on that exact field already, the optimizer will use that index first to qualify the needed row(s). The high degree of disk usage that you see is the traversal of these identical entries in the B+ index tree, checking the other search criteria for each of these rows which have unitnumber = 0. > This is an Informix database. I might guess that the problem is > that although unitnumber is indexed, there are so many rows with > the same unit number that Informix has trouble locating the > record to update. Is this correct? How do I get around this? > This performance is awful! Exactly. Not so much that there is "trouble locating the record" as "we have to check every record qualified by this value in the index." Same effect. To avoid this duplicate search, the best bet is to make that index a composite index also. I would guess that this update you run probably uses the same criteria each time, just with different values; you probably search for something like: .. WHERE unitnumber = 0 AND slotnumber = 32768 or .. WHERE unitnumber = 0 AND date = today or whatever. The solution to this is simple: create that index for unitnumber as a *composite* index, tacking whatever other field is your most frequent "additional" search criteria after the unitnumber. For example, if the first criterion above is what you use, create a composite of unitnumber and slotnumber, thus: CREATE INDEX unitnslot ON tablename(unitnumber, slotnumber) or, if the second criterion above is what you use, create a composite of unitnumber and date: CREATE INDEX unitndate ON tablename(unitnumber, curdate) *instead of* the singleton index on unitnumber. Now, there will be very few duplicates (or none at all) in the index. Also, for a compound condition search (e.g. WHERE unitnumber = 0 AND slotnumber = 32768), the composite index will be used directly to find what should be only a few rows (and therefore find them quickly). If a search is performed on unitnumber alone, it will still be fast, since the optimizer can use composite indexes for searches on the first column in the index as easily as if it was a singleton index on that column. (Of course, if you search on unitnumber=0 alone, it will be slower because you will be qualifying some 45,000 rows). By the way, there is a limit of 64K occurrences of the same value in a C-ISAM index (in the current and past releases, anyway). If you had a column with, say, the value 0 in 70,000 of the rows and had a singleton index on that column, you would have problems with that index. (BCHECK and CHECK TABLE would complain). > Thanks for any (ANY!) suggestions. Write or post. > citron@csvax.caltech.edu > citron@cit-vax.UUCP You're welcome. -- Alan S. Denney | Informix Software, Inc. | {pyramid|uunet}!infmx!aland Disclaimer: These opinions are mine alone. If I am caught or killed, the secretary will disavow any knowledge of my actions. Santos' 4th Law: "Anything worth fighting for is worth fighting *dirty* for"