Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!samsung!uunet!dev!dgis!jkrueger From: jkrueger@dgis.dtic.dla.mil (Jon) Newsgroups: comp.databases Subject: Re: Simple SQL question Keywords: SQL Message-ID: <670@dgis.dtic.dla.mil> Date: 23 Nov 89 04:51:38 GMT References: <98813@ti-csl.csc.ti.com> <1989Nov19.200834.266@mrspoc.Transact.COM> <1217@hpuviea.UUCP> <926@anasaz.UUCP> Organization: Defense Technical Information Center (DTIC), Alexandria VA Lines: 72 john@anasaz.UUCP (John Moore) writes: [For a db like +------+ | Frod | Location Name Birthdate +------+----------+------+------------+ | Atlanta | Doe | 1 Sep 1960 | | Boston | Ray | 2 Sep 1960 | +----------+------+------------+ ] >Query: Give me the first ten names by date of birth where > location is one of a set of locations which is a small subset > of all the locations. select * from frod where location = "Boston" or location = "Hartford" order by birthdate; Alternatively: select * from frod f, locations l where f.location = l.location order by birthdate; No cursors necessary or desirable. No 3GL necessary. The problem is trivial. This is exactly what DBMS are designed to do. Most are quite good at it. >Does someone have an optimizer which would generate a GOOD strategy for: > SELECT Name, Location > FROM Frod > WHERE Location in ( "Phoenix", "Atlanta", "Denver", "Seattle" ) > AND Day_of_Birth >= 12/31/47 > ORDER BY Day_of_Birth ; Yes. Most of us do. This is a simple single-table query. The optimization possibilities are small and amount to deciding whether to select first on location or birth date. If exactly one of them has an index, it's a good candidate. All commercial products can do this much. If both or neither have indices, or as an additional check, we could examine data distributions in each column and predict which would return fewer rows. In this example, if locations were evenly distributed among cities but only 10% of the birth dates were after 12/31/47, birth date would be a better candidate. To my knowledge, only INGRES's query optimizer does this. After that, we could check integrities defined on the columns. If location draws from a list of valid locations, and "Phoenix" isn't on the list, we can skip checking for it. To my knowledge, no commercial product does this. Probably none supports integrities in a sufficiently well-defined manner to make this optimization safe anyway. For instance, consider integrities maintained by triggers. In summary, three optimizations is about it, and they all boil down to deciding which of two columns to select on first. After that, execution speed is a near-linear function of disk-memory bandwidth. If the table size is small with respect to memory size, caching can change this to a very-near-linear function of processor speed. >Definition of good: performance >= what could be achieved by doing it >by hand using low level operations. Will you add in the labor costs of maintaining the low level optimizations? If not, long will your hand crafted solution compete on existing hardware with the portable solution as it continues to run on faster hardware? -- Jon -- Jonathan Krueger jkrueger@dtic.dla.mil uunet!dgis!jkrueger Isn't it interesting that the first thing you do with your color bitmapped window system on a network is emulate an ASR33?