Thursday 4 June 2009

Catch all queries and indexing

Amongst many others, Gail Shaw has blogged on the subject of catch all queries.  On many occasions I have needed to do something similar and found performance to be pretty dire on large tables.  Typically I would use dynamic SQL to generate the query but there are occasions where that is not possible. After a wee bit of poking around and trial and error , I think I am now straight in my mind of when SQLServer will index and when it will scan.  Consider this proc

Drop procedure test
go
Create procedure
test @ProductId integer,
@ReferenceOrderID integer
with recompile
as
select
*
from [Production].[TransactionHistory]
where (@ProductId is null or @productid = ProductID)
and (
@ReferenceOrderID is null or @ReferenceOrderID = ReferenceOrderID)

This will always create an index scan ,try these calls

exec test NULL, 61197 
exec test 790 , 61197
exec test 790 , NULL
However the Procedure can be re-written to the logically similar
drop proc test
go
create proc
test @ProductId integer,
@ReferenceOrderID integer
as
select
*
from [Production].[TransactionHistory]
where (@productid is not null and @productid = ProductID)
or (
@ReferenceOrderID is not null and @ReferenceOrderID = ReferenceOrderID)


Note no recompile is needed.  Now re-execute the procedure.  All being well you should notice index seeks and a much lower IO Cost.  Before we celebrate to much , there are some problems.  Firstly, if any one column that is referenced is not indexed then a scan will be reintroduced.  Try

drop proc test
go
create proc
test @ProductId integer,
@ReferenceOrderID integer,
@Quantity integer = NULL
with recompile
as
select
*
from [Production].[TransactionHistory]
where (@productid is not null and @productid = ProductID)
or (
@ReferenceOrderID is not null and @ReferenceOrderID = ReferenceOrderID)
or (
@quantity is not null and @quantity = Quantity)
Secondly an OR condition is used so it is now a union of Transactions which use either the given product of ReferenceOrderId.  So,  in most cases I would recommend two queries ,  the first to do the index seeks and dump the results to a temp table and the second to filter that based upon the non-indexed columns and to ensure that that all the required consitions are met.   The only assumption here is that the first pass has matched some rows.  This may seem long winded but if you are unable to use dynamic SQL and table scanning massive tables is out of the question then your options are rather limited.

No comments:

Post a Comment