Friday 29 May 2009

Optimal Query Plans With Unbalanced Data Loads

Sql Server is great at producing and caching efficient query plans.  However these plans may not be the most efficient based upon the parameters passed into subsequent calls off a stored procedure.

In this article I am assuming that you have read both of Erland Sommarskog’s articles The Curse and Blessings of Dynamic SQL and Dynamic Search Conditions in T-SQL.

First off,  in AdventureWorks we need to create a few indexes help help prove the point.

Create Index IdxLastName on person.Contact(LastName)
Create Index IdxContactId on sales.SalesOrderHeader(ContactID)
Create Index IdxOrderDate on sales.SalesOrderHeader(OrderDate)

Then if you execute :


Select * from sales.SalesOrderHeader  SOH ,
person.Contact CON
where SOH.ContactID = CON.ContactId
and Con.LastName = 'Smith'
and SOH.OrderDate between '2002-09-01' and '2002-09-02'

and


Select * from sales.SalesOrderHeader  SOH ,
person.Contact CON
where SOH.ContactID = CON.ContactId
and Con.LastName = 'Smith'
and SOH.OrderDate between '2002-09-01' and '2005-09-02'

You will notice the stark, and unsurprising,  difference in query plans.  In the first query its quicker to find all those order made on the 1st or 2nd of September  2002 and filter that list for those made by ‘Smith’s.  In the second it has found the orders made by ‘Smiths’ and filter those order for those made between 01 September 2002 and 02 September 2005.


Naturally if we did :


declare @LastName nvarchar(50),
@OrderDateLo smalldateTime,
@OrderDateHi smalldatetime

select
@LastName = 'Smith'
select @OrderDateLo = '2002-09-01'
select @OrderDateHi = '2002-09-01'
Select * from sales.SalesOrderHeader SOH ,
person.Contact CON
where SOH.ContactID = CON.ContactId
and Con.LastName = @LastName
and SOH.OrderDate between @OrderDateLo and @OrderDateHi

followed by


declare @LastName nvarchar(50),
@OrderDateLo smalldateTime,
@OrderDateHi smalldatetime

select
@LastName = 'Smith'
select @OrderDateLo = '2002-09-01'
select @OrderDateHi = '2005-09-01'
Select * from sales.SalesOrderHeader SOH ,
person.Contact CON
where SOH.ContactID = CON.ContactId
and Con.LastName = @LastName
and SOH.OrderDate between @OrderDateLo and @OrderDateHi

Then SQL Server will use the same (non optimal in the second case) cached query plan.


This could have quite an impact on your system.  We can , although not completely, resolve this inefficiency with one or two little tricks.   When deciding if to re-use a cached plan or create a new one SqlServer compares the entire SQL Statement , comments and all.  This means that :


Select * from person.Contact CON Where Con.LastName = 'Smith' /* 1 */

and


Select * from person.Contact CON Where Con.LastName = 'Smith' /* 2 */

will have different cache plans. Verify this yourself by checking sys.dm_exec_cached_plans.  We can use this to our advantage by generating comments based upon the input data to create different query plans.


Using a simplified search_orders procedure where we are only interested in search on last name and an order date range such as this :


Create Procedure search_orders @LastName    nvarchar(50) = NULL,
@OrderDateLo smalldateTime =NULL,
@OrderDateHi smalldatetime =NULL
as
Declare
@Sql nvarchar(max)

Select @Sql = 'Select * from sales.SalesOrderHeader SOH , person.Contact CON where SOH.ContactID = CON.ContactId '
if(@LastName is not null) Select @Sql = @Sql +' and CON.LastName = @LastName '
if(@OrderDateLo is not null)Select @Sql = @Sql +' and SOH.OrderDate >= @OrderDateLo '
if(@OrderDateHi is not null) Select @Sql = @Sql +' and SOH.OrderDate <= @OrderDateHi '
declare @DaysDiff integer
Select
@DaysDiff = datediff(dd,coalesce(@OrderDateLo,'2001-07-01 00:00:00.000'),
coalesce(@OrderDateHi,getdate()))
Select @Sql = @Sql + case when @DaysDiff <=2 then ' /* DR:1 */'
when @DaysDiff <=7 then ' /* DR:2 */'
when @DaysDiff <=30 then ' /* DR:3 */'
else ' /* DR:4 */' end

exec
sp_executesql @stmt = @Sql,
@params = N'@LastName nvarchar(50),@OrderDateLo smalldatetime,@OrderDateHi smalldatetime',
@LastName = @LastName,
@OrderDateLo = @OrderDateLo,
@OrderDateHi = @OrderDateHi

Note how a comment is added to the dynamic sql string depending on how many days are to be searched.


If we execute search_orders with varying parameters,  the maximum number of query plans we will end up with is 4.  However they will be optimized(ish) to account for possible date range sizes.    Again check sys.dm_exec_cached_plans to confirm this for yourself.


Taking this one step further if you are querying multiple tables for multiple wide or narrow ranges,  you could use a ratio of the count of resolved rows in each table as the comment.  The con is you would obviously have the extra processing of the counting the rows,  but at least you would not end up using an appalling query plan.

No comments:

Post a Comment