Thursday, February 12, 2009

Scalar Functions vs. Table Valued Functions vs. Inline Code

****Note: This article has been re-posted. It seems that both of my CPU time graphs below were showing the execution time – the first was supposed to show the CPU time. Thanks very much to Kent Waldrop for pointing this out.*****

A couple of weeks ago, Kent Waldrop emailed me with some (extensive) testing he had been doing on “reverse patindex”. He had read my article on the topic, and cringed when he saw the scalar user-defined function (UDF) that I produced.

Rightfully so – his testing showed that the same logic done up as a table valued function using CROSS APPLY performed considerably better. That was a bit of a surprise to me, so I dug a little deeper.

A few minutes of searching brought me to Adam Machanic’s article, Scalar functions, inlining, and performance: An entertaining title for a boring post. In this article, Adam contrasts the performance of a scalar UDF against that of a table-valued UDF for a common purpose – retrieving the MAX OrderQty from AdventureWorks.Sales.SalesOrderDetail for a given ProductID. He goes on to explain that the optimizer doesn’t expand scalar functions nearly as well as it does table valued functions, hence the difference in performance.

This sent me thinking – what if the statement that you’re building into the UDF really shouldn’t benefit from expansion? Would the scalar UDF still fare poorly when compared to the table valued function?

To test, I created a table and populated it with about 580,000 rows of pseudo random strings:

USE tempdb

GO

--Create a table to hold our test data

CREATE TABLE FunctionTest

(

TEXTCol NVARCHAR(200)

)



--Populate FunctionTest with a 580000ish pseudo-random strings

--(strings with our without padding on either end)

INSERT FunctionTest(TEXTCol)

SELECT REPLICATE(' ', CAST(RAND() AS decimal(1,0))) +
CAST(NEWID() AS VARCHAR(50)) +

REPLICATE(' ', CAST(RAND() AS decimal(1,0)))

FROM (SELECT sv.number * sv2.number AS number

FROM master..spt_values sv
CROSS JOIN master..spt_values sv2

WHERE sv2.number <= 100

AND sv.[type] = 'P') AS a


I then created a scalar function and a user defined function. I kept it simple – all that I wanted the functions to do was to trim both leading and trailing spaces from a string. How many people couldn’t figure out what they were doing wrong the first time they tried to use Trim(SomeColumn) in SQL Server? There must be a trim in SQL Server…..right? Of course, there is no Trim() operator in SQL Server – instead, we have Left Trim and Right Trim, or LTrim and RTrim, respectively. The functions I created below just implemented a simple trim – that is to say, they applied both a left trim and a right trim to the string that was input. Before you tell me that there is a left trim and a right trim for a good reason, and I’m ruining the optimizer’s ability to use an index by left trimming – don’t worry. This is just an example, designed as a test model and nothing else:

--Create a scalar function to combine LTrim and RTrim

CREATE FUNCTION dbo.ScalarTrim(@StrToTrim NVARCHAR(200))

RETURNS NVARCHAR (200)

AS

BEGIN

RETURN
LTRIM(RTRIM(@StrToTrim))

END

GO

--Create a table valued function to combine LTrim and RTrim

CREATE FUNCTION dbo.TVFTrim(@StrToTrim NVARCHAR(200))

RETURNS TABLE

AS

RETURN
(SELECT LTRIM(RTRIM(@StrToTrim)) AS Trimmed

)

GO

I wanted to compare three different ways of querying the FunctionTest table that we created above – an inline query, the scalar function, and the table-valued function:

--Inline query

SELECT LTRIM(RTRIM(TEXTCol))

FROM FunctionTest



--Scalar Function

SELECT dbo.ScalarTrim(TEXTCol)

FROM FunctionTest



--Table-valued function

SELECT tvf.Trimmed

FROM FunctionTest ft

CROSS APPLY dbo.TVFTrim(ft.TEXTCol) AS tvf

I started with the obvious – check the IO to make sure that nothing was screaming “inefficient”. So I SET STATISTICS IO ON, and ran all three queries through. The total subtree cost for all three plans was the same, but the scalar function differed from the other two in one respect. All three queries performed more or less the same operations:

SELECT LTrim(RTrim(TextCol)) FROM FunctionTest
--Compute Scalar(DEFINE:([Expr1004]=ltrim(rtrim([tempdb].[dbo].[FunctionTest].[TextCol]))))
--Table Scan(OBJECT:([tempdb].[dbo].[FunctionTest]))


SELECT dbo.ScalarTrim(TextCol) FROM FunctionTest
--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[ScalarTrim]([tempdb].[dbo].[FunctionTest].[TextCol])))
--Table Scan(OBJECT:([tempdb].[dbo].[FunctionTest]))


SELECT tvf.Trimmed FROM FunctionTest ft CROSS APPLY dbo.TVFTrim(ft.TextCol) AS tvf
--Compute Scalar(DEFINE:([Expr1003]=ltrim(rtrim([tempdb].[dbo].[FunctionTest].[TextCol] as [ft].[TextCol]))))
--Table Scan(OBJECT:([tempdb].[dbo].[FunctionTest] AS [ft]))

But on the scalar query, the second operation (compute scalar) had a Rows value of 583680 (the number of rows in FunctionTest) and an EstimateRows value of 583680, as opposed to the other two operations, which had a Rows value of 0 and an EstimateRows value of 583680. There was clearly a discrepancy – the scalar function was returning the result set consistently 1-2 seconds after the other queries.

So I had a look at their Actual Execution Plans for all three operations. They largely looked the same. The Compute Scalar step for the scalar function did still have a Rows value of 583680, and the other two didn’t have a Rows value at all. Hmmmm…curiouser and curiouser

I SET STATISTICS TIME ON and ran them through again (in reality, I ran them through both the IO and time tests about five times each). Here’s what I saw:

image

Wow! That’s quite the difference in CPU time! The TVF and System Functions performed more or less identically, but the Scalar Function’s CPU time was roughly ten times higher! The execution time, consistent with the time it took to return the result sets, was more even

image


Why in the heck was the CPU time on the Scalar Function so much longer? To be honest, I don’t know. As much as I scraped through my standby references – Inside SQL Server 2005: The Storage Engine and Query Tuning And Optimization, I couldn’t find anything. They might be there – I just can’t find them ;-). Same went for BOL. The TVF vs. Scalar Function this does seem to be fairly well known, but it would be nice to know why the discrepancy exists. Inquiring minds and all...

Hopefully, somebody reading this does know and will enlighten us all! Whatever the case, it just reinforces the time-honored secret to SQL success….test, test, and….I forget the last one.

6 comments:

Kent Waldrop said...

Aaron, I thought I had left comments earlier but apparently my save of the comments failed. I didn't realize when I responded to your previous article how compelling the CPU difference might be. I think the CPU difference is eye popping. Nice work.

Kent Waldrop

Aaron Alton said...

Strange - I did get an alert for your previous comment as well. Maybe blogger is having issues.

Thanks very much for the feedback.

John Hunter said...

Wow that was quite an eye opener I am going to have to go back and check all my scalar functions now. Did you ever get to the root cause of the problem?

Phil Nash said...

I found your blog post after seeing the same issue myself and trying to research it.
You might be interested in the stackoverflow question I posted as a result (which links back to this post):

http://stackoverflow.com/questions/1231862/metaprogramming-with-stored-procedures

Arjun said...

Wow, good to know this. Thanks Aaron.

Bohdan Hotskyy said...

Not reproducable on:
Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64)
Oct 19 2012 13:38:57
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.0 (Build 6002: Service Pack 2)