****Note: This article has been re-posted. It seems that both of my CPU time graphs below were showing the execution time – the first was supposed to show the CPU time. Thanks very much to Kent Waldrop for pointing this out.*****
A couple of weeks ago, Kent Waldrop emailed me with some (extensive) testing he had been doing on “reverse patindex”. He had read my article on the topic, and cringed when he saw the scalar user-defined function (UDF) that I produced.
Rightfully so – his testing showed that the same logic done up as a table valued function using CROSS APPLY performed considerably better. That was a bit of a surprise to me, so I dug a little deeper.
A few minutes of searching brought me to Adam Machanic’s article, Scalar functions, inlining, and performance: An entertaining title for a boring post. In this article, Adam contrasts the performance of a scalar UDF against that of a table-valued UDF for a common purpose – retrieving the MAX OrderQty from AdventureWorks.Sales.SalesOrderDetail for a given ProductID. He goes on to explain that the optimizer doesn’t expand scalar functions nearly as well as it does table valued functions, hence the difference in performance.
This sent me thinking – what if the statement that you’re building into the UDF really shouldn’t benefit from expansion? Would the scalar UDF still fare poorly when compared to the table valued function?
To test, I created a table and populated it with about 580,000 rows of pseudo random strings:
--Create a table to hold our test data
CREATE TABLE FunctionTest
--Populate FunctionTest with a 580000ish pseudo-random strings
--(strings with our without padding on either end)
SELECT REPLICATE(' ', CAST(RAND() AS decimal(1,0))) +
CAST(NEWID() AS VARCHAR(50)) +
REPLICATE(' ', CAST(RAND() AS decimal(1,0)))
FROM (SELECT sv.number * sv2.number AS number
FROM master..spt_values sv
CROSS JOIN master..spt_values sv2
WHERE sv2.number <= 100
AND sv.[type] = 'P') AS a
I then created a scalar function and a user defined function. I kept it simple – all that I wanted the functions to do was to trim both leading and trailing spaces from a string. How many people couldn’t figure out what they were doing wrong the first time they tried to use Trim(SomeColumn) in SQL Server? There must be a trim in SQL Server…..right? Of course, there is no Trim() operator in SQL Server – instead, we have Left Trim and Right Trim, or LTrim and RTrim, respectively. The functions I created below just implemented a simple trim – that is to say, they applied both a left trim and a right trim to the string that was input. Before you tell me that there is a left trim and a right trim for a good reason, and I’m ruining the optimizer’s ability to use an index by left trimming – don’t worry. This is just an example, designed as a test model and nothing else:
--Create a scalar function to combine LTrim and RTrim
CREATE FUNCTION dbo.ScalarTrim(@StrToTrim NVARCHAR(200))
RETURNS NVARCHAR (200)
--Create a table valued function to combine LTrim and RTrim
CREATE FUNCTION dbo.TVFTrim(@StrToTrim NVARCHAR(200))
RETURN (SELECT LTRIM(RTRIM(@StrToTrim)) AS Trimmed
I wanted to compare three different ways of querying the FunctionTest table that we created above – an inline query, the scalar function, and the table-valued function:
FROM FunctionTest ft
CROSS APPLY dbo.TVFTrim(ft.TEXTCol) AS tvf
I started with the obvious – check the IO to make sure that nothing was screaming “inefficient”. So I SET STATISTICS IO ON, and ran all three queries through. The total subtree cost for all three plans was the same, but the scalar function differed from the other two in one respect. All three queries performed more or less the same operations:
SELECT LTrim(RTrim(TextCol)) FROM FunctionTest
SELECT dbo.ScalarTrim(TextCol) FROM FunctionTest
SELECT tvf.Trimmed FROM FunctionTest ft CROSS APPLY dbo.TVFTrim(ft.TextCol) AS tvf
--Compute Scalar(DEFINE:([Expr1003]=ltrim(rtrim([tempdb].[dbo].[FunctionTest].[TextCol] as [ft].[TextCol]))))
--Table Scan(OBJECT:([tempdb].[dbo].[FunctionTest] AS [ft]))
But on the scalar query, the second operation (compute scalar) had a Rows value of 583680 (the number of rows in FunctionTest) and an EstimateRows value of 583680, as opposed to the other two operations, which had a Rows value of 0 and an EstimateRows value of 583680. There was clearly a discrepancy – the scalar function was returning the result set consistently 1-2 seconds after the other queries.
So I had a look at their Actual Execution Plans for all three operations. They largely looked the same. The Compute Scalar step for the scalar function did still have a Rows value of 583680, and the other two didn’t have a Rows value at all. Hmmmm…curiouser and curiouser
I SET STATISTICS TIME ON and ran them through again (in reality, I ran them through both the IO and time tests about five times each). Here’s what I saw:
Wow! That’s quite the difference in CPU time! The TVF and System Functions performed more or less identically, but the Scalar Function’s CPU time was roughly ten times higher! The execution time, consistent with the time it took to return the result sets, was more even
Why in the heck was the CPU time on the Scalar Function so much longer? To be honest, I don’t know. As much as I scraped through my standby references – Inside SQL Server 2005: The Storage Engine and Query Tuning And Optimization, I couldn’t find anything. They might be there – I just can’t find them ;-). Same went for BOL. The TVF vs. Scalar Function this does seem to be fairly well known, but it would be nice to know why the discrepancy exists. Inquiring minds and all...
Hopefully, somebody reading this does know and will enlighten us all! Whatever the case, it just reinforces the time-honored secret to SQL success….test, test, and….I forget the last one.