Tuesday, May 26, 2009

Easy On The Updates There, Sparky

Ahh, the UPDATE statement. Do you ever stop to think of where we would be without the UPDATE statement? Can you imagine writing a DELETE followed by an INSERT every time we needed to modify a data row? I thought not. The very word “update” rings of timeliness, relevance, and light. Unfortunately, as is the case with all things, it’s all too easy to go overboard, and you can have too much of a good thing.

You see, to the unsuspecting database developer, it may seem that some operations in SQL Server are more or less “free”. Let’s clear the air on that one – nothing is free, ever. Or if it is, it usually has a 30 day limit. It’s easy to forget this, because when you’re working with something like SQL Server, it’s hard to imagine that a sub-second response time can hide anything of significant concern.

A SQL Server update is actually a relatively “expensive” operation, and as such needs to be done only when necessary. Wondering what I’m blathering on about? OK, let me ask you a question – which of the following two statements is more expensive?

UPDATE Customer

SET AccountStatus = 0

WHERE LastPurchaseDate < '20090101'

AND CurrentBalance = 0

UPDATE Customer

SET AccountStatus = 0

WHERE LastPurchaseDate < '20090101'

AND CurrentBalance = 0

AND AccountStatus <> 0

(For those of you who haven’t yet had your coffee today, the second statement has an extra line).

Your choices are:
- 1
- 2
- It Depends

How many of you chose “it depends”? It’s the safe answer, so I’ll wager that most of you did. And I’m sure you’re right, because there are edge cases for everything. But let’s get real and talk about the 99.99% of cases that you should concern yourself with.

The correct answer is “1”.

Don’t believe me? OK, let’s dig a bit deeper. The first batch is, in plain English, setting the “status” flag for all customers who haven’t purchased in this calendar year, and don’t owe us anything, to “0” (0 = Inactive, which you’ll see when I post the sample data script). The second batch is setting the “status” flag for all customers who haven’t purchased in this calendar year, and don’t owe us anything, and who aren’t already inactive, to zero.

So what’s the big deal? If it’s already set to “0”, SQL Server will just pass it by, won’t it?

No, no it won’t. SQL Server performs every update in exactly the same manner, regardless of whether or not the “new” value is any different from the “old” one. The following is an oversimplification of what happens:

  • SQL Server acquires an Intent Exclusive (IX) lock on the rows or pages being modified. Note that an IX lock is NOT compatible with a shared (S) lock, so your update will block readers, even if it isn’t “changing” the value. This also means that your update can be blocked by readers, even if it doesn’t really need to update those rows. When SQL Server is ready to perform the update, it converts the IX lock into an Exclusive (X) lock.
  • The actual update occurs. SQL Server implements an update internally as a “delete” followed by an “insert”. ALL rows covered by the UPDATE statement are deleted and inserted, regardless of whether or not the value has actually changed.

  • Any update triggers which are defined on the table fire.

What’s that? You don’t believe that SQL Server actually updates the rows, even if the value hasn’t changed? Check it out for yourself. In the following example, we create a table called “Customer”. No comments on the schema – this isn’t a database design post, and it’s an easy example to work with. On said table, we create a FOR UPDATE trigger, which will fire any time an update is made. Have a look at the results:

--Create a table for testing




CustomerName VARCHAR(200),

LastPurchaseDate datetime NULL,

CurrentBalance decimal(18,6),

AccountStatus bit NOT NULL DEFAULT 0 -- 0 = Inactive, 1 = Active



--Create a trigger on our test table.
--This trigger fires whenever an update is actually performed

CREATE TRIGGER Customer_Update

ON Customer




@RowsDeleted INT, @RowsInserted INT



PRINT 'Rows Deleted: ' + CAST(@RowsDeleted AS VARCHAR(20)) + CHAR(10) + CHAR(13) +

'Rows Inserted: ' + CAST(@RowsInserted AS VARCHAR(20))



--Insert a test row. Note that the AccountStatus is 0

INSERT Customer (CustomerName, LastPurchaseDate, CurrentBalance, AccountStatus)

SELECT 'Trapper Sam''s House of Peameal Bacon', '20081215', 0, 0


--Perform an update

UPDATE Customer

SET AccountStatus = 0

WHERE LastPurchaseDate <= '20090101'

AND CurrentBalance = 0


Zut Alors!

Rows Deleted: 1

Rows Inserted: 1

(1 row(s) affected)


I hate to say I told you so…. ;-)

Now if you’re following along at home, don’t drop that table quite yet – we’re not done with it.

Now note that, in order to prove to you that the rows were updated, we also demonstrated that any triggers defined on the table are fired. In the event of an update that actually changes 0 rows, but updates >0 rows, the trigger will fire unnecessarily.

Lock Escalation
Remember lock escalation? Basically, SQL Server consumes resources in order to place and track locks. For a large query, the resources (memory and CPU, in specific) required to maintain a significant number of locks is not insignificant. Ideally, row locks are most conducive to concurrency, as the chances of one request blocking another is reduced. Unnecessarily updating rows can cause the lock manager to select page locks over row locks, or to escalate your row/page locks to table locks!

Transactional and Merge Replication
Although they employ different mechanisms, both transactional and merge replication will replicate all updates on published articles – with no regard to whether or not the row’s data was actually updated. This isn’t a failing on the part of the replication agents – it’s simply the most efficient way for them to work. If you’re “updating” rows unnecessarily, you’re replicating useless updates – you didn’t “change” anything, but it’s published anyway!

Database Mirroring
Do you use database mirroring? If you do, I’m sure you know that mirroring uses the SQL Server transaction log to keep the mirror database in sync with the primary. Just as with replication, if you’re performing unnecessary updates, you’re now adding to your mirror load as well.

Database Snapshots
Database Snapshots use a copy-on-write mechanism to maintain a point-in-time record of a database’s pages. When a page in the source database is written to, the page is first copied to the snapshot. As a result of this copy-on-write mechanism, you’re pushing rows that weren’t really changed into the snapshot! (Thanks to Paul Randal for confirming this).

So how do we prevent this kind of carnage? Well, as the code example above shows, all we need to do is add the column(s) being updated to the WHERE clause. Keep in mind that application code often doesn’t need to deal with this, because the data access components only pass back updates for rows that were actually changed. When in doubt it’s best to make sure that the application is doing this kind of “intelligent updating”, as it can have a significant impact on your application’s throughput.

Ladies and Gentlemen, that’s all for today. Please sign the guestbook on your way out.


Michelle Ufford said...

Nice post, Aaron! :)

Mike Walsh said...

You told me I had to write in the Guest Book before I left. So I am.

Great post. This is something I tend to harp on developers about from time to time. I bet that more of your readers picked option 2 than you gave credit though ;-)

bft42 said...

I'm a bit confused right now.. you were asking for the most expensive query, right? Isn't #2 going to be _less_ expensive than #1, since fewer rows are likely to be updated?

Aaron Alton said...

@Michelle - Thanks!

@Mike - You're probably right...

@bft41 - Doh! I'm revising the article now so I don't look like an idiot for all posterity. Thanks for the catch.