Wednesday, June 10, 2009

Defensive Programming – Assumptions Must be Guaranteed or Tested

Every couple of weeks so often, I come across a well-intentioned piece of Transact-SQL code that simply wasn’t designed with 100% reliability in mind. I won’t go into detail about what makes a piece of TSQL well-intentioned – let’s just say that it doesn’t look malicious or anything. Anyway, the type of code I’m talking about is the sort of thing that can easily go undetected through the application’s design, all through QC and acceptance testing, and look deceptively innocent for months – even years – until one day it breaks.

Confused yet? Good. Here’s an example of what I’m talking about:

DECLARE    @BatchToProcess INT

SET
@BatchToProcess = (/*

Select the next batch to process.

Only one batch can ever have (ProcessState=2)

at a time, so this will always return a single

BatchNumber

*/

SELECT BatchNumber
FROM QueuedBatches

WHERE ProcessState = 2

)

Does this look familiar? For the more junior coders in the room, this code will run fine and dandy as long as there genuinely is never more than one batch with ProcessState = 2. The first time there is more than one however, your application will fail, and will generally not do so in a very nice way. As database professionals, there are a good many tools available to us to maintain the integrity of our applications and our application data – constraints, triggers, and the like. When you use schema objects such as these to guarantee the validity of assumptions made in your application code, you’re coding defensively. When you test any assumptions that aren’t otherwise guaranteed, you’re also coding defensively. When you do neither of the above, you’re coding very, very optimistically, and you’ll end up soaked on the first rainy day.

I liken this type code to a bomb ticking on a time-delayed fuse – it may not go off today, and it may not go off tomorrow. Heck, it may never go off. But if and when it does, it’s almost certainly going to be a royal pain in the Heineken to clean up. Clean up enough of them, and you’ll understand that it’s easier to prevent this kind of occurrence than it is to clean up the mess after the fact.

In order to avoid errors in the code written above, I need to guarantee that my assumption is valid, or test to make sure that it is correct. Hence the title of this article. I know, it was inspired.

Now how can I guarantee that the condition I have assumed is valid? Off hand, I can think of two different ways – one works on any recent version of SQL Server, and the other will work on 2008+ only. The first involves creating a trigger to roll back any transactions that try to insert a row with ProcessState = 2, or update a row’s ProcessState to 2, when a row already exists in this state. The code for such a trigger might look something like this:

CREATE TRIGGER QueuedBatches_RestrictProcessState

ON QueuedBatches

FOR INSERT, UPDATE

AS

BEGIN

--Check for inserts or updates that set more than 1 row's ProcessState to 2

IF 1 < (SELECT COUNT(*)

FROM INSERTED

WHERE ProcessState = 2)

BEGIN

PRINT
'You cannot insert more than one record with ProcessState=2'

ROLLBACK TRAN

END


--Check for inserts or updates that set a row's ProcessState to 2, when

--a row already exists in that state

IF 1 < ( SELECT COUNT(*)
FROM QueuedBatches

WHERE ProcessState = 2)

BEGIN

PRINT
'QueuedBatches may only ever have a single record where ProcessState=2.

A record already exists in this state.'
ROLLBACK TRAN

END

END

GO

Of course, you would replace the PRINT statements with whatever error handling mechanism you’re employing.

The second method, applicable on SQL 2008 and greater, would employ a unique filtered index to guarantee that there is never more than one record with ProcessStatus = 2. Said index would look like this:

CREATE UNIQUE INDEX QueuedBatches_ProcessState_Unique_IX

ON QueuedBatches (ProcessState)

WHERE ProcessState = 2

The former method is a bit more flexible, but the latter is almost certainly more efficient.

If (for some reason) we are unable to implement constraints in the schema to guarantee the state of our QueuedBatches data, we can still handle the condition in code. Rather than creating a variable and blindly selecting in (hopefully one) batch number, we could either check for duplicate batches before doing the variable assignment:

IF 1 < (SELECT COUNT(*)

FROM QueuedBatches

WHERE ProcessState = 2)

BEGIN

PRINT
'QueuedBatches data is inconsistent - more than one record with BatchState = 2'

ROLLBACK TRAN

END

or we could fetch the TOP BatchNumber into our scalar variable (provided that doing so will not adversely affect our application):


SET
@BatchToProcess = (
SELECT TOP 1 BatchNumber
FROM QueuedBatches

WHERE ProcessState = 2

)

At the risk of sounding like a broken record, whatever the method you choose, you need to either guarantee or test your assumptions. Any of the above methods is better than the “wing and a prayer” strategy that seems to have become so prevalent nowadays, and only you can decide which method is right for your applications.


Remember – that poor schmuck who gets dumped on by the user community when the app breaks and business grinds to a halt – well, that poor schmuck just may be you.

2 comments:

Brian Tkatch said...

How about:

SELECT @BatchToProcess = BatchNumber FROM QueuedBatches WHERE ProcessState = 2
AND (SELECT SELECT COUNT(*) FROM QueuedBatches WHERE ProcessState = 2) = 1;

IF @BatchToProcess IS NULL THEN PRINT 'QueuedBatches data is inconsistent - more than one record with BatchState = 2';

Aaron Alton said...

Hey Brian,

Thanks for the comment. Your solution is very nice as well. There are a lot of ways to handle these types of errors - the important part being that they're handled!