Monday, February 23, 2009

Enabling Data Compression By Default in a SQL Server 2008 Database

For anyone following the Inventory in SQL Server series – don’t worry, we’ll return to your regularly scheduled program tomorrow. I was compelled to interrupt the regular broadcast schedule for something neat that hit me while helping out on the forums the other day. But first, the obligatory introductory paragraphs…..

SQL Server 2008 introduced Data Compression – a method by which data volumes can be reduced at the cost of (sometimes) increased CPU load. Compression isn’t just about hard drive space – reduced data volumes also result in decreased I/O and decreased memory utilization.

This article isn’t an introduction into SQL 2008 compression – Sunil Agarwal has a great introduction to both Row and Page Compression here. Chad Boyd digs deeper into the nature of data compression and it’s impact on performance here, and Linchi Shea has been producing a veritable miniseries on data compression over at SQLBlog.com.

I won’t repeat what they have written – please have a look at their excellent articles if you’d like to dig deeper. What we’re going to address today is the concept of enabling data compression by default, so that any new table you create in SQL Server 2008 is “compressed”.

In case you’re not familiar with the syntax used to enable row or page compression, it looks something like this:

ALTER TABLE MyTable REBUILD WITH (DATA_COMPRESSION=PAGE);

A similar syntax is used for index compression:

ALTER INDEX MyIndexALL ON MyTable REBUILD WITH (DATA_COMPRESSION=PAGE);

There isn’t an option to enable compression for all new tables in a database – but we can get around it. Now before we dig into the solution, let me issue a note of caution:


A Note of Caution
The SQL Server team chose not to include an option to make all tables "compressed" for a reason. Or at least, I assume that they did; they’re not really a very arbitrary bunch. Perhaps it was an oversight, but I’ve got my doubts. Anyway, you usually don’t want to just compress every table that gets created in a SQL Server database. For the same reason that there isn’t a “Create Indexes on All Columns By Default” option in SQL Server, there isn’t a “Compress Every New Table By Default” option – you really need to consider whether or not compression will be a boon for every table that you create in your SQL Server database. A perfect example is how a lookup table – containing a half a dozen rows and a couple of narrow columns – likely won’t benefit very much from compression, and you’ll incur the cost of compressing and decompressing the data on what could be a very regular basis. That said, there are some select cases in which compression of all tables is a good thing, and for those situations, it’s nice to have an alternative.


Anyway, such a suitable case presented itself recently on the MSDN SQL Forums, and I was fortunately hit by a particularly productive brainwave. It hadn’t occurred to me before to try to use a DDL trigger to attack this problem, so I gave it a whirl. Here’s what I came up with:

CREATE DATABASE CompressionTest   
GO

USE CompressionTest
GO
CREATE TRIGGER CompressTablesAfterCreation   
ON DATABASE
AFTER CREATE_TABLE
AS
BEGIN
DECLARE
@TableSchema sysname, @TableName sysname
SELECT @TableSchema = EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)[1]','sysname'),
@TableName = EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname')

DECLARE @SQL NVARCHAR(2000)

SET @SQL = 'ALTER TABLE ' + QUOTENAME(@TableSchema) + '.' +
QUOTENAME(@TableName) + ' REBUILD WITH (DATA_COMPRESSION=PAGE)'

EXEC (@SQL)
END
GO
CREATE TABLE CompressMePlease
(
Col1 INT,
Col2 VARCHAR(50)
)

--Check it out
SELECT OBJECT_NAME(OBJECT_ID) AS [ObjectName], [rows], data_compression_desc, index_id
FROM sys.partitions
WHERE data_compression > 0
ORDER BY ObjectName;

--When you're done....
--DROP DATABASE CompressionTest

(A big thanks to Glenn Berry, from whose blog I finally found out how to check and see if a table’s data is compressed. The “Check it out” section above is “borrowed” from his blog entry, entitled Fun With Data Compression in SQL Server 2008)


Now the example above creates a test database, but only so you don’t muck up your production db while testing. If you ran through the example yourself, you’ll have seen that it worked – which is a good thing for anyone faced with this problem!



A Couple of Caveats

Shortly after I posted the solution above on the forums, Jonathan Kehayias very astutely pointed out that, while the code worked in almost all instances, it caused issues with SELECT INTO. For instance, this statement:

SELECT * INTO AnotherCompressedTable FROM CompressMePlease

Will produce the following error:

Schema changed after the target table was created. Rerun the Select Into query.

The only way to get around this is to….not use SELECT INTO. Now, this should in theory not be a huge issue, because SELECT INTO really shouldn’t be used to create production database tables anyway, as it does not create a complete replica of your existing table. In practice, many people do use SELECT INTO to create a table and fill it with records, so if you’re one of those people, you’ll obviously need to make some changes if you would like to employ this solution. Note however that DDL triggers do not (ever) fire for temp tables or temporary stored procedures, so if you only use SELECT INTO to create #Temporary tables, this issue won’t affect you.

In addition, this won’t compress preexisting tables – you need to do this manually, or using dynamic SQL. For large tables this could potentially take quite some time to complete, so I would be very surprised if the SQL Server team ever introduced an option to compress all preexisting tables with a single command (Can't you just imagine the forum posts - "Help! I told it to compress everything and now I need to stop it!!! Where's the kill switch???" ).

Hopefully this provides some soothing relief for any of you who have a burning need to “compress all new tables by default”.

1 comment:

Custom24 said...

Hi there

Thanks for the tip. Here are two possible improvements;

1. Don't do anything if the table was created with an explicit data compression (otherwise, you override it in the trigger)

2. To avoid Select Into statements causing an error, don't do anything if the trigger sql was a select into statement, rather than a create table statement. I've not yet figured out how to make the resulting table use the desired "default" compression, but at least it doesn't error

Here's the code

CREATE TRIGGER CompressTablesAfterCreation
ON DATABASE
AFTER CREATE_TABLE
AS
BEGIN
DECLARE @TableSchema sysname, @TableName sysname, @theTriggeringSQL nvarchar(max)
SELECT @TableSchema = EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)[1]','sysname'),
@TableName = EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname'),
@theTriggeringSQL = EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]', 'nvarchar(max)')

if (@theTriggeringSQL not like '%with%(%data\_compression%=%' escape '\' and @theTriggeringSQL not like 'select%')
begin
DECLARE @SQL NVARCHAR(2000)

SET @SQL = 'ALTER TABLE ' + QUOTENAME(@TableSchema) + '.' +
QUOTENAME(@TableName) + ' REBUILD WITH (DATA_COMPRESSION=PAGE)'

EXEC (@SQL)
end
END