There are many articles online describing SQL Server data compression. In this post I will summarize some of the salient points.
What is SQL Server Data Compression
SQL Server data compression has been around since SQL Server 2008. To give perspective, SQL Server 2008 and 2008 R2 reach end of life for extended support this July.
Basically we can compress tables and indexes i.e. the data structures in an RDBMS.
Here is the prosaic KB from Microsoft about SQL Server data compression.
Types of Compression
There are 2 different kinds of SQL Server data compression:
- Row Level Compression
- Page Level Compression
Row Level Compression
Row level compression is the more primitive type of compression. It is superceded by page level compression. This means we cannot have page level compression without already having row compression.
It looks to compress string data types e.g. varchar, char, nvarchar, nchar, text, etc. It can ignore zeros and null marks as well.
Page Level Compression
Page level compression builds off of row level compression and adds additional compression. This will save us more space than row level compression. Further explanation of the compression can be found here.
A basic tenant of performance tweaking is to establish a baseline and evaluate changes over time. It is no different here – we want to see how much compression saved us.
Do not attempt to compress system tables!
There are some system views and DMVs that can help:
- sys.master_files – this shows us each file in the database instance along with some details.
- sys.database_files – a database level view of the database files on disk.
- sys.dm_os_volume_stats – shows us basic info about the OS volume where the database files are stored.
Additionally you might want to consider testing some queries with the IO statistics turned on – giving you a measurement you can judge progress from.
SET STATISTICS IO ON SELECT * FROM Purchasing.PurchaseOrderDetail where OrderQty > 500; SET STATISTICS IO OFF
How Much Space Can We Save?
Fortunately this is simple to see. Microsoft includes a system procedure called sp_estimate_data_compression_savings. It returns the size of the object we ask about and estimates how much space can be saved by the type of compresssion we are considering to apply.
We can call this like:
exec sp_estimate_data_compression_savings @schema, @object, @index, @partition, @typeofcompression;
Some examples of calling it:
Pay attention to the column “size_with_requested_compression_setting_(KB)”. This is what we are looking for.
There is always a bottleneck to any process. Some resource necessary for the computation will lag behind another. Typically the hierarchy looks like this:
CPU is orders of magnitude faster than disk!
Like encryption, the brunt of the work moves from the disk to the CPU. Since CPU is orders of magnitude faster than disk IOPS we realize a performance gain.
Enabling SQL Server Data Compression
Although this can be done in the GUI (SSMS) I prefer to use T-SQL. It allows for more granular control and the ability to automate later.
Here is an example of how to enable data compression on an object:
ALTER TABLE [Purchasing].[PurchaseOrderDetail] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);
Similar logic can be used for compressing an index.
Using iteration (while loops or cursor) we can iterate through database objects and compress / decompress in large numbers at a time.
Which Tables and Indexes are Compressed?
We can see what tables and indexes are compressed or not by using simple system catalog views:
SELECT DISTINCT s.[name] ,t.[name] ,i.[name] ,i.[type] ,i.index_id ,p.partition_number ,p.[rows] FROM sys.tables t LEFT OUTER JOIN sys.indexes i ON t.object_id = i.object_id INNER JOIN sys.schemas s ON t.schema_id = s.schema_id LEFT OUTER JOIN sys.partitions p ON i.index_id = p.index_id AND t.object_id = p.object_id WHERE t.type = 'U' AND p.data_compression_desc = 'NONE' ORDER BY p.rows desc
Disabling SQL Server Data Compression
Like the above example we follow similar logic:
ALTER TABLE [Purchasing].[PurchaseOrderDetail] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = NONE);
Here is the Microsoft KB explaining disabling compression.
By using some looping or batches with dynamic SQL we can do something like this snippet:
SELECT cs_obj_name = CHECKSUM(d.OBJ_NAME), * INTO #comp_tbls FROM( SELECT DISTINCT OBJ_NAME = OBJECT_NAME(i.[object_id]), i.[object_id], CASE WHEN i.[type]=0 /* Heap */ THEN ' ALTER TABLE '+ QUOTENAME(OBJECT_NAME(i.[object_id])) +' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = NONE);' /* Index */ ELSE ' ALTER INDEX '+ QUOTENAME(i.name)+' ON '+QUOTENAME(OBJECT_NAME(i.[object_id]))+' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = NONE);' END + CHAR(13) AS cmd FROM sys.partitions prt WITH(NOLOCK) INNER JOIN sys.indexes i WITH(NOLOCK) ON i.[object_id] = prt.[object_id] and prt.index_id = i.index_id --can also join a table with what you want to compress/decompress WHERE prt.[data_compression] <> 0
Ok now we have compressed objects and see performance gains. That’s great! But what about the disk space that was freed up? Can we reclaim it?
Yes – shrinking the database will deallocate the disk space to the object and return it to the OS.
Shrinking databases should be used with caution! This is a legitimate example of when to use it.
What We Didn’t Cover
There is a lot to cover for SQL Server data compression. I did not attempt to cover it all but rather provide a crash course.
Some of the things we omitted:
- Columnstore compression
- COMPRESS function in SQL Server 2017
- Backup compression
- Replication and compression
If you liked this post then you might also like: SQL Server on VMware Best Practices – How to Optimize the Architecture