Challenges of Large Scale DML using T-SQL

Using T-SQL to insert, update, or delete large amounts of data from a table will results in some unexpected difficulties if you’ve never taken it to task.

Let’s say you have a table in which you want to delete millions of records. If the goal was to remove all then we could simply use TRUNCATE. However, if we want to remove records which meet some certain criteria then executing something like this will cause more trouble that it is worth.

Why not run the following in a production environment?

DELETE 
FROM myTable
WHERE columnA = 'whatever';

DELETE

FROM myTable

WHERE columnA = 'whatever';

Transaction log growth – expect a large transaction like this to cause the transaction log to expand. Calls to expand the database files are some of the most expensive operations SQL Server makes to the OS. Hopefully the initial sizing was good and the growth settings are reasonable.
Blocking – concurrency is going to increase the likelihood of lock escalation. That will lead to the possibility of excessive blocking.
Rollbacks – if the SPID is killed then the transaction will enter a rollback state. For very large transactions this can take a lot of time. Don’t even think about killing or stopping SQL Services during a rollback unless you want a corrupt database!

What can we do to improve this?

We follow an important maxim of computer science – break large problems down into smaller problems and solve them.

We break up the transaction into smaller batches to get the job done. Each of the above points can be relived in this manner. Let’s take a look at some examples.