Thursday 22 May 2014

Minimising deadlocks with READ_COMMITTED_SNAPSHOT

The problem

We have a system that has a number of separate Windows services that all access a shared database hosted by SQL Server 2008 R2. The system is used to provide online hydraulic modelling in support of a water utility company’s maintenance activities. The services are built around SynerGEE Water, a hydraulic modelling product from DNVGL.

One of the Windows services is responsible for monitoring a model library – SynerGEE models are stored as MDB files on the file system – and when a new or updated model is detected it adds a MODEL record to the database. It also indexes all of the pipes in the model and adds them to a MODEL_PIPE table in the database.

A second service checks the database for new MODEL records and then invokes SynerGEE to perform some hydraulic analysis. The results of this analysis are used to update the MODEL_PIPE records.

We observed that if a number of models had been updated in one go the result was that sometimes a database deadlock occurred when the second service was querying the MODEL_PIPE table. This was because the first service was in the process of adding other MODEL_PIPE records for other models at the same time.

We are using NHibernate for all data access and all database queries or updates are wrapped in transactions with the assistance of the Spring.Net transaction template. NHibernate Profiler was used to confirm that all the transactions were correctly formed and we could see that the transactions were using the READ COMMITTED isolation level.

The solution

Firstly, I did some research around Minimizing Deadlocks and noted that using a row-based isolation level can help. In particular, activating READ_COMMITTED_SNAPSHOT on the database can help by allowing SQL Server to use row versioning rather than shared locks.

“When the READ_COMMITTED_SNAPSHOT database option is set ON, a transaction running under read committed isolation level uses row versioning rather than shared locks during read operations.” [1]

Further research around snapshot isolation levels provided further insight:

“The READ_COMMITTED_SNAPSHOT database option determines the behavior of the default READ COMMITTED isolation level when snapshot isolation is enabled in a database. If you do not explicitly specify READ_COMMITTED_SNAPSHOT ON, READ COMMITTED is applied to all implicit transactions. This produces the same behavior as setting READ_COMMITTED_SNAPSHOT OFF (the default). When READ_COMMITTED_SNAPSHOT OFF is in effect, the Database Engine uses shared locks to enforce the default isolation level. If you set the READ_COMMITTED_SNAPSHOT database option to ON, the database engine uses row versioning and snapshot isolation as the default, instead of using locks to protect the data.” [2]

Bingo! If the READ_COMMITTED_SNAPSHOT database option is set to ON row versioning is used instead of locks.

“Once snapshot isolation is enabled, updated row versions for each transaction are maintained in tempdb. A unique transaction sequence number identifies each transaction, and these unique numbers are recorded for each row version. The transaction works with the most recent row versions having a sequence number before the sequence number of the transaction. Newer row versions created after the transaction has begun are ignored by the transaction…

…Snapshot isolation uses an optimistic concurrency model. If a snapshot transaction attempts to commit modifications to data that has changed since the transaction began, the transaction will roll back and an error will be raised. ” [2]

In our case this looked very interesting because the first Windows service adds new MODEL and MODEL_PIPE records in a transaction and is done with them. The second Windows service then reads the new MODEL and MODEL_PIPE records and updates them in a separate transaction. The chances of an optimistic concurrency issue are minimal. Although the two services are accessing the same table they are not accessing the same rows. Therefore, using row version-based locks would allow the two services to work better together.

So, I enabled READ_COMMITTED_SNAPSHOT on the database [3] and found that the deadlocks no longer occurred.

ALTER DATABASE <dbname> SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE <dbname> SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE <dbname> SET MULTI_USER;

Result!

References

[1] Minimizing Deadlocks, Technet.

[2] Snapshot Isolation in SQL Server, MSDN.

[3] Enabling Row Versioning-Based Isolation Levels, Technet.