Rolling your own identities

I must say that I am not a database guy and my SQL skills are limited. But, it is not an acceptable excuse, specially when we are working in a small team and there is no dedicated database guy. We have no choice but to hone in our database skills and get ready to switch between the roles. That is exactly what I had to do when one of the system users reported an issue where database generated the same ID for 2 different records. I delved into to the SP and found what in the database world is called “Rolling your own identities”. It is a technique (hack or shortcut or whatever you want to call it) to generate ID numbers by getting the MAX of a column and adding 1 to it. It is a widely known solution in the community for situations when displaying a primary key column (with its seed and increment values set) on the client interface is not an option. For example if it is a GUID column or it has an out of sequence ID numbers due to frequent delete and insert operations.

So, this is what I saw in the SP, which is actually a way of Rolling your own identities.


BEGIN TRANSACTION 
/* select query */ 
SELECT TOP 1 @ItemNumber = ItemNumber + 1
FROM ItemTable
WHERE {some_condition}
ORDER BY ItemNumber DESC

/* insert query */
INSERT INTO ItemTable (ItemNumber, {column2}, {column3})
VALUES (@ItemNumber, {some_value}, {some_value})
END TRANSACTION

The /*select query*/ above can also be written using MAX a function.


SELECT MAX(ItemNumber) FROM ItemTable
WHERE {some_condition}

What exactly the problem is

If you notice, select and insert queries above are within a transaction which is important in a distributed environment where multiple clients access the database (or at least this SP) at the same time. I always believed that BEGIN TRANSACTION statement is the ultimate savior and creates a critical section internally, thus preventing 2 transactions from accessing the same database resource at the same time, like a lock statement in C#. If that was true, then why it allowed 2 different transactions to execute the select query concurrently thus allowing them to have the same max ItemNumber?

Research ensues

My first clue was the transactions isolation level. It is a keyword which controls the default locking behavior. The default transaction isolation level in SQL Server is Read Committed, which means only data committed by other transaction can be read hence avoids dirty read. But, this is not what I wanted.

What do you want then?

I wanted a way to bar other users from accessing select query until transaction is finished executing select *AND* insert statements both. I think I was wanting to convert my transaction into a truly atomic unit of work in order to make sure every transaction gets a new ItemNumber whenever it runs its select statement. Since Read Committed doesn’t serve the purpose, I decided on reading more about other transaction isolation levels and locking behaviors.

Read Uncommitted:

As explained in many places on the internet, By using this level one can read the data which has been read by other transactions but not yet committed. In this case no shared and exclusive locks will be honored and dirty read will not be prevented. It can also result in Phantom data or nonrepeatable reads. I certainly didn’t want this as I needed more restrictive locking not less. I moved onto Repeatable Read.

Repeatable Read:

This one took a bit long to get into my head. This isolation level will not allow other users to update the data that has been read in the select query. But, why do they call it repeatable? Actually within a SAME transaction we would like to issue the same SELECT statement multiple times.

Transaction 1

 
SELECT ItemNumber, ItemDetail FROM ItemTable
WHERE ItemNumber < 10

Transaction 2


SET ItemDetail = {some_new_value}
WHERE ItemNumber = 5

Transaction 1 continues...

   
SELECT ItemNumber, ItemDetail FROM ItemTable
WHERE ItemNumber < 10

In order to make sure Transaction 2 doesn’t update the records we have selected between multiple reads (means we may repeat our read later in the transaction), SQL Server will maintain a lock on all the rows we have read until the transaction ends. This is certainly more restrictive locking than Read Committed. We get the ownership of the rows we read till the end of the transaction.

Unfortunately, even Repeatable Read can’t help create the kind of critical section I talked about earlier because of one reason. It does allow INSERTS. Yes, no other users/transactions can update the rows  in the transaction, but they can always insert new rows amid the rows we have already locked. The newly inserted rows are called Phantom rows. This is exactly what our problem is that we don’t want to let other users insert until transaction is finished. Hmm.. since I was desperate to find a solution, I moved on and read about the next isolation level.

Serializable:

This isolation level will place a range lock on all the data we have read, means whatever rows come in that range will not be allowed to be updated or deleted, and no insertion will also be possible within that range. This level does what I wanted as it is the most restrictive locking. We can use HOLDLOCK as it has the same effect as using Serializable on all tables in SELECT statement in a transaction.

/* select query */  
SELECT MAX(ItemNumber) FROM ItemTable WITH (HOLDLOCK)

But, why do they say “Serializable is prone to cause deadlock”? Should I be worried about it? I think yes, because now I am eyeing on the most optimized solution and it is fair to be threatened by every warning it gives. 

Serializable and Repeatable Read may cause deadlocks

I can’t explain this better than MSDN:

“The transaction reads data, acquiring a shared (S) lock on the resource (page or row), and then modifies the data, which requires lock conversion to an exclusive (X) lock. If two transactions acquire shared-mode locks on a resource and then attempt to update data concurrently, one transaction attempts the lock conversion to an exclusive (X) lock. The shared-mode-to-exclusive lock conversion must wait because the exclusive lock for one transaction is not compatible with the shared-mode lock of the other transaction; a lock wait occurs. The second transaction attempts to acquire an exclusive (X) lock for its update. Because both transactions are converting to exclusive (X) locks, and they are each waiting for the other transaction to release its shared-mode lock, a deadlock occurs.”

To avoid this potential deadlock problem, update (U) locks are used. Since HOLDLOCK only applies shared range locks, we can always complement this with UPDATE to turn into update range locks.


/* to see what locks HOLDLOCK applies – same can be repeated for UPDATE as well */
BEGIN TRANSACTION
SELECT MAX(ItemNumber) from Orders with(HOLDLOCK)
EXEC sp_lock @@SPID
ROLLBACK

So, that means our final query should look something like this:

 
/* select query */
BEGIN TRANSACTION 
SELECT MAX(ItemNumber) FROM ItemTable WITH (HOLDLOCK, UPDATE)

/* insert query */
INSERT INTO ItemTable (ItemNumber, {column2}, {column3})
VALUES (@ItemNumber, {some_value}, {some_value})
END TRANSACTION

We should now be able to generate our own *unique* identities in a distributed environment without worrying about deadlocks using appropriate isolation level and locking.


HTH,

0 comments:

Post a Comment