Krishnan Subramanian

Re: ECO : Optimistic Locking

First off, this post is in reply to a blog entry by Malcolm Groves at: http://blogs.codegear.com/malcolmgroves/archive/2004/06/28/659.aspx on Optimistic Locking done by ECO.

I first thought of replying on Malcolm’s blog site but then thought my entry might deserve a place of its own (presumptuous, but not vain ;) I also come from the Java, J2EE world - and I have a deep respect for a lot of technologies in Delphi and Delphi.NET. Some of views on the topic might be bit off since I might not know how things work exactly (only plain ignorance ;). I have to admit that most of these issues have already been thrashed around in the J2EE world and Borland Enterprise Server provides a piece of functionality that is almost identical by name - but differing in its implementation. You will find a link to this functionality at the end of this post.

Malcolm’s entry stated how Optimistic Locking behaves when set to ModifiedMembers and when set to AllMembers. What caught my attention was that:

  • Two SQLs are sent to the database for every update to the database.
  • The isolation level for the above two SQLs themselves are seemingly undefined.

Firstly, it seems like sending two SQLs to the database for every update to the database is not a very scalable solution. One, it introduces more RPCs into the picture if the ECO Server-side application and database are on different machines. Two, it imposes a higher load on the database than necessary. That is, there are now more CPU cycles utilized at the database-tier. This could very quickly lead to the database-tier becoming a performance bottleneck which will then ripple up tiers to manifest itself as a scalability problem. This whole issue of two SQLs seems trivial till you actually consider high-concurrency OLTP-like applications where you want squeeze every little bit of performance out of your tiers since you are typically dealing with large amounts of data.

Secondly, the last point assumes that the two SQLs will run in proper isolation. What I mean by this - is not whether the two SQLs will run in the same transaction (which they must - else ECO is doing something horribly wrong); but what the isolation level will be of that transaction. Off the top of my head - I would say you atleast need a REPEATABLE READ isolation level (most database default to READ COMMITTED in which case you still have a problem). Imagine, under a high concurrency, T1 and T2 are two transactions:

    T1 performs a SELECT (prior to its UPDATE)
    T2 performs a SELECT (prior to its UPDATE)
    T1 assumes it can update the database and issues an UPDATE that succeeds
    T2 assumes it can also update the database and issues an UPDATE that also succeeds

I believe the above will occur if you select (pun intended) the default READ COMMITTED isolation level.

So, what is the solution you may well ask? Or more importantly is there a solution?

Yes.

The idea is to kill two birds with one stone. Or in this, to just use one SQL in lieu of two. And what kind of an SQL would that be? Let’s see. In ECO, after the first SQL is executed, it checks to see if the value returned by the database is different from those prior to edits/changes. This implies that ECO is holding this old value somewhere - whether it be an OptimisticLocking mode of ModifiedMembers or AllMembers. So, the smart thing for ECO to do would be to use that information in its WHERE clause. Let me give an example in the context of Malcolm’s example: Damien and I are working with a Person whose first name happens to be ‘Barney’.

Damien makes a change to this Person’s first name and calls him ‘Loony’ and while I make a change and decide to call him ‘Goony’. Assuming the problem in point two (above) does not exist, ECO will issue a SELECT when Damien’s transaction has to succeed. It will find that the database contains ‘Barney’ which is what Damien started out with in the first place. His transaction will succeed.

My transaction will then issue a SELECT, find that the person’s first name is now ‘Loony’ and not ‘Barney’ - which is not what I received in the first place (I received ‘Barney’) and so my transaction will fail. All well - assuming of course that problem in point 2 above does not exist.

The other way of doing things - maybe a bit more efficient - is to let ECO do a bit more work. Since ECO ‘knows’ its old values - that is, the values it receives when it issued the request to get the data to the user, it cannot be that hard for ECO to ‘diff’ these values with what the user had edited/changed. So, ECO should know what columns have actually changed. In this context of this example, it should know that the only change is one to the firstName column. And it also ‘knows‘ what the old value is (Barney) and new value (either Loony | Goony) is. And it could use this information to construct the WHERE clause.

    UPDATE Person
        SET firstName = ‘Loony’
        WHERE Person.BOLD_ID = 3 AND Person.firstName = ‘Barney’

And the other transaction would issue:

    UPDATE Person
        SET firstName = ‘Goony’
        WHERE Person.BOLD_ID = 3 AND Person.firstName = ‘Barney’

In this case, only one of the above two transactions will succeed since the WHERE will fail if it cannot find the given row. And in the case of an Optimistic Concurrency of AllFields, the WHERE clause would contain the values of all columns that the user first read in. So something along the lines of:

    UPDATE Person
        SET firstName = ‘Goony’
        WHERE Person.BOLD_ID = 3 AND Person.firstName = ‘Barney’ AND Person.lastName = ‘Slater’ AND Person.someField = ‘oldValue’ …

So, this approach:

  • Uses only one SQL
  • Is the ‘correct’ way. No ‘holes’ to be punched through it.

And also does not suffer from any potential scalability problem (side-effect ;)

I’d be interested in hearing other’s opinions on this . . .

-krish

Borland Enterprise Server uses the same approach: http://info.borland.com/devsupport/bes/faq/all_versions/ejb/cmp.html#_how_optimisitic

Posted by Krishnan Subramanian on July 16th, 2004 under Borland Enterprise Server, General, J2EE |



9 Responses to “Re: ECO : Optimistic Locking”

  1. by the way Says:

    Ado.Net Typed DataAdapter’s optimistic locking use one sql like what you said.

  2. Mathias Burbach Says:

    What about BLOB fields in the database? You can’t use them in your WHERE clause. That is exactly why I use UpdNo in my traditional Client/Server apps, which is kind of a version number on the record incremented by a trigger or the middle-tier app.

    I have had a look at ECO a few months ago and decided that its too early to use it in a real world example. But I will carefully watch its steps out of childhood and use it when it becomes a more mature product.

    Salut,

    Mathias

  3. Krishnan Subramanian Says:

    Mathias,

    You are right in that BLOB fields cannot be used in your WHERE clause. That said, it is possible for your server-side application to check to see if BLOBs actually require updating. It is a matter of comparing byte arrays - and the process of comparison itself can be very fast. So, that would eliminate your server-tier from updating the BLOB field if it has not actually changed.

    But it would never be a good idea for your server-tier to use BLOB fields in its WHERE clause in the first place. Those fields should be ignored. And I use the word ‘ignore’ here with caution. Other solutions - as the one you propose - include the use of a version number column.

    The safest option - if you require absolute bulletproof safety is to use pessimistic concurrency. That would mean using the SERIALIZABLE isolation level to ensure that one transaction does not step on the toes of another transaction. However this isolation level comes with its caveats: it is likely to kill performance in a high-concurrency scenario and is not a very scalable solution. But it may be the only ‘correct’ solution :)

    -krish

  4. Peter Gummer Says:

    Krishnan,

    In the approach you describe, the failing transaction does not change any rows in the table (which is good), but SQL UPDATE statements do not throw an exception if zero rows are affected. The user therefore receives no notification that the update failed.

  5. Krishnan Subramanian Says:

    Peter,

    I do not know how things work in this regard in the Delphi/.NET world - so I am a bit handicapped in that respect, but in the Java world, every INSERT/UPDATE/DELETE statement you issue via the Java Database Connectivity(JDBC) API returns an "int" - the row count for the numbers of rows affected.

    The Borland Enterprise Server - our J2EE Application Server - automatically rolls back the transaction if this UPDATE fails (that is, the row count is 0) and propagates the exception (javax.transaction.RollbackException) back to to the caller. It is then upto the caller to decide what to do with this exception. Depending on the nature of the application domain, the application might choose to display an exception to the user or retry the failed transaction etc.

    If this feature is provided by ECO or an equivalent persistence infrastructure on the server side I do not see why the user (or caller) will not receive a notification if the UPDATE failed.

    -krish

  6. Haibo Yu Says:

    in the Delphi/.NET world ,DBExpress also return the row count

  7. Mike Ver Hagen Says:

    My only observation on the issue of concurrency is this:

    The second transaction will needlessly fail if the only fields that were changed by each user were also changed to the same ending values.

    In that scenario, a round-trip to the server to get the current values would permit the second transaction to succeed if the only DIFFERENT changes were to fields that only one concurrent user changed.

  8. Peter Morris Says:

    In your scenario, on Interbase at least, I thought that the second update would fail because the version number of the row has changed.

    I don’t know how SQL Server etc handle this same situation, but it looks like the ECO TimeStamp of the row implements a record-version approach like IB’s.

    Also, ECO (and Bold also) do not send any columns which have not altered, so you could in fact be sending less work over the network using the ECO approach simply because you only changed a single attribute, rather than sending every value of the object you would be sending only one.

    Pete

  9. Vijay Rao Says:

    Krish, you say

    "This whole issue of two SQLs seems trivial till you actually consider high-concurrency OLTP-like applications"

    *IF* there are 2 SQLs : It is *never* a trivial issue. This would be a transactional-design-issue for any kind of application. Even if your app had 100 transactions per day or per week. Relaxing integrity and consistency rules is only going to give you grief.



Server Response from: blog1.codegear.com