Sometimes you just have to break the rules in order to learn something new. It is inherent in everyone, from the first time you ignored your parents about getting burned by the hot stove, to doing 10+ over the speed limit because "you can." Even though I mentioned in this post that you should never go mucking with the fields of a critical section structure, I just could not keep my grimy little hands off :-). One reason is that in Windows Vista, Microsoft introduced new condition variable APIs. My previous posts were about creating a full monitor object which is the combination of a mutual-exclusion lock (critical section) and a condition variable (wait/notify/notifyall). However there are cases where you would want to use more than one condition variable with the same lock. That is what these new Windows APIs allow you to do. After reading as much about the underlying implementation of these new bits of synchronization goodness, I found that they use a relatively new internal set of undocumented APIs for "Keyed Events." You can read this post by Joe Duffy for a high-level explanation. What if I could "backfill" these new APIs for older version of Windows within the DPL? I would need to break some rules in order to do that. I’d also have to forego any use of keyed events since those are only Windows XP and up (and not documented either). Windows 2000 would be out of luck. Let’s get started with this hack-fest. For now let’s ignore the "slim" reader/writer locks.
There are four key APIs to duplicate and one structure.
type PRTLConditionVariable = ^TRTLConditionVariable; _RTL_CONDITION_VARIABLE = record Ptr: Pointer; end; TRTLConditionVariable = _RTL_CONDITION_VARIABLE; procedure InitializeConditionVariable(out ConditionVariable: TRTLConditionVariable); procedure WakeConditionVariable(var ConditionVariable: TRTLConditionVariable); procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable); function SleepConditionVariableCS(var ConditionVariable: TRTLConditionVariable; var CriticalSection: TRTLCriticalSection; dwMilliseconds: DWORD): BOOL;
The condition variable structure has one field which is not much space to store state information. My first thought was to simply allocate a structure that would be pointed to by the single field of the condition variable. The problem is there is no Delete/Destroy/FreeConditionVariable API. How would this structure be freed? Rats! That’s a bust. This is where it is time to get all geeky and put on the propeller cap. If you recall from my discussion of the TMonitor, the Wait method would insert a little structure into a linked list that represented all the current waiters. We need to do something similar. There is a little twist here as well because the WakeCondtionVariable and the WakeAllConditionVariable are not required to be called from within the lock. In fact, all they take is the TRTLConditionVariable structure as the lone parameter. This makes managing that internal list of waiters a lot more tricky since we’re not guaranteed to be in a lock while we need to update that list. This is compounded by the fact that the condition variable structure has only one pointer sized field. My first thought was to use a lock-free queue. The problem here is that we need two pointers, one to the head of the queue and one to the tail of the queue. I only have one pointer. Before we get into SleepConditionVariableCS where we’re going to do some unholy things to the poor critical section structure, let’s get the wait queue out of the way.
To set the stage we need a little review of field alignment. Alignment of fields of a structure, the local variables on the stack and the data on the heap is very important to making sure reads and writes are fast and atomic. Packing a structure that knocks the fields out of their "natural" alignment can sometimes have unwanted side-effects. Yes, there are times where you will get a few "dead bytes" between fields. A double word sized entity on the 32bit x86 is 4 bytes in size. This means that the "natural" alignment of this entity is on even 4 byte boundaries. As long as the entity falls on those boundaries, you know that the lowest 2 bits of any pointer to that entity will always be 0. We can exploit that fact with a bit of (OMG! is he going to do what I think he’s going to do!!???) propeller-head bit twiddling. Now before you scroll down right now and hit the comment button and fire up the flame thrower, please bear with me.
Just like the Wait method of the TMonitor class, the SleepConditionVariableCS will simply place a structure allocated on the stack into a queue right before releasing the crit-sec, and block on an event. Because interacting with the queue does not use any inherent locking we need a thread-safe queue. We also only have that one field. How do we "lock" the list while it is being updated? The key is using one of those extra bits that we know will always be 0. If one of those bits is not 0, the list is locked. Here is the LockQueue and UnlockQueue functions:
type PWaitingThread = ^TWaitingThread; TWaitingThread = record Next: PWaitingThread; WaitEvent: THandle; end; //UPDATE: Slight modification based on comment function LockQueue(var ConditionVariable: TRTLConditionVariable): PWaitingThread; var ReleaseSlice: Boolean; begin ReleaseSlice := GetProcessorCount < 2; // This local could be eliminated by using a global while True do begin Result := PWaitingThread(Integer(ConditionVariable.Ptr) and not 1); if TInterlocked.CompareExchange(Integer(ConditionVariable.Ptr), Integer(Result) or 1, Integer(Result)) = Integer(Result) then Break; if ReleaseSlice then SwitchToThread else asm PAUSE end; end; end; procedure UnlockQueue(var ConditionVariable: TRTLConditionVariable; WaitQueue: PWaitingThread); begin ConditionVariable.Ptr := Pointer(Integer(WaitQueue) and not 1); end;
It should be immediately obvious that the LockQueue function will never return until it can successfully lock the queue. It should also be noted that it will also spin-lock indefinitely, which can burn lots of cycles. Under a single proc system, doing a busy-wait is not cool so we should just release this thread’s timeslice and let the scheduler get another thread to run. (Sleep(0); is not appropriate since that only allows threads with equal or greater priority to run, SwitchToThread is far more fair and much more like a traditional "yield"). Now that we have the tail of the linked list returned from LockQueue, we can manipulate the queue all we want. We just cannot "touch" the ConditionVariable.Ptr field and can only operate with the value returned from LockQueue. Once we’re done we unlock the Queue by passing the potentially updated tail pointer to the UnlockQueue which will simply update the Ptr field of the ConditionVariable. About the performance implications; In theory, the queue should never grow very (one entry per waiting thread per condition variable) deep so the queue operations should be fast enough to not adversely affect performance (although I know it will a little bit). A final note about this technique is that it does not allow any type of recursion, so you much make sure there are no calls that could call LockQueue on the same condition variable while the lock is held.
If you haven’t lost your lunch or broken out in hives by now, let’s move on to the SleepConditionVariableCS function. It is commented similar to that of the TMonitor.Wait function.
function SleepCriticalSectionCS(var ConditionVariable: TRTLConditionVariable; var CriticalSection: TRTLCriticalSection; dwMilliseconds: DWORD): BOOL; var WaitingThread: TWaitingThread; RecursionCount: Integer; begin if CriticalSection.OwningThread = GetCurrentThreadId then begin WaitingThread.Next := nil; WaitingThread.Thread := CriticalSection.OwningThread; WaitingThread.WaitEvent := CreateEvent(nil, False, False, nil); try // Save the current recursion count RecursionCount := CriticalSection.RecursionCount; // Add the current thread to the waiting queue QueueWaiter(ConditionVariable, WaitingThread); // Set it back to almost released CriticalSection.RecursionCount := 1; TInterlocked.Add(CriticalSection.LockCount, -(RecursionCount - 1)); // Release and get in line for someone to do a Wake or WakeAll LeaveCriticalSection(CriticalSection); // This is, admitedly, a potential race condition... what do you think? Result := WaitForSingleObject(WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0; // Got to get the lock back and block waiting for it. EnterCriticalSection(CriticalSection); // Remove any dangling waiters from the list RemoveWaiter(ConditionVariable, WaitingThread); // Lets restore all the recursion and lock counts TInterlocked.Add(CriticalSection.LockCount, RecursionCount - 1); CriticalSection.RecursionCount := RecursionCount; finally CloseHandle(WaitingThread.WaitEvent); end; end else Result := False; // should call SetLastError with the appropriate error code. end;
Upon entry, we make sure that the calling thread owns the lock and if not return false (and as the comment suggests we should call SetLastError with the appropriate error code). Once we determine that the calling thread owns the lock, we need to start the process of tearing down the critical section and blocking on the wait event. The main difference between TMonitor.Wait and this function is that we still want to make sure that any threads waiting for the critical section are properly woken up, so we want to first tear down the critical section to within 1 count of being released, then call LeaveCriticalSection() which will do the final deed and unblock an waiters. We then immediately block on the wait event. Once we return from the wait either by being signaled or timing out, we have to re-acquire the critical section. This will only be a recursion level of 1, so we now need to restore any recursions to it. The QueueWaiter and RemoveWaiter functions use the LockQueue and UnlockQueue functions above to lock and manage the condition variable’s waiting thread queue. Like the TMonitor.Pulse and TMonitor.PulseAll functions, the Wake and WakeAll functions are very simple.
procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable); var WaitingThread: PWaitingThread; begin WaitingThread := DequeueWaiter(ConditionVariable); if WaitingThread <> nil then SetEvent(WaitingThread.WaitEvent); end; procedure WakeAllConditionVariable(var ConditionVariable: TRTLConditionVariable); var WaitQueue, WaitingThread: PWaitingThread; begin WaitQueue := LockQueue(ConditionVariable); try WaitingThread := DequeueWaiterNoLock(WaitQueue); while WaitingThread <> nil do begin SetEvent(WaitingThread.WaitEvent); WaitingThread := DequeueWaiterNoLock(WaitQueue); end; finally UnlockQueue(ConditionVariable, WaitQueue); end; end;
For the WakeAllConditionVariable function, we want to atomically wake up all current waiters and don’t want any intervening threads to be injected into the queue. So we have a special no lock version of DequeueWaiter which only takes the tail pointer passed by reference ("var" parameter). It will update the tail pointer appropriately, so when UnlockQueue is called, it should be nil. In fact DequeueWaiter simply locks the queue and then calls DequeueWaiterNoLock then unlocks the queue.
I’ll totally understand if you have the sudden urge to take a shower, or to stick red-hot needles into your eyeballs… but sometimes it is just a little fun to break the rules (evil grin :-). I will not be surprised if this shows up on Coding Horror or The Daily WTF for public ridicule :-) As for an alternative implementation, all I want is an efficient user-level implementation of keyed events. That would eliminate the queue and make this function work more like the Vista version.Posted by Allen Bauer on January 22nd, 2008 under CodeGear, Delphi, Parallel Programming |