Getting IUnknown from __ComObject

I’m working in an environment with a lot of mixed managed (F#) and unmanaged (C++ COM) code. One of the big problems with this is the mix of lifetime management techniques; .NET uses garbage collection while COM relies on reference counting. Furthermore .NET garbage collection is somewhat non-deterministic, which adds further complexity.

So quite often in our mixed code-base, we find that the .NET garbage collection process doesn’t kick in when we need it to. For instance, when we’ve allocated a lot of memory in the COM code that .NET isn’t aware of. Memory exhaustion has to get pretty bad for the GC to occur at any other time than during a .NET allocation, either the system-wide low-memory event has to be signalled or an OutOfMemoryException needs to be thrown. In both of these cases it’s probably too late to do anything about it.

In this case it’s extremely useful to be able to see what .NET objects are still alive, and what COM objects they’re hanging on to. Unfortunately this isn’t as easy as it might seem.

The COM object itself hides within a weakly-typed System.__ComObject or a strongly-typed managed wrapper, depending on whether rich type information is available. Furthermore, a runtime controller RCW (runtime callable wrapper) is what actually holds a pointer to the object itself, and this structure is internal to mscorwks.dll.

So how can we untangle this and, on finding a __ComObject that happens to still be alive (i.e. is not reachable in the object graph and is therefore eligible for garbage collection) identify which COM object it’s hanging on to.

First of all, let’s see how many __ComObjects are still alive. In this case, it’s only one (phew!):

0:005> !DumpHeap -type __ComObject
 Address       MT     Size
01453b74 79306e60       16
total 1 objects
      MT    Count    TotalSize Class Name
79306e60        1           16 System.__ComObject
Total 1 objects

And you remember the layout of .NET objects in memory, don’t you? Of course you do! The 4 bytes prior to the address displayed (01453b74) are the “object header”. The exact content of the header is apparently undocumented. Let’s see what it contains (at least on a 32-bit platform, your mileage may vary):

0:005> dd 01453b74-4 L1
01453b70  08000002

According to various sources the object header contains 2 fields; a handle and a sync block index. If the object is an RCW, the handle is always 0x08000. You can use the index with SOS’s !syncblk command to de-reference it:

0:005> !syncblk 2
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
    2 001e0fec            0         0 00000000     none    01453b74 System.__ComObject
Total           3
CCW             0
RCW             1
ComClassFactory 0
Free            0

The sync block itself is an undocumented structure, but after a bit of investigation, it turns out that at offset 0x1c there is a pointer to a further structure that contains the “interop information”:

0:005> dd 001e0fec+1c L1
001e1008  001e9510

And from this, we can obtain a pointer to the RCW itself. We’re almost there!

0:005> dd 001e9510+c L1
001e951c  001e5380

The RCW is a pretty large structure, but for our purposes there are only a couple of interesting fields: the IUnknown pointer at 0x64, and the object’s virtual function table pointer at 0x88. If you use dds you can easily see any symbols associated with these pointers:

0:005> dds 01e5380+64 L1
001e53e4  00ef6c24
0:005> dds 01e5380+88 L1
001e5408  00eb9710 rcwrepro!ATL::CComObject::`vftable'

This is the salient information; we now know exactly what type of COM object we’re dealing with. This is obviously a bit fragile, given that it relies on structures from mscorwks that may well change in newer versions of the runtime (I’ll check on .NET 4 when I get a chance). It’s also a bit of a pain to go through all these steps manually in WinDbg, so I put together a simple extension DLL to do it automatically given the address of the __ComObject. I’ll upload that and blog about it soon.