<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>voyce &#187; IL</title>
	<atom:link href="http://www.voyce.com/index.php/tag/il/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.voyce.com</link>
	<description>Programming and debugging tidbits</description>
	<lastBuildDate>Sat, 03 Jul 2010 12:40:51 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>.NET 4.0 Type Equivalence causes BadImageFormatException</title>
		<link>http://www.voyce.com/index.php/2010/04/23/net-4-0-type-equivalence-causes-badimageformatexception/</link>
		<comments>http://www.voyce.com/index.php/2010/04/23/net-4-0-type-equivalence-causes-badimageformatexception/#comments</comments>
		<pubDate>Fri, 23 Apr 2010 10:53:33 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[COM]]></category>
		<category><![CDATA[Debugging]]></category>
		<category><![CDATA[WinDbg]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[.NET4]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[IL]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=840</guid>
		<description><![CDATA[Interop assemblies containing certain constructs will cause a BadImageFormatException in .NET 4.0]]></description>
			<content:encoded><![CDATA[<p>I recently discovered a nasty backward compatibility problem with the new type equivalence feature in .NET 4.0. Luckily it&#8217;s relatively difficult to hit it if you&#8217;re in a pure-C# environment, but if you happen to generate any assemblies directly using IL, you should watch out. Read on for all the gory details.<br />
<span id="more-840"></span></p>
<h2>What is .NET type equivalence?</h2>
<p>Described at a high level <a href="http://msdn.microsoft.com/en-us/library/dd997297.aspx">here</a>, .NET 4.0 type equivalence essentially gives you a way of indicating that different .NET types represent the same underlying COM type and is most commonly used in COM interop scenarios. One of the reasons for its introduction is to save developers from having to ship large interop DLLs with their software, e.g. the multi-megabyte Microsoft.Office.Interop. Instead the compiler can inline the definition of any types used, and mark them appropriately as representing the original COM types. </p>
<h2>The error</h2>
<p>We noticed that whenever we built and ran an application that referenced a DLL using .NET 2.0, it worked. Doing the same thing with .NET 4.0 caused a <a href="http://msdn.microsoft.com/en-us/library/system.badimageformatexception.aspx">BadImageFormatException</a>.<br />
<code><br />
Unhandled Exception: System.BadImageFormatException: Could not load file or assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. An attempt was made to load a program with an incorrect format.<br />
   at X.Main()<br />
</code> </p>
<h2>Let&#8217;s dig!</h2>
<p>So, the BadImageFormatException doesn&#8217;t actually tell us much. Let&#8217;s break out WinDbg and see what we can find. Running the faulting app we can see several C++ exceptions before the CLR exception is thrown:<br />
<code><br />
(178c.790): C++ EH exception - code e06d7363 (first chance)<br />
...<br />
(178c.790): C++ EH exception - code e06d7363 (first chance)<br />
(178c.790): CLR exception - code e0434352 (first chance)<br />
</code><br />
I changed the exception handling settings to stop on C++ exceptions (<code>sxe eh</code>) then ran again to see where things were going wrong. It stopped here:<br />
<code><br />
0:000> kp<br />
ChildEBP RetAddr<br />
0012d15c 79084c0f KERNEL32!RaiseException+0x53<br />
0012d194 793371be MSVCR100_CLR0400!_CxxThrowException+0x48<br />
0012d5e4 79455cae clr!EEFileLoadException::Throw+0x1a8<br />
0012d634 794558d2 clr!CompareTypeTokens+0x200<br />
0012d6b0 791b5c00 clr!IsTypeDefEquivalent+0x102<br />
0012d6d4 791b2ca8 <b>clr!MethodTableBuilder::CheckForTypeEquivalence</b>+0x94<br />
0012d7ac 791b27c9 clr!MethodTableBuilder::BuildMethodTableThrowing+0x60d<br />
0012d9a4 791a4578 clr!ClassLoader::CreateTypeHandleForTypeDefThrowing+0x88e<br />
</code><br />
Interesting. Notice how the call stack contains some .NET 4.0 specific methods relating to the new type equivalence feature. We&#8217;re hitting a new code path, which is consistent with the fact that running against a down-level CLR works.</p>
<p>After a bit more toing-and-froing, I discovered that the C++ exception is thrown when <code>clr!MDInternalRO::IsValidToken</code> returns an error. By disassembling the function we can see it&#8217;s just looking at various bits in the token value, and it decides that the value passed (0&#215;02000000) isn&#8217;t valid. Looking at the output from ildasm that token doesn&#8217;t appear anywhere. And if we add a dump of the value, we can see that it indeed doesn&#8217;t look like the other tokens: </p>
<pre>
0:000> bu clr!MDInternalRO::IsValidToken "dd esp+8 L1; g"
...
0012f5a8  02000001
0012f31c  06000001
0012f2c0  02000002
0012f0f4  02000002
0012ebe4  01000001
0012e944  23000001
...
0012d5f4  02000000
(18ec.1ec8): C++ EH exception - code e06d7363 (first chance)
</pre>
<h2>What&#8217;s the culprit?</h2>
<p>So it looks pretty conclusive; the DLL contains something that the CLR isn&#8217;t expecting. But what? It&#8217;s time to break out the oldest tool in the troubleshooting box: the binary chop!</p>
<p>Eventually I got the referenced DLL down to only a single simple construct. Can you guess what it is? A global literal value. A <em>real</em> global value, one that isn&#8217;t even part of a type. Crazy huh? In IL it looks like this:<br />
<code><br />
.field public static literal valuetype Test.MyEnum LiteralValue = int32(0x00000001)<br />
</code><br />
It&#8217;s a literal value of an enumerated type. That&#8217;s important: using a value of a simple type (say int32) does not provoke the error.</p>
<p>Now, I wasn&#8217;t even sure that this is a valid IL construct, but according to the ECMA IL spec, specifically <a href="http://jilc.sourceforge.net/ecma_p2_cil.shtml#_Toc524940530">partition II, section 15</a>, it is:</p>
<blockquote><p>The CLI also supports global fields, which are fields declared outside of any type definition. Global fields shall be static.</p></blockquote>
<p>So it looks like we&#8217;re not doing anything illegal, backed up by the fact that the .NET 2.0 CLR can make use of it without a problem.</p>
<p>Interestingly, there&#8217;s another aspect that influences whether this code path is hit. As mentioned above, type equivalence is intended for use with interop libraries. As such, it only kicks in if your referenced assembly is marked with the PrimaryInteropAssembly attribute, e.g.:</p>
<p><code><br />
  .custom instance void [mscorlib]System.Runtime.InteropServices.PrimaryInteropAssemblyAttribute::.ctor(int32,int32) = ( 01 00 01 00 00 00 00 00 00 00 00 00 )<br />
</code></p>
<h2>The Fix?</h2>
<p>The issue is currently with Microsoft product support. Let&#8217;s see what they come up with; is it too esoteric for a hotfix&#8230;?</p>
<h2>The Repro</h2>
<p>Here&#8217;s some code and instructions on how to repro the problem.</p>
<ol>
<li>Build the IL into a DLL using ilasm.<br />
<code>"c:\WINNT\Microsoft.NET\Framework\v2.0.50727\ilasm.exe" /dll Test.il /output=Test.dll</code>
</li>
<li>Build the application into a .NET 4.0 EXE that references the DLL<br />
<code>"c:\winnt\Microsoft.NET\Framework\v4.0.30319\csc.exe" TestConsumer.cs /reference:Test.dll</code>
</li>
<li>Run the resulting <code>TestConsumer.exe</code> application and you&#8217;ll get the exception</li>
</ol>
<p><b>Test.il</b><br />
<code><br />
.assembly extern mscorlib<br />
{<br />
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )<br />
  .ver 2:0:0:0<br />
}<br />
.assembly Test<br />
{<br />
  .custom instance void [mscorlib]System.Runtime.InteropServices.PrimaryInteropAssemblyAttribute::.ctor(int32,int32) = ( 01 00 01 00 00 00 00 00 00 00 00 00 )<br />
  .hash algorithm 0x00008004<br />
  .ver 1:0:0:0<br />
}<br />
.module Test.dll<br />
.imagebase 0x00400000<br />
.file alignment 0x00000200<br />
.stackreserve 0x00100000<br />
.subsystem 0x0003<br />
.corflags 0x00000001 </p>
<p>.field public static literal valuetype Test.MyEnum LiteralValue = int32(0x00000001)</p>
<p>.class public auto ansi sealed Test.MyEnum<br />
       extends [mscorlib]System.Enum<br />
{<br />
  .field public specialname rtspecialname int32 value__<br />
  .field public static literal valuetype Test.MyEnum Zero = int32(0x00000000)<br />
  .field public static literal valuetype Test.MyEnum One = int32(0x00000001)<br />
}<br />
</code><br />
<b>TestConsumer.cs</b></p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">class</span> X
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> Main<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        var v <span style="color: #008000;">=</span> Test.<span style="color: #0000FF;">MyEnum</span>.<span style="color: #0000FF;">Zero</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2010/04/23/net-4-0-type-equivalence-causes-badimageformatexception/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>IL analysis using F#</title>
		<link>http://www.voyce.com/index.php/2009/04/24/il-analysis-using-fsharp/</link>
		<comments>http://www.voyce.com/index.php/2009/04/24/il-analysis-using-fsharp/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 21:19:17 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[F#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[IL]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=128</guid>
		<description><![CDATA[A description of using F# language features and reflection to enable basic analysis of .NET IL (intermediate language).]]></description>
			<content:encoded><![CDATA[<p>I recently needed to determine which functions were called by some of our F# code. Naively, you can use existing tools like ildasm, to disassemble a .NET DLL and then search the resulting IL source code for references. The obvious problem here though, is that you&#8217;re going to include <em>all</em> references whether or not they&#8217;re actually called. In some circumstances this isn&#8217;t too bad, but in our case we pull in a great deal of shared library code, so you&#8217;re likely to get lots of false positives.</p>
<p>There are some other options to more accurately determine whether the method you&#8217;re interested in is actually called: run the code, or &#8220;almost&#8221; run it, by simulating the operation of the CLR. To radically understate; this is quite a lot of work. Yet another option is to statically analyse the original source code. This is generally easier than dynamic evaluation, but there are some serious and well known problems doing it exhaustively, that can result in the complexity eventually converging with that of full dynamic analysis.</p>
<p>So broadly, we have 3 types of approches:<br />
<table>
<tr>
<td><em><br />
Approach</em></p>
</td>
<td><em><br />
Implementation</em></p>
</td>
<td><em><br />
Accuracy</em></p>
</td>
</tr>
<tr>
<td>Disassembly</td>
<td>Easy</td>
<td> Superset</td>
</tr>
<tr>
<td>Dynamic analysis</td>
<td> Hard</td>
<td> Exact</td>
</tr>
<tr>
<td>Static analysis</td>
<td> Medium</td>
<td> Medium</td>
</tr>
</table>
<p>Anyone for a trade-off? Unsurprisingly I decided to look at implementing the third option. Although static analysis is normally performed on the source code itself, it&#8217;s actually easier for us to use the generated IL, it certainly requires less gnarly parsing. We can also take some short cuts based on the fact that we&#8217;re analysing F# code, more on that later.</p>
<p>We can use F#&#8217;s discriminated unions &#8211; a type that is constructed from one of many possible options &#8211; to describe the universe of IL instructions in a pretty concise way, e.g. (a partial example):</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp" style="font-family:monospace;"><span style="color: #06c; font-weight: bold;">type</span> inst <span style="color: #000080;">=</span>
    <span style="color: #000080;">|</span> Nop
    <span style="color: #000080;">|</span> Break
    <span style="color: #000080;">|</span> Ldarg_0
    <span style="color: #000080;">|</span> Ldc_i4 <span style="color: #06c; font-weight: bold;">of</span> int32
    <span style="color: #000080;">|</span> Newobj <span style="color: #06c; font-weight: bold;">of</span> meth
<span style="color: #06c; font-weight: bold;">and</span> field <span style="color: #000080;">=</span> FieldInfo
<span style="color: #06c; font-weight: bold;">and</span> meth <span style="color: #000080;">=</span> MethodBase
<span style="color: #06c; font-weight: bold;">and</span> typ <span style="color: #000080;">=</span> Type</pre></div></div>

<p>This allows us to construct instances of <code>inst</code> by doing something like this in fsi (F# interactive):<br />
<code><br />
&gt; let i = Ldc_i4 2;;<br />
val i : inst<br />
</code><br />
You may have noticed that as well as the instructions that take simple types like int32, we also have ones that accept <code>meth</code>, which is an alias for System.Reflection.MethodBase, the base class for all methods, including constructors, which is what&#8217;s used to construct a <code>Newobj</code>.</p>
<p>Now we have this discrimated union defined, we need a way to build instances of it. In the IL byte stream, instructions are stored as opcodes, an unsigned 16bit integer. Firstly we need to get the raw bytes representing the IL. Using Reflection, it&#8217;s fairly easy given <code>m</code> of type <code>MethodInfo</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp" style="font-family:monospace;">    <span style="color: #06c; font-weight: bold;">let</span> body <span style="color: #000080;">=</span> m<span style="color: #000080;">.</span><span style="color: #505090;">GetMethodBody</span><span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span>
    <span style="color: #06c; font-weight: bold;">let</span> ilbytes <span style="color: #000080;">=</span> body<span style="color: #000080;">.</span><span style="color: #505090;">GetILAsByteArray</span><span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span>
    <span style="color: #06c; font-weight: bold;">let</span> ms <span style="color: #000080;">=</span> <span style="color: #06c; font-weight: bold;">new</span> IO<span style="color: #000080;">.</span><span style="color: #505090;">MemoryStream</span><span style="color: #000080;">&#40;</span>ilbytes<span style="color: #000080;">&#41;</span>
    <span style="color: #000080;">...</span></pre></div></div>

<p>So now we have a stream of bytes, and we can use functions from System.IO to extract information in various sized pieces:</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp" style="font-family:monospace;">    <span style="color: #06c; font-weight: bold;">let</span> getByte <span style="color: #06c; font-weight: bold;">_</span>  <span style="color: #000080;">=</span> <span style="color: #000080;">&#40;</span>byte <span style="color: #000080;">&#40;</span>ms<span style="color: #000080;">.</span><span style="color: #505090;">ReadByte</span><span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span><span style="color: #000080;">&#41;</span><span style="color: #000080;">&#41;</span>
    <span style="color: #06c; font-weight: bold;">let</span> i2 <span style="color: #06c; font-weight: bold;">_</span> <span style="color: #000080;">=</span> readInt16 ms
    <span style="color: #06c; font-weight: bold;">let</span> i4 <span style="color: #06c; font-weight: bold;">_</span> <span style="color: #000080;">=</span> readInt32 ms
    <span style="color: #000080;">...</span></pre></div></div>

<p>As Harry Hill would say; &#8220;well, you get the idea with that&#8221;. It&#8217;s worth noting that these functions have a dummy argument (indicated by the<br />
underscore). This is required because they have a side effect &#8211; reading from the stream, changing it&#8217;s state &#8211; which is not obvious to the compiler, so if we omitted it the function would only be called once. Although adding the dummy arg is required, it does have the unfortunate consequence that we have to pass something (normally unit) which can look a little ugly in the normally terse F# world.</p>
<p>As the ECMA CIL spec describes, IL opcodes consist of either 1 or 2 bytes, in which case the first is always 0xFE. Now we can begin to implement something serious. Given <code>ms</code> of type <code>MemoryStream</code> we can write something that will convert it to instructions:</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp" style="font-family:monospace;">    <span style="color: #06c; font-weight: bold;">match</span> ms<span style="color: #000080;">.</span><span style="color: #505090;">ReadByte</span><span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span> <span style="color: #06c; font-weight: bold;">with</span>
    <span style="color: #000080;">|</span> 0xFE <span style="color: #06c; font-weight: bold;">as</span> lb <span style="color: #000080;">-&gt;</span>
        <span style="color: #060; font-style: italic;">// Two byte instruction, read further byte</span>
        <span style="color: #06c; font-weight: bold;">let</span> hb <span style="color: #000080;">=</span> getByte<span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span>
        <span style="color: #06c; font-weight: bold;">let</span> i <span style="color: #000080;">=</span> <span style="color: #000080;">&#40;</span><span style="color: #000080;">&#40;</span>uint16 lb<span style="color: #000080;">&#41;</span> <span style="color: #000080;">&lt;&lt;&lt;</span> <span style="color: #c6c;">8</span> <span style="color: #000080;">&#41;</span> <span style="color: #000080;">+</span> <span style="color: #000080;">&#40;</span>uint16 hb<span style="color: #000080;">&#41;</span>
        <span style="color: #06c; font-weight: bold;">let</span> t <span style="color: #000080;">=</span>
            <span style="color: #06c; font-weight: bold;">match</span> i <span style="color: #06c; font-weight: bold;">with</span>
            <span style="color: #000080;">|</span> 0xfe01us <span style="color: #000080;">-&gt;</span> Ceq
    <span style="color: #000080;">|</span> <span style="color: #06c; font-weight: bold;">_</span> <span style="color: #06c; font-weight: bold;">as</span> b <span style="color: #000080;">-&gt;</span>
        <span style="color: #06c; font-weight: bold;">let</span> t <span style="color: #000080;">=</span>
            <span style="color: #06c; font-weight: bold;">match</span> b <span style="color: #06c; font-weight: bold;">with</span>
            <span style="color: #000080;">|</span> 0x0 <span style="color: #000080;">-&gt;</span> Nop
            <span style="color: #000080;">|</span> 0x1f <span style="color: #000080;">-&gt;</span> Ldc_i4_s <span style="color: #000080;">&#40;</span>getByte<span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span><span style="color: #000080;">&#41;</span>
            <span style="color: #000080;">|</span> 0x20 <span style="color: #000080;">-&gt;</span> Ldc_i4 <span style="color: #000080;">&#40;</span>i4<span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span><span style="color: #000080;">&#41;</span>
            <span style="color: #000080;">|</span> 0x73 <span style="color: #000080;">-&gt;</span> Newobj <span style="color: #000080;">&#40;</span>meth<span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span><span style="color: #000080;">&#41;</span></pre></div></div>

<p>So we now have a function that will go from a method to a list of opcodes and operands (<code>MethodBase -> inst []</code>). These are essentially the same steps we would perform if we were writing an interpreter for a textual language; taking the source and transforming it into an abstract syntax tree. In that case it&#8217;s a tree rather than a list, but the next step is pretty much the same anyway: we pattern match over it. This is the point where we can decide how we want to interpret the instruction stream.</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp" style="font-family:monospace;">        insts
        <span style="color: #000080;">|&gt;</span> List<span style="color: #000080;">.</span><span style="color: #505090;">map</span> <span style="color: #000080;">&#40;</span><span style="color: #06c; font-weight: bold;">fun</span> inst <span style="color: #000080;">-&gt;</span>
            <span style="color: #06c; font-weight: bold;">match</span> inst <span style="color: #06c; font-weight: bold;">with</span>
            <span style="color: #000080;">|</span> Newobj<span style="color: #000080;">&#40;</span>meth<span style="color: #000080;">&#41;</span> <span style="color: #000080;">-&gt;</span>
                printf <span style="color: #008080;">&quot;NEW: %s.%s<span style="color: #008080; font-weight: bold;">\n</span>&quot;</span> meth<span style="color: #000080;">.</span><span style="color: #505090;">DeclaringType</span><span style="color: #000080;">.</span><span style="color: #505090;">Namespace</span> meth<span style="color: #000080;">.</span><span style="color: #505090;">DeclaringType</span><span style="color: #000080;">.</span><span style="color: #505090;">Name</span>
            <span style="color: #000080;">|</span> <span style="color: #06c; font-weight: bold;">_</span> <span style="color: #000080;">-&gt;</span>
                <span style="color: #000080;">&#40;</span><span style="color: #000080;">&#41;</span></pre></div></div>

<p>Here we need to make some compromises based on the problem domain. I&#8217;m not trying to create a general purpose static analyser, but one that will work on object code in a certain format &#8211; that generated by the F# compiler. As such we make some assumptions and use some knowledge about the internals of the compiler to get the result we&#8217;re after. To be specific we&#8217;re relying on the fact that the compiler generates types for closures, and we assume that closures will always be called, even though in reality they needn&#8217;t be.</p>
<p>So based on this, we can put together something that, given an entry point &#8211; a particular method on a type &#8211; can recurse through the code, following references to other methods and types via the <code>Newobj</code>, <code>Call</code>, <code>Calli</code> and <code>Callvirt</code> instructions. This will build up a graph of all types referenced directly from our starting point. We also use our intimate knowledge of the purpose of F#&#8217;s <code>FastFunc</code> type (from which all functions are derived) and always follow its Invoke method if we find an instance of that type, even if it&#8217;s not directly referenced.</p>
<p>There are some major caveats. Anything accessed purely via reflection will not be detected. And polymorphic objects passed in and accessed via interfaces will also be missed. Also, I don&#8217;t attempt to do full flow analysis; e.g. following branch instructions etc, as this isn&#8217;t a common pattern in fsc-generated IL.</p>
<p>Luckily in the particular cases I&#8217;m looking at, these shortcomings don&#8217;t have a significant impact. Instead, we end up with a reasonably straight-forward and useful way of determining whether a particular function is called. It&#8217;s already been used in anger to determine whether a buggy function was referenced from some release-candidate software.</p>
<p>As a little post-script: rather than writing your own library from the ground-up to do this, there are some &#8220;off-the-shelf&#8221; solutions that you can try. Notably the recently released <a href="http://ccimetadata.codeplex.com/">CCI</a>, a common compiler infrastructure out of Microsoft Research, that allows you to reverse engineer IL metadata. I haven&#8217;t had a chance to have a good look at this yet, but it seems to do what we need for call graph analysis. There&#8217;s also an API called AbstractIL &#8211; in the absil.dll assembly &#8211; that ships with and is used internally by the F# compiler toolset. This looks extremely powerful, but the API is complex and the documentation is poor. Depending on exactly what your motivation is for looking at this stuff, it&#8217;s worth checking if these ready-made libraries will do what you need.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2009/04/24/il-analysis-using-fsharp/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Verifying dynamically generated IL</title>
		<link>http://www.voyce.com/index.php/2009/03/30/verifying-dynamic-il/</link>
		<comments>http://www.voyce.com/index.php/2009/03/30/verifying-dynamic-il/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 10:17:52 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[F#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[emit]]></category>
		<category><![CDATA[IL]]></category>
		<category><![CDATA[peverify]]></category>
		<category><![CDATA[reflection]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=97</guid>
		<description><![CDATA[It&#8217;s safe to assume that when you use the C#, F# or (heaven forfend) VB.NET compilers, the IL generated for you will be correct. But, if you&#8217;re using Reflection.Emit to generate code &#8220;by hand&#8221; in a dynamic method or assembly it can be difficult to identify problems with the IL you emit. In the majority [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s safe to assume that when you use the C#, F# or (heaven forfend) VB.NET compilers, the IL generated for you will be correct. But, if you&#8217;re using Reflection.Emit to generate code &#8220;by hand&#8221; in a dynamic method or assembly it can be difficult to identify problems with the IL you emit. In the majority of cases the runtime will simply throw an InvalidProgramException. This is of course, exactly as you&#8217;d expect, as the JIT compiler (which generates architecture-specific machine code from the IL) is intended to be highly performant, rather than robust to errors which should&#8217;ve been dealt with earlier in the tool chain.</p>
<p>So what tools can you use to troubleshoot problems with dynamic IL? In a word: peverify.<br />
<span id="more-97"></span><br />
<strong>What is peverify?</strong><br />
peverify (where PE stands for portable executable, the file format used by Windows executables) takes a .NET assembly and validates the metadata &#8211; the type structure &#8211; and IL &#8211; the instructions, using a combination of rules and static analysis.</p>
<p>As the MSDN pages states it&#8217;s intended for use by &#8220;compiler writers and script engine developers&#8221;, but with a few small modifications to your code you can also use it with Reflection.Emit.</p>
<p>Here&#8217;s some example code that uses F# to emit a simple type (deriving from MarshalByRefObject) to wrap another object. The intention is that the wrapper type implements the same interfaces as the underlying object, and delegates calls to it. The key part of the code is where we emit the body of each function.</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<pre style="margin: 0px;"><span style="color: blue;">let</span> wrapObject (obj:obj) =</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> name = <span style="color: maroon;">"MyAssembly"</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> filename = name + <span style="color: maroon;">".dll"</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> ab = AppDomain.CurrentDomain.DefineDynamicAssembly(AssemblyName(name), AssemblyBuilderAccess.RunAndSave)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> modb = ab.DefineDynamicModule(filename, filename)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> tb = modb.DefineType(<span style="color: maroon;">"MyType"</span>, TypeAttributes.Class, typeof&lt;MarshalByRefObject&gt;)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> objField : FieldBuilder = tb.DefineField(<span style="color: maroon;">"_underlyingObject"</span>, typeof&lt;obj&gt;, FieldAttributes.Public)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; obj.GetType().GetInterfaces() </pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; |&gt; Array.iter (<span style="color: blue;">fun</span> itf <span style="color: blue;">-&gt;</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; tb.AddInterfaceImplementation(itf)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; itf.GetMethods()</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; |&gt; Array.iter (<span style="color: blue;">fun</span> m <span style="color: blue;">-&gt;</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> ptypes = m.GetParameters() |&gt; Array.map (<span style="color: blue;">fun</span> p <span style="color: blue;">-&gt;</span> p.ParameterType)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> mb = tb.DefineMethod(m.Name, MethodAttributes.Public|||MethodAttributes.Virtual, m.ReturnType, ptypes)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> il = mb.GetILGenerator()</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: green;">// load object reference from field</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; il.Emit(OpCodes.Ldarg_0)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; il.Emit(OpCodes.Ldfld, objField)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: green;">// cast it to appropriate type</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; il.Emit(OpCodes.Castclass, itf)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: green;">// push all args, verbatim</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; m.GetParameters() |&gt; Array.iteri (<span style="color: blue;">fun</span> n p <span style="color: blue;">-&gt;</span> il.Emit(OpCodes.Ldarg,n+1))</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: green;">// call method on underlying objects</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; il.Emit(OpCodes.Callvirt, m)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; il.Emit(OpCodes.Ret)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ))</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> typ = tb.CreateType()</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; ab.Save(filename)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> o = typ.GetConstructor([||]).Invoke([||])</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; typ.GetField(<span style="color: maroon;">"_underlyingObject"</span>).SetValue(o, obj)</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; o</pre>
</div>
<p>Now, in my first version of the code, I forgot that you need to load &#8220;this&#8221; onto the stack before loading a field, so the code looked like this:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;"><span style="color: blue;">let</span> il = mb.GetILGenerator()</p>
<p style="margin: 0px;"><span style="color: green;">// il.Emit(OpCodes.Ldarg_0) Oops, the forgot to load &#8220;this&#8221; onto stack</span></p>
<p style="margin: 0px;"> il.Emit(OpCodes.Ldfld, objField)</p>
<p style="margin: 0px;"> il.Emit(OpCodes.Castclass, itf)</p>
</div>
<p>I used the following code to create an instance of a type implementing IFoo, wrap it and invoke the function:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<pre style="margin: 0px;"><span style="color: blue;">do</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> o = </pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; { <span style="color: blue;">new</span> MyInterfaces.IFoo <span style="color: blue;">with</span></pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; <span style="color: blue;">member</span> this.Bar(a) = a + <span style="color: maroon;">"!"</span> }</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; <span style="color: blue;">let</span> wrappedObj = wrap o</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; wrappedObj.GetType().InvokeMember(<span style="color: maroon;">"Bar"</span>, BindingFlags.Instance|||BindingFlags.Public|||BindingFlags.InvokeMethod, <span style="color: blue;">null</span>, wrappedObj, [|box <span style="color: maroon;">"Hello"</span>|])</pre>
<pre style="margin: 0px;">&nbsp;&nbsp;&nbsp; ()</pre>
</div>
<p>As you&#8217;d expect the function generated the expected InvalidProgramException:<br />
<code><br />
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> <b>System.InvalidProgramException: Common Language Runtime detected an invalid program.</b><br />
   at MyType.Bar(String )<br />
   --- End of inner exception stack trace ---<br />
   at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct&#038; sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)<br />
   at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)<br />
</code></p>
<p>In order to use peverify with your code, the first thing you need to do is save the assembly to disk. This involves making a couple of changes to the code, because normally you&#8217;d do everything in memory and avoid the cost of the disk access. First, change Run to RunAndSave in the call to define the assembly:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;"><span style="color: blue;">let</span> ab = AppDomain.CurrentDomain.DefineDynamicAssembly(AssemblyName(name), AssemblyBuilderAccess.<b>RunAndSave</b>)</p>
</div>
<p>And ensure you include the filename when you create the module:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;"><span style="color: blue;">let</span> modb = ab.DefineDynamicModule(filename<b>, filename</b>)</p>
</div>
<p>Now, you should be able to save the assembly to disk by adding a call after you&#8217;re finished building the type:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;">ab.Save(filename)</p>
</div>
<p>Be aware that you can&#8217;t specify a path in the call to AssemblyBuilder.Save. This is presumably a security restriction; keeping generated binaries within the application&#8217;s directory tree and preventing people creating arbitrary binaries all over the file system. If you&#8217;re not running from an obvious &#8220;top level&#8221; application, e.g. you&#8217;re using a scripting environment like F# interactive, you may find the location of the file odd; when using FSI from within Visual Studio, the binary is created in &#8220;C:Program FilesMicrosoft Visual Studio 9.0Common7IDE&#8221;.</p>
<p>Now we can run peverify on the DLL and we get a much richer explanation of the problem:</p>
<p><code><br />
Microsoft (R) .NET Framework PE Verifier.  Version  3.5.30729.1<br />
Copyright (c) Microsoft Corporation.  All rights reserved.<br />
</code><code><br />
[IL]: Error: [c:Program FilesMicrosoft Visual Studio 9.0Common7IDEMyAssembl<br />
y.dll : MyType::Bar][offset 0x00000000] <b>Stack underflow</b>.<br />
1 Error(s) Verifying c:Program FilesMicrosoft Visual Studio 9.0Common7IDEMy<br />
Assembly.dll<br />
</code></p>
<p>From this error message it&#8217;s pretty clear that the first thing we&#8217;re doing (i.e. the instruction at offset 0) is underflowing the stack; calling an instruction that expects more on the stack than we&#8217;ve put there. That should give us a good idea of how to fix the problem.</p>
<p>Another error that I came across was the omission of a cast in the IL:</p>
<div style="font-family: Consolas; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;"><span style="color: blue;">let</span> il = mb.GetILGenerator()</p>
<p style="margin: 0px;">il.Emit(OpCodes.Ldarg_0)</p>
<p style="margin: 0px;">il.Emit(OpCodes.Ldfld, objField)</p>
<p style="margin: 0px;"><span style="color: green;">//il.Emit(OpCodes.Castclass, itf) Oops, forget to cast object to specific type</span></p>
</div>
<p>From which peverify generated the fantastically specific error message:<br />
<code><br />
[IL]: Error: [c:Program FilesMicrosoft Visual Studio 9.0Common7IDEMyAssembl<br />
y.dll : MyType::Bar][offset 0x0000000C][found ref 'System.Object'][expected ref<br />
'MyInterfaces.IFoo'] Unexpected type on the stack.<br />
</code></p>
<p>And of course, once all your emission bugs are fixed, you should get a success report:<br />
<code><br />
All Classes and Methods in c:Program FilesMicrosoft Visual Studio 9.0Common7<br />
IDEMyAssembly.dll Verified.<br />
</code></p>
<p>The only problem with peverify as far as I can see is that there are no compiler style &#8220;error codes&#8221; to help you identify and resolve the problem. It assumes fairly intimate knowledge of the IL/CLR specification and there may be a bit of effort required to get from the error to the resolution. But one thing&#8217;s for sure: it&#8217;s definitely easier than trying to do it with just InvalidProgramException.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2009/03/30/verifying-dynamic-il/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
