IL analysis using F#

I recently needed to determine which functions were called by some of our F# code. Naively, you can use existing tools like ildasm, to disassemble a .NET DLL and then search the resulting IL source code for references. The obvious problem here though, is that you’re going to include all references whether or not they’re actually called. In some circumstances this isn’t too bad, but in our case we pull in a great deal of shared library code, so you’re likely to get lots of false positives.

There are some other options to more accurately determine whether the method you’re interested in is actually called: run the code, or “almost” run it, by simulating the operation of the CLR. To radically understate; this is quite a lot of work. Yet another option is to statically analyse the original source code. This is generally easier than dynamic evaluation, but there are some serious and well known problems doing it exhaustively, that can result in the complexity eventually converging with that of full dynamic analysis.

So broadly, we have 3 types of approches:


Approach


Implementation


Accuracy

Disassembly Easy Superset
Dynamic analysis Hard Exact
Static analysis Medium Medium

Anyone for a trade-off? Unsurprisingly I decided to look at implementing the third option. Although static analysis is normally performed on the source code itself, it’s actually easier for us to use the generated IL, it certainly requires less gnarly parsing. We can also take some short cuts based on the fact that we’re analysing F# code, more on that later.

We can use F#’s discriminated unions – a type that is constructed from one of many possible options – to describe the universe of IL instructions in a pretty concise way, e.g. (a partial example):

type inst =
    | Nop
    | Break
    | Ldarg_0
    | Ldc_i4 of int32
    | Newobj of meth
and field = FieldInfo
and meth = MethodBase
and typ = Type

This allows us to construct instances of inst by doing something like this in fsi (F# interactive):

> let i = Ldc_i4 2;;
val i : inst

You may have noticed that as well as the instructions that take simple types like int32, we also have ones that accept meth, which is an alias for System.Reflection.MethodBase, the base class for all methods, including constructors, which is what’s used to construct a Newobj.

Now we have this discrimated union defined, we need a way to build instances of it. In the IL byte stream, instructions are stored as opcodes, an unsigned 16bit integer. Firstly we need to get the raw bytes representing the IL. Using Reflection, it’s fairly easy given m of type MethodInfo:

    let body = m.GetMethodBody()
    let ilbytes = body.GetILAsByteArray()
    let ms = new IO.MemoryStream(ilbytes)
    ...

So now we have a stream of bytes, and we can use functions from System.IO to extract information in various sized pieces:

    let getByte _  = (byte (ms.ReadByte()))
    let i2 _ = readInt16 ms
    let i4 _ = readInt32 ms
    ...

As Harry Hill would say; “well, you get the idea with that”. It’s worth noting that these functions have a dummy argument (indicated by the
underscore). This is required because they have a side effect – reading from the stream, changing it’s state – which is not obvious to the compiler, so if we omitted it the function would only be called once. Although adding the dummy arg is required, it does have the unfortunate consequence that we have to pass something (normally unit) which can look a little ugly in the normally terse F# world.

As the ECMA CIL spec describes, IL opcodes consist of either 1 or 2 bytes, in which case the first is always 0xFE. Now we can begin to implement something serious. Given ms of type MemoryStream we can write something that will convert it to instructions:

    match ms.ReadByte() with
    | 0xFE as lb ->
        // Two byte instruction, read further byte
        let hb = getByte()
        let i = ((uint16 lb) <<< 8 ) + (uint16 hb)
        let t =
            match i with
            | 0xfe01us -> Ceq
    | _ as b ->
        let t =
            match b with
            | 0x0 -> Nop
            | 0x1f -> Ldc_i4_s (getByte())
            | 0x20 -> Ldc_i4 (i4())
            | 0x73 -> Newobj (meth())

So we now have a function that will go from a method to a list of opcodes and operands (MethodBase -> inst []). These are essentially the same steps we would perform if we were writing an interpreter for a textual language; taking the source and transforming it into an abstract syntax tree. In that case it’s a tree rather than a list, but the next step is pretty much the same anyway: we pattern match over it. This is the point where we can decide how we want to interpret the instruction stream.

        insts
        |> List.map (fun inst ->
            match inst with
            | Newobj(meth) ->
                printf "NEW: %s.%s\n" meth.DeclaringType.Namespace meth.DeclaringType.Name
            | _ ->
                ()

Here we need to make some compromises based on the problem domain. I’m not trying to create a general purpose static analyser, but one that will work on object code in a certain format – that generated by the F# compiler. As such we make some assumptions and use some knowledge about the internals of the compiler to get the result we’re after. To be specific we’re relying on the fact that the compiler generates types for closures, and we assume that closures will always be called, even though in reality they needn’t be.

So based on this, we can put together something that, given an entry point – a particular method on a type – can recurse through the code, following references to other methods and types via the Newobj, Call, Calli and Callvirt instructions. This will build up a graph of all types referenced directly from our starting point. We also use our intimate knowledge of the purpose of F#’s FastFunc type (from which all functions are derived) and always follow its Invoke method if we find an instance of that type, even if it’s not directly referenced.

There are some major caveats. Anything accessed purely via reflection will not be detected. And polymorphic objects passed in and accessed via interfaces will also be missed. Also, I don’t attempt to do full flow analysis; e.g. following branch instructions etc, as this isn’t a common pattern in fsc-generated IL.

Luckily in the particular cases I’m looking at, these shortcomings don’t have a significant impact. Instead, we end up with a reasonably straight-forward and useful way of determining whether a particular function is called. It’s already been used in anger to determine whether a buggy function was referenced from some release-candidate software.

As a little post-script: rather than writing your own library from the ground-up to do this, there are some “off-the-shelf” solutions that you can try. Notably the recently released CCI, a common compiler infrastructure out of Microsoft Research, that allows you to reverse engineer IL metadata. I haven’t had a chance to have a good look at this yet, but it seems to do what we need for call graph analysis. There’s also an API called AbstractIL – in the absil.dll assembly – that ships with and is used internally by the F# compiler toolset. This looks extremely powerful, but the API is complex and the documentation is poor. Depending on exactly what your motivation is for looking at this stuff, it’s worth checking if these ready-made libraries will do what you need.

Posted in .NET, F#, Software Development | Tagged , , | 2 Comments

Installing Windows SDK breaks F# Visual Studio integration

Beware! If you install the Windows SDK – perhaps to get access to the interesting looking WPF performance tools – you’ll find that it hoses your F# Visual Studio integration. I found that it causes intellisense tooltips to stop appearing, and the integrated F# interactive to crash Visual Studio. Both of these issues are a real pain; especially the inability to see the inferred types “live”, which is pretty much essential for F# development – where the focus is on compile time correctness.

I remembered seeing a post on that Windows SDK blog that I’d come across relating to a similar issue with the XAML editor (I’ve been doing some work with WPF recently, more on that in a later post) so thought I’d try the steps they recommend, in short, re-registering TextMgrP.dll:

regsvr32 "%CommonProgramFiles%\Microsoft Shared\MSEnv\TextMgrP.dll"

…and all my problems went away. Hope you find this useful.

Posted in F#, Visual Studio | Tagged , , , , | 5 Comments

Verifying dynamically generated IL

It’s safe to assume that when you use the C#, F# or (heaven forfend) VB.NET compilers, the IL generated for you will be correct. But, if you’re using Reflection.Emit to generate code “by hand” in a dynamic method or assembly it can be difficult to identify problems with the IL you emit. In the majority of cases the runtime will simply throw an InvalidProgramException. This is of course, exactly as you’d expect, as the JIT compiler (which generates architecture-specific machine code from the IL) is intended to be highly performant, rather than robust to errors which should’ve been dealt with earlier in the tool chain.

So what tools can you use to troubleshoot problems with dynamic IL? In a word: peverify.
Read More »

Posted in .NET, F#, Software Development | Tagged , , , , , | 2 Comments

Implementing INotifyPropertyChanged with F#

I like F# for a lot of things, but, man, is it a pain to support events. In C# it’s trivial to implement an interface like INotifyPropertyChanged consisting only of an event, but in F# you have to jump through some hoops to map native functions to delegates/events. F# is generally much terser than C# and other .NET languages, but not in this case. After spending some time the other day trying to figure out the right combination of syntax and helper functions (and unsucessfully googling for it), I thought I’d upload a bare-bones implementation here as an aide-memoire.

open System.ComponentModel
 
type MyObject() =
    let mutable propval = 0.0
 
    let event = Event<_, _>()
 
    interface INotifyPropertyChanged with
        member this.add_PropertyChanged(e) =
            event.Publish.AddHandler(e)
        member this.remove_PropertyChanged(e) =
            event.Publish.RemoveHandler(e)
 
    member this.MyProperty
        with get() = propval
        and  set(v) =
            propval <- v
            event.Trigger(this, new PropertyChangedEventArgs("MyProperty"))

It turns out that in F# version 1.9.6.16 there’s a slightly more concise syntax for this, as pointed out by Rei in the comments (thanks!). It uses the CLIEvent attribute to hook up the .NET event:

open System.ComponentModel
 
type MyObject() =
    let mutable propval = 0.0
 
    let propertyChanged = Event<_, _>()
    interface INotifyPropertyChanged with
        [<clievent>]
        member x.PropertyChanged = propertyChanged.Publish
 
    member this.MyProperty
        with get() = propval
        and  set(v) =
            propval <- v
            propertyChanged.Trigger(this, new PropertyChangedEventArgs("MyProperty"))

Posted in F# | Tagged , , | 4 Comments

Visual Studio Toggle Brackets Macro

After using a F# heavily for a while, I often found myself wanting to add brackets (or rather, parentheses) around some text. This is normally when adding a type specification to an argument in order to be able to use dot notation, e.g. going from:

let typeName t = t.Name

which causes “error FS0072: Lookup on object of indeterminate type based on information prior to this program point”, to the correct:

let typeName (t:Type) = t.Name

(These are obviously simplistic examples!)

So I broke out the Visual Studio macro editor for the first time in a while, and put together something to toggle brackets around the currently selected text. It’s naive, but, combined with Shift+Alt+Left Arrow to select the previous word, it’s effective:

Public Sub AddBrackets()

Dim s As Object = DTE.ActiveWindow.Selection()

If s.Text.StartsWith(“(”) And s.Text.EndsWith(“)”) Then

s.Text = s.Text.Substring(1, s.Text.Length – 2)

Else

s.Text = “(” + s.Text + “)”

End If

End Sub

Copy this text into a module within your macro project, and assign a suitable keystroke using Tools|Customize|Keyboard.

Posted in F#, Visual Studio | Tagged , , , , | Leave a comment

BattleFingers is here!

BattleFingers text

Well, I’ve done it: I’ve got my first game live on the AppStore. It’s been an interesting journey. I’m terribly bad at getting my hands on devkits and SDKs, having a play with them and then not doing anything constructive. This dates way back to things like the Playstation NetYaroze, which was pretty expensive, and with which I failed to produce anything concrete. So this time around all the pieces were in place: shiny new “gaming” kit, interesting SDK, low cost of entry. I was determined to create!

I’ll be making a series of posts on the process and details of creating it, in the interest of sharing the fun. In the meantime, you can find out more about the game here.

Posted in Gaming, Mac, Software Development, iPhone | Tagged , | Leave a comment

Twitter

I’m on twitter! Expect some random thoughts on software development and the like: http://twitter.com/voyce

I’ve added a widget to the sidebar to give you a flavour.

Posted in Uncategorized | Tagged | Leave a comment

F# CTP and Visual Studio integration

Just a quick note on an inconsistency in the F# 1.9.6.2 (CTP) release and it’s integration into Visual Studio: be aware that the standard VS environment variable $(TargetPath) is not getting set to what you’d expect. Rather than containing the full path to the output file it references the intermediate file typically in \obj\bin.

This can be a problem if you’ve got any tools set-up that try and do something with the built binary. Normally you can assume that referenced assemblies will also be in that directory, so you’d be able to load and execute your built file. If you’re pointing to a copy in the intermediate directory, that’s not the case.

It looks like it’s just an artifact of the way they’ve integrated the F# compiler (fsc.exe) with msbuild. The F# team are aware of this bug, so hopefully it’ll be fixed in the next drop.

Posted in F# | Tagged , , , , | Leave a comment

Sound formats for iPhone development

Ahh, back to work today. It’s pretty tough getting back into the swing of things after what turned out to be a long break this year. While I was off I finally got to spend some time working on an iPhone game. After getting hold of the SDK a while back, it’s only now that I’ve gotten around to doing something with it.

One of the things that seemed a little odd about the SDK is it’s use of CAF-format audio files, detailed here. I got hold of a few very nice audio samples from the freesound site, but needed to convert them from WAVs to CAFs.

I thought ffmpeg might be up to the job, but the version I had didn’t list it as an avaliable output format using ffmpeg -formats. However after a bit of digging I discovered that it is supported by libsndfile, so set about installing it using MacPorts:

sudo port install libsndfile

Then I used the included libsndfile-convert app to convert my file:

libsndfile-convert file.wav file.caf

The output format is inferred from the file extension, so you don’t have to specify it. However, when I rebuilt and ran my iPhone app using the new file, it didn’t play back. I suspected there may be something wrong with the format of the file, so I took a look to see what file reported. For the original WAV file I got

file.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz

Unfortunately file doesn’t work on .CAF files, but you can open them using QuickTime Player, and using the Movie Inspector window you can see that the file has the following format:

16-bit Integer (Big Endian), Mono, 44.100 kHz

So it looks like the problem may be libsndfile-convert changing the endian-ness of the file contents, from the x86-style little-endian to Motorola-ish (i.e. pre-x86 Mac) big-endian, which is a bit of a pain. According to the docs, the libsndfile API supports endian-ness manipulation, so it’s probably just the case that the helper app is doing the wrong thing automatically. I’ll look at putting together a small command line app to use the API directly and enable me to batch process .WAV files correctly.

Posted in Mac, Software Development, iPhone | Tagged , , , , , , | 5 Comments

Getting .NET type information in the unmanaged world

One of the tools that I write and maintain displays type information for COM objects hidden behind “handles” in Excel spreadsheets. The underlying objects can either support an interface that allows them to be richly rendered to XML, or the viewer will fall-back to using metadata and displaying the supported interfaces and their properties and methods. It will also invoke parameterless property getters – making the assumption that doing so won’t change the state of the object – and display the returned value. This is a useful way of getting some visibility on otherwise completely opaque values.

In order to obtain the type information about the COM objects, the tool uses type libraries, and the associated ITypeLib and ITypeInfo interfaces, which, with a little effort, can be used to iterate over all the coclasses, interfaces and functions in the library. But the difficulty lies in obtaining a type library when all you’re given is an already-instantiated object. In theory, COM allows you to know no more about an object than what interfaces it supports. But in practice, there are a variety of ways you can circumvent this and get to the type information.

For unmanaged COM objects you can use the information in the registry (or SxS configuration) and obtain the server (DLL) that contain a TLB embedded as a resource, or the type library filename itself. I won’t go into that now, there’s plenty of information about the location of these common registry keys elsewhere on the internet.

But for managed COM objects – well, COM callable wrappers (CCWs) – you have a different problem: registry scraping will never work and there may not even be an associated type library. The InprocServer32 registry entry always points to mscoree.dll, which obviously doesn’t have an embedded type library, and unless you’ve registered the assembly with /tlb (which is a pain) then you won’t have entries under HKEY_CLASSES_ROOT\Typelib and a TLB file to load.

So, if you’re in the unmanaged world, and all you’ve got is a pointer to a live CCW, what can you do?

Well, the easiest thing is to use IProvideClassInfo. This is supported by all CCWs, and provides a way to get an auto-generated (by the CLR) ITypeInfo implementation for the managed class. In fact, this is what I actually used to implement the solution eventually, but along the way I discovered some other interesting aspects of the CCW.

There is another interface that it supports: _Object, the unmanaged version of System.Object, which supports basic .NET functionality such as ToString and GetType. I couldn’t find it declared anywhere in the Platform or .NET SDK headers, so I put together a version that I could use from C++:

struct __declspec(uuid(“{65074F7F-63C0-304E-AF0A-D51741CB4A8D}”)) Object : public IDispatch

{

public:

// We don’t actually call these methods, doing so seems to return

// COR_E_INVALIDOPERATION. Instead we just use the IDispatch::Invoke

// and use the DISPID of the methods.

virtual HRESULT STDMETHODCALLTYPE ToString(BSTR *) = 0;

virtual HRESULT STDMETHODCALLTYPE Equals(VARIANT, VARIANT_BOOL *) = 0;

virtual HRESULT STDMETHODCALLTYPE GetHashCode(long *) = 0;

virtual HRESULT STDMETHODCALLTYPE GetType(mscorlib::_Type **) = 0;

};

Despite the presence of the virtual functions in this “interface”, we’re not actually going to call them. Instead we’ll call through the IDispatch that it derives from. It may be possible to use them directly, but see the comment describing what happens when I tried it. Calling via IDispatch may seem slightly odd, because the object itself claims not to support it (QueryInteface returns E_NOINTERFACE).

The methods on the _Object interface have well-known DISPIDs:

ToString 0×00000000
Equals 0×60020001
GetHashCode 0×60020002
GetType 0×60020003

So we can use that to invoke the GetType method:

DISPPARAMS parms;

parms.cArgs = 0;

parms.cNamedArgs = 0;

_variant_t vType;

hr = pObject->Invoke(0×60020003, IID_NULL, 0, DISPATCH_METHOD, &parms, &vType, NULL, NULL);

And we get back a _Type interface that allows us to navigate around the type information in the same way as we can with System.Type! Just #import mscorlib.tlb and you get all the interfaces you need to e.g. iterate over all the interfaces implemented by a type, and invoke a function on them:

#import <mscorlib.tlb> rename(“ReportEvent”,“xReportEvent”)

mscorlib::_TypePtr t(V_UNKNOWN(&vType));

CComSafeArray<LPUNKNOWN> saInterfaces(t->GetInterfaces());

mscorlib::_TypePtr tInterface((LPUNKNOWN)saInterfaces.GetAt(n));

result = tInterface->InvokeMember(_bstr_t(“Function”),

(mscorlib::BindingFlags)

(mscorlib::BindingFlags_GetProperty +

mscorlib::BindingFlags_InvokeMethod +

mscorlib::BindingFlags_Public +

mscorlib::BindingFlags_Instance +

mscorlib::BindingFlags_IgnoreCase),

NULL, _variant_t(punk), NULL, NULL, NULL, NULL);

So this turns out to be quite nice: you can get rich managed type information even if you’re running in the unmanaged world.

Posted in .NET, COM, Debugging, Software Development | Tagged , , , , , | Leave a comment
  • Follow me on Twitter Follow me on Twitter @voyce

  • Categories

  • Archives