<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>voyce &#187; heap</title>
	<atom:link href="http://www.voyce.com/index.php/tag/heap/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.voyce.com</link>
	<description>Programming and debugging tidbits</description>
	<lastBuildDate>Wed, 11 Aug 2010 03:56:45 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Modifying the VC runtime to get better heap allocation stack traces</title>
		<link>http://www.voyce.com/index.php/2010/03/17/modifying-the-vc-runtime-to-get-better-heap-allocation-stack-traces/</link>
		<comments>http://www.voyce.com/index.php/2010/03/17/modifying-the-vc-runtime-to-get-better-heap-allocation-stack-traces/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 23:24:30 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[Debugging]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[WinDbg]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[heap]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=754</guid>
		<description><![CDATA[Heap allocation stack traces are useless when using certain versions of the MSVC runtime. Is it possible to modify and rebuild MSVCR80 to avoid this?]]></description>
			<content:encoded><![CDATA[<p>Today I was trying to track down some &#8211; how can I put this politely &#8211; &#8220;unusual&#8221; memory usage in some unmanaged code running inside Excel. I broke out WinDbg and tried the usual suspects to get an idea of how memory was being used. Unfortunately, the way that msvcr80.dll is built stopped me from getting decent stack traces for the allocations, so I decided to try and rebuild it with a fix to remedy the situation.<br />
<span id="more-754"></span></p>
<h2>Collecting stack traces</h2>
<p>One of the most helpful things the heap manager can do for you when investigating memory issues is to capture stack traces for each heap allocation. You can enable the &#8220;collect stack traces&#8221; heap flag using the gflags GUI or from within WinDbg:</p>
<pre>
0:006> !gflag +ust
Current NtGlobalFlag contents: 0x00001040
    hpc - Enable heap parameter checking
    ust - Create user mode stack trace database
</pre>
<p>This means that for each heap block (one located at <code>0x0bbf7308</code> in this case), you can see where it was allocated by using the -a (show all information) option:</p>
<pre>
0:006> !heap -p -a 0bbf7308
    address 0bbf7308 found in
    _HEAP @ a630000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0bbf7308 0073 0000  [07]   0bbf7310    00380 - (busy)
        Trace: 401c
        7c96d6dc ntdll!RtlDebugAllocateHeap+0x000000e1
        7c949d18 ntdll!RtlAllocateHeapSlowly+0x00000044
        7c91b298 ntdll!RtlAllocateHeap+0x00000e64
        78134333 MSVCR80!malloc+0x00000077
</pre>
<p>But the obvious problem with this is that the stack trace always stops at malloc. Something&#8217;s allocating some memory? You don&#8217;t say&#8230; </p>
<p>It turns out that this is a <a href="http://http://www.nynaeve.net/?p=209">well understood</a> and documented issue with the Microsoft VC++ runtime, variously known as msvcrt, msvcr70, msvcr71, msvcr80, msvcr90, etc. Unfortunately they&#8217;re all built using the stack frame pointer omission optimisation. Well they&#8217;re built with the <a href="http://msdn.microsoft.com/en-us/library/8f8h5cxt.aspx">-O1</a> (favour small code) option, which implies <a href="http://msdn.microsoft.com/en-us/library/2kxx5t2c.aspx">-Oy</a>. This means that the fast stack-walking algorithm the heap manager uses stops at functions without a return address. The only way to get a decent trace in this situation would be to use the DbgHelp API along with the .pdb files, which would be far too slow to do at each allocation site.</p>
<h2>&#8220;Fixing&#8221; it</h2>
<p>So, given that the source for the runtime library ships as part of Visual Studio, maybe it would be possible to build it without the -Oy option?</p>
<p>My first attempt at building it failed miserably with errors like:<br />
<code><br />
NMAKE : fatal error U1073: don't know how to make 'build\intel\mt_obj\startup.lib'<br />
</code><br />
Luckily this <a href="http://blogs.msdn.com/michkap/articles/478235.aspx">excellent page</a> helped me get past this to a point where I could actually get a DLL built.</p>
<p>The next stage was to modify the build scripts to use different compiler switches. This was simply a case of changing line 69 of <code>makefile.sub</code> from:<br />
<code>CFLAGS=$(CFLAGS) -O1</code><br />
to:<br />
<code>CFLAGS=$(CFLAGS) -O1 <b>-Oy-</b></code></p>
<p>I thought I may have to also modify the build scripts to output a version of the DLL with the same name as the file I was replacing, msvcr80.dll, directly, in case there were internal references to the name in embedded manifests. There&#8217;s a section at the top of the build script for choosing a name for your private version of the library, but it strongly discourages use of the &#8220;reserved&#8221; MSVC* names. Luckily it turns out not to be necessary; the DLL is constructed in such a way as to be rename-able without any ill effects. I could build _sample_.dll (the default output name) and simply copy it to the destination directory in the SxS tree (<code>C:\WINNT\WinSxS\x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.3053_x-ww_b80fa8ca</code>) and rename it.</p>
<h2>Result</h2>
<p>Now I get the expected full stack trace (names have been changed to protect the innocent):</p>
<pre>
0:006> !heap -p -a 0bbf7308
    address 0bbf7308 found in
    _HEAP @ a630000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0bbf7308 0073 0000  [07]   0bbf7310    00380 - (busy)
        Trace: 401c
        7c96d6dc ntdll!RtlDebugAllocateHeap+0x000000e1
        7c949d18 ntdll!RtlAllocateHeapSlowly+0x00000044
        7c91b298 ntdll!RtlAllocateHeap+0x00000e64
        78134333 MSVCR80!malloc+0x00000077
        7816207f MSVCR80!operator new+0x0000001d
        fa92336 leakydll!std::allocator<std::vector<ATL::CAdapt<ATL::CComBSTR>,std::allocator<ATL::CAdapt<ATL::CComBSTR> > > >::allocate+0x00000016
        fa9879b leakydll!std::vector<CComVariant,std::allocator<CComVariant> >::resize+0x0000005b
        ...
        ...etc...
</pre>
<p>That&#8217;ll make it <i>much</i> easier to work out what&#8217;s happening and who&#8217;s responsible. Of course, you should be careful with this modified version. Only use it on development machines, and make sure it doesn&#8217;t escape into the wild: with great power comes great responsibility.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2010/03/17/modifying-the-vc-runtime-to-get-better-heap-allocation-stack-traces/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Diagnosing out of memory errors with VMMap &#8211; Part 2</title>
		<link>http://www.voyce.com/index.php/2009/07/29/diagnosing-out-of-memory-errors-with-vmmap-part-2/</link>
		<comments>http://www.voyce.com/index.php/2009/07/29/diagnosing-out-of-memory-errors-with-vmmap-part-2/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 08:32:34 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[Debugging]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[address space]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[perfmon]]></category>
		<category><![CDATA[private bytes]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=743</guid>
		<description><![CDATA[(I had problems with WordPress choking on this long post, so I&#8217;ve split it into 2 parts. The first part is here. This is the second part).

Private data
This is the data that is explicitly allocated by the process, or blocks of memory that contain the allocated data. So when you&#8217;re allocating from the heap, for [...]]]></description>
			<content:encoded><![CDATA[<p>(I had problems with WordPress choking on this long post, so I&#8217;ve split it into 2 parts. The first part is <a href="http://www.voyce.com/index.php/2009/07/28/diagnosing-out-of-memory-errors-with-vmmap/">here</a>. This is the second part).<br />
<span id="more-743"></span></p>
<h3>Private data</h3>
<p>This is the data that is explicitly allocated by the process, or blocks of memory that contain the allocated data. So when you&#8217;re allocating from the heap, for instance, using code like:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>p <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> <span style="color: #0000ff;">char</span><span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1024</span><span style="color: #000040;">*</span><span style="color: #0000dd;">1024</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span></pre></div></div>

<p>you can see how the heap manager manages address space; creating reserved segments of a fixed size, then committing parts of them as required.<br />
When a segment is full, it will create a new one of double the size.</p>
<p>Allocate first 256*1024 bytes:<br />
<div id="attachment_220" class="wp-caption alignleft" style="width: 303px"><a href="http://72.47.193.211/wp-content/uploads/2009/07/heap1.png"><img src="http://72.47.193.211/wp-content/uploads/2009/07/heap1.png" alt="After allocating first 256KB" title="heap1" width="293" height="41" class="size-full wp-image-220" /></a><p class="wp-caption-text">After allocating first 256KB</p></div><br />
<br clear="all"/>Allocate another 256*1024 bytes:<br />
<div id="attachment_221" class="wp-caption alignleft" style="width: 303px"><a href="http://72.47.193.211/wp-content/uploads/2009/07/heap2.png"><img src="http://72.47.193.211/wp-content/uploads/2009/07/heap2.png" alt="After allocating another 256KB" title="heap2" width="293" height="41" class="size-full wp-image-221" /></a><p class="wp-caption-text">After allocating another 256KB</p></div><br />
<br clear="all"/>Allocate yet another 256*1024 bytes:<br />
<div id="attachment_222" class="wp-caption alignleft" style="width: 303px"><a href="http://72.47.193.211/wp-content/uploads/2009/07/heap3.png"><img src="http://72.47.193.211/wp-content/uploads/2009/07/heap3.png" alt="And another..." title="heap3" width="293" height="41" class="size-full wp-image-222" /></a><p class="wp-caption-text">And another...</p></div><br />
<br clear="all"/>Allocate another 256*1024 bytes. This time a new heap segment is created:<br />
<div id="attachment_223" class="wp-caption alignleft" style="width: 299px"><a href="http://72.47.193.211/wp-content/uploads/2009/07/heap4.png"><img src="http://72.47.193.211/wp-content/uploads/2009/07/heap4.png" alt="And yet another..." title="heap4" width="289" height="84" class="size-full wp-image-223" /></a><p class="wp-caption-text">And yet another...</p></div><br clear="all"/>Many heaps in the process, all attempting to create segments in this way can cause problems, see Heap contention.</p>
<h2>Causes of out of memory errors</h2>
<h3>Fragmentation</h3>
<p>If you&#8217;re experiencing out of memory errors, the first thing to check is the largest free block size. Obviously if you&#8217;re requesting more than the available size, your allocation will fail. Remember that large requests may be made implicitly by mechanisms that are outside your control. For instance, Windows heaps expand geometrically, attempting to grab a segment of double the previous size each time, e.g. 16, 32, 64, 128MB. As you can see, allocating a single byte when the 64MB segment is full will result in the heap manager trying to reserve a 128MB block.</p>
<p>So once you&#8217;ve seen that the largest free block is small, the question then is: why? It&#8217;s either due to genuine memory exhaustion or address space fragmentation.</p>
<h3>Heap contention</h3>
<p>Fragmentation can be due to multiple heaps contending for contiguous address space. This is typically the case when you have a mix of VC++ code that is built using different versions of the runtime libraries (msvcrt.dll, msvcr70.dll, msvcr80.dll, msvcr90.dll), each of which manages it&#8217;s own heap.</p>
<p>Remember that heap segments need to be reserved as a contiguous block of address space, but if your virtual memory is fragmented, the heap manager may not be able to obtain one. In this case, it will fall-back to creating segments of smaller sizes. The problem is that there&#8217;s a limit to the number of segments that can be created &#8211; a measly 32 in Windows XP &#8211; so if fragmentation causes it to create more, smaller segments this limit may be reached. If a new segment cannot be created you&#8217;ll get out of memory errors.</p>
<p>If you suspect this, you can use <code>!heap -m</code> to see details of the segment count and sizes for each heap. To identity the heap associated with each version of the MSVC runtime, ensure you&#8217;ve got the appropriate symbols loaded and then use <code>dd msvcr80!_crtheap L1</code> to see the address.</p>
<h3>Address space exhaustion</h3>
<p>It may also be possible that you <i>really have</i> exhausted your 2GB address space. This can happen when your process wastes lots of address space due to allocation granularity. For example calling <code>VirtualAlloc</code> without an address will cause the OS to choose one for you, and as the documentation states, this will be rounded to a multiple of the allocation granularity, 64KB. So if you happen to allocate lots of objects of only a few bytes with direct calls to VirtualAlloc, you will waste almost 64KB a time. Although this might not seem significant in a 2GB address space, it all adds up.</p>
<p>One of the symptoms of address space exhaustion is DLLs failing to load. I noticed recently that a COM <code>CoCreateInstance</code> call was failing because the only address space left to load the DLL into was way up into the area usually reserved for OS DLLs such as ntdll.dll.</p>
<h3>Other tips</h3>
<p>By default the allocations are ordered by Address (the first column) and, because things are generally allocated in increasingly higher locations in memory, this can serve as a useful &#8220;timeline&#8221; of the app&#8217;s allocations. It&#8217;s not guaranteed though: DLLs that have an explicit load address don&#8217;t follow this pattern (for example, VBE6.DLL always loads at 0&#215;65000000). You can use it to see roughly when threads and heaps are created and files are mapped though.</p>
<h3>Summary</h3>
<p>So, I hope you find this information useful in interpreting the output from VMMap. It&#8217;s a very good way of getting visibility on the state of your processes and it&#8217;s certainly more intuitive than having to use the !address and !heap commands in WinDbg.</p>
<p>Good hunting!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2009/07/29/diagnosing-out-of-memory-errors-with-vmmap-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programmatically checking memory usage</title>
		<link>http://www.voyce.com/index.php/2008/06/20/programmatically-checking-memory-usage/</link>
		<comments>http://www.voyce.com/index.php/2008/06/20/programmatically-checking-memory-usage/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 07:09:10 +0000</pubDate>
		<dc:creator>ian</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[pdh]]></category>
		<category><![CDATA[perfmon]]></category>
		<category><![CDATA[private bytes]]></category>
		<category><![CDATA[win32]]></category>

		<guid isPermaLink="false">http://www.voyce.com/?p=31</guid>
		<description><![CDATA[One of the things that&#8217;s useful in a pre-release check is do a regression test on the memory usage of your unmanaged functions. This should help to ensure that the fantastic new data structure you introduced doesn&#8217;t cost too much in additional storage for the order-of-magnitude performance improvement you were boasting about.
Like most of my posts, this assumes that [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things that&#8217;s useful in a pre-release check is do a regression test on the memory usage of your unmanaged functions. This should help to ensure that the fantastic new data structure you introduced doesn&#8217;t cost <em>too</em> much in additional storage for the order-of-magnitude performance improvement you were boasting about.</p>
<p>Like most of my posts, this assumes that it&#8217;s not feasible to go through all your source code, and say, replace all instances of new with a version that tracks usage (the approach used by the debug CRT). As well as being logistically infeasible, this also tends to miss allocations that don&#8217;t go via new, for example, direct calls to HeapAlloc.</p>
<p><span id="more-31"></span>In the past, I&#8217;ve seen some code trying to use the <a href="http://msdn.microsoft.com/en-us/library/aa366781(VS.85).aspx">Win32 heap functions</a> to try and find out the amount of memory allocated by the process. It used GetProcessHeaps, HeapWalk and HeapSize to sum all the block sizes and get an overall memory in use figure, but in my experience it was extremely slow and unreliable.</p>
<p>What was really required was something that gave a figure similar to the &#8220;private bytes&#8221; counter in perfmon. If you didn&#8217;t know, this is the counter you need to be watching if you&#8217;re looking for memory leaks in a process. For goodness sake don&#8217;t use the &#8220;Mem Usage&#8221; column in Task Manager; this is in fact (almost) the working set size and it doesn&#8217;t correlate exactly with memory explicitly allocated by the process. It includes additional things including space occupied by the loaded DLLs. Also, the working set will shrink if the app is paged out, although it still has the memory allocated. To see an example of this in action, open Excel and a large spreadsheet, calc it, look in Task manager and you&#8217;ll see a large number (if not, you&#8217;re obviously not looking at a <em>real</em> spreadsheet). Then minimise the Excel window. You&#8217;ll see the mem usage value plummet as the working set is &#8221;trimmed&#8221; &#8211; probably by a call to <a href="http://msdn.microsoft.com/en-us/library/ms686234(VS.85).aspx">SetProcessWorkingSetSize</a>. The OS does this because it expects the app won&#8217;t be being used, so it makes sense to free up physical memory for use by other processes.</p>
<p>So essentially what I want to do is get the perfmon &#8220;private bytes&#8221; value programmatically as my app is running, and this can be achieved using the Performance Data Helper (PDH) library. It provides an API to access the performance counters in a similar way to the perfmon GUI.</p>
<p>It uses the concept of &#8220;queries&#8221;; you create a query, add a counter to it, then collect the query data as required (not forgetting to remove the counter and close the query when you&#8217;re done).</p>
<p>The first thing to do is open the query:</p>
<div style="font-family: Lucida Sans Typewriter; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;">    PDH_STATUS status = PdhOpenQuery(NULL, 0, &amp;hquery);</p>
<p style="margin: 0px;">    <span style="color: #0000ff;">if</span> (status != ERROR_SUCCESS)</p>
<p style="margin: 0px;">        <span style="color: #0000ff;">return</span> status;</p>
</div>
<p> </p>
<p>Then add the required counters (this code assumes you&#8217;re looking at a process on the current machine):</p>
<div style="font-family: Lucida Sans Typewriter; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;">    status = PdhAddCounter(hquery, _T(<span style="color: #800000;">&#8220;\\\\.\\Process(processname)\\Private Bytes&#8221;</span>), 0, &amp;hcounter);</p>
<p style="margin: 0px;">    <span style="color: #0000ff;">if</span> (status != ERROR_SUCCESS)</p>
<p style="margin: 0px;">    {</p>
<p style="margin: 0px;">        PdhCloseQuery(hquery);</p>
<p style="margin: 0px;">        <span style="color: #0000ff;">return</span> status;</p>
<p style="margin: 0px;">    }</p>
</div>
<p>At this point you&#8217;re ready to start polling for updates. At periodic intervals you can collect the query data and do with it what you will:</p>
<div style="font-family: Lucida Sans Typewriter; font-size: 10pt; color: black; background: white;">
<p style="margin: 0px;">    PDH_STATUS status = PdhCollectQueryData(hquery);</p>
<p style="margin: 0px;">    <span style="color: #0000ff;">if</span> (status == ERROR_SUCCESS)</p>
<p style="margin: 0px;">    {</p>
<p style="margin: 0px;">        PDH_RAW_COUNTER value;</p>
<p style="margin: 0px;">        DWORD dwType;</p>
<p style="margin: 0px;">        status = PdhGetRawCounterValue(hcounter, &amp;dwType, &amp;value);</p>
<p style="margin: 0px;">        <span style="color: #0000ff;">if</span> (status == ERROR_SUCCESS)</p>
<p style="margin: 0px;">        {</p>
<p style="margin: 0px;">            printf(<span style="color: #800000;">&#8220;%lld %lld %s\n&#8221;</span>, value.TimeStamp, value.FirstValue, sz);</p>
<p style="margin: 0px;">        }</p>
<p style="margin: 0px;">    }</p>
</div>
<p>Luckily for the Private Bytes counter we&#8217;ve got the simplest type of counter to &#8216;decode&#8217;; a raw counter value, essentially just a number. We don&#8217;t need to do any further manipulation on it to get the information we need, like having to divide by some frequency.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.voyce.com/index.php/2008/06/20/programmatically-checking-memory-usage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
