{"id":496,"date":"2012-03-02T14:53:59","date_gmt":"2012-03-02T19:53:59","guid":{"rendered":"http:\/\/gameangst.com\/?p=496"},"modified":"2012-03-02T14:53:59","modified_gmt":"2012-03-02T19:53:59","slug":"the-hole-that-dlmalloc-cant-fill","status":"publish","type":"post","link":"http:\/\/gameangst.com\/?p=496","title":{"rendered":"The Hole That dlmalloc Can&#8217;t Fill"},"content":{"rendered":"<h2><strong>dlmalloc<\/strong><\/h2>\n<p><a href=\"http:\/\/g.oswego.edu\/dl\/html\/malloc.html\">Dlmalloc<\/a> is a general-purpose memory allocator developed by <a href=\"http:\/\/g.oswego.edu\/\">Doug Lea<\/a> since 1987. \u00a0It is highly optimized, easily portable, and elegantly designed. \u00a0It functions as a drop-in replacement for malloc and the source has been generously released into the public domain. \u00a0The code is also an absolute joy to read, with every line conveying volumes of careful intention\u2014optimizations, bug fixes, and refinements thoughtfully integrated over the years. \u00a0With so many positive attributes to recommend it, it is no wonder dlmalloc has become the gold standard for memory allocators and a popular choice for game developers. \u00a0It is also really unfortunate, because <strong>dlmalloc is a terrible allocator for today\u2019s game consoles<\/strong>.<\/p>\n<p>To understand why this is, you must first have an overview of dlmalloc\u2019s interface and how it works. \u00a0Dlmalloc manages a pool of address space composed of one or more discrete regions called <em>segments<\/em>. \u00a0Dlmalloc\u2019s pool can optionally grow or shrink through user-provided callbacks to allocate new segments and free unused ones. \u00a0The \u201cgrow\u201d callback is triggered when dlmalloc can\u2019t satisfy an allocation request within its currently managed pool of memory and the \u201cshrink\u201d callback is triggered when dlmalloc detects that a segment is no longer occupied by any allocations. \u00a0The grow callback is typically implemented with OS-level functions that allocate memory such as <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/aa366887(v=vs.85).aspx\">VirtualAlloc<\/a>, <a href=\"http:\/\/www.kernel.org\/doc\/man-pages\/online\/pages\/man2\/mmap.2.html\">mmap<\/a>, or <a href=\"http:\/\/linux.die.net\/man\/2\/sbrk\">sbrk<\/a>.<\/p>\n<p>Dlmalloc is a <a href=\"http:\/\/www.amazon.com\/Art-Computer-Programming-Fundamental-Algorithms\/dp\/0201896834\" target=\"_blank\">boundary tag allocator<\/a> at its heart, so segments are treated like new nodes in an intrusive linked list and are written with small headers or footers. \u00a0The important thing to note about this is that dlmalloc requires that all memory it manages be readable and writable. \u00a0The grow callback can return virtual memory, but it must already be committed (mapped to physical memory) in order for dlmalloc to function.<\/p>\n<p>Another important characteristic of dlmalloc is that while it can return segments through the use of the shrink callback, it is not very aggressive in doing so. \u00a0Segments are stored in a linked list and the only way for dlmalloc to coalesce adjacent segments or find empty segments is to perform a linear traversal of that list. \u00a0For some users segments may be very large in size and very small in number so the linear traversal is cheap, but for most users segment management isn\u2019t worth the cost, and dlmalloc\u2019s default configuration is to only consider the most recently allocated segment when looking for segments to merge or release.<\/p>\n<p>This is significant because it means dlmalloc&#8217;s callbacks can&#8217;t be used to allocate and commit individual pages of virtual memory. \u00a0Trying to use dlmalloc with a segment equivalent to a virtual memory page would be prohibitively expensive or pointless depending on the configuration.<\/p>\n<p>So to summarize, <strong>dlmalloc has no support for virtual memory<\/strong>.<\/p>\n<p>This may seem like a crippling limitation for such a popular allocator, but in practice it affects very few systems. \u00a0In fact, if a system meets any of the following requirements, dlmalloc\u2019s blind spot when it comes to virtual memory is almost completely irrelevant.<\/p>\n<h4>Demand Paging<\/h4>\n<p>Almost all general-purpose computers these days have an operating system that features <a href=\"http:\/\/en.wikipedia.org\/wiki\/Demand_paging\">demand paging<\/a>. \u00a0With demand paging, the operating system keeps track of which pages of virtual memory are needed by the system and tries to keep only those pages resident in memory. \u00a0In other words, dlmalloc doesn\u2019t have to bother with the difficult work of committing and decommitting pages because the OS is already doing a pretty good job of it.<\/p>\n<h4>More Physical Memory than Virtual Address Space<\/h4>\n<p>In the consumer space, most applications are 32-bit and run on machines with more than 2 gigabytes of memory. \u00a0This includes the most recent big-budget PC games like <a href=\"http:\/\/www.ea.com\/crysis-2\/blog\/pc-specs\">Crysis<\/a>, <a href=\"http:\/\/bf3blog.com\/battlefield-3-system-requirements\/\">Battlefield<\/a>, <a href=\"http:\/\/www.modernwarfare3forum.com\/topic\/13452-modern-warfare-3-pc-system-requirements\/\">Call of Duty<\/a>, and <a href=\"http:\/\/community.batmanarkhamcity.com\/news\/latest\/item\/202-pc-recommended-specs#\">Batman<\/a> since they all have minimum system requirements of at least 2 gigabytes of main memory. \u00a0If dlmalloc can commit all of the available address space without penalty, it has no reason not to do it.<\/p>\n<h4>No Virtual Memory or Predictable Allocation Pattern<\/h4>\n<p>Embedded systems are likely to have no virtual memory or to have very predictable allocation patterns that don\u2019t benefit from virtual memory. \u00a0The fact that dlmalloc sacrifices virtual memory support and dynamic segment management in favor of speed and simplicity is actually a perk for embedded systems.<\/p>\n<h2>Which Brings Us to Modern Consoles<\/h2>\n<p>Modern game consoles are unusual in that they don\u2019t match any of the above criteria. \u00a0Both the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Xbox_360\">Xbox 360<\/a> and the <a href=\"http:\/\/en.wikipedia.org\/wiki\/PlayStation_3\">Playstation 3<\/a> have rich support for virtual memory, but the operating systems on the two consoles offer limited or no support for demand paging. \u00a0The Xbox 360 has 512 megabytes of physical memory to back its virtual address space and the Playstation 3 has 256 megabytes. \u00a0Both consoles provide games with significantly more virtual address space than physical memory, so there is a huge potential upside to managing them separately.<\/p>\n<h2>Why We Care About Virtual Memory<\/h2>\n<p>In the absence of demand paging, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Virtual_memory\">virtual memory<\/a> exists for one purpose only\u2014to reduce the cost of fragmentation. Fragmentation occurs in any allocator that manages variable-sized blocks of memory which can be freed in arbitrary order. \u00a0Consider the diagrams below. \u00a0The green blocks are allocations in a system with 10 megabytes of memory. \u00a0In the first diagram, most of the available memory has been allocated in variable-sized amounts. \u00a0In the second diagram, some of those allocations have been freed. \u00a0There are more than 4 megabytes of memory available in diagram 2, but even a 2 megabyte allocation will fail because there are no contiguous blocks of memory large enough to place it. \u00a0This phenomenon is known as <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fragmentation_(computing)#Memory_fragmentation\">fragmentation<\/a>.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-506\" title=\"Physical Memory Fragmentation\" src=\"http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation1.png\" alt=\"\" width=\"671\" height=\"178\" srcset=\"http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation1.png 671w, http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation1-150x39.png 150w, http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation1-400x106.png 400w\" sizes=\"(max-width: 671px) 100vw, 671px\" \/><\/p>\n<p>Fragmentation can be a huge problem with any general-purpose allocator. \u00a0The less predictable an application\u2019s allocation patterns and the larger the range of its allocation sizes, the more susceptible it is to allocation failure due to fragmentation. \u00a0Most console games are large, complex systems integrating code from multiple sources and trying to satisfy a wide range of allocation patterns. \u00a0At the same time, the game industry is very competitive, so console games are expected to take maximum advantage of the console hardware\u2019s capabilities. \u00a0You simply can\u2019t do that without tackling the problem of memory fragmentation.<\/p>\n<p>The most surefire way to avoid memory fragmentation is to only manage memory in fixed-size blocks. \u00a0If every allocation is the same size, any hole left by one allocation can be filled perfectly by another. \u00a0Unfortunately requiring that all memory requests be the same size is unreasonable, and trying to do so naively will only result in even more memory waste. \u00a0This is where virtual memory comes in.<\/p>\n<p>Virtual memory subdivides all physical memory into fixed-size blocks, known as <em>pages<\/em>, and provides a mechanism for contiguous allocations of virtual memory to span multiple nonadjacent pages. \u00a0By adding this level of indirection, an application can be given an address the behaves like a block of contiguous memory, but under the hood is made up of separate fixed-size allocations of physical memory. \u00a0This completely removes the problem of physical memory fragmentation, though it obviously creates the problem of virtual memory fragmentation. \u00a0If virtual memory is limited to the same size as physical memory, you\u2019re no better off than when you started. \u00a0However, if virtual memory is allowed to be much larger than physical memory, even though fragmentation still exists, it is much less likely to ever cause an allocation failure.<\/p>\n<p>The reason why virtual memory can be larger than physical memory is that unused pages of virtual memory don\u2019t have to be backed by physical memory. \u00a0As the name implies, it is <em>virtual<\/em> resource.<\/p>\n<p>The next set of diagrams shows the exact same sequence of memory allocations as the earlier diagrams, but this time using a slightly larger pool of virtual memory. \u00a0Allocations in virtual memory are still marked in green and free regions are still marked in white, but now the diagram is divided into 1 megabyte pages and any pages which are backed by physical memory are shaded with diagonal lines. \u00a0In order for an allocation to succeed, it must fit into a free region of virtual memory that is backed by physical pages, ie. white areas shaded with diagonal lines.<\/p>\n<p>In diagram 3 all 10 megabytes of physical memory are allocated, but there are still more than two megabytes of virtual memory available. \u00a0A new 2 megabyte allocation could fit in virtual memory, but it will still fail because it can&#8217;t be backed by physical memory. \u00a0In diagram 4 some allocations have been freed, and not only is there still 2 megabytes of contiguous virtual memory available, there are two free 1 megabyte physical memory pages available to back it. \u00a0Diagram 5 shows the successful addition of a new 2 megabyte allocation, the same allocation that would have failed without virtual memory in diagram 2. \u00a0If you count the number of physical pages that are allocated in diagram 5, you\u2019ll see that we haven\u2019t exceeded the system limit of 10 megabytes.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-507\" title=\"Virtual Memory Fragmentation\" src=\"http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation2.png\" alt=\"\" width=\"798\" height=\"267\" srcset=\"http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation2.png 798w, http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation2-150x50.png 150w, http:\/\/gameangst.com\/wp-content\/uploads\/2012\/03\/fragmentation2-400x133.png 400w\" sizes=\"(max-width: 798px) 100vw, 798px\" \/><\/p>\n<p>Notice that while we\u2019ve succeeded in allocating an additional 2 megabytes despite fragmentation, we still have more than 2 megabytes of virtual memory left and yet we can\u2019t service another 2 megabyte request. \u00a0Virtual memory itself is now too fragmented. \u00a0We could increase the amount of virtual memory without penalty to make room for another 2 megabyte allocation, but it doesn\u2019t help because although we have 2 megabytes of physical memory that isn\u2019t being used, we don\u2019t have 2 free physical memory pages to back the new allocation.<\/p>\n<p>For virtual memory to be an effective solution to fragmentation, you need a lot more virtual address space than physical memory and you need as small a page size as possible. \u00a0On the Xbox 360 the default page size is 64 kilobytes and the on the Playstation 3 it is 1 megabyte, but both consoles support a range of page sizes. \u00a0The best page size on consoles for compromising between performance and fragmentation is 64 KB, which begs the question, \u201cwhy does the Playstation 3 default to 1 megabyte?\u201d \u00a0The answer is simple: the Playstation 3 uses dlmalloc for its general-purpose allocator and since dlmalloc doesn\u2019t take advantage of virtual memory, there is no point in using a smaller page size!<\/p>\n<p>Hopefully by now I&#8217;ve convinced you of the value of virtual memory and why dlmalloc isn&#8217;t the right choice for console games. \u00a0In the future I&#8217;ll describe the general-purpose memory allocator I wrote for <a href=\"http:\/\/gameangst.com\/?tag=despair-engine\">Despair Engine<\/a> and how I used it to combat fragmentation. \u00a0As a bonus, I&#8217;ll explain the larger memory management framework that I built on top of the allocator to reduce thread contention and to ensure that the entirety of memory is available to a\u00a0heterogeneous\u00a0set of processors, despite their non-unified views of virtual memory.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>dlmalloc Dlmalloc is a general-purpose memory allocator developed by Doug Lea since 1987. \u00a0It is highly optimized, easily portable, and elegantly designed. \u00a0It functions as a drop-in replacement for malloc and the source has been generously released into the public domain. \u00a0The code is also an absolute joy to read, with every line conveying volumes [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[11,7],"_links":{"self":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/496"}],"collection":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=496"}],"version-history":[{"count":21,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/496\/revisions"}],"predecessor-version":[{"id":525,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/496\/revisions\/525"}],"wp:attachment":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=496"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}