{"id":226,"date":"2009-09-17T23:12:29","date_gmt":"2009-09-18T06:12:29","guid":{"rendered":"http:\/\/gameangst.com\/?p=226"},"modified":"2009-09-17T23:14:16","modified_gmt":"2009-09-18T06:14:16","slug":"minimizing-code-bloat-template-overspecialization","status":"publish","type":"post","link":"http:\/\/gameangst.com\/?p=226","title":{"rendered":"Minimizing Code Bloat: Template Overspecialization"},"content":{"rendered":"<p><em>This article is a continuation of <a href=\"http:\/\/gameangst.com\/?p=46\">Minimizing Code Bloat for Faster Builds and Smaller Executables<\/a>.<\/em><\/p>\n<p>Even in an engine that isn&#8217;t rife with template meta-programming, template overspecialization can be a major source of code bloat. \u00a0Very few programmers, in my experience, really think about what the compiler and linker are doing with the code they write. \u00a0They design their code in C++, they write their code in C++, and of course they debug their code in C++. \u00a0Their view of the programs they write begins and ends at that level, which is certainly understandable since there is more than enough to worry about in C++ alone.<\/p>\n<p>Luckily for us, C++ is a language whose design is very representative of the underlying hardware. \u00a0Most of the time it is perfectly reasonable to assume that the code you write in C++ bears a close resemblance to the machine code generated by the compiler, at least in terms of quantity, structure, and performance. \u00a0The one place where that assumption breaks down, however, is in template code. \u00a0A very large amount of template code can result in little or no machine code and, conversely, a very small amount of template code can generate an enormous volume of machine code.<\/p>\n<p>Consider this snippet from a hypothetical class implementing a Win32 file stream:<\/p>\n<blockquote>\n<div class=\"dean_ch\" style=\"white-space: nowrap;\"><span class=\"kw2\">class<\/span> Stream<br \/>\n<span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"co1\">\/\/ &#8230;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw2\">template<\/span> &lt;typename T&gt;<br \/>\n&nbsp; &nbsp; <span class=\"kw4\">int<\/span> Read<span class=\"br0\">&#40;<\/span>T* obj, uint numObjects<span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">if<\/span> <span class=\"br0\">&#40;<\/span>!m_open<span class=\"br0\">&#41;<\/span> <span class=\"kw1\">return<\/span> <span class=\"nu0\">0<\/span>;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">if<\/span> <span class=\"br0\">&#40;<\/span>m_accessMode == ACCESS_WO<span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span> <span class=\"nu0\">0<\/span>;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; uint32 bytesRead = <span class=\"nu0\">0<\/span>;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; uint32 bytesRemaining = numObjects * <span class=\"kw3\">sizeof<\/span><span class=\"br0\">&#40;<\/span>T<span class=\"br0\">&#41;<\/span>;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw4\">char<\/span>* readBuffer = <span class=\"kw2\">reinterpret_cast<\/span><span class=\"br0\">&#40;<\/span>obj<span class=\"br0\">&#41;<\/span>;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">while<\/span> <span class=\"br0\">&#40;<\/span>bytesRemaining &gt; <span class=\"nu0\">0<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw4\">const<\/span> uint32 kMaxReadSize = <span class=\"nu0\">32<\/span>*<span class=\"nu0\">1024<\/span>*<span class=\"nu0\">1024<\/span>;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint32 bytesThisRequest = std::<span class=\"me2\">min<\/span><span class=\"br0\">&#40;<\/span>bytesRemaining, kMaxReadSize<span class=\"br0\">&#41;<\/span>;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint32 bytesReadThisRequest = <span class=\"nu0\">0<\/span>;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw4\">bool<\/span> success = ::<span class=\"me2\">ReadFile<\/span><span class=\"br0\">&#40;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; m_hFile,<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; readBuffer,<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bytesThisRequest,<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &amp;bytesReadThisRequest,<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">NULL<\/span><span class=\"br0\">&#41;<\/span> != <span class=\"kw2\">FALSE<\/span>;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bytesRemaining -= bytesReadThisRequest;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; readBuffer += bytesReadThisRequest;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bytesRead += bytesReadThisRequest;<\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">if<\/span> <span class=\"br0\">&#40;<\/span>bytesReadThisRequest == <span class=\"nu0\">0<\/span> ||<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; !success<span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw2\">break<\/span>;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"br0\">&#125;<\/span><\/p>\n<p>&nbsp; &nbsp; &nbsp; &nbsp; m_currentPosition += bytesRead;<br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span> bytesRead;<br \/>\n&nbsp; &nbsp; <span class=\"br0\">&#125;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"co1\">\/\/ &#8230;<\/span><br \/>\n<span class=\"br0\">&#125;<\/span>;<\/div>\n<\/blockquote>\n<p>The author of this function is attempting to ease the burden on the calling code by removing the need to specify the size of the object being read into. \u00a0This simplifies the interface and removes a potential source of errors. \u00a0Unfortunately, the way the function is written is horribly inefficient from a compilation and code size perspective. \u00a0The entire contents of the function, some thirty lines of code, have been specialized for every type that is read from a stream. \u00a0That&#8217;s a lot of code for the compiler to process, particularly when you consider only one line of code in the entire function is actually dependent on the template parameter.<\/p>\n<p>The story from the linker&#8217;s perspective is a little better. \u00a0For objects with the same size, say an <em>unsigned short<\/em> and a <em>short<\/em>, the compiler will generate identical code for both instantiations and any decent linker will happily merge those in the executable. \u00a0Nevertheless, code deduplication is a slow process so there is no reason to rely on it if you don&#8217;t have to. \u00a0Also, even with code deduplication most linkers won&#8217;t do partial function deduplication, so you&#8217;re still left with unique copies of this function in your executable for every size of serialized type in your application.<\/p>\n<p>The compiler-friendly way to write a function like the one above is, of course:<\/p>\n<blockquote>\n<div class=\"dean_ch\" style=\"white-space: nowrap;\"><span class=\"kw2\">class<\/span> Stream<br \/>\n<span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"co1\">\/\/ &#8230;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"kw4\">int<\/span> ReadRaw<span class=\"br0\">&#40;<\/span><span class=\"kw4\">void<\/span>* buffer, uint bufferSize<span class=\"br0\">&#41;<\/span>;<\/p>\n<p>&nbsp; &nbsp; <span class=\"kw2\">template<\/span> &lt;typename T&gt;<br \/>\n&nbsp; &nbsp; <span class=\"kw4\">int<\/span> Read<span class=\"br0\">&#40;<\/span>T* obj, uint numObjects<span class=\"br0\">&#41;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"br0\">&#123;<\/span><br \/>\n&nbsp; &nbsp; &nbsp; &nbsp; <span class=\"kw1\">return<\/span> ReadRaw<span class=\"br0\">&#40;<\/span>obj, numObjects * <span class=\"kw3\">sizeof<\/span><span class=\"br0\">&#40;<\/span>T<span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span>;<br \/>\n&nbsp; &nbsp; <span class=\"br0\">&#125;<\/span><br \/>\n&nbsp; &nbsp; <span class=\"co1\">\/\/ &#8230;<\/span><br \/>\n<span class=\"br0\">&#125;<\/span>;<\/div>\n<\/blockquote>\n<p>Now only one line of code is specialized for each serialized type and the meat of Stream::Read will only be seen once by both the compiler and the linker.<\/p>\n<p>The example above may seem contrived, and you&#8217;re probably thinking there is no way code like that could end up in your engine. \u00a0That&#8217;s certainly the way I felt before I started digging through our .obj files. \u00a0To find out if your optimism is justified, take a look at the output of DumpBin I mentioned <a href=\"http:\/\/gameangst.com\/?p=46\">earlier<\/a>.<\/p>\n<p>Run DumpBin on the libraries linked by your application and concatenate the results. \u00a0Next process the output to build a database of unique symbols recording both their individual sizes and their repeat counts. \u00a0If you want to see where your compiler is spending its time, look at the sum of <em>symbol_size * symbol_count<\/em> for all template symbols regardless of what types they are templatized on. \u00a0If you want to see where your linker is spending its time or what&#8217;s contributing to your executable size, ignore the symbol counts\u00a0and simply take the sum of the symbol sizes\u00a0for every symbol from the same template.<\/p>\n<p>Once you have a list of top offenders, the next thing to do is to inspect their code and determine whether their degree of specialization is really justified. \u00a0In most cases it is pretty easy to turn template functions into non-template functions or at least to factor out whatever portions of the code aren&#8217;t directly dependent on the template parameters.<\/p>\n<p>When you&#8217;re tired of looking at template code, it&#8217;s time to turn your attention to <a href=\"http:\/\/gameangst.com\/?p=212\">excessive inlining<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is a continuation of Minimizing Code Bloat for Faster Builds and Smaller Executables. Even in an engine that isn&#8217;t rife with template meta-programming, template overspecialization can be a major source of code bloat. \u00a0Very few programmers, in my experience, really think about what the compiler and linker are doing with the code they [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[8,14,7],"_links":{"self":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/226"}],"collection":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=226"}],"version-history":[{"count":26,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/226\/revisions"}],"predecessor-version":[{"id":239,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/226\/revisions\/239"}],"wp:attachment":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=226"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=226"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}