{"id":320,"date":"2009-09-22T15:02:10","date_gmt":"2009-09-22T22:02:10","guid":{"rendered":"http:\/\/gameangst.com\/?p=320"},"modified":"2013-02-13T09:31:02","modified_gmt":"2013-02-13T14:31:02","slug":"symbol-sort-a-utility-for-measuring-c-code-bloat","status":"publish","type":"post","link":"http:\/\/gameangst.com\/?p=320","title":{"rendered":"Symbol Sort : A Utility for Measuring C++ Code Bloat"},"content":{"rendered":"<h3><span style=\"text-decoration: underline;\">OVERVIEW<\/span><\/h3>\n<p>SymbolSort is a utility for analyzing code bloat in C++ applications. \u00a0It works by\u00a0extracting the symbols from a dump generated by the Microsoft DumpBin utility or by\u00a0reading a PDB file. \u00a0It processes the symbols it extracts and generates lists sorted\u00a0by a number of different criteria. \u00a0You can read more about the motivation behind SymbolSort <a href=\"http:\/\/gameangst.com\/?p=46\">here<\/a>.<\/p>\n<p>The lists compiled by SymbolSort are:<\/p>\n<h4>Raw Symbols, sorted by size<\/h4>\n<p>This list is generated from the complete set of symbols. \u00a0No deduplication is\u00a0performed so this list is intended to highlight individual large symbols.<\/p>\n<h4>File contributions, sorted by size<\/h4>\n<p>This list is generated by calculating the total size of symbols that contribute to\u00a0a folder path. \u00a0If the input is a COMDAT dump, the source location for symbols is\u00a0the .obj or .lib file that DumpBin was run on (see usage for details). \u00a0It is\u00a0important to note that for COMDAT dumps individual symbols will appear multiple times coming from different .obj files. \u00a0If the input is a PDB file, the source location for symbols\u00a0is the actual source file in which the symbol is defined. \u00a0The source file for\u00a0data symbols is not always clearly defined within the PDB so in some cases it is\u00a0a best guess.<\/p>\n<h4>File contribution, sorted by path<\/h4>\n<p>This is a complete, hierarchical list of the size of symbols in all contributing\u00a0source files.<\/p>\n<h4>Symbol Sections \/ Types, sorted by total size and by total count<\/h4>\n<p>This shows a breakdown of symbols by section or type, depending on the kind of information that can be extracted from the input source.<\/p>\n<h4>Merged Duplicate Symbols, sorted by total size and by total count<\/h4>\n<p>This list is generated by merging symbols with identical names. \u00a0The symbols are\u00a0not guaranteed to be the same symbol. \u00a0In the case of PDB input there will be very\u00a0few duplicate symbols. \u00a0COMDAT input, however, should contain a large number of duplicate symbols. \u00a0This list is useful for measuring total compile and link time\u00a0for a particular symbol. \u00a0A relatively small symbol that appears in a very large\u00a0number of .obj files will have a large total size and appear near the top of this\u00a0list.<\/p>\n<h4>Merged Template Symbols, sorted by total size and by total count<\/h4>\n<p>This list is generated by stripping template parameters from symbols and then\u00a0merging duplicates. \u00a0Symbols <em>std::auto_ptr&lt;int&gt;<\/em> and <em>std::auto_ptr&lt;float&gt;<\/em> will\u00a0be transformed into <em>std::auto_ptr&lt;T&gt;<\/em> in this list and be counted together.<\/p>\n<h4>Merged Overloaded Symbols, sorted by total size and by total count<\/h4>\n<p>This list is generated by stripping template parameters and function parameters\u00a0from symbols and then merging duplicates. \u00a0Overloaded functions <em>sqrt(float)<\/em> and\u00a0<em>sqrt(double)<\/em> will be transformed into <em>sqrt(&#8230;)<\/em> in this list and be counted\u00a0together.<\/p>\n<h4>Symbol Tags, sorted by total size and by total count<\/h4>\n<p>This list represents a tag cloud generated from the symbol names. \u00a0The symbols\u00a0are tokenized and the total size and count is tallied for each token. \u00a0I&#8217;m not\u00a0sure what this list is good for, but I&#8217;m all about tag clouds so I couldn&#8217;t\u00a0resist including it.<\/p>\n<h3><span style=\"text-decoration: underline;\">USAGE<\/span><\/h3>\n<pre>SymbolSort [options]\r\n\r\nOptions:\r\n  -in[:type] filename\r\n      Specify an input file with optional type.  Exe and PDB files are\r\n      identified automatically by extension.  Otherwise type may be:\r\n          comdat - the format produced by DumpBin \/headers\r\n          sysv   - the format produced by nm --format=sysv\r\n          bsd    - the format produced by nm --format=bsd --print-size\r\n\r\n  -out filename\r\n      Write output to specified file instead of stdout\r\n\r\n  -count num_symbols\r\n      Limit the number of symbols displayed to num_symbols\r\n\r\n  -exclude substring\r\n      Exclude symbols that contain the specified substring\r\n\r\n  -diff:[type] filename\r\n      Use this file as a basis for generating a differences report.\r\n      See -in option for valid types.\r\n\r\n  -searchpath path\r\n      Specify the symbol search path when loading an exe\r\n\r\n  -path_replace regex_match regex_replace\r\n      Specify a regular expression search\/replace for symbol paths.\r\n      Multiple path_replace sequences can be specified for a single\r\n      run.  The match term is escaped but the replace term is not.\r\n      For example: -path_replace d:\\\\SDK_v1 c:\\SDK -path_replace\r\n      d:\\\\SDK_v2 c:\\SDK\r\n\r\n  -complete\r\n      Include a complete listing of all symbols sorted by address.\r\n\r\nOptions specific to Exe and PDB inputs:\r\n  -include_public_symbols\r\n      Include 'public symbols' from PDB inputs.  Many symbols in the\r\n      PDB are listed redundantly as 'public symbols.'  These symbols\r\n      provide a slightly different view of the PDB as they are named\r\n      more descriptively and usually include padding for alignment\r\n      in their sizes.\r\n\r\n  -keep_redundant_symbols\r\n      Normally symbols are processed to remove redundancies.  Partially\r\n      overlapped symbols are adjusted so that their sizes aren't over\r\n      reported and completely overlapped symbols are discarded\r\n      completely.  This option preserves all symbols and their reported\r\n      sizes\r\n\r\n  -include_sections_as_symbols\r\n      Attempt to extract entire sections and treat them as individual\r\n      symbols.  This can be useful when mapping sections of an\r\n      executable that don't otherwise contain symbols (such as .pdata).\r\n\r\n  -include_unmapped_addresses\r\n      Insert fake symbols representing any unmapped addresses in the\r\n      PDB.  This option can highlight sections of the executable that\r\n      aren't directly attributable to symbols.  In the complete view\r\n      this will also highlight space lost due to alignment padding.<\/pre>\n<p>SymbolSort supports three types of input files:<\/p>\n<h4>COMDAT dump<\/h4>\n<p>A COMDAT dump is generated using the DumpBin utility with the \/headers option. \u00a0DumpBin is included with the Microsoft compiler toolchain. SymbolSort can accept the dump from a single .lib or .obj file, but the best way to use it is to create a complete dump of all the .obj files from an entire application.\u00a0\u00a0The Windows command line utility FOR can be used for this:<\/p>\n<pre>for \/R \"c:\\obj_file_location\" %n in (*.obj) do \"C:\\Program Files (x86)\\Microsoft Visual Studio 10.0\\VC\\bin\\DumpBin.exe\" \/headers \"%n\" &gt;&gt; c:\\comdat_dump.txt<\/pre>\n<p>This will generate a concatenated dump of all the headers in all the .obj files in <em>c:\\obj_file_location<\/em>. \u00a0Beware, for large applications this could produce a multi-gigabyte file.<\/p>\n<h4>PDB or EXE<\/h4>\n<p>SymbolSort supports reading debug symbol information from .exe files and .pdb files. \u00a0The .exe file will only be used to find the location of its matching .pdb file, and then the symbols will be extracted from the PDB. \u00a0SymbolSort uses msdia100.dll to extract data from the PDB file. \u00a0Msdia100.dll is included with the Microsoft compiler toolchain. \u00a0In order to use it you will probably have to register the dll.<\/p>\n<pre>regsvr32 \"C:\\Program Files\\Common Files\\Microsoft Shared\\VC\\msdia100.dll\"<\/pre>\n<p>It is important that you register the 64-bit version of msdia100.dll on 64-bit Windows and the 32-bit version on 32-bit Windows. \u00a0If you don&#8217;t find msdia100.dll in the path listed above, try looking for it in the Visual Studio install directory under &#8220;\\Microsoft Visual Studio 10.0\\DIA SDK\\bin\\&#8221;<\/p>\n<h4>NM dump<\/h4>\n<p>Similar to the COMDAT dump, SymbolSort can accept symbol dumps from the unix utility nm. \u00a0The symbols can be extracted from .obj files or entire .elfs. \u00a0SymbolSort supports bsd and sysv format dumps. \u00a0Sysv is preferred because it contains more information. \u00a0The recommended nm command lines are:<\/p>\n<pre>nm --format=sysv --demangle --line-numbers input_file.elf\r\nnm --format=bsd --demangle --line-numbers --print-size input_file.elf<\/pre>\n<h3><span style=\"text-decoration: underline;\">DOWNLOAD<\/span><\/h3>\n<p><span style=\"text-decoration: underline;\"><a href=\"http:\/\/gameangst.com\/wp-content\/uploads\/2013\/02\/SymbolSort-1.2.zip\">SymbolSort-1.2.zip<\/a><\/span><\/p>\n<h3><span style=\"text-decoration: underline;\">BUILDING<\/span><\/h3>\n<p>The source for SymbolSort is distributed as a single file, SymbolSort.cs. \u00a0It can be built as a simple C# command line utility. \u00a0In order to get the msdia100 interop to work you must add msdia100.dll as a reference to the C# project. \u00a0That is done either by dragging and dropping the dll onto the references folder in the C# project or by right clicking the references folder, selecting &#8220;Add Reference&#8221; and then browsing for the msdia100 dll.<\/p>\n<h3><span style=\"text-decoration: underline;\">REVISION HISTORY<\/span><\/h3>\n<pre>1.2    + Upgraded to Visual Studio 2010 \/ msdia100.dll\r\n       + Added -path_replace option to convert paths stored in PDBs.\r\n       + Added -complete option to dump a full list of all symbols sorted by \r\n         address.\r\n       + Added several options for controlling what symbols are included in PDB\r\n         dumps since PDBs often list the same address redundantly under\r\n         different labels.\r\n1.1    + Added support for computing differences between multiple input sources\r\n       + Added support for nm output for PS3 \/ unix platforms.\r\n       + Changed command line parameters.  See usage for details.\r\n       + Added section \/ type information to output.\r\n1.0    + First release!<\/pre>\n<h3><span style=\"text-decoration: underline;\">FUTURE WORK<\/span> (to be done by someone else!)<\/h3>\n<ul>\n<li>Add a GUI frontend to allow interactive filtering and sorting.<\/li>\n<li>Read both the PDB and the COMDAT dump simultaneously and cross-reference the two. \u00a0This would enable new kinds of analysis and richer dumps.<\/li>\n<li>Produce additional merged symbol reports by merging all symbols from the same class or namespace or that match based on some more clever fuzzy comparison.<\/li>\n<li>Improve relative -&gt; absolute path conversion for nm inputs<\/li>\n<li>Figure out how to extract string literal information from PDB.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>OVERVIEW SymbolSort is a utility for analyzing code bloat in C++ applications. \u00a0It works by\u00a0extracting the symbols from a dump generated by the Microsoft DumpBin utility or by\u00a0reading a PDB file. \u00a0It processes the symbols it extracts and generates lists sorted\u00a0by a number of different criteria. \u00a0You can read more about the motivation behind SymbolSort [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[8,14,7],"_links":{"self":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/320"}],"collection":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=320"}],"version-history":[{"count":10,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/320\/revisions"}],"predecessor-version":[{"id":322,"href":"http:\/\/gameangst.com\/index.php?rest_route=\/wp\/v2\/posts\/320\/revisions\/322"}],"wp:attachment":[{"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=320"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/gameangst.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}