This article is a continuation of Minimizing Code Bloat for Faster Builds and Smaller Executables.
When a function is inlined its code is, from the perspective of the compiler, replicated to every place where the function is called. For very simple functions the inlined code can be smaller than the out-of-line function call, and the result is a net reduction in code size. For most functions, however, the function code is larger than the function call and inlining will increase executable size.
Code bloat from excessive inlining usually isn’t a problem if you’re smart enough to trust your compiler. Compilers apply a set of heuristics to decide whether the performance benefits of inlining a function justify the costs. If the cost isn’t justified, the compiler won’t inline. Additionally, most compilers err on the side of not inlining. Unless you’ve overridden the default compiler behavior with a __forceinline directive, you can safely assume that any code bloat that comes from inlining is justified by improved performance.
Still, you can never be too careful, so it’s worth checking to see how much inlining is costing you.
One technique for measuring the cost of inlining is compiling your engine with inlining enabled and again with inlining disabled, and comparing the size of the resulting executables. If the executable without inlining is a lot smaller, you’re probably inlining too much. Even if your executable isn’t significantly smaller with inlining disabled, you may still be suffering from excessive inlining. In some cases inlining a function can result in reduced code size because the inlined code opens up optimization opportunities that were otherwise invisible to the compiler. Code size reductions from well inlined functions can offset code size increases from poorly inlined functions, making quick executable size comparisons unreliable.
A better way to identify excessive inlining is to use DumpBin on the .obj files of your application. Even with inlining enabled, inline functions will be instantiated into every object file in which they are used. All those duplicate instantiations are merged or removed at link time, but if you run DumpBin on the .lib or .obj files you can measure how many object files the inlined functions are being replicated into. This is similar to the technique I proposed to measure the compile-time cost of template overspecialization. Sort the database of symbols extracted from DumpBin by symbol_size * symbol_count, where symbol_count is the number of object files in which a particular symbol appears, and skim through the top 100. This isn’t a perfect measure of code bloat from function inlining, but it can usually catch the worst offenders.
You can improve the report I described above by discarding symbols with sizes below some threshold. These will be functions whose inlined code is smaller than an out-of-line function call and are consequently not a factor in code bloat. Likewise, if you haven’t been abusing the __forceinline directive, you can discard symbols with sizes above some threshold. Very large functions will never automatically be inlined by the compiler, so they don’t increase the size of the executable. They do, however, contribute unfavorably to compile and link times, but I’ll talk more about that later.
One last thing–be sure to review the source for any function you’re considering deinlining. Consider carefully how the compiler sees the function in isolation and in the context in which it is inlined. Are the function parameter values typically known at compile time at the call site? Is the ratio of the number of times the function is executed in a frame to the number of times the function is called in the source very large? If the answer to either of these questions is yes, the function may be a good candidate for inlining regardless of what the symbol analysis says.
Next I’ll be looking at static allocations.