This article is a continuation of Minimizing Code Bloat for Faster Builds and Smaller Executables.
Earlier I talked about excessive inlining as a common cause of code bloat. Excessive inlining is when code that shouldn’t be inlined is. Today I’m going to look at a related problem, code that is declared as inlined but never will be.
Any function whose address is taken for storage in a function pointer or a virtual function table can never fully be inlined. Even for a function that can legally be inlined, the compiler has ultimate authority over whether or not it will be. If a function is large the compiler will generally forgo inlining it because the performance benefits of removing the function call overhead don’t justify the increase in executable size. That’s all well and good, but it still poses a problem.
C++ compilers work by dividing programs into something called translation units. A translation unit is a single cpp file and all the headers it includes. The compiler turns these translation units into object files and ultimately the linker combines the contents of all the object files into an executable. The important thing to note is that every function visible to the compiler in a translation unit must be compiled as part of that unit. In the case of a function that is declared inline, the compiler will generate a compiled instantiation of the function in every cpp file that uses it. The various redundant instantiations are marked as weak symbols so the linker knows that multiple copies are expected and that it can pick any one of them for use in the final executable.
Knowing this, you can see how wasteful functions that are incorrectly marked as inline can be. Large functions are less likely to be inlined, are more likely to take a long time to compile, and consequently can be murder on your build times.
By this point you should have a pretty good idea of how to detect incorrect inlining. The same technique I proposed for measuring excessive inlining and the compile-time cost of template overspecialization works equally well for detecting incorrect inlining. Dump the symbols from all your object files with DumpBin /headers and look for large functions with lots of redundant instantiations.
One of the most interesting (and insidious!) sources of incorrect inlining is code that you don’t write at all. Consider this apparently harmless class definition:
This class is potentially a major contributor to code bloat from incorrect inlining. If you’re scratching your head and wondering how a class with no functions can contribute to code bloat, take a look at the next listing to see this class the way the compiler sees it.
// Not legal C++. This is for illustration purposes only.
In C++ every class has a constructor, a destructor, a copy constructor, and an assignment operator. If you don’t provide these functions the compiler will, and it must do so inline. Of course good compilers will only instantiate inline functions when they are actually used, so most classes never have their default copy constructor or assignment operators generated, but unless you’re doing something dangerously clever in your codebase, every class has a destructor instantiated at least once.
If you notice default (also sometimes called automatic) methods showing up in the top slots of your redundant function hotlist, the fix is counterintuitive but also really simple. Just provide an empty, non-inline implementation in place of the default one. Providing empty, non-inline implementations of default methods for nontrivial classes can sometimes help your build times in other ways. If you use a lot of smart pointers in your code, the default destructor of a class holding smart pointers may be the only thing preventing you from replacing the #includes of the headers for the classes stored in smart pointers with forward declarations of those classes. In other words, inline functions create header dependencies, and inline default methods are no exception.
Speaking of smart pointers, as great as they are, code bloat is one of their unfortunate costs. As I mentioned above, good compilers only instantiate inline code when it is actually used. Reference counting smart pointers allow for distributed ownership of objects, which can be a powerful and convenient design freedom. However, if responsibility for destroying objects is widely distributed through your code, so is responsibility for instantiating object destructors. An engine with rigorously defined ownership hierarchies will have few inline destructor instantiations whereas an engine with sloppily defined ownership hierarchies will have many. So if you use smart pointers, use them responsibly! Pass by reference and reference count only when necessary. Unnecessary reference counting isn’t just inefficient at run time, it is inefficient at compile time too!
If all this sounds like a major pain in the neck, I suggest you give up now and implement unity builds. They circumvent all these redundant instantiation issues so they’re fantastic at optimizing build times in codebases with poor structure. On the other hand, if you’re still enjoying this trek through compile land, join me next time for a discussion of redundant template instantiation.