Minimizing Code Bloat: Redundant Template Instantiation

This article is a continuation of Minimizing Code Bloat for Faster Builds and Smaller Executables.

The last item on my list of code bloat causes is redundant template instantiation.  Template functions have a lot in common with inline functions.  Like inline functions, template functions are instantiated into the object file of every translation unit in which they are used.  Also like inline functions, template functions generate weak symbols which are merged silently by the linker.  Redundant template instantiations won’t increase your executable size, but they can contribute significantly to your build times.

Consider a commonly used class like std::string.  The std::string class is actually a template class defined as:

typedef basic_string<char, char_traits<char>, allocator<char> > string;

In an engine that uses std::strings, some members like std::string::assign may be referenced in nearly every translation unit.  Even engines that eschew the use of std::string may unknowingly be instantiating a lot of std::string code because it is so pervasive in the standard library.  Looking at a symbol dump from one version of Despair Engine, I see over 1600 instances of dozens of functions from std::string.  That’s right, in every full build we compile most of std::string 1600 times, optimize the code 1600 times, write it into .obj files 1600 times, copy it into .lib files 1600 times, read it into the linker 1600 times, and then throw 1599 identical copies away!  All told I think it is less than 10 megabytes of code and accounts for only a couple percent of our build times, but still, it’s a lot to pay for a wrapper around a char*.

Don’t think that these sorts of problems are confined to the standard library.  Any template class that is widely used can produce a lot of code bloat from redundant instantiations.  The question is, what do you do about it?

One option is to convert template code into non template code.  If you never instantiate basic_string with anything other than a single set of parameters, is it really worth being a template class?  The same question could be asked of a templatized math library.  Does matrix<T> need to be a template class when all you ever use is matrix<float>?

Although removing template code is always an option, it isn’t always a good one.  Template code may be expensive, but it is also really handy.  If you have a templatized math library, chances are that while 99% of the instantiations are matrix<float>, somewhere in your engine or tools you have an instantiation of matrix<double>.  Similarly, amid all those instantiations of basic_string<char>, maybe there’s a basic_string<wchar_t> or two hanging out.

Luckily there is a way to prevent redundant template instantiations without changing the code.  That is through the use of explicit template instantiation declarations.  Explicit instantiation declarations aren’t technically part of the C++ standard, but they’re supported by MSVC and gcc and they’re included in the C++09 draft standard, so they’re a safe bet to use.  What explicit instantiation declarations do is allow you to prevent the compiler from instantiating a particular template in the current translation unit.  They look just like explicit instantiation definitions, but they’re preceded by the extern keyword.  For example, the explicit instantiation declaration of basic_string<char> looks like this:

extern template class basic_string<char>;

You can add this declaration to a root header in your application and rebuild, and, unless you’ve missed a translation unit, you’ll encounter linker errors for missing basic_string<char> symbols.  Once you’ve removed all the basic_string<char> instantiations from your engine, the next thing to do is to add one back so the linker has all the code it needs to put together a complete executable.  That’s where explicit instantiation definitions come in.

Explicit instantiation definitions are the counterpart to explicit instantiation declarations.  They tell the compiler to fully instantiate a particular template even if it is otherwise unused in the current translation unit.  Here’s an example of how you can use the two concepts together to prevent redundant instantiations of basic_string.

// in string_instantiation.h
#ifdef DS_EXTERN_TEMPLATE_INSTANTIATION
namespace std
{
    extern template class basic_string<char>;
}
#endif

// in string_instantiation.cpp
#ifdef DS_EXTERN_TEMPLATE_INSTANTIATION
#undef DS_EXTERN_TEMPLATE_INSTANTIATION

#include "string_instantiation.h"

namespace std
{
    template class basic_string<char>
}
#endif // DS_EXTERN_TEMPLATE_INSTANTIATION

The DS_EXTERN_TEMPLATE_INSTANTIATION #define in the above code serves two purposes.  First it allows us to disable explicit template instantiation for compilers that don’t support it, and second it allows us to work around a common bug in compilers that do support it.  Although the standard states that an explicit instantiation definition may follow an explicit instantiation declaration within a translation unit, many compilers don’t like it and won’t allow an explicit instantiation definition to override a preceding explicit instantiation declaration.

There is one dark fact about explicit template instantiation that you should know before applying it in your own codebase.  Although explicit template instantiation effectively turns template code into non template code from the perspective of code bloat, it doesn’t really help much with classes like std::string.  The syntax for out-of-line template class function definitions is so cumbersome that many programmers prefer to implement all their template class members entirely within the class declaration.  That’s what has happened with Microsoft’s STL implementation.  Unfortunately what that does is implicitly make every member function in a template class an inline function.  Of course most of the member functions are so large that they’ll never be inlined, but it doesn’t matter from the perspective of the compiler.

The explicit template declaration of basic_string<char> prevents the compiler from instantiating its members as template functions, but it doesn’t prevent it from instantiating them as inline functions!  The basic_string class goes from a classic example of code bloat due to redundant template instantiation to a classic example of code bloat due to incorrect inlining!  You can’t win!

In your own code, at least, you can fix the incorrect inlining problem right after you fix the redundant instantiation problem.  Just move the template class member functions out of line and they’ll disappear from your symbol dumps.

And that’s everything I know about code bloat in a voluminous 6 part series.  Perhaps what I should be writing about is blog bloat.  All that’s left is to post some code.

5 Comments

  1. […] If all this sounds like a major pain in the neck, I suggest you give up now and implement unity builds.  They circumvent all these redundant instantiation issues so they’re fantastic at optimizing build times in codebases with poor structure.  On the other hand, if you’re still enjoying this trek through compile land, join me next time for a discussion of redundant template instantiation. […]

    1. Billy Zelsnack says:

      haha. Unity builds. The solution to optimizing unity builds is to breaks things back into more files and you’re back to where you started.

  2. Paxton Mason says:

    I really enjoyed this series of posts. Thanks, Adrian!

  3. peirz says:

    Interesting post, thanks! I have a stupid question though: rather than playing with ifdef to work around the bug of “declaration followed by definition”, why not simply avoid including the .h file?

    1. Adrian Stone says:

      It isn’t clear in my example, but usually I’m dealing with classes I own the source for so I put the explicit instantiation declaration in the same header in which the class is declared. That ensures that no one can include the class without seeing the instantiation declaration. However, it also necessitates the ifdef trick in the file containing the instantiation definition.

Reply in Thread