Building C++ projects faster

One of the common complaints about C++, especially when dealing with very large projects, is that it can take forever to compile. A single small change to one single file might well mean an hour long project build, when we are talking about tens of thousands of source files and millions of lines of code, in a huge project.

However, this doesn't necessarily have to be so!

One of the major problems with C++ is that it uses the admittedly rather antiquated system of header files in order to declare public types and functions (when it would be possible for this to be completely automated without the need to use header files, although this would probably require some backwards compatibility breaking changes to how compilers approach the compilation of C++ code.)

Every time a header or a source code file #includes another, it creates a dependency: If that other file is changed, or any file it's itself including down the chain changes, it requires for this file to be recompiled. However, there are many techniques that can be used to drastically reduce the amount of dependencies between source files. Many programmers are too lazy to use them, or even unaware that they exist.

Here are some of these techniques and tips, which may help drastically reduce the amount of source file dependencies in your project, and drastically increase building speed.

Understanding dependency propagation

While it's easy to understand why an #include dependency may "propagate" itself throughout a project, sometimes through large chains of iterative inclusions (in other words, header A is included in header B, which is itself included in header C, which is included in header D and so on and so forth, causing a change in header A to require recompilation of everything that's including any of those headers), it may oftentimes be hard to really realize how bad this can be.

Careless use of #include lines in header files often means that a change in one such file may cause the recompilation of source files that have absolutely nothing to do with that header file, just because of a long chain of #include dependencies.

The more you can break these long chains, and the more you reduce dependencies between header files, the better. If a particular header file is #included in tons of source files and other header files by necessity, because there is no way around it, then minimizing the amount of #include lines in this popular header file is of paramount importance (because whatever this popular header file #includes, will also indirectly become likewise "popular", even if that's not the intent nor the need.)

Avoid "master" header files

It's really common to see in many C++ (and C) projects a "master" header file which does essentially nothing more than #include every single other header file in the project. This may feel like a really convenient way of doing things. After all, rather than have a dozen #include lines in every source file, and going through the trouble of always finding out which headers you need to #include for this particular source file, you can just write one single #include line, and everything is available! Short, clean, practical.

Except that if you do this, now every single source file will have a dependency on every single header file in the project. Which means that if you need to make even the tiniest change to a single header file, the entire project will be recompiled.

While it may feel like extra work, every source file really should only have an #include for the header files it needs and nothing else. This will reduce dependencies and make the project compile faster, as changes in one single header file will not propagate to the the entire project.

Prefer forward declarations over #includes

Especially in header files, the less #include lines you have, the better. Unnecessary #include lines should be avoided almost religiously, if your intent is to reduce the number of dependencies.

C++ offers many tools to do this. Quite many things can be forward-declared, instead of defined, in many contexts, and this may avoid the need for an #include line.

It might feel good design to always #include whatever a class or function needs as parameter. This way the calling code doesn't need to do it itself, and it's directly available to it. Many programmers do indeed have a design philosophy like "if module X takes objects of module Y as parameter, it should make the latter fully available to the calling code."

In other words, code like this:

// Bad!
#include "Agent.hh"

namespace Utils
{
    void doSomethingWithAgent(const Core::Agent&, Core::Action);
}

In order to avoid an extra dependency, it's often preferrable to forward declare those types instead of just carelessly #including the header file.

// Good!
namespace Core { class Agent; enum class Action: unsigned; }

namespace Utils
{
    void doSomethingWithAgent(const Core::Agent&, Core::Action);
}

Admittedly this can feel like extra work, and in some cases there is the small danger that a change in the actual type (such as the type of a strongly typed enum) may break things. However, it may still be quite worth it, especially if doing this will break a huge dependency chain. (For instance, if this particular header file is, directly or indirectly, included in hundreds of other files, and that "Agent.hh" file is included in only a few files, breaking the dependency in this header may reduce the number of dependencies on "Agent.hh" drastically.)

Note that forward declared types work even if the function takes that type by value. So this is ok too:

namespace Core { class Agent; enum class Action: unsigned; }

namespace Utils
{
    void doSomethingWithAgent(Core::Agent, Core::Action);
}

Consider moving things around

It oftentimes happens that one header file, especially a "popular" one (ie. a header that's needed in a ton of other files) only needs one particular thing from another header file. If that other header file is #included in the "popular" header, that other header will indirectly also itself become "popular", often needlessly. Breaking this dependency may be important, especially if that other header file is changed often.

Sometimes the dependency can be broken with a forward declaration, as described in the previous section. However, sometimes this can't be done. This is often the case when this header defines a class or struct that has another class/struct from that other header as a member variable. Like:

#include "Agent.hh"

namespace Core
{
    class SomeClass
    {
        Core::Agent mAgent;
        ...
    };
}

In this case there is often little that can be done. In order to use one class as a member variable of another, we need the full definition of that class. A declaration does not suffice. Replicating class definitions is usually a rather bad idea, so it's not a viable strategy.

However, oftentimes what is needed from that other header file is not the full class, but eg. something from within it. Or something else that the header is defining. For example, perhaps that "Agent.hh" has code like this:

// Agent.hh
namespace Core
{
    class Agent
    {
     public:
         class Listener { ... };
         enum class Action { ... };
         ...
    };
}

And what we actually need in our other header file are those inner types, like:

#include "Agent.hh"

namespace Core
{
    class SomeClass: public Core::Agent::Listener
    {
        Core::Agent::Action mAgentAction;
        ...
    };
}

Inner classes and types are a great feature of C++. They increase modularity by keeping types within their relevant scopes, reducing name pollution, and keeping things in a logical hierarchical structure.

Unfortunately the problem with inner types is that they can't be forward declared. Currently the language simply does not support this. It's just impossible. Thus if you need to use an inner type, you need the full definition of the class, which in practice means you need to #include the header where that class is defined. (Again: Replicating class definitions is not a viable strategy in practice.)

However, if this header file is a very "popular" one, and this #include is causing a ton of dependencies on that "Agent.hh" file, and we would like to get rid of that dependency (eg. because "Agent.hh" is being modified quite a lot), it may be worth considering how much it would break the design and modularity of the program if those types were moved out of the class, to a separate header file (which is not modified so often).

Perhaps you could create something like a "Interfaces.hh" file where you collect all these "Listener" interface classes, and renamed Core::Agent::Listener to something like Core::AgentListener, and perhaps you could move that inner Action enumeration out of the class, so that it can be forward declared. This way you can remove the #include "Agent.hh" line from this header file.

How much of an impact this kind of change may have on the design of the program is ultimately a question of perspective. It is, however, something to consider, if breaking the #include dependency is important.

Use tools to discover dependency chains

Even in moderately sized projects, not to talk about enormous ones, it can be very hard to find out which files depend on which other files, and especially what the dependency chain is. Many compilers are able to list dependencies, but they only make "raw" lists of the kind "file X depends on files A, B, C and D". This is kind of the reverse of what we want. We are not so much interested on which files X depends on, but rather, which files depend on X (ie. which files will be recompiled if we modify X.)

Also, it would be useful to not just get a "raw" list of files that depend on X, but to see which files depend on it directly (by having a direct #include "X"), and which files depend on it indirectly (ie. files that do not have a direct #include "X", but have other #includes that themselves #include "X", perhaps through a longer chain of indirections). This would be best visualized in a tree-like structure. This would help discovering problematic dependencies, and where to best break them.

Unfortunately, there seem to exist no such tools out there, at least that I have found (and I have searched quite a lot). The only solution seems to be to implement one yourself.

I implemented such a tool for my work, and it has helped me quite a lot in reducing dependencies, and has become a really essential tool for optimizing project building times.

As an example, here is the output of the program from an actual project. I ran it giving it the "CoreEngine/LevelGeometry.hh" file name as parameter. It lists source files that depend on that particular file, directly or indirectly, in a tree-like structure:

CoreEngine/LevelGeometry.hh
`-> CoreEngine/LevelDefinitions.hh
|   `-> CoreEngine/Field.cc
|   `-> CoreEngine/GameData.cc
`-> CoreEngine/Types.hh
|   `-> ControllerSceneBase.cc
|   `-> CoreEngine/Portal.hh
|   |   `-> CoreEngine/Field.hh
|   |   |   `-> GameSceneBase.cc
|   |   `-> CoreEngine/Portal.cc
|   |   `-> CoreEngine/PortalsInitElements.cc
|   |   `-> DebugMenu.cc
|   `-> CoreEngine/MultiplayerLevelData.hh
|   |   `-> Utils/LevelThumbnailSprites.cc
|   `-> CoreEngine/PathFinding.hh
|   |   `-> CoreEngine/HomingProjectile.cc
|   |   `-> CoreEngine/PathFinding.cc
|   |   `-> CoreEngine/Enemy.cc
|   `-> CoreEngine/PowerupNode.hh
|   |   `-> CoreEngine/PowerupNode.cc
|   `-> CoreEngine/PowerupsManager.hh
|   |   `-> CoreEngine/PowerupsManager.cc
|   `-> CoreEngine/Projectile.hh
|   |   `-> CoreEngine/Projectile.cc
|   `-> CoreEngine/Agent.hh
|   |   `-> CoreEngine/HomingProjectile.hh
|   |   `-> CoreEngine/Enemy.hh
|   |   `-> CoreEngine/Agent.cc
|   |   `-> MultiplayerResultsNode.cc
|   |   `-> MultiplayerSetupScene.cc
|   |   `-> testing.cc
|   `-> ICom/ServerLogic.cc
|   `-> Utils/Animations.hh
|   |   `-> MultiplayerGameGUI.cc
|   |   `-> SinglePlayerGameGUI.cc
|   |   `-> Utils/Animations_gates.cc
|   |   `-> Utils/Animations_misc.cc
|   |   `-> Utils/Animations_objects.cc
|   |   `-> Utils/Animations_powerups.cc
|   |   `-> Utils/Animations_enemiess.cc
|   |   `-> Utils/Animations_agent.cc
|   `-> Utils/Utils.hh
|       `-> CoreEngine/LevelData.cc
|       `-> GameViewController.cc
|       `-> HighScoresMenu.cc
|       `-> ICom/RemoteModeScene.cc
|       `-> DeviceControlNode.cc
|       `-> InstructionsMenu.cc
|       `-> MainMenuScene.cc
|       `-> MenuSceneBase.cc
|       `-> MultiplayerGameScene.cc
|       `-> MultiplayerLevelSelectScene.cc
|       `-> PauseMenu.cc
|       `-> ResultsNode.cc
|       `-> SceneBase.cc
|       `-> SettingsMenu.cc
|       `-> SinglePlayerGameOverNode.cc
|       `-> SinglePlayerGameScene.cc
|       `-> SinglePlayerLevelResultsNode.cc
|       `-> Utils/Animations_common.cc
|       `-> Utils/PulseEffectSpriteNode.cc
|       `-> Utils/Shaders.cc
|       `-> Utils/Sprites.cc
|       `-> Utils/Utils.cc
`-> CoreEngine/WallSprites.cc
Total dependencies: 64

As you can see, "LevelGeometry.hh" causes a huge amount of dependencies. Whenever it's modified, it causes about 60 source files to be recompiled, the vast majority of which have absolutely nothing to do with it. And the problem in this particular case is that "LevelGeometry.hh" is modified quite often during the development of the project.

But this tree-like structure of dependencies is extremely informative. As you can see, only three files are directly #including "LevelGeometry.hh", and the vast majority of other dependencies are indirect. What this reveals in this case is that there is one particular file that is causing the enormous amount of dependencies: "CoreEngine/Types.hh". Over 90% of the dependencies on "LevelGeometry.hh" are caused by it.

Thus breaking the dependency in "CoreEngine/Types.hh" would reduce the dependency chain quite drastically. I was able to do exactly that, and the end result was:

CoreEngine/LevelGeometry.hh
`-> CoreEngine/LevelDefinitions.hh
|   `-> CoreEngine/Field.cc
|   `-> CoreEngine/GameData.cc
`-> CoreEngine/WallSprites.cc
Total dependencies: 4

Now the project builds quite significantly faster after modifying the LevelGeometry.hh file, because only three source files need to be recompiled instead of 60.

Regularly "clean up" source files from extraneous #includes

It's really common that during the development of a large project #include lines are added to source files that much later, after a lot of further development of those files, get obsolete and unnecessary. It is a good idea to "clean up" source files from unneeded #include files from time to time, thus reducing needless dependencies.

Unfortunately this can be quite a laborious task. This is something that could benefit from automation but, also quite unfortunately, there seem to be even less existing tools for this than to just discover include dependencies. (This is also something that's a lot harder to implement, because it would essentially need an entire C++ source code parser.)

However, even with a lack of an automation tool, it may be worth to manually clean up source files from time to time.

Consider using the PImpl idiom

The so-called "PImpl" (which stands for "private implementation") idiom refers to the technique of a class, instead of having everything as direct member variables, having just a pointer to a dynamically allocated structure, which contains those variables. That structure can be forward-declared to avoid dependencies in the header file of the class. In other words, it would look something like:

class SomeClass
{
 public:
    ...

 private:
    struct Impl;
    Impl* data;
};

The Impl struct would be defined in the source file for this class (rather than in this header file). When instantiated, this class, in its source file, will also dynamically allocate an object of the Impl type. (Note that this requires for this class to carefully implement a copy/move constructors and assignment operators, or to disable them.)

If this class uses other classes or types from other header files, this header can often avoid including those other headers completely, by using this technique.

The drawback is that this is considerably more complicated to implement (care must be taken to implement the copy/move constructors and assignment operators, as they are often not completely trivial, and mistakes are easy to make with them), and it introduces an extraneous dynamic memory allocation.

If this class is not instantiated very often, then the extra dynamic memory allocation is mostly inconsequential. However, if this class needs to be instantiated and copied around a lot, then the allocation may introduce extra overhead. How much of an impact this may have on the program needs to be considered on a case-by-case basis.

Consider using statically linked libraries

Sometimes you just have to clean the project (in other words, make the building system, eg. your IDE, just remove all object files and recalculate all dependencies and recompile everything.) Maybe you need to get rid of extraneous data files that aren't used anymore; maybe your IDE is bugging and is not calculating dependencies properly. Whatever the reason, the easiest solution is simply to clean the project and rebuild.

In this case no include dependency chain optimization helps, obviously. You are telling the IDE to recompile everything, and that it will do. And with huge projects this can take a long time.

But this doesn't need to so either. There are situations where even with a full clean&rebuild you don't actually need to recompile everything.

Very large projects quite often have large modules, even groups of large modules, that are completely independent of everything else. They are in essence independent libraries within the project. They are in practice independent "sub-projects" within the overall project. They only depend on their own files and nothing else.

Quite often they could be literally kept as their own projects, and thus if you need to clean&rebuild the main project, those sub-projects wouldn't need to.

There is a technique that can be used to achieve this: Statically linked libraries. Instead of just adding the source files of that independent large module in the main project, create a separate project from them, and make the project compile a statically linked library from them. Said library can then be added to the main project. The project cleanup process of the main project can be configured so that it won't remove these statically linked library files. Thus they won't get recompiled if the main project needs to be.

Many IDEs support this kind of thing quite directly. In other words, you can have "sub-projects" in your main project, which are compiled independently into statically linkable libraries. (The IDE will automatically create a dependency on the library so that the relevant parts of the main project will be compiled or re-linked, if the library changes.) This should be used to its full extent, as it will help greatly reduce dependencies and compilation times even in situations where a full project cleanup is required.