Why C++ is vastly superior to C

Introduction

If you search the internet for discussions related to C vs. C++, you will invariably find a lot of people furiously defending C and claiming that it's superior to C++, as well as lots of denigrating claims about C++, such as that its implementation of object-oriented programming is somehow "wrong" and inferior, or that it "teaches bad habits" (whatever those might be).

I have previously written an article where I describe why I personally hate C as a programming language, especially when compared to C++. In this article I'll give concrete examples of why C++ is vastly superior to C in all possible counts.

I will use different background colors to differentiate C code from C++ code in the examples below, as follows:

A block of C code...

A block of C++ code...

The basic scenario

Let's assume that we need a matrix type for some application we are developing. The matrices used by the application tend to be very large in size (hundreds and even thousands of rows and columns), they get copied around a lot, but often these copies are not always modified. Hence the most efficient implementation of this matrix type is for it to use the copy-on-write idiom. "Copy-on-write" means lazy copying: The matrix data is not copied immediately when the matrix object itself is copied, but only if it's modified. This way if a copy is never modified, it will not perform needless (and memory-wasting) data copying.

The matrix type will also have to support some basic operations, such as adding and multiplying matrices, inverting, etc.

I will not concentrate on the actual implementation of this matrix type functions, as that's rather trivial in both C and C++. What I will concentrate on, however, is the usage of such a type in each language.

From the usage point of view, the way the data management of the matrix type is implemented is important. In C++ the implementation is fairly trivial and straightforward: Simply implement a matrix class and its needed member functions, including the constructors, destructor and assignment that are needed for the copy-on-write mechanism (while this requires a slight amount of care to be done properly, it's not very difficult, and the same thing has to be done in the C version as well, so in principle it's the same for both).

The C implementation, however, is not as straightforward because C does not offer a direct paradigm for this. If the C version wishes to match the C++ version in speed and memory usage, it will have to basically simulate the C++ implementation using a struct and functions which handle objects of that struct type (including things like initialization and destruction).

Basic usage

Once we have implemented our matrix type, its usage in C++ is rather simple. For example:

int foo()
{
    // Create two 1000x1000 unit matrices:
    Matrix a(1000, 1000), b(1000, 1000)

    // Do something with a and b here, for example:
    Matrix c = 2*a + b;
    ...

    return some_value;
}

(If you don't like the operator overloading of the Matrix class for whatever strange reason, you could always write the equivalent named functions instead. It's not really the main point here.)

The C equivalent will be slightly more complicated, but still somewhat bearable:

int foo()
{
    Matrix a, b, c;

    // Create two 1000x1000 unit matrices:
    Matrix_init(&a, 1000, 1000);
    Matrix_init(&b, 1000, 1000);

    // Do something with a and b here, for example:
    Matrix_init_with_matrix(&c, &a);
    Matrix_multiply_by_constant(&c, 2);
    Matrix_add(&c, &b);
    ...

 exit_foo:
    Matrix_destroy(&a);
    Matrix_destroy(&b);
    Matrix_destroy(&c);

    return some_value;
}

(The exit_foo label is necessary if the function needs to be exited prematurely from somewhere inside it.)

The C version is necessarily more verbose and error-prone (eg. coding conventions need to be obeyed to avoid memory leaks, something which was not necessary in the C++ version), but it's still not completely bad. Note how in the C++ version we don't need to do anything at all to make sure that memory is not leaked.

The C version is also easy to break accidentally. How easy it would be to simply write something like:

Matrix c = a;

But doing that is asking for trouble. Even if you wanted c to be an "alias" for a (which is what the above effectively produces), this can easily lead to situations where you access freed memory, or free the same memory block twice. Think, however, how easy it is to do the above by mistake.

Needless to say, C++ does not suffer from this problem, and the above assignment will work correctly and safely.

First hurdle: Error handling

There's a small problem with the C version above, though: It does not handle in any way the real possibility that a memory allocation could fail. This is doubly problematic because in principle any operation done to a matrix object could fail (due to a possibly triggered copy-on-write failing to allocate memory).

What to do if memory allocation fails? A lazy solution would be simply to abort() in that case. However, it may well be that in this application we want to do something cleaner and nicer than that. In the C++ version this isn't really a problem, because we can do, for example this:

try
{
    foo();
}
catch(std::bad_alloc&)
{
    do_something_nicer_than_abort();
}

Note that if an allocation error happens somewhere inside the foo() function and hence the catch block is triggered, no memory will be leaked: Everything that was allocated inside foo() will be automatically released in the exact same way as when foo() was executed normally without error.

Doing this in the C version is way more problematic. The first problem is how to signal a memory allocation error from foo() to the calling code. The return value of foo() might be such that it can't be used to signal this (ie. it may not have any "invalid" value). We could make foo() take as parameter an error code pointer, or we could use a global variable.

The second problem is that we need to check every single operation done to the matrix objects to see if an error happened, so the code would become something like:

int foo()
{
    // Create two 1000x1000 unit matrices:
    Matrix a, b, c;
    if(!Matrix_init(&a, 1000, 1000)) goto exit_foo;
    if(!Matrix_init(&b, 1000, 1000)) goto exit_foo;

    // Do something with a and b here, for example:
    if(!Matrix_init_with_matrix(&c, &a)) goto exit_foo;
    if(!Matrix_multiply_by_constant(&c, 2)) goto exit_foo;
    if(!Matrix_add(&c, &b)) goto exit_foo;
    ...

 exit_foo:
    Matrix_destroy(&a);
    Matrix_destroy(&b);
    Matrix_destroy(&c);

    return some_value;
}

Some of the checks could be skipped, if the programmer knows for sure that the function cannot fail (eg. successive operations done to the c matrix cannot fail if the first one succeeded, as they don't entail new memory allocations; however, the programmer would have to be completely sure of this at every place). However, having to check the majority of operations for success can be a rather big annoyance.

Of course the lazier C programmer will simply ignore the possibility of such errors and hope it will never cause any problems. In the C++ version, by contrast, it was enough to add one single try/catch block to take care of the situation. Existing code didn't need to be modified.

A trick sometimes used in C is to use the setjmp() and longjmp() standard functions to handle such situations. This of course requires the matrix functions implementations to explicitly support it and for the foo() function to set it up. While this makes the situation slightly more bearable, it's still a nuisance that most C programmers will skip. The longjmp() trick is also error-prone and requires strict coding conventions to use properly (again something which is completely unnecessary in C++).

Second hurdle: Initialization and destruction baggage

The C version of the matrix type carries with it an inherent baggage: It always has to be initialized and destroyed manually, no matter where it's used, and this requirement is transferred to anything that wants to use it. For example, suppose that you want to create a new type which contains objects of the matrix type, for example:

typedef struct
{
    Matrix mainMatrix, secondaryMatrix;
    int someValue, anotherValue;
} SomeCompoundStruct;

Now every time you want to make instances of SomeCompoundStruct, you need to initialize the two Matrix members properly. Likewise when instances of SomeCompoundStruct are copied around and destroyed, you need to make the equivalent operations to the Matrix members.

Hence you will need to write initialization, destruction and copying functions for SomeCompoundStruct, and always be sure to call them appropriately when you handle instances of that type. And all this by hand. The compiler won't help you with this at all.

In C++ you don't need to do any such thing. You can completely safely have such a struct (or class) with objects of type Matrix as members, instantiate that struct and copy it around as much you like, and the matrices will always be properly initialized, copied and destroyed automatically. You don't need to write any additional code for this. The Matrix class does not carry any baggage with it which is transferred to whatever wants to use it.

Third hurdle: Data containers

An array of matrices

Consider the following line of C++ code:

void bar(int amount)
{
    std::vector<Matrix> matrices(amount, Matrix(1000, 1000));
    ...
}

The above line creates an array which contains 1000x1000-sized unit matrices. Since, as specified, Matrix uses copy-on-write, each matrix in the array shares the same matrix data, so there's only one such data block allocated after the array has been created. This can be an enormous memory saver if not all of those matrices are modified.

Another extremely important aspect of the above is that it's completely and absolutely safe: When the matrices object goes out of scope, all of the member matrices will be automatically properly destroyed.

How to do this in C? Naturally this gets a bit more complicated. Like with the SomeCompoundStruct example earlier, you need to write some initialization and destruction functions, along the lines of:

Matrix* createMatrixArray(int size, Matrix* value)
{
    Matrix* array = malloc(sizeof(Matrix) * size);
    for(int i = 0; i < size; ++i)
        Matrix_init_with_matrix(array+i, value);
    return array;
}

void destroyMatrixArray(Matrix* array, int size)
{
    for(int i = 0; i < size; ++i)
        Matrix_destroy(array+i, value);
    free(array);
}

void bar(int amount)
{
    Matrix* array;
    {
        Matrix tmp;
        Matrix_init(&tmp, 1000, 1000);
        array = createMatrixArray(amount, &tmp);
        Matrix_destroy(&tmp);
    }

    ...

 exit_bar:
    destroyMatrix(array, amount);
}

We could, of course, make the above slightly more modular by creating a new struct which acts as an array of matrices (and which contains the size of the array as member, so that it doesn't have to be dragged separately). However, the implementation, initialization, destruction and usage of such struct wouldn't be significantly simpler than the above example code (in fact, it would be more verbose because of the need to declare the struct).

Verbosity is one of the key things that manifests itself in these C examples. The more complicated the situation, the more verbose the code becomes. Basically one line of C++ code requires a dozen of lines of C code.

Some C advocates will argue that I'm somehow "cheating" by using std::vector here. It's in no way "cheating" because if C offered a similar utility, I would use it, but it doesn't. The reason C doesn't offer such an utility is because it can't, and that's one of the major problems with the language. And even if one really wanted to create a raw matrix array in C++, it would still be simpler because constructing the array is simpler and then destroying the entire array with all of its member objects can be done with one single "delete[]".

However, an array was but the simplest of data structures.

An linked list of matrices

Assume that the line was like this instead:

void bar(int amount)
{
    std::list<Matrix> matrices(amount, Matrix(1000, 1000));
    ...
}

Now it gets interesting, and quite complicated in C.

C offers raw arrays as a primitive type. However, C doesn't offer linked lists in any way or form. That's because C cannot offer any rational generic linked list implementation which would work with any user-defined type. The programmer will have to create his own linked list of matrices by hand. The amount of code required for that is quite significant, and the code will be complicated, error-prone and hard to follow.

But it gets even more complicated still.

Nested containers of matrices

Consider this:

void bar()
{
    typedef std::vector<Matrix> MatrixVector;
    typedef std::list<MatrixVector> ListOfMatrixVectors;

    MatrixVector v(100, Matrix(1000, 1000));
    ListOfMatrixVectors l(200, v);
    std::vector<ListOfMatrixVectors> array(300, l);
    ...
}

The array data container above is really complicated. It's an array of linked lists, each such list containing arrays of matrices. In this example a total of 6 million matrices are instantiated, each one sharing the same matrix data.

However, even though the data container is very complicated, the code itself isn't. It's pretty straightforward. A few type aliases are declared to simplify the type definitions, and the object instantiations are quite simple. The underdlying structure of data is quite complicated, but it's nicely wrapped inside simple interfaces.

Moreover, the above code is completely safe and efficient. We don't have to do anything in order to make sure that nothing is leaked. Every single data container and object will be nicely deleted when they go out of scope. There is no maintenance required from the part of the programmer at all.

What's even better, all three classes being used above, Matrix, std::vector and std::list are completely independent of each other, don't know anything about each other, and are nicely packaged each in their own separate module. Yet the executable binary produced by the compiler will be as efficient as it can get.

This is the beauty of C++. When modules are properly designed, it makes it extremely easy to take a module and reuse it with something user-defined (such as a user-defined class like Matrix). Note that std::vector and std::list have no concept (in their source code) of what Matrix is, or that it's supposed to use copy-on-write or anything like that. Yet it works, and it works efficiently.

The functionality in the example above can be reproduced in C, but it will be extremely complicated and hard to follow, and very error-prone. Very strict coding conventions need to be followed in order to avoid mistakes (such as leaking memory or accessing uninitialized objects). These coding conventions are completely unnecessary in C++, because it's the compiler who is making the majority of the work that would otherwise be up to the programmer in C.

C starts to crumble: Copying containers

Suppose that you have gone through the trouble of replicating the previous example in C. Now I will give you one single line of code which will make everything crumble:

std::vector<ListOfMatrixVectors> array2 = array;

This single line of code creates a copy of the entire data container. All the lists in the vector, and all the vectors in those lists, and all the matrices in those vectors will be copied (the matrices automatically using lazy copying).

This single line of code translates to hundreds of lines of complicated and unsafe C code, which requires minute attention to detail and strict following of coding conventions.

Again, in C++ we need not worry about how this new array2 object will be destroyed. When it goes out of scope, everything that it manages will be safely deleted. This even in case of error (eg. if a memory allocation fails).

The final nail in the coffin

An experienced C programmer might be able to implement all of the above, and not even break a sweat too much. A few of lines of C++ code would need hundreds, if not even thousands of lines of C code, but an experienced C programmer is used to that.

However, here comes the final straw, the final nail in the coffin:

We have not defined what the numerical type used by the matrix is. If a programmer went to implement the matrix type, he would most probably use the double type, or possibly int.

However, what if we want for the matrix to support different numerical types, even inside the same program? Not only basic types such as float, double, int and long double, but more complicated types such as complex or a user-defined type such as arbitrary precision numbers using the GMP or the MPFR libraries (which have C++ bindings in the form of classes which act as numerical values but which internally use the library for arbitrary precision numbers)?

In C++ this is fairly trivial to do. All you have to do is make the Matrix class a template. This happens by adding the line "template<typename Value_t>" at the beginning of the class declaration and then substituting all instances of eg. "double" with "Value_t". And that's about it. (The implementation of the member functions will have to be moved to the header file where the Matrix class is declared, but that's only a slight bummer.)

The above change could probably be made in less than a minute. After that we have full support for any type that acts like a number, so we can write things like:

void foo()
{
    typedef Matrix<double> DMatrix;
    typedef Matrix<int> IMatrix;
    typedef Matrix<std::complex<double> > CMatrix;
    typedef Matrix<mpreal> MPMatrix; // mpreal = an MPFR wrapper

    std::vector<DMatrix> dArray(100, DMatrix(100, 100));
    std::vector<IMatrix> iArray(100, IMatrix(100, 100));
    std::vector<CMatrix> cArray(100, CMatrix(100, 100));
    std::vector<MPMatrix> mpArray(100, MPMatrix(100, 100));
    ...
}

How do you do this in C? Well, you don't. You just can't.

Even if you try to struggle by making the matrix type and all of its functions as preprocessor macros (something which would make it some of the most horrible pieces of code ever created), it would still fall short because it wouldn't work with the GMP, MPFR and other similar libraries which do not act as primitive types.

Note how we went from a change which probably took less than a minute to do to the C++ version, to a change which is basically impossible in C to implement.

The only thing you could do in C is to manually replicate all the functionality of the matrix type for each numerical type you want to use (especially for the numericals that need to be used in a special way, such as the arbitrary precision libraries). It's simply impossible to make a generic matrix implementation that would work with all user-defined numerical types.

What's worse, you will also need to replicate all the data container code you want to use the different matrix types with. In the same way as you can't create a generic matrix type in C, you can't create generic data containers for that generic matrix type, so the only solution is to replicate code for each distinct type. All this replication needs to be done by hand.

Technically speaking the same code replication is happening in the C++ version as well, but here it's the compiler who is making the replication automatically, not the programmer. This results in a program which is significantly simpler, shorter and easier to understand and maintain.

In conclusion

The example I described in this lengthy article is but just one of the reasons why C is so vastly inferior to C++. However, it's one of the main reasons, and something which comes up all the time in practical programs. You might not need a matrix class in your programs, but it just served to illustrate the inherent problems with C as a programming language, and how C++ completely exels at this kind of programming.

The internet is full of people defending C as the superior language, and denigrating C++. These people are complete idiots and morons, and they have absolutely no idea what they are talking about.

In conclusion, the reason why C is so vastly inferior to C++ as described in this article can be summarized with one single sentence: C has no support for RAII nor templates, while C++ does. Those are two of the most important features that make C++ by far the superior language.