The power of iterators

Many C++ programmers, even experienced ones, think that iterators are used and useful mainly for traversing data containers. (Many such programmers have heard of other types of iterators, such as stream iterators, but they have never considered them very useful.) However, iterators are much more powerful than that.

Let's start by looking at a simple problem that can be handily solved using iterators, and then look at its applications.

A simple character encoding converter

Converting between different character encoding formats can be a non-trivial problem from a design point of view. The conversion routines would need to know what kind of data containers are handling the character data of different encodings. Should such routines provide their own "string" data container, or should they support a variety of containers (such as the standard std::string and std::vector containers)? How about static arrays? What if a user would want to use a non-standard custom container?

All these conundrums are moot if we use the same design principle as most standard library algorithms do: Make the routines iterator-based.

As a simple example, let's make a UCS-2 to ISO-Latin-1 conversion routine. (UCS-2 is basically the same thing as UTF-16, except that it's limited to the Unicode codepoints in the range from 0 to 65535; in other words, UCS-2 characters always take 2 bytes.) Since the point of this article is not the conversion itself, this particular conversion was chosen because it's unusually trivial to implement (because UCS-2/UTF-16 is identical to ISO-Latin-1 in the codepoint range from 0 to 255.)

template<typename InputIterator, typename OutputIterator>
void ucs2ToISOLatin1(InputIterator inputBegin, InputIterator inputEnd,
                     OutputIterator output)
{
    while(inputBegin != inputEnd)
    {
        if(*inputBegin < 256) *output = *inputBegin;
        else *output = '?'; // Not in ISO-Latin-1 range
        ++inputBegin;
        ++output;
    }
}

Basic usage

A simple example usage of the previous function would be:

std::vector<unsigned short> ucs2String;
initializeWithSomething(ucs2String);

std::string latin1String(ucs2String.size(), ' ');
ucs2ToISOLatin1(ucs2String.begin(), ucs2String.end(),
                latin1String.begin());

The advantage of using iterators instead of the ucs2ToISOLatin1() function taking data containers directly may not be immediately apparent, but it becomes much clearer when one realizes that the function is agnostic to what kind of data container and iterators are used.

To exemplify this, consider that this will also work:

unsigned short ucs2String[MAX_STR_LENGTH];
initializeWithSomething(ucs2String);

char latin1String[MAX_STR_LENGTH];
ucs2ToISOLatin1(ucs2String, ucs2String + MAX_STR_LENGTH,
                latin1String);

Note how the one and same ucs2ToISOLatin1() function handles both cases. (Also note that any combination of one of the strings being a static array and the other a dynamic data container will also work equally well.)

Also, since the function uses iterators, we can convert only part of the input string rather than the entirety of it, which can sometimes be very useful.

More advanced usage

There's a slight problem in the usage examples above: They assume that the result will have as many characters as the input. This is true when converting an UCS-2 encoded string into an ISO-Latin-1 one. However, this is most certainly not true with other encodings. For example, converting an UTF-16 string into an UTF-8 one will often create more characters (because all code points above 127 have to be encoded as at least two characters in UTF-8). It's even theoretically possible for our ucs2ToISOLatin1() function to generate more characters to the output than in the input (unless it specifically states otherwise), so it's potentially dangerous to use it like above.

However, this isn't a problem at all, thanks to more specialized iterators. The problem described above can be easily solved by doing it like this:

std::vector<unsigned short> ucs2String;
initializeWithSomething(ucs2String);

std::string latin1String;
ucs2ToISOLatin1(ucs2String.begin(), ucs2String.end(),
                std::back_inserter(latin1String));

Now the function can safely generate more output than there is input and there will be no out-of-bounds accesses: Instead, latin1String will grow as needed.

As for another trick, consider this:

std::vector<unsigned short> ucs2String;
initializeWithSomething(ucs2String);

ucs2ToISOLatin1(ucs2String.begin(), ucs2String.end(),
                std::ostream_iterator<char>(std::cout));

Now we are printing the resulting string directly to std::cout.

(This can be especially handy when debugging an application that uses eg. UCS-2 or UTF-16 strings as the its native string format. Printing such strings usually needs a conversion to UTF-8 or whatever the terminal is using. It would be possible to first convert it to a string of that format and then print that string, but why go through that trouble when we can print it directly like this? A function call like above, using a stream iterator, is a handy way of implementing an operator<<(std::ostream&, ...) for a custom string format that uses eg. UCS-2 as its character encoding.)

Conclusion

In the example above we had five different usage situations:

  1. Converting from a dynamic data container to another of the same size.
  2. Converting from a static array to another. (Combinations of this and a dynamic data container are also possible.)
  3. Converting to a dynamic data container making sure it grows as needed.
  4. Outputting the result of the conversion directly to a stream.
  5. Converting only part of the input rather than the entirety of it.

All cases were handled by one single function (and this function didn't need to take the different situations into account in any way), thanks to the iterator idiom.