Table of contents |
---|

This page presents a O(n^{2}) sorting algorithm which is faster
than insertion sort in some cases (specifically: when there is a lot of
repetition in the data to be sorted).

(Strictly speaking this is not an O(n^{2}) algorithm, but an
O(n*m) algorithm where n is the number of items in the array to be sorted
and m is the number of distinct items in the array, but since the O-notation
is used to denote the worst case scenario (which happens when m=n),
O(n^{2}) should be correct by all practical means.)

Why develop a O(n^{2}) sorting algorithm since
O(n log n) sorting algorithms are so much faster, specially with
large amounts of data?

The answer is precisely in the "with large amounts of data" part. O(n log n) sorting algorithms usually perform very poorly when the amount of data to be sorted is very small (eg. 10-20 items). For example, with 10 items to sort, insertion sort can be several times faster than for example quicksort.

One could then ask why that matters. If there are so few items to sort,
it doesn't take almost any time at all, no matter what the sorting
algorithm. However, this *does* matter when the amount of small
groups to sort is very large. For example, if you have a million groups
of 10 items and each group needs to be sorted, it's much faster to use
insertion sort on each group than eg. quicksort.

This is exactly why the fastest existing sorting methods do not use exclusively a O(n log n), but they usually sort the data only up to a point with that sorting algorithm (eg. when partitioning in quicksort, don't sort any partition which is smaller than 15 items) and then they run insertion sort on the data. This trick of running quicksort only up to a certain partition size and then running insertion sort can speed up a pure quicksort by even 20-40%.

Since there's seldom something truely new under the Sun, it's quite probable that a similar, if not even the exact same algorithm, has already been developed (and named) by someone somewhere. However, since I have never heard of this sorting algorithm or anything similar, I have named it "double-burst-selection sort" at least for the time being.

The reason for using "selection sort" in the name is that this algorithm is somewhat similar in idea.

The idea in selection sort is to find the smallest item in the array and swap the first array item with it. Then this is repeated again for the remaining of the array (ie. everything but the first item) until all items have been placed in this way. If there's more than one smallest item, only one of them will be placed at the beginning.

The idea came to me that this is actually quite an inefficient way of
doing it. Why should just *one* of the smallest items be placed
at the beginning of the array when it's possible to do so for *all*
of them with one pass?

This idea made me develop what I call the "burst-selection sort": After
each pass the smallest item, or most importantly, if there's more than
one smallest item, *all* of them will end up at the beginning of
the array. (For example, if the smallest value in the array is '1' and
there are 10 of them in the array, after one pass all of them will be
located at the beginning of the array.) When this is repeated for the
rest of the array until all the smallest items have been translated
to the beginning, the array is sorted.

After that it ocurred to me that why should I restrict the algorithm to just place all the smallest items to the beginning? Why not, in the same pass, also place all the largest items to the end of the array? I discovered that it is, in fact, possible to do so and that it speeds up the sorting by some amount. Hence I called this enhanced variant of the algorithm "double-burst-selection sort".

Since insertion sort is one of the fastest (and also simplest)
O(n^{2}) sorting algorithms, and usually the algorithm of choice
to speed up other sorting algorithms like quicksort, it's logical to
compare the double-burst-selection sort with it.

The greatest advantage of this sorting algorithm is that it takes advantage of repetition in the data to be sorted. The more repetitions there are in the data, the faster the algorithm is. As the table below shows, depending on the amount of repetition the double-burst-selection sort algorithm can be several orders of magnitude faster then insertion sort.

However, this algorithm starts losing to insertion sort when there's little or no repetition in the data. Another disadvantage of this algorithm is that it's much harder to understand and implement than insertion sort (a typical implementation of insertion sort takes about 7 lines of code while double-burst-selection sort takes more than 25).

The following table shows a speed comparison between the two algorithms.
Both have been implemented as efficiently as possible (eg. the insertion
sort does not *swap* items as the basic algorithm description says,
but takes the item to be moved, moves bigger items one place up until the
correct place for the current item is found, and the item is then placed
there; this saves a lot of unnecessary reads and writes). The program used
to make this speed comparison can be found later in this page.

The test program creates an array of random unsigned integer values with differing array sizes and maximum item values (the smaller the maximum item value, the more repetition there is) and measures how many times each sorting algorithm can sort the array within 10 seconds. Naturally the same array was given to both algorithms to sort for fairness. The number under the algorithm column is thus the amount of times the algorithm was able to sort the array (that is, the bigger the number, the better).

Max value | Array size | Insertion | D-B-Selection |
---|---|---|---|

10 | 10 | 4 859 733 | 4 498 987 |

10 | 100 | 296 240 | 1 114 643 |

10 | 1 000 | 3 802 | 140 492 |

10 | 10 000 | 36 | 13 200 |

100 | 10 | 4 867 741 | 3 994 877 |

100 | 100 | 283 263 | 384 414 |

100 | 1 000 | 3 675 | 31 487 |

100 | 10 000 | 32 | 2 915 |

1 000 | 10 | 4 754 145 | 4 054 443 |

1 000 | 100 | 308 434 | 191 426 |

1 000 | 1 000 | 3 551 | 5 070 |

1 000 | 10 000 | 33 | 324 |

10 000 | 10 | 4 777 478 | 4 031 111 |

10 000 | 100 | 292 915 | 183 846 |

10 000 | 1 000 | 3 633 | 2 574 |

10 000 | 10 000 | 32 | 46 |

100 000 | 10 | 4 824 157 | 4 222 801 |

100 000 | 100 | 318 131 | 155 052 |

100 000 | 1 000 | 3 713 | 2 138 |

100 000 | 10 000 | 32 | 30 |

As we can see from the table, the larger the array, the more likely it is for repetitions to happen, which benefits double-burst-selection sort. For example, with an array of 10 000 items with the range of item values being between 0 and 10, double-burst-selection sort is more than 350 times faster than insertion sort. On the other hand, with smaller arrays and less repetitions the former can be quite much slower.

As already indicated in the introduction, insertion sort is useful to speed up other sorting algorithms such as quicksort.

In order to compare double-burst-selection sort with insertion sort in a
more practical use, I made an efficient implementation of quicksort^{1}
and compared it with the same implementation but enhanced with insertion sort
and another enhanced with double-burst-selection sort.

The version of quicksort enhanced with insertion sort skips partitioning partitions with less than 15 items and after it is done runs insertion sort to the whole table. The version enhanced with double-burst-selection sort skips partitioning partitions with less than 30 items and runs double-burst-selection sort to each of these partitions. (The partition sizes were fine-tuned by hand to get the fastest average sorting speed in both cases.)

All three versions of the algorithm were used to sort an array of 100 000 items containing random unsigned integer values with differing maximum values. The value in each algorithm column tells how many times the algorithm was able to sort the table in 10 seconds (thus a bigger number is better):

Max value | Quicksort | QS + Insertion | QS + D-B-Selection |
---|---|---|---|

10 | 298 | 452 | 485 |

100 | 283 | 418 | 446 |

1 000 | 275 | 399 | 425 |

10 000 | 256 | 348 | 371 |

100 000 | 256 | 340 | 308 |

1 000 000 | 251 | 331 | 266 |

As we can see from the table, enhancing quicksort with a fast
O(n^{2}) algorithm (fast for very small arrays, of course) has
a very considerable impact in performance. (As said earlier, this is because
quicksort is not very efficient with small partitions.)

We can also see that double-burst-selection sort presents a slight improvement over insertion sort when there's a considerable amount of repetition. When the amount of repetition is minimal, insertion sort starts being considerably faster.

The (C++) source code of the program used to make these measurements can be found here.

^{1} The quicksort implementation used in the program chooses
the pivot item by calculating the median of the first, last and middle
items of the partition. This method of choosing the pivot item introduces
a considerable average speed performance compared to the regular quicksort
(which just takes the first value of the partition as pivot) and even to
the randomized quicksort (which chooses a random item as pivot).

First I'll explain the burst-selection sort algorithm, which is the simpler version of the double-burst-selection sort.

As explained in the introduction, the idea of this sorting algorithm is to make a pass over all the items in the array and swap items in such way that after the pass is done the smallest item has been placed at the beginning of the array. If there's more than one smallest item, all of them will be placed there. The same algorithm is repeated for the rest of the array (ie. all the items in the array except the smallest ones at the beginning) until the entire array is sorted.

In pseudocode, the algorithm is the following:

- Initialize a
`minIndex`

index variable which points to the beginning of the array (where the minimum values will be placed). - Initialize a
`min`

variable of the array item type with the value at`minIndex`

. This variable will contain the minimum item value in the array (it may not contain it at first, but will eventually do so). - Initialize a
`newMinIndex`

index variable with`minIndex+1`

. This variable contains the location where the found minimum items will be placed during the pass. - For each item in the array from the item at
`minIndex+1`

to the end of the array, if the item is less or equal to`min`

do:- If the item is less than
`min`

, assign the item to`min`

and assign`minIndex`

to`newMinIndex`

. - Swap the item at the current location with the item at
`newMinIndex`

. - Increase
`newMinIndex`

by one.

- If the item is less than
- Assign
`newMinIndex`

to`minIndex`

. - If
`minIndex`

is still inside the array, jump to step 2.

If an item is not a smallest item, it's possible to check to see if it's the largest item and if it is, move it to the end of the array. This way we can achieve two goals at the same time and perform the job faster. This kind of addition to the burst-selection sort makes it somewhat faster (but not twice faster as one could hastily think because the amount of work done does not actually drop to half, but it drops somewhat nevertheless).

This addition makes the algorithm a bit more complex because swapping the current item with the item at the current end of the array may actually bring a smallest item at the current place which means we can't move to the next item in the array just yet (if we did, the algorithm would malfunction in some cases). Also, if a new minimum item is found (and is swapped to the start of the current array section), the value swapped to the current item place may be a largest item, which also means that we can't move to the next item yet. Otherwise the addition can be done basically by replicating the burst-selection sort code in a way that it makes the reverse job.

These additions need to be done to the burst-selection sort algorithm in order to make it a double-burst-selection sort:

- Besides having a
`minIndex`

, create also a`maxIndex`

index variable initialized to point to the last item in the array. - Besides having a
`min`

variable, initialize also a`max`

variable with the item at`maxIndex`

. - Besides having a
`newMinIndex`

, create also a`newMaxIndex`

index variable initialized to`maxIndex`

(note:*not*to`maxIndex-1`

). - The inner loop can now be performed from
`minIndex+1`

to`newMaxIndex`

(inclusive). - In the inner loop, if the current item was not less or equal to
`min`

, then check if it was greater or equal to`max`

and if it was, perform the same code as with the minimum case, but using the*max*versions of the variables (note that`newMaxIndex`

should naturally be decreased by one, not increased). - One important thing to take into account is that the inner loop should
not go to the next item if the current item was less than
`min`

or greater or equal to`max`

, but it should stay at the same item in its next loop (that is, it should not increase the loop index variable). - At the end of the outer loop, assign
`newMaxIndex`

to`maxIndex`

. - The outer loop should be executed again if
`minIndex`

is less than`maxIndex`

.

The source code of the test program has an implementation of this algorithm in C++ at the beginning.

© Copyright 2004 Juha Nieminen