Comparison of several sorting algorithms: Integers with repetition

(Main test page)

This test is basically the same as the previous one, except that in this case there is a lot of repetition in the data. The data was generated with random values between 0 and 1/100 of the size of the array. This means that each value is repeated, in average, about 100 times. I was interested in knowing if high repetition makes any difference in the sorting algorithms.

In order to see better the difference between the two cases, both charts, ie. the ones in the previous test and this test are shown together (the new chart below the old one).

100 items

100 integers

100 integers

In this case all the items are actually the same (0), so this is a rather special (but interesting) case. The reason why all the bars are not equal for each algorithm is probably due to small inaccuracies at timescales this small (less than 0.5 microseconds).

Insertion sort was skipped because in this case it is trivially enormously faster than the others (linear time).

Having all the items equal seems to cause considerably faster sorting times in all cases (in the previous test the average time was about 4 microseconds, and in this case about 1.5 microseconds).

Random Almost sortd Almost rev. Random end
Alg.Comp.Assig. Comp.Assig. Comp.Assig. Comp.Assig.
Shell 282564 282564 289572 282564
Heap 294297 294297 294297 294297
Merge 316800 316800 316800 316800
Quick 830948 830948 830939 830948
Quick2 430663 430663 430660 430663

5000 items

5000 integers

5000 integers

Shell sort is somewhat faster, no difference in heap sort nor in merge sort, and the quicksorts are also somewhat faster. Other than that there's surprisingly little difference.

Random Almost sortd Almost rev. Random end
Alg.Comp.Assig. Comp.Assig. Comp.Assig. Comp.Assig.
Shell 69325104376 4808781619 5561692884 4289276284
Heap 106642168690 101201161595 102329158034 109923176862
Merge 5655770000 5228170000 5104570000 3874470000
Quick 8223966783 16557950407 12251058662 30751158991
Quick2 5697654584 5604037851 5710444643 9251445010

100000 items

5000 integers

5000 integers

The same trend continues.

Random Almost sortd Almost rev. Random end
Alg.Comp.Assig. Comp.Assig. Comp.Assig. Comp.Assig.
Shell 31535224149856 21050873065171 22501563277592 11537482109500
Heap 30185434721747 29306374620850 29335084510015 309875115569
Merge 15662681800000 14780201800000 14629791800000 9658931800000
Quick 22968781631423 66851641096403 41800551393463 --
Quick2 16742931398385 1608271857867 1643649997362 2369313889930

1 million items

5000 integers

5000 integers

As the size of the array grows, the charts start more and more mimicking each other. Only shell sort seems to benefit from the repetition.

Random Almost sortd Almost rev. Random end
Alg.Comp.Assig. Comp.Assig. Comp.Assig. Comp.Assig.
Shell 574795992369566 3790287749546185 3940050151806490 13577175913309
Heap 3679272157141457 3629421156833423 3600632355119587 1324969710399311
Merge 1871575020000000 1782647220000000 1769344820000000 1096559720000000
Quick 2796635918585637 2679341212089254 5088144515805873 --
Quick2 2090044216328183 215617759718281 2161904311154983 --

Conclusion

Perhaps a bit surprisingly having a lot of repetition in the data doesn't seem to make too much of a difference, especially with larger arrays. Shell sorts seems to benefit in all cases, though.


Previous - Next