Linked lists

Element insertion

Linked lists are a somewhat curious type of data container in that people often carelessly throw claims about them without much thought. For example, one common such claim is "you can insert an element in the middle of a linked list in constant time".

My usual response to that claim is: "Well, if that's so, then show me a function which takes a linked list and an element as parameters, and which inserts the given element in the middle of the given list in constant time."

When confronted with this dilemma, some stubborn people try to wriggle themselves out of it, as if I had somehow been unfair or were cheating. But it's not unfair or cheating: It's a real challenge to the claim.

The point is that when people carelessly say "you can insert in the middle in constant time", they don't consider the prerequisites which are necessary for that to happen.

The fact is that you can't insert an element in the middle of a linked list in constant time just like that. In order to do that, you have to get there first. And getting there is a linear-time operation. Hence all in itself the insert operation is linear-time, not constant-time.

If you already had a reference/iterator pointing to the place where you want to insert the element, then yes, it would be a constant-time operation. However, the question is how do you get that reference/iterator?

Hence the insertion being constant-time has a precise prerequisite. Just saying "inserting in the middle is constant time" is wrong. You have to add "if you already have a reference/iterator pointing to the place of insertion", which isn't a requisite to be taken lightly.

Sorting

There's an odd but very prevalent and common misconception that you can't use quicksort to sort a linked list, even if it's a doubly-linked list (ie. every element has a pointer to the next and previous elements). The reason why people think like this is because they seem to believe that quicksort requires random access. People think that quicksort can only be used with an array.

There are actually two different ways of using quicksort on a linked list: By recursively splitting the list into separate sub-lists and then joining them (this can be done equally easily to both singly-linked and doubly-linked lists), or by swapping the list element values in the same way as you would do with an array (can only be done efficiently on a doubly-linked list).

Since I'm comparing quicksort as applied to an array with quicksort as applied to a linked list, I'm going to concentrate on the latter method. (The former is actually pretty trivial, so I'll leave it as an interesting exercise.)

If you look at the algorithm description of the classical quicksort algorithm, you'll see that it does not, in fact, require random access: You have two pointers, one traversing the list from the beginning towards the end, and another from the end towards the beginning (perfectly doable with a doubly-linked list), and swapping elements as they go. When the two pointers meet, the algorithm calls itself recursively with the two sub-partitions which were created this way (again, perfectly doable with a list). There's nothing in the quicksort algorithm which would prevent it from being applied to a doubly-linked list.

In fact, you can even use optimized versions of quicksort on doubly-linked lists.

For example, one common optimization is that if the sub-partition to sort is smaller than a certain amount of elements, insertion sort is applied to it instead. This is perfectly doable: While partitioning it's trivial to count the size of the sub-partition (because the partitioning process is traversing the list, after all), and if it's smaller than the given threshold, insertion sort can be applied to a doubly-linked list without problems (because it doesn't require random access either).

Another common optimization is choosing the pivot element for partitioning as the median of the first, last and middle elements of the partition.

This is slightly trickier because getting the middle element of the partition being sorted would require random access. And, indeed, the very first time you start partitioning (ie. the entire list) you can't get the middle element, in which case you just have to use eg. the first element as pivot.

However, while partitioning you can keep a pointer/iterator to the middle of each of the sub-partitions being created: Every two steps of advancing the main iterator, you advance the "mid-iterator" once. This way when you have done partitioning (ie. the main iterators meet), you will have two secondary iterators pointing to the middles of the two sub-partitions. You can use these to get the median-of-three pivot for the next sub-partitioning steps.