From HandWiki - Reading time: 7 min
In computer science, a measure of presortedness of a sequence represents how much work is required to sort the sequence. If the sequence is pre-sorted, sorting the sequence entirely require little work, hence it is expected to have a small measure of presortedness. In particular, the measure of a sorted sequence is 0.
Some sorting algorithms are more efficient on pre-sorted list, as they can use this pre-work into account to avoid duplicate work. The measure of presortedness allows to formalize the notion that an algorithm is optimal for a certain measure.
Let represents finite sequence of natural numbers. Let a function sending sequence of numbers to non-negative numbers. This function is said to be a measure of presortedness if it satisfies the following axioms:
Those axioms are similar to the axioms of measure theory. However, a measure of presortedness is not a measure as in measure theory, since it is not defined over a set but over sequence.
It is well known that an optimal comparison sort requires to compare pairs of elements in a sequence. That is, the decision tree complexity is . However, when more informations is known about how much a sequence is presorted, more optimal bounds can be devised. This notion is now formalized.
Given a measure of pre-sortedness , a sequence , , let be the set of permutations of whose measure is at most . Then let the set of permutations whose measure is at most the measure of .
A sorting algorithm which takes as input and is said to be weakly -optimal if its time complexity is . That is, the number of operations is logarithmic in the number of possible permutations whose measure is at most . This is optimal in the sens that simply selecting an element in a set of size requires bits of information.
Similarly, a sorting algorithm which takes as input is -optimal if its time complexit is . When is computable in linear time, a weakly -optimal algorithm is automatically -optimal. However, if is more complex, it is possible that computing is actully more costly than sorting the sequence in the first place, and thus only an upper bound of can be used.
Similarly, a decision tree is weakly -optimal if its complexity is and it is -optimal if its complexity .
For each measure , there exists a weakly -optimal decision tree.
If the decision tree complexity of is , there exists an -optimal decision tree. This decision tree's root compute . Once is computed in a node, the weakly -optimal decision tree for is appended at this node.
The existence of the tree does not implie that there exists (weakly) -optimal sorting algorithm, as computing the decision tree may be costly.
Here is a list of measures of presortedness and related (weakly) -optimal algorithms. Let be fixed for the remaining of the section.
Let us introduce a sorting algorithm that will be optimal for multiple measure. Let's call local insertion sort the algorithm that consists in iterating on the sequence and adding each element in a finger tree.
The constant function 0 is the most trivial measure of presortedness. Any sorting function running in -time, such as heapsort and mergesort is -optimal.
The indicator function of sorted sequence is a measure of sortedness. This function sends sorted sequences to 0 and all other sequences to 1. Any algorithm that check whether a list is sorted, and if it is not sorted apply a -optimal sorting algorithm is -optimal.
The number of runs, , is the number of increasing sequences in minus one. It is equivalently defined as the number of such that . is a measure of presortedness.
Natural merge sort, which consists in using existing runs and merging them, is a -optimal algorithm. The local insertion sort is also Runs-optimal.
The number of inversion in , , is the number of pairs such that . It is a measure of presortedness and the local insertion sort is Inv-optimal.
The mininimum number of elements that need to be removed from to obtain a sorted sequences. It is a measure of presortedness and the local insertion sort is Rem-optimal.
The radius of [1] is a measure of presortdness.
The following algorithm is Rad-optimal. It consists in iteratively halving the radius until it reaches 0. Halving the radius can be done by three merges of sub-sequences of length .
Given two measures of presortdness and , the measure is also a measure of presortdness. Given two -optimal sorting algorithms , the algorithm consisting in executing and in parallel and returning the first output is a -optimal algorithm.
Template:Data structures and algorithms