2. Sorting, Searching and SelectionWe begin our study of algorithms by learning some well-known algorithms for three important problems that we always encounter. We also present some properties about the problems that are useful if you want to design new algorithms for these problems
2.1 Sorting
Most of Sorting is using comparison sort. It is a type of sorting algorithm that only reads the array item through a single abstract comparison operation that determines which of two items should occur first in the final sorted array. The Comparison Sort is swaps the items that are in incorrect order. Here will introduce 3 basic comparison sorting algorithms, i.e. Selection Sort, Bubble Sort and Insertion Sort.
Selection Sort There are many algorithms for sorting an array. Selection sort is the simple and intuitive sorting algorithm. The idea of selection sort is that, it firstly finds the biggest value item in the array A[1...n] and swap it with the 1st item. And then find the biggest value item form the range A[2...n] and swap it with the 2nd. Repeat it until swapping until the size of searching range become A[n-1...n]. All items have been sorted.
The number of comparisons of selection sort is n(n-1)/2, the Time complexity is O(n2).
Insertion Sort Insertion sort is another simple comparison sorting algorithm. The implementation of insertion sort is very simple, and it is quite efficient for small data sets. Therefore, some advanced hybrid sorting algorithms will use insertion sort when the data set becomes smaller enough. And also, insertion sort is the only one sorting algorithm being suitable for online approach, because insertion sort can sort the array as it receives an element in the array efficiently.
Although the previous pseudo code example shows how to implement insertion sort with an array, it is better to use a link list instead of an array to implement insertion sort, because link list helps to avoid some unnecessary movements of the elements. The differences between array and link list can be seen in the topic about "Data Structures". Time Complexity of Insertion Sort is O (n2)
Bubble Sort The third sorting algorithm we going to introduce is Bubble sort. It runs by repeatedly visiting the items to be sorted, comparing each pair of adjacent items and swapping them in a correct order. The work to visit items is repeated until the whole array of items is sorted.
For n number of items to be sorted, bubble sort requires O(n2) number of comparisons. Since other O(n2) sorting algorithms such as insertion sort have better performance than bubble sort, bubble sort is not a practical sorting algorithm for a large n. Although bubble sort and insertion sort have the same worst case running time complexity, insertion sort still performs better even on random lists. However, because of its simplicity, bubble sort is usually used to introduce the concept of algorithm to the students who firstly learn to program.
2.2 Can we sort faster?(Part 1)Selection Sort, Insertion Sort and Bubble Sort all requires Ω(n2) time in the worst case. To design faster sorting algorithms, it is useful to observe a relationship between sorting and the number of inversion in input.
Swapping two adjacent items can reduce the number of inversions by at most one. There are at most n(n-1)/2 inversion in an array, so the worst case of swapping need to take n(n-1)/2 times.
The sorting algorithm introduce before are all make the comparison between adjacent items. Hence, to design faster algorithm, we must swap items that are further away. Which are the Merge Sort and Quick Sort.
Merge Sort Merge Sort is a O(n log n) comparison-based sorting algorithm. The key idea of Merge Sort is that, a shorter list takes fewer steps than a longer array and also constructing a sorted list from two sorted arrays takes lower time complexity than from two unsorted array.
Merge sort works as follows
Merge of two sorted array
Consider the recurrence T(n) = 2T(n/2) + O(n) when assuming merging takes O(n), by the master theorem[1] it will be seen that the time complexity is O(n log n)
Quick Sort Quick sort is practically faster sorting algorithm and it is implemented by most libraries as the sorting algortims.Quick sort is a divide and conquer algorithm. It first divides a large list into two smaller sub-lists: the low elements and the high elements. It can then recursively sort the sub-lists.
When Quick Sort run, it ramdonly Pick pivot from the array. Pivot is the item which is selected first by an algorithm to do certain calculations. Reorder the array so that all items with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. Recursively sort the sub-list of lesser elements and the sub-list of greater elements. The base case of the recursion are lists of size zero or one, which never need to be sorted. Same as Merge Sort, Time complexity of Quick Sort is T(n) = 2T(n/2) + O(n) . The master theorem tells us that T(n) = O(n log n).
2.3 Can we sort faster? (Part 2)
The model that we will use for this proof is a decision tree. The root of the decision tree corresponds to the state of the input of the sorting problem. Each internal node of the decision tree represents such a state, as well as two possible decisions that can be made as a result of examining that state, leading to two new states. The leaf nodes are the final states of the algorithm, which in this case correspond to states where the input array is determined to be sorted. The worst-case running time of an algorithm modelled by a decision tree is the height or depth of that tree. A sorting algorithm can be thought of as generating some permutation of its input. Since the input can be in any order, every permutation is a possible output. In order for a sorting algorithm to be correct in the general case, it must be possible for that algorithm to generate every possible output. Therefore, in the decision tree representing such an algorithm, there must be one leaf for every one of n! permutations of n input values. Since each comparison results in one of two responses, the decision tree is a binary tree. A binary tree with n! leaves must have a minimum depth of log2 (n!) . But log2 (n!) has a lower bound of Ω(nlogn) [2]. Thus any general sorting algorithm has the same lower bound. All the algorithms we presented previously only compare the relative magnitude of two items. We show that any comparison-based algorithm take Ω(n log n) time in the worst case. For Sorting much faster, we will introducte the non-comparison base algorithm.
Counting Sort Counting sort is an integer sorting algorithm for sorting a collection of objects according to keys that are small integers. It counts the number of objects that have each different key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Because counting sort uses key values as indexes into an array, it is not a comparison sort, and the Ω(n log n) lower bound for comparison sorting is inapplicable to it.
Time Complexity of counting sort is O (n + k) where n is the number of items and k is the biggest value in the items.
2.4 Searching
Searching is algorithm for find out the target items in an array. Linear search. Linear search or sequential search is a method for finding a particular value in a total order array, that consists of checking every one of its items, one at a time and in sequence, until the desired one is found. The time complexity of linear search is O(n).Linear search is the simplest searching algorithm , and is not effecient when searching multiple solution in the array. Binary Search. Binary Search is different to the general Linear Search, Binary Search works only in a sorted array, certainly for an unsorted array it is no way but only to scan all elements so Linear Search is already optimal. The idea of Binary Search is that, in a sorted array, if the current element is not the target, then the target must be either in the left hand side or right hand side, but not both, which depends on the target is less than or greater than the current element.
2.5 Quick Select
Quick Select is an algorithm for finding the kth smallest number in a list In statistics, a median of a set of n numbers is the (n/2)-th smallest element of the set, that is to say, if the numbers in the set have been sorted, then the median is the element to separate the lower half subset and higher half subset of the set, so the median can be found by arranging from lowest value to highest value and picking the middle one. Note that if there are even numbers in the set, then the median is usually defined to be the average of the two middle values. Algorithm
In this graph each column refers to a sorted group of 5 elements, and each element with pink colour refers to the median of the corresponding group. In the SELECT algorithm, obviously step 1 and 4 takes O(n), and for step 2 consider sorting only 5 elements can be fast as O(n) because the number of comparisons is very small. Both step 3 and 5 apply SELECT algorithm recursively and the problem size is reduced to be n/5 and 3n/4 respectively.
2.6 Exercises
[1] Master theorem provides a cookbook solution in asymptotic terms (using Big O notation) for recurrence relations of types that occur in the analysis of many divide and conquer algorithms. |
© The University of Hong Kong Algorithms Library - hkual@cs.hku.hk |