DATA STRUCTURES AND ALGORITHMS CS 3139
Lecture:  April 1st, 1999








ANNOUNCEMENTS

Reading - Chapter 8
Reminder - HW3 due in class Tuesday 04/06/99
 
 

REVIEW

Quicksort

TODAY
 

QUICKSELECT

Use quicksort to solve selection problem
Recall:  Find kth element out of N elements, let |Si| = Number of elements in S

Pivot p is an element of S
Partition S-{P} into S1, S2 just like quicksort
If k <= |Si|, kth sublet G S1
    return quickselect (S1, k)
If k = 1 + |Si|, kth sublet = p
Else kth sublet in S2
    return quickselect (S2, k-|S1|-1)

We use one recursive call instead of 2
Worst case O(N^2)
Average case O(N)

Best-case

T(N)    = T(N/2) + cN
            = T(N/2) + C(N/2) + cN
            = T(N/8) + C(N/4) + C(N/2) + CN
            ...
            = T(1) + C(2+4+....+N/2+N)
            = T(1) + CN + (2/N + 4/N +...+N/2+1)
            = T(1) + CN * (1+1/2+1/4+...2/N)
            = O(N)
 
 

GENERAL LOWER BOUND FOR SORTING

We have O(NlogN) algorthims for sorting
Can we do better?

We will show that any algorithm for sorting using only comparisons requires Omega(NlogN) comparisons in the worst case
and on average as well.

Idea - Decision Tree - An abstraction used to prove lower bounds [We provided an example of a decision tree in class, please see the text for an example]

Binary decision tree of depth d - At worst 2^d leaves
    N elements => N! permutations
    Need 2^d = N!
    d  = logN!
        = logN + log(N-1) + log(N-2) +...+log1
        >= logN + log(N-1) + log(N-2) +...+log(N/2)
        >= (N/2)log(N/2)
        >= N/2log(N) - N/2(log2)
        == Omega(NlogN)
 
        Number of decisinos in worst case = depth of decision tree
        ==> lower bound = Omega(NlogN)
 

BUCKET SORT

Sort a set of positive integers A1, ..., AN suck that each Ai < M

Does not use simple comparisons
Instead, we use M-way comparisons in unit time
An example of a specialization better then the general case
 

EXTERNAL SORTING
 

Shellsort, quicksort, etc Model for external sorting Basic Idea [ In class we went through an example that illustrates the basic ideas behind external sorting ]

Takes log(N/M) passes + initial runstruct pass ==> i.e. log(Number of blocks of data)
We used a mutil-way merge - k-way merge instaed of a just 2-way
Takes logk(N/M)
Merge - consider k elements, use priority queue, deletemin
Problem - k-way merge requires 2k tapes  ==> expensive
 

POLYPHASE MERGE

k-way merge with only k+1 tapes.  Don't split evenly, split unevenly.

What is good about an uneven distribution?
Number runs = FibN
==> 2 sets of runs Fib(N-1), Fib(N-2)
Generalized for k : Fib(k)N = Fib(k)(N-1) + Fib(k)(N-2) + ... + Fib(k)(N-k)
 

REPLACEMENT SELECTION

Replacement selection Longer runs, fewer disk accesses, typically 2M long instead of just M.  Even better if input presorted.
 

  


  Nhat Minh Dau, nmd13@columbia.edu