DATA STRUCTURES AND ALGORITHMS CS 3139
Lecture:  February 25th, 1999





ANNOUNCEMENTS
 


REVIEW

Table ADT

Hash Table Issues


TODAY

Priority queues
Performance of collision schemes
Extendable hashing
 

PRIORITY QUEUES

Recall queues:  First in, first out scheme
Suppose something is more important, for example, the priority()

Priority queues


SIMPLE IMPLEMENTATIONS

Linked list

Sorted linked list BST Priority Queue Presented Binary Heap


COLLISIONS

Performance of collision schemes
f(load) ==> load = #elements N/table size M = X
X large = more expensive to do finds, etc.

Separate Chaining

Unsuccessful search

Successful search Good idea to have table size equal the number of elements:  X = 1

Open Addressing: X = 1 by definition

Random Probing

Linear Probing

Average cost of operations depend on how the data is clustered.
For example, if table is half full

Finding average number of probes for different cluster lengths: Landmark results of '62 Knuth report Quadradic Probing
  Double Hashing:  Apply a second hash function to inputs.

Linear probing is simple, but gets bad quickly.  Often, random probing is better
 
 

BIRTHDAY PARADOX

Intuition:  As X gets large, collisions increase.
X doesn't have to be too large to have collisions however!

Von Misses Birthday Paradox:  If > 23 people in room, then >50% chance two people have the same birthday

Q(N) = Probability that when we randomly toss person into table, there are no collisions
P(N) = Probability of at least 1 collision

Q(N)+P(N) = 1
P(N) = 1 - Q(N)

Q(N):    Q(1) = 1
             Q(2) = 364/365
             Q(3) = 1 * 364/365 * 363/365
             Q(N) = Q(N-1) * (365-N+1)/365
                       = (365*364*...(365-N+1))/365^N
                       = 365!/(365^N(365-N)!)

              P(N) = 1 - 365!/(365^N(365-N)!)

This is counter-intuitive but true!  If a hash table is 10% full, then >50% collision probability!
 
 

EXTENSIBLE HASHING

Gives us a way to deal with data too large to fit into memory
Minimizes disk accesses

Recall B-tree

We can make root a hash table instead Two issues to keep in mind:  Partial keys and chain limits
 
 

PARTIAL KEYS
 
Data
Hash
A
1010
B
0010
C
1001
D
0101
E
1010
F
0110

Take 2 bits
 
00
01
10
11
B
D
A
 
 
F
C
 
   
E
 

Take 1 bit
 
0
1
B
A
D
C
F
E

Take 0 bits
 
A
B
C
D
E
F

Trade-off:  Size of directory vs. list size
 
 

CHAIN LIMITS

Given a chain limit, if our lists get too long, we can split the lists up by applying another hash function.  We extend the bits, hence this technique is applied to
extensible hash tables.

Extensible Hash Table of order d


 
 
  Nhat Minh Dau, nmd13@columbia.edu