DATA STRUCTURES AND ALGORITHMS CS 3139
Lecture: February 18th, 1999
ANNOUNCEMENTS
-
We distributed class surveys for you guys to fill out
-
HW #2 is out, due 03/02/1999
-
Reading: Chapter 5
REVIEW
Example of building
Rotations are based on the pattern exhibited by the tree
TODAY
B-TREES
B-tree of order M is an M-ary tree such that
-
Data items are stored at leaves
-
Non-leaf nodes store a max of M-1 keys to guide search. Keys represents
the smallest key in subtree i+1
-
Root is either a leaf or a node that has 2 to M children
-
All non-leaf nodes have ceil[M/2] to M children
-
All leaves are at the same depth and have between ceil[L/2] and L children
In class, we looked at valid B-trees to illustrate how the specified properties
are satisfied.
We also examined invalid B-trees.
Why B-trees? Lets look at an example that illustrates the power
of B-trees.
-
We are asked to operate the database of New York resident driving records
-
Lets say there are 10 million records
-
Each record has a 32 byte [4 chars] name key
-
Assume each driving record is 256 bytes
-
Total storage needed: 2.5 gigs => Obviously, we can't keep everything
in memory! We have to store some stuff on disk.
-
Lets say that a disk access reads in a 8192 byte data block. Remember
that disk accesses are very slow!
How can a B-tree help?
-
It makes a very wide and short tree. Lets have each node represent
one disc access
-
This reduces the number of disk accesses
keys = 32 bytes
B order M tree = M-1 keys
node = 8192
32M - 32 bytes
4M bytes
8192 >= 36M -32 bytes ==> M = 228
Choose L so that we can maximize the number of records we can store
on a disk block: 8192>= 256L ==> L=32
Each non-leaf node has at least 114 branches
Each Leaf has 16-32 records
10 million records => At least 625,000 leaves
Worst case tree analysis - i.e. root has 2 branches (not M):
-
4 levels
-
4 disk access ==> if each acces is 3ms then 12ms for M=228
-
For binary tree: 5sec
-
For linked list: 8 hr
We demonstrated how to perform insertions on B-trees
-
Easy when leaf node has less than L elements
-
Else, need to re-balance
Why does the root have 2-M children? Answer: If we need to
split at the root, then the resulting tree still meets the B-tree criteria.
If we delete things, we may need to combine instead split to re-balance.
HASHING
lists - O(N)
trees - O(logN)
Can we get O(1)? Yes!
Hash Tables
-
Hash tables can insert and delete in O(1)
-
Not so good at findmin(), or findmax()
-
Consists of an array of some fixed size
-
Store items based on keys
-
key -> index -> lookup
Hash Function Properties
-
Simple to compute
-
2 distinct keys get different indices
Issues in hashing
-
Choose hash function
-
Choose table size
-
Decide what to do when keys map to the same index
Where are hash tables used: CPU cache, etc.
Nhat Minh Dau, nmd13@columbia.edu