COMS4111, Spring 2013
Homework 4

Problem 1 (5 pts, 1.25 pt/question)

Consider a relation R(a,b,c,d,e), where attribute "a" is the primary key, containing 5,000,000 records. The values of all attributes are integers uniformely distributed over the range 0 to 4,999,999, i.e., the size of this integer domain is 5 million. R is stored as a sorted file in R.a order. The pages of the file are not sequential (they are in artitrary disk locations) and linked with each other to facilitate scanning; each page of the file holds 10 records, on average. 

For each of the following queries, state which of the following options (or combination thereof) is most likely to be the best and briefly justify your answer. Your justification should be based on the approximate cost (in number of pages accessed) associated with the chosen option and you will need to consider things like selectivity factors, clustering, and anything else you think it may affect performance.


The queries:

  1. SELECT * FROM R WHERE a < 50,000 AND b < 50,000
  2. SELECT * FROM R WHERE a = 50,000 AND b < 50,000
  3. SELECT * FROM R WHERE a > 50,000 AND b = 50,000
  4. SELECT * FROM R WHERE a = 2,500,000 AND b = 50,000


The options:

  1. Use a clustered B+ tree index on attribute R.a.
  2. Use a hash index on attribute R.a.
  3. Use a composite hash index on attributes (R.a, R.b).
  4. Use an unclustered B+ tree index on attribute R.b.
  5. Just use the sorted file for R directly. 

Problem 2 (5 pts., 2.5 pts/question)

Consider the following schedule S: R3(A) R1(B) W2(B) R1(A) W3(A) C2 C1 C3

  1. Is the above schedule S serializable? If not, why? If yes, show all equivalent serial schedules.
  2. Regardless of your answer in (a) above, is S a valid schedule under the the Strict 2PL? If yes, explain. If not, show the first operation that would not be allowed under the Strict 2PL and explain why.