COMS4111, Spring 2013
Homework 3
- Due date: Monday April 22, 5:00pm.
- You should submit your solutions electronically. Login to CourseWorks,
select the site for our
class, then from the left pane select "Assignments" and then "HW3". Go
to "Submission" section, attach your solutions, and hit the "Submit"
button. Your solutions document must be either a plain text
file or PDF.
Problem 1 (2.8 pts)
You are given a relation R(A,B,C,D) with functional
dependencies F = {A ->
BC, C -> AD}.
- (1.8 pts.) What are all the non-trivial functional
dependencies that follow from the given F? You should only list
non-trivial dependencies. Notice:
(a) if X -> Y, do not list that X -> (any subset of Y);
this means you should find and list the largest Y for which X ->
Y.
(b) if X -> Y, do not list that (any superset of X) -> Y;
this means that you should find and list the smallest X so
that X -> Y.
Finally, state all the keys of relation R.
- (0.25 pts.) Is relation R in BCNF (briefly explain why)?
- (0.25 pts.) Is relation R in 3NF (briefly explain why)?
- (0.25 pts.) Assume we decompose relation R into R1(A,B,C)
and R2(A,D). Show whether this is a lossless join decomposition or not.
- (0.25 pts.) Show whether the above decomposition is
dependency preserving or not.
For help on how to deal with question (a) of this
problem, please
see hw3-help-FDs.
Problem 2 (7.2 pts, 0.8 pts each question)
Assume a table Emp of employee records with columns "name" and
"salary". The
constraint defined on this table is that no two employees have the same
salary. The total size of the table is 34,560MB. The table
(i.e., the
records of the
table) is stored in a heap file in chunks of 2KB blocks (all full of
records) on a single disk drive with
the following characteristics: average seek time is 8msecs,
average rotational delay is 1msecs, and average transfer rate is 1msec
per 2KB block.
- We are about to run a query on this Emp table to find the
name of the employee with a given salary, say 1,000; i.e., in SQL,
"select name from Emp where salary=1000".
In a worst-case scenario, how long this operation will
take? Express your answer in both, number of disk accesses (I/O) and in
hours.
- Assume that in addition to storing the table as described
above, we also have available a B-tree built for this table on salary -
this is the search key of the B-tree. A data entry in the tree is a
pair (salary, RID of a data record in the heap). What is the approximate cost
(in number of disk accesses) of executing the query in (a) if we use
the B-trre index?
- Now assume that we have a hash index on salary for the Emp
table. A data entry in the hash is a pair (salary, RID of a data record
in the heap). What is the approximate
cost (in number of disk accesses) of executing the query in (a) if we
use the hash index?
- We will now ask the same questions as in (a)-(c) but for a
different query. The query now is to find the maximum salary value in
the Emp table; i.e., in SQL, "select max(salary) from Emp".
Assuming no indexing of any kind, i.e., we just have the records of the
table in the heap, what is the cost of this query (in number of disk
accesses)?
- For the query in (d), and assuming there is a B-tree as
described in (b), what is the approximate
cost (in number of disk accesses) of executing the
query if we use the B-trre?
- For the query in (d), and assuming there is a hash index as
described in (c), is the hash index usueful to compute the query? If
no, explain. If yes, i.e., you think it is better to use the hash index
instead of the heap, explain how you do this search and what is the approximate cost
(in number of disk accesses) of executing the query if we use the hash
index?
- We will now ask the same questions as in (a)-(c) and
(d)-(f) but with a different query, actually an update: we want to
insert a new employee record with salary 1,000. Assuming no
indexing of any kind, i.e., we just have the records of the table in
the heap, what is the cost of this operation (in number of disk
accesses)?
- For the operation in (g), and assuming there is a B-tree as
described in (b), what is the approximate
cost (in number of disk accesses) of executing the
operation if we use the B-trre?
- For the query in (g), and assuming there is a hash index as
described
in (c), what is the approximate
cost (in number of disk accesses) of executing the
operation if we use the hash index?
Note 1: Assume the cost of everything else besides disk
accesses is
negligible and, to simplify calculations, assume 1 MB = 1,000 KB.
Note 2: Your answers in cost related questions must be plain
numbers (e.g., 3, 5, 100, 100.56) that include no formulas and/or
computation of any kind (our TAs are not good in aritmetic operations).
For example, the following types of answers will automatically get zero
with no further consideration: 3/5; square root of 5; log base
2 of some N; cube of N square divided by log base 2 of N cube
multiplied by log base 10 N; big O(log(N)); etc. It's ok, of course, to
explain your answer by indicating the operations you performed to come
up with your final (number) answer.