CS3134: HW#4

CS3134 Homework #4
Due on November 11, 2003 at 11:00am

There are two parts to this homework: a written component worth 5 points, and a programming assignment worth 20 points. See the homework submission instructions on how to hand it in and for important notes on programming style and structure.

Note: parts in red are revisions/clarifications.

Written questions

(5 points)You're given the following list of numbers to insert in a binary search tree.
35, 21, 48, 1, 93, 87, 55, 100, 34, 97
1. (1 point) Draw the resulting tree.
2. (2 points) Using the book's convention, draw the tree that results in the deletion of 93 from the previous tree.
3. (2 points) An alternative convention in handling a two-child delete would be to take the next-smallest node (or the inorder predecessor) instead of the next-largest number. If we used this convention when deleting 93, which number would it be, and what is the general algorithm (explain in 1-2 sentences) for finding the inorder predecessor?

Programming problems

Both of these programming problems work with a spelling dictionary, i.e., you will build data structures and tools to handle large numbers of words (Strings). You will use this file as input for both programs; it's a scrambled version of a "dictionary" of words (without definitions) as distributed with Debian Linux. (To download it in IE/Mozilla/Netscape, right-click and choose Save Target As or Save Link To Disk.) The file only contains words with alphabetic characters, both upper- and lower-case, one per line. You are to process the file, ignoring but preserving case, and support the operations as described below.

(10 points) Use an array-backed list to handle this "dictionary".
1. (7 points) Build an ArrayBackedList class that uses an array to store the words. The constructor for the class must take one parameter: the number of words to be stored, which serves as a capacity property for the array. You must then implement the following methods in your ArrayBackedList:
  1. (1/2 point) public boolean insert(String s): this takes a String and inserts it at the bottom of the (occupied part of the) array. It also updates an object-level variable called longestWord on each insert, so that by the end of input, longestWord contains the length of the longest word. You should return true unless the array is full, in which case return false.
  2. (1/4 point) public String elementAt(int index): this returns the element at the specified index, or null if no such index exists.
  3. (1/4 point) public int size(): this returns the number of elements in the array.
  4. (1 point) public boolean binarySearch(String s): this performs a binary search of the array and returns true if the element exists, or false if the element does not exist. You are to implement this method recursively, i.e., binarySearch should call a private binarySearch that has the necessary parameters to work recursively. Make sure to ignore the case of the strings!
  5. (5 points) public int radixSort: this does an (iterative) alphabetic radix sort of the array. The strategy is similar, but not the same, as when sorting numbers. First of all, there will be 27 groups (26 characters plus "too-short" words), not 11.
    
    Second, words aren't "right-aligned", but rather "left-aligned". In other words, you will start the radix sort at the last character of the longest String, but only words that are that long will be grouped appropriately; all other words will be thrown into the "zero" group that holds "too-short" words. Future passes then go through every group and sort by the second-to-last-longest character, third-to-last-longest character, etc, and throw the result into the appropriate new group (note that you need a new "set" of groups for every pass!). Once you get to the "zeroth" character, you will finally have a configuration where there is no data in the "zero" group, but data in the remaining groups are in order. Read the groups starting with the "a" group, and copy the elements back into the array.
    
    You will use a doubly linked-list structure to store each individual group, i.e., you'll use an array of linked lists to store the collection of groups. Instead of having to modify the book's, use the LinkedList as supplied by Java in the java.util package. (Note that this is the only java.util data structure you should be using!)
    
    Your radix sort method will return an integer: the number of "operations" in and out of groups. That is, any element inserted into a linked list acts as one operations, and any element read out of a linked list acts as another. Copying from one group into another group acts as two operations. Add all of these up and return it from the radix sort.
2. (3 points) Implement an ArrayBackedListApp class with a main() method that does the following:
  1. Uses a BufferedReader to read the words from the aforementioned words.txt file into a new instance of ArrayBackedListApp;
  2. Radix sorts this new instance and prints out the nuber of assignments;
  3. Presents a small user interface (at a ">" prompt) with the following commands (don't worry about invalid input). There are 45,372 words in the array; you can create a static-sized array for the purposes of this assignment.
    - s word: Searches for the word, and prints out found or not found based on whether the binary search finds a word;
    - d count: Dumps the first count elements to screen. If count is 0, print all the elements to screen;
    - i index: Print out the element referenced by index. 0 would imply the first element. If no such element exists, print out not found.
    - q: Quit.
(10 points) Use a tree to handle this "dictionary".
1. (8 points) Modify the Tree class in Tree.java (download here; it's the same as the book version, with the TreeApp class and traversal methods thrown out) in the following manner:
  1. (2 points) Change the tree and its respective functions to handle String keys, and no additional data associated with the key. This also means that the find method just returns a boolean indicating whether or not the supplied key was found. Make sure to ignore case when comparing Strings!
  2. (6 points) Make the tree an indexed binary search tree. An indexed BST differs in that it keeps information that enables it to find any arbitrary element without having to do a complete inorder traversal of the tree, e.g., it can find an arbitrary element corresponding to an index in O(log n) time. The first element (in sorted order) would have index zero, and the last element would have index #elements-1, much like an array.
    
    You are to accomplish this by adding a new field to every node in the tree, called leftSize. This field represents the size of the left subtree rooted at this node (with a leaf having a leftSize of zero). Once this is accomplished, the algorithm to find a node given an index is relatively simple:
    - Set a current reference to be the root.
    - If the index we're looking for equals the leftSize of the current node, we've found our node, and we can stop.
    - If the index is less than leftSize, move current to the left child and repeat.
    - If the index is greater than leftSize, change the index we're looking for to index-(leftSize+1), change current to the right child, and repeat.
    - If current becomes null, the element does not exist in the tree.
    Given the aforementioned algorithm, you must add the field to the object, change one method and write one method.
    - (3 points) Modify insert to correctly update leftSize when nodes are inserted. You can do this without adding any iterative or recursive loops. (You don't need to worry about delete here.)
    - (3 points) Write a method called elementAt that takes one parameter (the index) and returns a String that corresponds to the data at that index. Return null if the index doesn't exist.
2. (2 points) Create a new class called TreeApp, based on the code in the previous app but modified to handle the tree as designed above. The user interface should remain the same (although we aren't doing radix sort or calculating assignments here).
(6 points) Extra credit: If you do any of this, make sure to clearly indicate you've done so in your README.
1. (3 points) In programming problem 2(a)(ii), we might also want to support deletes. Modify delete to correctly update leftSize. Then, update the TreeApp to support a delete operation (del word; if successful, it says nothing, otherwise it says not found).
2. (3 points) You may have observed that, as stated above, radix sort is rather inefficient -- we've got a few long words for which we have to keep on scanning through lots and lots of short words. Radix sort is best when we have words with similar length, not with such a heterogeneous collection as you might find in a dictionary. However, there is a modification that will make radix sort faster with a spelling dictionary:
  1. First, create a set of groups that are arranged by length. You'll have m groups, one for words of each possible length (where m is bounded by the maximum length over all the words. In the first pass, you will walk through the list of words and throw it into one of these m groups based on length.
  2. Now, as you do the radix sort, start with the mth character by grabbing all the words from the group that has words that are m characters long, and put them into the alphabetically-sorted groups. (Note that you will no longer need the "0" group, although you're welcome to leave it alone.) After that's done, continue looping to the m-1th character, grab the words from the m-1-length group, and combine it with the words in the alphabetically-sorted groups. Repeat this process over and over until we reach (and finish) length 1 words, at which point the alphabetic groups will have all the words sorted. Make sure to update the code that handles the number of group operations -- a read or write from any kind of group should add to this total.
  If you choose to do this, make sure to implement it in a separate method (call it smartRadixSort), and modify your ArrayBackedListApp code to load the words into two separate arrays, sort each of them, and display the # of comparisons for each. (Remaining operations can use just the first array as specified earlier in the homework.)