_______ | o L o | ------- / \ / \ / \ _______ _______ | o H o | | o P o | ------- ------- / \ / \ _______ _______ _______ _______ | o D o | | o J o | | o N o | | o R o | ------- ------- ------- ------- / \ / \ / \ / \ ___________ ___________ _______ _______ _______ _______ _______ _______ | o A o B o | | o E o F o | | o I o | | o K o | | o M o | | o O o | | o Q o | | o S o | ----------- ----------- ------- ------- ------- ------- ------- -------
0 | 1 | 4371 2 | 3 | 1323 -> 6173 4 | 4344 5 | 6 | 7 | 8 | 9 | 4199 -> 9679 -> 1989
0 | 9679 1 | 4371 2 | 1989 3 | 1323 4 | 6173 5 | 4344 6 | 7 | 8 | 9 | 4199
0 | 9679 1 | 4371 2 | 3 | 1323 4 | 6173 5 | 4344 6 | 7 | 8 | 1989 9 | 4199
0 | 1 | 4371 2 | 3 | 1323 4 | 6173 5 | 9679 6 | 7 | 4344 8 | 9 | 4199
When rehashing, we choose a table size that is roughly twice as large and prime. In our case, the appropriate new table size is 19, with hash function h(x) = x(mod 19).
0 | 4199 1 | 4371 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9679 9 | 10| 11| 12| 1323 -> 4344 13| 1989 14| 15| 16| 17| 6173 18|
0 | 4199 1 | 4371 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9679 9 | 10| 11| 12| 1323 13| 1989 14| 4344 15| 16| 17| 6173 18|
0 | 4199 1 | 4371 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9679 9 | 10| 11| 12| 1323 13| 1989 14| 15| 16| 4344 17| 6173 18|
0 | 4199 1 | 4371 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9679 9 | 10| 11| 12| 1323 13| 1989 14| 15| 4344 16| 17| 6173 18|
10
After inserting 12:
10 / 12
After inserting 1:
1 / \ 12 10
After inserting 14:
1 / \ 12 10 / 14
After inserting 6:
1 / \ 6 10 / \ 14 12
After inserting 5:
1 / \ / \ 6 5 / \ / 14 12 10
After inserting 8:
1 / \ / \ 6 5 / \ / \ 14 12 10 8
After inserting 15:
1 / \ / \ 6 5 / \ / \ 14 12 10 8 / 15
After inserting 3:
1 / \ / \ 3 5 / \ / \ 6 12 10 8 / \ 15 14
After inserting 9:
1 / \ / \ / \ 3 5 / \ / \ / \ / \ 6 9 10 8 / \ / 15 14 12
After inserting 7:
1 / \ / \ / \ / \ 3 5 / \ / \ / \ / \ 6 7 10 8 / \ / \ 15 14 12 9
After inserting 4:
1 / \ / \ / \ / \ 3 4 / \ / \ / \ / \ 6 7 5 8 / \ / \ / 15 14 12 9 10
After inserting 11:
1 / \ / \ / \ / \ 3 4 / \ / \ / \ / \ 6 7 5 8 / \ / \ / \ 15 14 12 9 10 11
After inserting 13:
1 / \ / \ / \ / \ 3 4 / \ / \ / \ / \ 6 7 5 8 / \ / \ / \ / 15 14 12 9 10 11 13
After inserting 2 (resulting heap):
1 / \ / \ / \ / \ 3 2 / \ / \ / \ / \ 6 7 5 4 / \ / \ / \ / \ 15 14 12 9 10 11 13 8
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 14 6 5 8 / \ / \ / \ / \ 15 3 9 7 4 11 13 2
After percolateDown(7)
:
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 14 6 5 2 / \ / \ / \ / \ 15 3 9 7 4 11 13 8
After percolateDown(6)
:
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 14 6 4 2 / \ / \ / \ / \ 15 3 9 7 5 11 13 8
After percolateDown(5)
:
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 14 6 4 2 / \ / \ / \ / \ 15 3 9 7 5 11 13 8
After percolateDown(4)
:
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 3 6 4 2 / \ / \ / \ / \ 15 14 9 7 5 11 13 8
After percolateDown(3)
:
10 / \ / \ / \ / \ 12 1 / \ / \ / \ / \ 3 6 4 2 / \ / \ / \ / \ 15 14 9 7 5 11 13 8
After percolateDown(2)
:
10 / \ / \ / \ / \ 3 1 / \ / \ / \ / \ 12 6 4 2 / \ / \ / \ / \ 15 14 9 7 5 11 13 8
After percolateDown(1)
(resulting heap):
1 / \ / \ / \ / \ 3 2 / \ / \ / \ / \ 12 6 4 8 / \ / \ / \ / \ 15 14 9 7 5 11 13 10
After the first deleteMin
:
2 / \ / \ / \ / \ 3 4 / \ / \ / \ / \ 6 7 5 8 / \ / \ / \ / 15 14 12 9 10 11 13
After the second deleteMin
:
3 / \ / \ / \ / \ 6 4 / \ / \ / \ / \ 13 7 5 8 / \ / \ / \ 15 14 12 9 10 11
After the third deleteMin
(resulting heap):
4 / \ / \ / \ / \ 6 5 / \ / \ / \ / \ 13 7 10 8 / \ / \ / 15 14 12 9 11
deleteMin
:
2 / \ / \ / \ / \ 3 4 / \ / \ / \ / \ 12 6 5 8 / \ / \ / \ / 15 14 9 7 10 11 13
After the second deleteMin
:
3 / \ / \ / \ / \ 6 4 / \ / \ / \ / \ 12 7 5 8 / \ / \ / \ 15 14 9 13 10 11
After the third deleteMin
(resulting heap):
4 / \ / \ / \ / \ 6 5 / \ / \ / \ / \ 12 7 10 8 / \ / \ / 15 14 9 13 11
PROBLEM 1
File README:
README Data Structures and Algorithm Analysis in JAVA (W3139-02) Professor: Jason Nieh Homework #3: HuffmanCoding project Submitted by: Changbin Wang ================================================================================ Files submitted: -------------------------------------------------------------------------------- For the Huffman code project: ./HuffmanCoding: HuffmanCoding.java -- Main class of the Huffman coding project HuffmanTree.java -- Structure for holding Huffman tree HuffmanQueue.java -- Priority queue for constructing Huffman tree HuffmanNode.java -- Nodes for Huffman Tree CodeTable.java -- Code table for storing the code value for chars CodeTableItem.java -- Item for code table PrintStack.java -- Print tool for word wrapping Comparable.java(*) Overflow.java(*) result.log -- Test runs (* indicate that the class codes are copied directly from DataStructures package) To compile: At CUNIX prompt(in ./HuffmanCoding directory): javac *.java To run: At CUNIX prompt(in ./HuffmanCoding directory): java HuffmanCoding Log File: Contains several test runs of the program ================================================================================ This project will create a Huffman code for any given message file. This given message file should be in ASCII data format. The following function realized in this project: 1 -- Generate a frequency table for the given file by calculating the occurance of characters in the file; The program will display the frequency table upon user selection; 2 -- Construct a Huffman tree for the given file. The frequency table will be convert into a priority queue according to the frequencies of characters; the priority queue will then be used to construct the Huffman tree. The program will display the Huffman tree upon user selection; 3 -- Generate a code table by traversing the Huffman tree. The code is generated when traversing the tree recursively. Note that in the code table, the code of a character is stored as a char value, this value will be transformed into a BitSet bit pattern. The user will be prompted to display the code table; 4 -- Generate the bit pattern (BitSet) of the input file. The bit pattern of the given file is generated from the code table and will be displayed upon user selection; 5 -- Given a bit pattern (BitSet) of the given file, the program will translate it back to its original message. This function is useful when try to use Huffman coding to compress ascii datafile. ================================================================================ Note Points: a -- If the given datafile is not ended by a CRNL combination (EOF), the program will add such a combination to the Huffman tree (the file is not changed). An EOF is read as two distinct character: a carriage return and a new line feed, i.e., when calculating the character number, an EOF is taken as two characters (although it is OK to take then as just one character). b -- The program will work for any file with different size. Characters with value lower than 33 or more than 166 is NAMED as the following: ASCII Name ================ 10 NL 13 CR 32 SB other CC c -- The EOF of a file will be displayed in the result as "EOF" or "" and will be placed in the frequency table with index 0.
File HuffmanCoding.java:
import java.io.*; import java.util.BitSet; /** * HuffmanCoding class, the main class for the Huffman coding project */ public class HuffmanCoding { public static void main (String [] args) { BufferedReader in = new BufferedReader( new InputStreamReader(System.in)); try { // Prompt for file name and create a file object System.out.print("Please input the text file name: "); String fileName = in.readLine(); System.out.println(); File f = new File(fileName); // Exit the program if the file cannot be found if (!f.exists() ) { System.err.println("Error: File not found!"); System.exit(1); } // New instance of a HuffmanCoding class HuffmanCoding hc = new HuffmanCoding(f); // Print out the frequency table System.out.print("Print out the frequency table (y/n)?"); String yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0) == 'Y' || yOrn.charAt(0) == 'y') { hc.printOriginMessage(f); System.out.println(); hc.printFreqs(); } System.out.println(""); HuffmanTree ht = new HuffmanTree(hc.freqs, hc.charCounter); System.out.println(""); // Print the Huffman tree System.out.print("Print out the Huffman tree (y/n)?"); yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0) == 'Y' || yOrn.charAt(0) == 'y') { ht.printHuffmanTree(); } System.out.println(""); System.out.println(""); // Print the code table System.out.print("Print out the code table(y/n)?"); yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0) == 'Y' || yOrn.charAt(0) == 'y') { hc.printOriginMessage(f); ht.printCodeTable(); } System.out.println(""); System.out.println(""); // Print the bit pattern of the input message System.out.print("Print the bit pattern for the input file (y/n)?"); yOrn = in.readLine(); System.out.println(); BitSet bs = hc.encode(f, ht); // The logical size of a BitSet; used only for JDK1.2 //int bssize = bs.length(); // The logical size of a BitSet; used for JDK1.1.* int bssize = CodeTable.lengthOf(bs); if(yOrn.charAt(0) == 'Y' || yOrn.charAt(0) == 'y') { System.out.println("Here is the coded message:\n"); hc.printBitPattern(bs, 32); System.out.println("\n\n"+(f.length()+1)+" characters encoded in "+ (bssize-1)+ " bits, "+ ((float)(bssize-1))/f.length() + " bits per char."); } System.out.println(""); System.out.println(""); // Translate the bit pattern to ascii message System.out.print("Translate the bit pattern back to its original form(y/n)?"); yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0) == 'Y' || yOrn.charAt(0) == 'y') { System.out.println("DECODED MESSAGE:\n"); String ds = hc.decode(bs, ht); System.out.println(ds); System.out.println(); System.out.println("Note: <EOF> indicates the EOF of the input message file."); } System.out.println(""); } catch (Exception e) // Catch all the exceptions { System.out.println(e); } } /** * Constructor for HuffmanCoding * @param txtFile the file to be coded */ public HuffmanCoding(File txtFile) { this.txtFile = txtFile; freqs = new int[256]; for( int i = 0; i < 256; i++) freqs[i] = 0; charCounter = 0; try { // Get file size int fileSize = (int)txtFile.length(); // Include EOF in the file size fileSize ++; // Open the fileinput stream FileInputStream in = new FileInputStream(txtFile); int index = -1; for(int i = 0; i < fileSize; i++) { // Read the file byte by byte index = (int)in.read(); // Read the EOF simbol and put into freqs[0] if (index == -1) { if(freqs[0] == 0) charCounter ++; freqs[0] ++; } else if (index > 0 && index < 256) { if (freqs[index] == 0) charCounter ++; // Put the byte into the array freqs[index]++; } } // Report erro if NULL appears in the file; should never happen if(freqs[0] != 1) System.out.println("There should not be a " + "NULL character and only one EOF character in the file! " + freqs[0]); } catch (IOException e) { System.out.println("Error: Cannot read file!"); } } /** * Print out the original message */ public void printOriginMessage(File f) { System.out.println("ORIGINAL INPUT MESSAGE:"); try { // Get file size int fileSize = (int)f.length(); // Open the fileinput stream FileInputStream in = new FileInputStream(f); char [] b = new char[fileSize]; for(int i=0;i<fileSize;i++) b[i] = (char)in.read(); // Output the message in the file and a <EOF> indicating // the EOF character of the file System.out.print(String.copyValueOf(b, 0, fileSize)+"<EOF>"); System.out.println(); } catch (IOException e) { System.out.println("Error: Cannot read file!"); } } /** * Print out the frequency table of the file */ public void printFreqs() { System.out.println(" Frequency Table "); System.out.println("=============================="); System.out.println("CHARACTER | ASCII | FREQUENCY "); System.out.println("=============================="); for(int i = 0; i < 256; i++) { if(freqs[i] > 0) { char c = (char)i; String s = String.valueOf(c); if( i < 33 || i > 126) { if( i == 10 ) s = "NL"; else if( i == 13 ) s = "CR"; else if( i == 32 ) s = "SB"; else if( i == 0 ) s = "EOF"; else s = "CC"; } String sp1 = space(3); String sp2; if( i > 0 ) sp2 = space(13-lenOfInt(i)-s.length()); else sp2 = space(13-5-s.length()); String sp3 = space(9-lenOfInt(freqs[i])); if( i > 0 ) System.out.print(sp1 + s + sp2 + i + sp3 + freqs[i]); else System.out.print(sp1 + s + sp2 +"-1(*)"+ sp3 + freqs[0]); System.out.println(); } } System.out.println(); System.out.println("* Not indicate the real ASCII code"); } /** * Return the BitSet of a text file * @param f, the file to be coded * @param ht, the Huffman tree for the file */ public BitSet encode( File f, HuffmanTree ht ) { return ht.encode(f); } /** * Return the ascii string for the bit pattern * @param bs, the bitset * @param ht, the Huffman tree */ public String decode( BitSet bs, HuffmanTree ht ) { return ht.decode(bs); } /** * Print the bitSet in a bit pattern * @param bs, the BitSet * @param lineWidth, the line width when printing with word wrapped manner */ public void printBitPattern( BitSet bs , int lineWidth) { PrintStack ps = new PrintStack(lineWidth); // The logical size of a BitSet; used only in JDK1.2 //int n = bs.length(); // The logical size of a BitSet; used only in JDK1.1.* int n = CodeTable.lengthOf(bs); if(!bs.get(n-1)) System.err.println("Wrong code!"); else for(int i = n - 2; i >= 0; i--) ps.printWord(bs.get(i)?"1":"0"); } File txtFile; // The file to be coded int [] freqs; // Integer array to store character frequencies int charCounter; // Number of different characters /** * Used when print the frequency table */ private String space( int i ) { char [] c = new char[i]; for (int j = 0; j< i; j++) c[j]=' '; return String.copyValueOf(c); } /** * Used when print the frequency table */ private int lenOfInt( int i) { int tmp = i; int len = 0; if(i == 0) return 1; while(tmp > 0) { tmp /= 10; len ++; } return len; } }
File HuffmanTree.java:
import java.io.*; import java.util.BitSet; public class HuffmanTree { /** * Construct a Huffman tree using a priority queue * @param freqs the frequency table * @param charCounter the unique character number in the frequency table */ public HuffmanTree (int [] freqs, int charCounter) { this.freqs = freqs; if (charCounter <= 0) { System.err.println ("Empty input file!"); System.exit(1); } else { // The priority queue used to construct a huffman tree hQueue = new HuffmanQueue(charCounter); try { // Insert elements in the frequency table into the // priority queue. The elements of the queue is Huffman nodes for (int i = 0; i < 256; i++) if (freqs[i] > 0) hQueue.insert( new HuffmanNode(freqs[i], null, (char)i)); HuffmanNode item1, item2; for (int i = 0; i < charCounter; i++) { item1 = hQueue.deleteMin(); if (hQueue.isEmpty()) { // Remember the root of the Huffman tree root = item1; break; } item2 = hQueue.deleteMin(); // Insert the new formed Huffman tree into the priority queue hQueue.insert ( new HuffmanNode(item1.freq + item2.freq, item1, item2)); } } catch (Overflow e) { System.err.println("The queue is full!"); } } codeTable = traverse(); } /** * Return the bit pattern for the file to be encoded * @param f the file to be encoded */ public BitSet encode( File f ) { int codeLength = codeTable.totalCodeLength(); BitSet bs = new BitSet(); bs.set(codeLength); try { // Get file size int fileSize = (int)f.length() +1; // Open the fileinput stream FileInputStream in = new FileInputStream(f); int bitIndex = codeLength - 1; for(int i = 0; i < fileSize; i++) { // Read the file byte by byte int b = (int)in.read(); char c = (b>=0)? (char)b: (char)0; BitSet cbs = codeTable.transCode(c); //The logical size of BitSet; used for JDK1.2 //int n = cbs.length(); //The logical size of BitSet; used for JDK1.1.* int n = CodeTable.lengthOf(cbs); for(int j = n - 2; j >= 0; j--) { if (cbs.get(j)) bs.set(bitIndex --); else bitIndex --; } } } catch (IOException e) { System.out.println("Error: Cannot read file!"); } return bs; } /** * Decoding the given bit pattern into ascii string using the Huffman tree * @param bs the BitSet to be decoded * @return the decoded string */ public String decode( BitSet bs ) { String s = ""; // Logical size of a Bitset; used for JDK1.2 //int n = bs.length(); // Logical size of a Bitset; used for JDK1.2 int n = CodeTable.lengthOf(bs); if( n > 0 && root == null) { System.err.println("Wront input or Huffman tree!"); System.exit(1); } else if (root == null) { System.out.println("Empty input and Huffman tree!"); System.exit(1); } HuffmanNode hn = root; int bitIndex = n - 2; while(bitIndex >= -1) { HuffmanNode hn1 = (bitIndex >-1)?searchThroughPath(hn, bs.get(bitIndex)): searchThroughPath(hn, true); if(hn1 == null) { char tc = hn.getChar(); String ts = (((int)tc) >0)?String.valueOf(tc):"<EOF>"; s += ts; hn = root; } else { hn = hn1; bitIndex --; } } return s; } /** * Print out the Huffman tree */ public void printHuffmanTree() { printHuffmanTree(root, 0); } /** * Print out the code table */ public void printCodeTable() { codeTable.printTable(); } private HuffmanQueue hQueue; private CodeTable codeTable; private HuffmanNode root; private int [] freqs; /** * Return a code table by traverse the Huffman tree * @return the code table */ private CodeTable traverse() { CodeTable ct = new CodeTable(); traverse( root, ct, (char)0, 0); return ct; } /** * Internal method to traverse the tree in order to get a code table * @param t the tree node * @param ct the code table to be generated * @param code the code of a character * @param len the length of the code (bit number) */ private void traverse( HuffmanNode t, CodeTable ct, char code, int len) { if (t.left == null) { char c = t.getChar(); ct.put( c, new CodeTableItem( code, len, freqs[(int)c] )); } else { code <<= 1; len ++; traverse (t.left, ct, code, len); traverse ((HuffmanNode)t.right, ct, (char)((int)code | 1), len); } } /** * Internal method to determine the path of a bit pattern when decoding a * bit pattern back into a ascii string * @param t the HuffmanNode indicating the tree node * @param right the direction switch * @return the next tree node along the path; null if t is a leaf */ private HuffmanNode searchThroughPath( HuffmanNode t, boolean right) { HuffmanNode hn; if( t.left != null && right ) hn = (HuffmanNode)t.right; else if( t.left != null && !right) hn = t.left; else hn = null; return hn; } /** * Internal method to print out the tree by recursive * @param t the node * @param indent the space before print out the node */ private void printHuffmanTree( HuffmanNode t, int indent ) { if (t == null ) return; HuffmanNode tmpright; String s = ""; if (t.left == null) tmpright = null; else tmpright = (HuffmanNode)t.right; printHuffmanTree(tmpright, indent + 4); for( int i=0; i<indent; i++) System.out.print(" "); if(t.left == null) { char c = t.getChar(); s = "("+String.valueOf(c)+")"; if( (int)c < 33 || (int)c > 126) { if( (int)c == 10 ) s = "(NL)"; else if( (int)c == 0 ) s = "(EOF)"; else if( (int)c == 13 ) s = "(CR)"; else if( (int)c == 32 ) s = "(SB)"; else s = "(CC)"; } } System.out.println(t.freq+s); printHuffmanTree(t.left, indent+4); } }
File HuffmanQueue.java:
/** * Implements a priority queue with binary heap. * Note that all "matching" is based on the compareTo method. * @author Changbin Wang */ public class HuffmanQueue { /** * Construct the priority queue with default capacity */ public HuffmanQueue() { this(DEFAULT_CAPACITY); } /** * Construct the priority queue. * @param capacity the capacity of the priority queue. */ public HuffmanQueue( int capacity ) { currentSize = 0; array = new HuffmanNode[ capacity + 1 ]; } /** * Insert into the priority queue, maintaining heap order. * Duplicates are allowed. * @param x the item to insert. * @exception Overflow if container is full. */ public void insert( HuffmanNode x ) throws Overflow { if( isFull( ) ) throw new Overflow( ); // Percolate up int hole = ++currentSize; for( ; hole > 1 && x.compareTo( array[ hole / 2 ] ) < 0; hole /= 2 ) array[ hole ] = array[ hole / 2 ]; array[ hole ] = x; } /** * Find the smallest item in the priority queue. * @return the smallest item, or null, if empty. */ public HuffmanNode findMin( ) { if( isEmpty( ) ) return null; return array[ 1 ]; } /** * Remove the smallest item from the priority queue. * @return the smallest item, or null, if empty. */ public HuffmanNode deleteMin( ) { if( isEmpty( ) ) return null; HuffmanNode minItem = findMin( ); array[ 1 ] = array[ currentSize-- ]; percolateDown( 1 ); return minItem; } /** * Establish heap order property from an arbitrary * arrangement of items. Runs in linear time. */ private void buildHeap( ) { for( int i = currentSize / 2; i > 0; i-- ) percolateDown( i ); } /** * Test if the priority queue is logically empty. * @return true if empty, false otherwise. */ public boolean isEmpty( ) { return currentSize == 0; } /** * Test if the priority queue is logically full. * @return true if full, false otherwise. */ public boolean isFull( ) { return currentSize == array.length - 1; } /** * Make the priority queue logically empty. */ public void makeEmpty( ) { currentSize = 0; } private static final int DEFAULT_CAPACITY = 256; private int currentSize; // Number of elements in heap private HuffmanNode [ ] array; // The heap array /** * Internal method to percolate down in the heap. * @param hole the index at which the percolate begins. */ private void percolateDown( int hole ) { int child; HuffmanNode tmp = array[ hole ]; for( ; hole * 2 <= currentSize; hole = child ) { child = hole * 2; if( child != currentSize && array[ child + 1 ].compareTo( array[ child ] ) < 0 ) child++; if( array[ child ].compareTo( tmp ) < 0 ) array[ hole ] = array[ child ]; else break; } array[ hole ] = tmp; } }
File HuffmanNode.java:
public final class HuffmanNode implements Comparable { /** * Construct a HuffmanNode with two HuffmanNodes as leaves * @param freq the priority number when used in priority queue * @param left the left leaf * @param right the right leaf */ public HuffmanNode(int freq, HuffmanNode left, HuffmanNode right) { this.freq = freq; this.left = left; this.right = right; } /** * Construct a HuffmanNode with a HuffmanNode and a character as leaves * @param freq the priority number when used in priority queue * @param left the left leaf, this leaf should always be null * @param c the charactor as right leaf */ public HuffmanNode(int freq, HuffmanNode left, char c) { this.freq = freq; this.left = left; this.right = new Character(c); } /** * Return the character stored in this node when this node is a leaf * @return the character stored in this node */ public char getChar() { return ((Character)right).charValue(); } /** * Implement the interface method of Comparable * @param rhs the right hand side to be compared * @return the value of comparision */ public int compareTo( Comparable rhs ) { return freq < ((HuffmanNode)rhs).freq ? -1 : freq == ((HuffmanNode)rhs).freq ? 0 : 1; } int freq; HuffmanNode left; Object right; }
File CodeTable.java:
import java.io.*; import java.util.BitSet; /** * CodeTable class for holding the code table */ public final class CodeTable { /** * Construct a code table */ public CodeTable() { // Initialize the array the number of ASCII characters array = new CodeTableItem [256]; } /** * Put a character and its code into the code table * @param c the character * @param code the code information of the char */ public void put ( char c, CodeTableItem code) { if ((int)c >= 256) System.out.println("Wrong index"); else array [(int)c] = code; } /** * Return the code information of a character * @param c the char to be searched * @return the code information of the char */ public CodeTableItem get( char c ) { if ((int)c >=256) return null; else return array[ (int)c ]; } /** * Return the total code length for a encoded file * @return teh total code length */ public int totalCodeLength() { int sum = 0; for(int i = 0; i < 256; i++) if(array[i] != null) sum += array[i].len * array[i].freq; return sum; } /** * Print a bit pattern of a given character * @param c the char to be printed */ public void printCode( char c ) { BitSet bs = transCode(c); // The logical size of the bitset; used only for JDK1.2 //int n = bs.length(); // The logical size of the bitset; used only for JDK1.1.* int n = lengthOf(bs); if(!bs.get(n-1)) System.err.println("Wrong code!"); else { for(int i = n - 2; i >= 0; i--) { if (bs.get(i)) System.out.print("1"); else System.out.print("0"); } } } /** * Transform a code into bit pattern for a given char * @param c the character * @return the BitSet bit pattern */ public BitSet transCode( char c ) { CodeTableItem codeItem = get( c ); int len = codeItem.len; char code = codeItem.code; BitSet bs = new BitSet(); bs.set(len); transCode(bs, (int)code, 0); return bs; } /** * Print out the whole table */ public void printTable() { System.out.println(" Code Table "); System.out.println("===================================================="); System.out.println("CHARACTER | ASCII | FREQUENCY | BIT-PATTERN | LENGTH"); System.out.println("===================================================="); for(int i = 0; i < 256; i++) { if(array[i] != null) { char c = (char)i; String s = String.valueOf(c); CodeTableItem cti = array[i]; char code = cti.code; int len = cti.len; int freq = cti.freq; if( i < 33 || i > 126) { if( i == 10 ) s = "NL"; else if( i == 13 ) s = "CR"; else if( i == 32 ) s = "SB"; else if(i == 0 ) s = "EOF"; else s = "CC"; } String sp1 = space(3); String sp2; if( i > 0 ) sp2 = space(13-lenOfInt(i)-s.length()); else sp2 = space(13-5-s.length()); String sp3 = space(9-lenOfInt(freq)); String sp4 = space(16-len); String sp5 = space(7); if( i > 0 ) System.out.print(sp1 + s + sp2 + i + sp3 + freq + sp4); else System.out.print(sp1 + s + sp2 +"-1(*)"+ sp3 + freq + sp4); printCode(c); System.out.println(sp5 + len); } } System.out.println(); System.out.println("* Not indicate the real ASCII code"); } private CodeTableItem [] array; /** * Internal method for transforming a code into bit pattern * @param bs the generated bitset * @param code the code in the table, a char datatype * @param index the bit index in the bitset */ private void transCode(BitSet bs, int code, int index) { if(code % 2 == 1) { bs.set(index); index ++; } else if (code % 2 == 0) { index ++; } if (code <= 1) return; transCode(bs, code/2, index); } /** * Used for print out the table */ private String space( int i ) { char [] c = new char[i]; for (int j = 0; j< i; j++) c[j]=' '; return String.copyValueOf(c); } /** * Used for print out the table */ private int lenOfInt( int i) { int tmp = i; int len = 0; if(i == 0) return 1; while(tmp > 0) { tmp /= 10; len ++; } return len; } /** * Used for calculate the logical size of a BitSet * Used when using Java JDK version lower than 1.2 * @param bs the bit set * @return the logical size of a bitset */ public static int lengthOf(BitSet bs) { int len; int size = bs.size(); len = size; while(!bs.get(size--)) len --; return len + 1; } }
File PrintStack.java:
/** * Class PrintStack mimic a stack to print word list or sentence * in word wrapping manner */ public class PrintStack { int lineWidth; // The width of a line int currentSize; // Used to count the number of output charaters /** * Construct the class * @param n the line width */ public PrintStack(int n) { lineWidth = n; currentSize = 0; } /** * Method used to print out word list with word wrapping feature * @param str the word to print out */ public void printWord (String str) { int n = str.length(); // The length of the word // Output word if this line can make room for it if((currentSize + n) <= lineWidth) { currentSize += n; // Increment the current size of the line // by the word length System.out.print(str); } // Else start a new line and set outputed charactor length to the length of word else { System.out.println(); currentSize = n; System.out.print(str); } } /** * Method used to print out sentences with word wrapping feature * @param str the sentence to be output */ public void printSentence (String str) { int n = str.length(); // The length of the sentence int nbeg = 0; // The beginning index in str of each new line // initialized to 0 // Output new lines if not finish while(nbeg < n) { int nend = nbeg+lineWidth; // Index of end index of a line in sentence nend = (nend>=n)?n:nend; // Set end index to the length of sentence // if this index exceed sentence length // If the last charater of the line is not space" // back track the characters to find a space for(int i= nend-1; i>0 && ((str.charAt(i)>64 && str.charAt(i)<91)|| (str.charAt(i)>96 && str.charAt(i)<123)); i--) nend -= 1; // Output the line according to the start and end indice System.out.println(str.substring(nbeg, nend)); // Update the begining point of the new output line nbeg = nend; } } }
PROBLEM 2
File README:
README Data Structures and Algorithm Analysis in JAVA (W3139-02) Professor: Jason Nieh Homework #3: Concordance project Submitted by: Changbin Wang ================================================================================ Files submitted: -------------------------------------------------------------------------------- For the Concordance project: ./Concordance: Condordance.java -- Main class of the project CitationList.java -- Word and sentence info list in separate chaining CitationListItr.java -- Linked list iterator for CitationList CitationListNode.java -- Node for CitationList CitationPoint.java -- Structure to store sentence information Comparable.java(*) Hashable.java(*) JavaHashtable.java -- Symbol table implemented using Java Hashtable JHTableItem.java -- Table item for JavaHashtable LinkedList.java(*) LinkedListItr.java(*) ListNode.java(*) PrintStack.java -- Print words or sentences with word wrapped SeparateChaining.java -- Symbol table implemented using separate chaining SPTableItem.java -- Table item for SeparateChaining StackLi.java(*) Underflow.java(*) Word.java -- Wrapped class for a word and its sentence info Result1.log -- The test runs using separate chaining implementation Result2.log -- The test runs using Java Hashtable implementation (* indicate that the class codes are copied directly from DataStructures package) To compile: At CUNIX prompt(in ./Concordance directory): javac *.java To run: At CUNIX prompt(in ./Concordance directory): java Concordance Log File: Contains several test runs of the program using "~cs3139-2/bible/philemon" ================================================================================= This program supply the following functions in order to get the concordance of distinct words in a text file and test out different data structure implementation of the symbol table. 1 -- Print the total number of distinct words in the alphabetized list; 2 -- Find the sentences contain certain word. The program will prompt for a user input and then do searching in a linked list or a binary search tree. The result first report the frequency of the word in the text and then let the user to confirm whether or not to display all the sentence contain the word; 3 -- Print the word with maximum occurrences. The program first output the frequency of the word in the text and then let the user to select whether or not to display the sentences contain the word; 4 -- Test searching time; Help the user to compare the searching time of two different implemetation of the alphabetized list. This program does only the searching time according to the requirement of the assignment, and the other running time such as insertion and delete can be implemented easily; 5 -- Print out the total number of collision happened when insert word and its sentence information into the symbol table(for separate chaining implementation); 6 -- Print out the Print the bucket number with most collision and the number of collisions in this position(for separate chaining implementation); 7 -- Print out the table size(for separate chaining implementation); 8 -- Print out the load factor(for separate chaining implementation); 9 -- Exit the running of the program. ================================================================================ Note points: a. The symbol table is implemented using two different methods. The first method is using separate chaining implementation for a hash table. In this implementation, the words are used as keys while linked list storing the collision words as value corresponding to each hashed index. The linked list which stores the collided words store the words and their sentence informations list. The second method is using the Hashtable class provide by Java package. The words are used as keys and the sentence information lists are used as values for the table. b. Running time: This program will only show the running time for searching a given word according to the requirement of the homework assignment. For both implementation, the average successful searching takes O(1) running time, while for the separate chaining implementation, the unsuccessful searching may take longer time than using Hashtable class in Java package. The following is a table of running time tested on different files. (The word "you" is used to search in the symbol tables of all the files). ======================================================================================= file name | Number of | Running time (ms) | Rnning time (ms) | |different words | for separate chaining implement| for Java Hashtable impl| ======================================================================================= philemon | 191 | 0 | 0 | --------------------------------------------------------------------------------------- habakkuk | 504 | 1 | 0 | --------------------------------------------------------------------------------------- micah | 738 | 0 | 0 | --------------------------------------------------------------------------------------- ezra | 1088 | 1 | 0 | --------------------------------------------------------------------------------------- leviticus | 1412 | 0 | 0 | --------------------------------------------------------------------------------------- mark | 1663 | 1 | 0 | --------------------------------------------------------------------------------------- numbers | 2065 | 0 | 0 | ======================================================================================= From the proceeding table, for separate chaining implementation, the running time changes randomly. This may be the effects of collision of the searched word with other words hashed to the same position. For the Java Hashtable implementation, the running time is always very small (indetectable). In both the test cases, there is also the CPU effects of the computer, since the program running time depends on the work load of the CPU. ==============================================================================================
File Result1.log:
Please input the text file name: Error: File not existing! Please try again: =============================================== Please select an implementation method of the symbol table: 0 -- Seperate chaining 1 -- Hashtable in Java package =============================================== Your selection(0/1): Program is precessing data ... Data processing successful! =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): There are totally 191 different words in the file! =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): Please input the word you want to look for:appears 3 times in sentences of the file. Do you want to see all the sentences (y/n)? I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. For we have great joy and consolation in thy love, because the bowels of the saints are refreshed by thee, brother. Wherefore, though I might be much bold in Christ to enjoin thee that which is convenient, Yet for love's sake I rather beseech thee, being such an one as Paul the aged, and now also a prisoner of Jesus Christ. =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): Word appears in the text file with maximum frequency, which is 16 times. Do you want to display all sentences (y/n)? Paul, a prisoner of Jesus Christ, and Timothy our brother, unto Philemon our dearly beloved, and fellowlabourer, And to our beloved Apphia, and Archippus our fellowsoldier, and to the church in thy house: Grace to you, and peace, from God our Father and the Lord Jesus Christ. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. For we have great joy and consolation in thy love, because the bowels of the saints are refreshed by thee, brother. Wherefore, though I might be much bold in Christ to enjoin thee that which is convenient, Yet for love's sake I rather beseech thee, being such an one as Paul the aged, and now also a prisoner of Jesus Christ. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. For perhaps he therefore departed for a season, that thou shouldest receive him for ever; Not now as a servant, but above a servant, a brother beloved, specially to me, but how much more unto thee, both in the flesh, and in the Lord? For perhaps he therefore departed for a season, that thou shouldest receive him for ever; Not now as a servant, but above a servant, a brother beloved, specially to me, but how much more unto thee, both in the flesh, and in the Lord? Yea, brother, let me have joy of thee in the Lord: refresh my bowels in the Lord. Yea, brother, let me have joy of thee in the Lord: refresh my bowels in the Lord. Having confidence in thy obedience I wrote unto thee, knowing that thou wilt also do more than I say. There salute thee Epaphras, my fellowprisoner in Christ Jesus; Marcus, Aristarchus, Demas, Lucas, my fellowlabourers. =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): Please input the word you want to look for: Searching takes totally 0 milliseconds and the following is the results: appears 3 times in sentences of the file. =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): Totally 28 collisions happened during hashing all the words! =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): Position index=63 in the table has most collision and the number of collision is 2 =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): The size of the table is: 503 =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9): The load factor of this table is: 0.32405567 =============================================== You selected the separate chaining implementation Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Print the total number of collisions 6 -- Print the bucket number with most collision and the number of collisions 7 -- Print the size of the hash table 8 -- Print the load factor of the table 9 -- Exit =============================================== Your selection(1-9):
File Result2.log:
Please input the text file name: philemon =============================================== Please select an implementation method of the symbol table: 0 -- Seperate chaining 1 -- Hashtable in Java package =============================================== Your selection(0/1): 1 Program is precessing data ... Data processing successful! =============================================== You selected implementation using Java Hashtable Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Exit =============================================== Your selection(1-5): 1 There are totally 191 different words in the file! =============================================== You selected implementation using Java Hashtable Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Exit =============================================== Your selection(1-5): 2 Please input the word you want to look for:appears 3 times in sentences of the file. Do you want to see all the sentences (y/n)? y I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. For we have great joy and consolation in thy love, because the bowels of the saints are refreshed by thee, brother. Wherefore, though I might be much bold in Christ to enjoin thee that which is convenient, Yet for love's sake I rather beseech thee, being such an one as Paul the aged, and now also a prisoner of Jesus Christ. =============================================== You selected implementation using Java Hashtable Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Exit =============================================== Your selection(1-5): 3 Word appears in the text file with maximum frequency, which is 16 times. Do you want to display all sentences (y/n)? y Paul, a prisoner of Jesus Christ, and Timothy our brother, unto Philemon our dearly beloved, and fellowlabourer, And to our beloved Apphia, and Archippus our fellowsoldier, and to the church in thy house: Grace to you, and peace, from God our Father and the Lord Jesus Christ. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. I thank my God, making mention of thee always in my prayers, Hearing of thy love and faith, which thou hast toward the Lord Jesus, and toward all saints; That the communication of thy faith may become effectual by the acknowledging of every good thing which is in you in Christ Jesus. For we have great joy and consolation in thy love, because the bowels of the saints are refreshed by thee, brother. Wherefore, though I might be much bold in Christ to enjoin thee that which is convenient, Yet for love's sake I rather beseech thee, being such an one as Paul the aged, and now also a prisoner of Jesus Christ. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. I beseech thee for my son Onesimus, whom I have begotten in my bonds: Which in time past was to thee unprofitable, but now profitable to thee and to me: Whom I have sent again: thou therefore receive him, that is, mine own bowels: Whom I would have retained with me, that in thy stead he might have ministered unto me in the bonds of the gospel: But without thy mind would I do nothing; that thy benefit should not be as it were of necessity, but willingly. For perhaps he therefore departed for a season, that thou shouldest receive him for ever; Not now as a servant, but above a servant, a brother beloved, specially to me, but how much more unto thee, both in the flesh, and in the Lord? For perhaps he therefore departed for a season, that thou shouldest receive him for ever; Not now as a servant, but above a servant, a brother beloved, specially to me, but how much more unto thee, both in the flesh, and in the Lord? Yea, brother, let me have joy of thee in the Lord: refresh my bowels in the Lord. Yea, brother, let me have joy of thee in the Lord: refresh my bowels in the Lord. Having confidence in thy obedience I wrote unto thee, knowing that thou wilt also do more than I say. There salute thee Epaphras, my fellowprisoner in Christ Jesus; Marcus, Aristarchus, Demas, Lucas, my fellowlabourers. =============================================== You selected implementation using Java Hashtable Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Exit =============================================== Your selection(1-5): 4 Please input the word you want to look for: you Searching takes totally 0 milliseconds and the following is the results: appears 3 times in sentences of the file. =============================================== You selected implementation using Java Hashtable Please selection functions from the menu: 1 -- Print the total number of unique words 2 -- Find the sentences contain certain word 3 -- Print the word with maximum occurrences and the sentences containing the word 4 -- Test searching time 5 -- Exit =============================================== Your selection(1-5): 5 RESULT2.LOG 3
File Concordance.java:
import java.io.*; /** * class Concordance * * Main class for the concordance project. * @ Author: Changbin Wang (cw170@columbia.edu) */ public class Concordance { /** * Constructor of the concordance class * @param fineName the file for getting concordance * @param ls boolean to indicate the implemetation method */ public Concordance(File txtFile, boolean ls) { this.ls = ls; this.txtFile = txtFile; try { // Get file size fileSize = (int)txtFile.length(); // Initialize the array allChar = new char[fileSize]; // Open the fileinput stream FileInputStream in = new FileInputStream(txtFile); for(int i=0;i<fileSize;i++) { // Read the file byte by byte char c = (char)in.read(); // Put the byte into the array if(c != 10) allChar[i]= c; // Replace all the carriage return with space else allChar[i] = ' '; } } catch (OutOfMemoryError e1) { System.out.println("Error: Out of memory when open charactor array!"); } catch (IOException e2) { System.out.println("Error: Cannot read file!"); } // Construct separate chaining hashtable if (ls) spTable = new SeparateChaining(500); // Construct java hashtable implementation else jhTable = new JavaHashtable(); } /** * method to extract words and sentences from the character array */ public void parseArray() { // Stack for temporary store words in a sentence StackLi wStack = new StackLi(); // Index for the beginning point of word int wbeg = 0; // Length of each word int wlen = 0; // Index for the beginning point of sentence int sbeg = 0; // Length of each sentence int slen = 0; // Number of space between words and sentences int nsp = 0; // True if the index is inside a word boolean word = true; // True if the index is inside a sentence boolean sentence = true; for(int i=0; i<fileSize; i++) { // Not one of ". ? !" if(allChar[i]!=46 && allChar[i]!=33 && allChar[i]!=63) { // Between 'a-z' or 'A-Z' ? if((allChar[i]>64 && allChar[i]<91)|| (allChar[i]>96 && allChar[i]<123)) { // First enter a word if(!word) { // Set the beginning point of word wbeg = wbeg+wlen+nsp; // Initial length of the word wlen = 0; // # of space = 0 nsp = 0; // Inside the word word = true; } // Increment the word length wlen ++; // First enter a sentence if(!sentence) { // Set the beginning point of word sbeg = sbeg + slen + nsp; // Initial length of the sentence slen = 0; // Inside the sentence sentence = true; } } // One of "space ; , ', :" else { // First meet a space if(word) { // In space area word = false; // Push the proceeding word into stack if(i!=0) wStack.push(String.valueOf(allChar, wbeg, wlen)); } // Increment space number nsp ++; } // Increment sentence length slen ++; } // One of ". ? !", end of sentence else { // Increment the sentence length to include '.' or '?' or '!' slen ++; // Push the last word of the sentence into the stack if(word) { word = false; if(i!=0) wStack.push(String.valueOf(allChar, wbeg, wlen)); } // Not really space, but used to exclude the space before a sentence nsp ++; // First meet a '.', '?' or '!' if(sentence) { // Outside a sentence sentence = false; // Store the starting point and the length of the sentence CitationPoint cp = new CitationPoint(sbeg, slen); // Pop words out of the temporary stack and insert words into // symbol table; All words are changed to upper case while(!wStack.isEmpty()) { String stmp = (String)wStack.topAndPop(); Word w = new Word(stmp.toUpperCase(), cp); if(ls) spTable.insert(w); else jhTable.insert(w); } } } } // Output a message if the input file is empty and exit running if((ls && spTable.getWordNum()==0)||(!ls && jhTable.getWordNum()==0)) { System.out.println("The input file is an empty file!"); System.exit(1); } } /** * Method to print out all sentences contain certain word * @param index the index corresponding to the word * @param lineWidth used to indicate the line width for word wrapping */ public void printSentences(String word, int lineWidth) { // Construct a PrintStack tool for printing with word wrapping PrintStack ps = new PrintStack(lineWidth); // Get out the linked list for certain word LinkedList list = (ls)?spTable.get(word):jhTable.get(word); // The list is null, no sentence for such index if (list == null) return; // Print out all sentence in thelist if( list.isEmpty( ) ) System.out.print( "Error: Empty list" ); // Empty list, should not happen else { // Iterator point the first element LinkedListItr itr = list.first( ); // Extract and print the sentence out of the array according to // sentence information in the list for( ; !itr.isPastEnd( ); itr.advance( ) ) { CitationPoint cp = (CitationPoint) itr.retrieve(); String str = String.copyValueOf(allChar, cp.nbeg, cp.nlen); ps.printSentence(str); System.out.println(); } } } // Name of input file File txtFile; // Array holding all characters in the file char[] allChar; // Bytes(characters) of the file int fileSize; // Symbol table implemented using separate chaining SeparateChaining spTable=null; // Symbol table implemented using java hashtable JavaHashtable jhTable = null; // Boolean to indicate the implemetation of symbol table boolean ls; /** * main function of the class */ public static void main(String [] args) { // PrintStack used to output long sentence PrintStack psm = new PrintStack(75); if(args.length > 0) System.out.println("Usage: Concordance"); else { // Input stream for various input use BufferedReader in = new BufferedReader( new InputStreamReader(System.in)); try { // Prompt for file name and create a file object System.out.print("Please input the text file name: "); String fileName = in.readLine(); System.out.println(); File f = new File(fileName); // Prompt for file name if the file cannot be found while (!f.exists()) { System.out.println("Error: File not existing!"); // Prompt for file name System.out.print("Please try again: "); fileName = in.readLine(); System.out.println(); f = new File(fileName); } // Prompt for selection of implemetation method of alphabetized list System.out.println("==============================================="); psm.printSentence("Please select an implementation method "+ "of the symbol table:"); System.out.println(" 0 -- Seperate chaining "); System.out.println(" 1 -- Hashtable in Java package "); System.out.println("==============================================="); System.out.print("Your selection(0/1): "); String txtLine = in.readLine(); int x = Integer.parseInt(txtLine); boolean ls = (x == 0)? true:false; System.out.println(); System.out.println("Program is precessing data ..."); System.out.println(); // Construct the instance of the concordance class Concordance prog = new Concordance(f, ls); // Parse the charater array into words and sentence and generate // the alphabetized list and symbol table. prog.parseArray(); System.out.println("Data processing successful!"); System.out.println(); for(;;) { if(ls) { System.out.println("==============================================="); System.out.println("You selected the separate chaining implementation"); System.out.println("Please selection functions from the menu: "); System.out.println(" 1 -- Print the total number of unique words "); System.out.println(" 2 -- Find the sentences contain certain word "); System.out.println(" 3 -- Print the word with maximum occurrences "); System.out.println(" and the sentences containing the word "); System.out.println(" 4 -- Test searching time "); System.out.println(" 5 -- Print the total number of collisions "); System.out.println(" 6 -- Print the bucket number with most collision"); System.out.println(" and the number of collisions "); System.out.println(" 7 -- Print the size of the hash table "); System.out.println(" 8 -- Print the load factor of the table "); System.out.println(" 9 -- Exit "); System.out.println("==============================================="); System.out.print("Your selection(1-9): "); } else { System.out.println("==============================================="); System.out.println("You selected implementation using Java Hashtable"); System.out.println("Please selection functions from the menu: "); System.out.println(" 1 -- Print the total number of unique words "); System.out.println(" 2 -- Find the sentences contain certain word "); System.out.println(" 3 -- Print the word with maximum occurrences "); System.out.println(" and the sentences containing the word "); System.out.println(" 4 -- Test searching time "); System.out.println(" 5 -- Exit "); System.out.println("==============================================="); System.out.print("Your selection(1-5): "); } txtLine = in.readLine(); x = Integer.parseInt(txtLine); switch(x) { case 1: // Output the total number of distinct words in the file int num = (ls)? prog.spTable.getWordNum():prog.jhTable.getWordNum(); System.out.println(); psm.printSentence("There are totally "+num+ " different words in the file!"); System.out.println(); break; case 2: String word1; int freq; System.out.println(); // Prompt for a word to be searched. System.out.print("Please input the word you want to look for: "); word1 = in.readLine(); System.out.println(); word1 = word1.toUpperCase(); LinkedList list = (ls)?prog.spTable.get(word1):prog.jhTable.get(word1); freq = (ls)?prog.spTable.getFreq(word1):prog.jhTable.getFreq(word1); if(list == null && freq <= 0) System.out.println("Sorry, no such word in the text!"); else { psm.printSentence("<"+word1.toUpperCase()+"> appears "+ freq+" times in sentences of the file."); System.out.println(); System.out.print("Do you want to see all the sentences (y/n)?"); String yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0)=='y' || yOrn.charAt(0) == 'Y') prog.printSentences(word1, 75); System.out.println(); } System.out.println(); break; case 3: // Print out the word with maximum concordance and String w; // the frequency System.out.println(); if(ls) { CitationListItr ci = prog.spTable.findMaxFreq(); w = ci.retrieveWord(); freq = ci.retrieveFreq(); } else { JHTableItem jhi = prog.jhTable.findMaxFreq(); w = jhi.getWord(); freq = jhi.getFreq(); } psm.printSentence("Word <"+w+"> appears in the text file"+ " with maximum frequency, which is "+freq+" times."); System.out.println(); System.out.print("Do you want to display all sentences (y/n)?"); String yOrn = in.readLine(); System.out.println(); if(yOrn.charAt(0)=='y' || yOrn.charAt(0) == 'Y') prog.printSentences(w, 75); break; case 4: // Test the searching time for different implementation System.out.println(); System.out.print("Please input the word you want to look for: "); word1 = in.readLine(); System.out.println(); word1 = word1.toUpperCase(); long start = System.currentTimeMillis(); list = (ls)? prog.spTable.get(word1):prog.jhTable.get(word1); long end = System.currentTimeMillis(); long interval = end - start; psm.printSentence("Searching takes totally "+ interval+" milliseconds and the following is the results:"); if(list==null) System.out.println("Sorry, no such word in the text!"); else { freq = (ls)?prog.spTable.getFreq(word1):prog.jhTable.getFreq(word1); psm.printSentence("<"+word1+"> appears "+freq+ " times in sentences of the file."); } System.out.println(); break; case 5: if(!ls) return; else { System.out.println(); psm.printSentence("Totally "+prog.spTable.getTotalCollision()+ " collisions happened during hashing all the words!"); System.out.println(); } break; case 6: if(ls) { System.out.println(); psm.printSentence("Position index="+prog.spTable.getCollisionBucket()+ " in the table has most collision and the number of"+ " collision is "+prog.spTable.getMaxCollision()); System.out.println(); } break; case 7: if(ls) { System.out.println(); psm.printSentence("The size of the table is: "+ prog.spTable.getSize()); System.out.println(); } break; case 8: if(ls) { System.out.println(); psm.printSentence("The load factor of this table is: "+ prog.spTable.getLoadFactor()); System.out.println(); } break; case 9: if(ls) return; break; } } } catch (Exception e) // Catch all the exceptions { System.out.println(e); } } } }
File CitationList.java
/** * Linked list to store the sentence information * Access to the list is via CitationListItr. * @author Changbin Wang (cw170@columbia.edu) */ public class CitationList { /** * Construct the list */ public CitationList() { try { header = new CitationListNode( null, null ); } catch (OutOfMemoryError e) { System.out.println( "Error: Not enough memory for opening a new list!"); } currentSize = 0; } /** * Return the size of the list indicate by the member variable of the class */ public int getSize() { return currentSize; } /** * Test if the list is logically empty. * @return true if empty, false otherwise. */ public boolean isEmpty() { return header.next == null; } /** * Make the list logically empty. */ public void makeEmpty() { header.next = null; currentSize = 0; } /** * Return an iterator representing the header node. */ public CitationListItr zeroth() { return new CitationListItr( header ); } /** * Return an iterator representing the first node in the list. * This operation is valid for empty lists. */ public CitationListItr first() { return new CitationListItr( header.next ); } /** * Return iterator corresponding to the first node containing an item. * @param x the item to search for. * @return an iterator; iterator isPastEnd if item is not found. */ public CitationListItr find(String x) { CitationListNode itr = header.next; while (itr != null && !itr.element.equals(x) ) itr = itr.next; return new CitationListItr(itr); } /** * Return iterator corresponding to the node with maximum frequency. * @return an iterator; iterator isPastEnd if item is not found. */ public CitationListItr findMaxFreq() { CitationListItr itr = first(); CitationListNode maxNode = itr.current; for (; !itr.isPastEnd(); itr.advance()) maxNode=(maxNode.freq>=itr.retrieveFreq())?maxNode:itr.current; return new CitationListItr(maxNode); } /** * Remove the string.(not used in this project) * @param x the string to remove. */ public void remove( String x ) { CitationListItr p = findPrevious(x); if( p.current.next != null ) p.current.next = p.current.next.next; else System.out.println("No such word in the list!"); currentSize -= 1; // Decrement the list size } /** * Return iterator prior to the node containing the string. * @param x the string to search for. * @return appropriate iterator if the item is found. Otherwise, the * iterator corresponding to the last element in the list is returned. */ public CitationListItr findPrevious( String x ) { CitationListNode itr = header; while (itr.next !=null && !itr.next.element.equals(x) ) itr = itr.next; return new CitationListItr(itr); } /** * Insert a word and the information of sentence containing it into * the linked list. * @param w the word inserted * @return true if the list dosn't contain the given word; false if * the list has had word already */ public boolean insert(Word w) { String word = w.getWord(); CitationPoint cp = w.getSentenceInfo(); CitationListItr itr = find(word); if(itr.isPastEnd()) insert(word, cp, zeroth()); // Insert as a new node else // Insert as an old node { itr.retrieveSentenceList().insert(cp, itr.retrieveSentenceList().findPrevious(null)); itr.current.addFreq(); } return (itr.isPastEnd() && currentSize >1); } private CitationListNode header; private int currentSize; // The number of non-header nodes in the list /** * Insert after p, used for insert a new word. * @param x the item to insert. * @param p the position prior to the newly inserted item. * @param theIndex the index used to access the symbol table */ private void insert( String x, CitationPoint cp, CitationListItr p) { if( p != null && p.current != null ) p.current.next = new CitationListNode( x, cp, p.current.next); currentSize ++; } }
File CitationListItr.java:
/** * Linked list implementation of the list iterator */ public class CitationListItr { /** * Construct the list iterator * @param theNode any node in the linked list. */ CitationListItr( CitationListNode theNode ) { current = theNode; } /** * Test if the current position is past the end of the list. * @return true if the current position is null. */ public boolean isPastEnd() { return current == null; } /** * Return the item stored in the current position. * @return the stored item or null if the current position * is not in the list. */ public String retrieveWord() { return isPastEnd()? null : current.element; } /** * Return the frequency of item stored in the current position. * @return the frequency of stored item or -1 if the current position * is not in the list. */ public int retrieveFreq() { return isPastEnd()? -1 : current.freq; } /** * Return the linked list storing the sentence information. * @return the linked list */ public LinkedList retrieveSentenceList() { return isPastEnd()? null : current.citeList; } /** * Advance the current position to the next node in the list. * If the current position is null, then do nothing. */ public void advance() { if( !isPastEnd() ) current = current.next; } CitationListNode current; }
File CitationListNode.java:
// Basic node stored in a string linked list public class CitationListNode { public CitationListNode( String theElement, CitationPoint cp ) { this ( theElement, cp, null ); } public CitationListNode( String theElement, CitationPoint cp, CitationListNode n) { element = theElement; next = n; freq = 1; citeList = new LinkedList(); citeList.insert(cp, citeList.zeroth()); } /** * Increase the frequency when a word is inserted * into a list containing it */ public void addFreq() { freq ++; } int freq; // The frequency of the word String element; LinkedList citeList; CitationListNode next; }
File CitationPoint.java:
/** * Class Citation used to store sentence information */ public class CitationPoint extends Object { int nbeg; // The beginning index of the sentence in the array int nlen; // The length of the sentence /** * Constructor for the class * @param _nbeg the beginning index * @param _nlen the length of the sentence */ public CitationPoint (int _nbeg, int _nlen) { this.nbeg = _nbeg; this.nlen = _nlen; } }
File JavaHashtable.java:
import java.util.Hashtable; /** * JavaHashtable class, Symbol table implemented using Java Hashtable package * @author: Changbin Wang (cw170@columbia.edu) */ public class JavaHashtable { /** * Constructor */ public JavaHashtable() { table = new Hashtable(); currentSize = 0; } /** * Insert a word and its sentence information into the table * @param w the word and its sentence information */ public void insert(Word w) { String word = w.getWord(); CitationPoint cp = w.getSentenceInfo(); // If the word exists in the table, increase its frequency // and insert its sentence information into the linked list if(table.containsKey(word)) { JHTableItem ti = (JHTableItem)table.get(word); ti.insert(cp); maxFreq = (maxFreq.getFreq() >= ti.getFreq())? maxFreq: ti; table.put(word,ti); } // If the word isn't in the table, insert the word and // its sentence information else { JHTableItem ti = new JHTableItem(word); ti.insert(cp); table.put(word, ti); currentSize ++; if(currentSize == 1) maxFreq = ti; } } /** * Get the sentence information of a given word * @param word, the word looked for * @return the linked list containing the sentence information */ public LinkedList get(String word) { JHTableItem ti = (JHTableItem)table.get(word); return (ti==null)?null:ti.getList(); } /** * Return the frequency of word in the text * @param word, the word looked for * @return the frequency */ public int getFreq(String word) { JHTableItem ti = (JHTableItem)table.get(word); return (ti==null)?-1:ti.getFreq(); } /** * Return the total distinct word number of the text * @return the total word number */ public int getWordNum() { return currentSize; } /** * Return the word item with maximum frequency * @return the table item with maximum frequency */ public JHTableItem findMaxFreq() { return maxFreq; } private Hashtable table; private int currentSize; private JHTableItem maxFreq; }
File JHTableItem.java:
/** * JHTableItem class, the symbol table item used in the Hashtable(Java) * implementation */ class JHTableItem { /** * Constructor * @param word, the word and its sentence information stored in the item */ public JHTableItem(String word) { this.word = word; citeList = new LinkedList(); freq = 0; } /** * Insert a sentence information into the linked list of this item * @param cp the sentence information */ public void insert(CitationPoint cp) { citeList.insert(cp, citeList.findPrevious(null)); freq ++; } /** * Return the linked list for the sentence information of the word * @return the linked list of sentence information */ public LinkedList getList() { return citeList; } /** * Return the frequency of the work this item stands for * @return the frequency */ public int getFreq() { return freq; } /** * Return the word this item stands for * @return the word */ public String getWord() { return word; } private LinkedList citeList; private int freq; private String word; }
File PrintStack.java:
/** * Class PrintStack mimic a stack to print word list or sentence * in word wrapping manner */ public class PrintStack { int lineWidth; // The width of a line int currentSize; // Used to count the number of output charaters /** * Construct the class * @param n the line width */ public PrintStack(int n) { lineWidth = n; currentSize = 0; } /** * Method used to print out word list with word wrapping feature * @param str the word to print out */ public void printWord (String str) { int n = str.length(); // The length of the word // Output word if this line can make room for it if((currentSize + n) <= lineWidth) { currentSize += n; // Increment the current size of the line // by the word length System.out.print(str); } // Else start a new line and set outputed charactor length to the length of word else { System.out.println(); currentSize = n; System.out.print(str); } } /** * Method used to print out sentences with word wrapping feature * @param str the sentence to be output */ public void printSentence (String str) { int n = str.length(); // The length of the sentence int nbeg = 0; // The beginning index in str of each new line // initialized to 0 // Output new lines if not finish while(nbeg < n) { int nend = nbeg+lineWidth; // Index of end index of a line in sentence nend = (nendgt;=n)?n:nend; // Set end index to the length of sentence // if this index exceed sentence length // If the last charater of the line is not space" // back track the characters to find a space for(int i= nend-1; igt;0 && ((str.charAt(i)gt;64 && str.charAt(i)<91)|| (str.charAt(i)gt;96 && str.charAt(i)<123)); i--) nend -= 1; // Output the line according to the start and end indice System.out.println(str.substring(nbeg, nend)); // Update the begining point of the new output line nbeg = nend; } } }
File SeparateChaining.java:
// SeparateChaining class // // CONSTRUCTION: with an approximate initial size or default of 101 // // ******************PUBLIC OPERATIONS********************* // void insert( x ) --> Insert x // void remove( x ) --> Remove x // Hashable find( x ) --> Return item that matches x // void makeEmpty( ) --> Remove all items // int hash( String str, int tableSize ) // --> Static method to hash strings // ******************************************************** /** * Separate chaining table implementation of symbol table. * Note that all "matching" is based on the equals method. * @author Changbin Wang */ public class SeparateChaining { /** * Construct the hash table. */ public SeparateChaining( ) { this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size approximate table size. */ public SeparateChaining( int size ) { chainElements = new SPTableItem[ nextPrime( size ) ]; for( int i = 0; i < chainElements.length; i++ ) chainElements[ i ] = new SPTableItem( ); } /** * Insert into the hash table. If the item is * already present, then do nothing. * @param x the item to insert. */ public void insert( Hashable x ) { SPTableItem whichElement = chainElements[ x.hash( chainElements.length ) ]; whichElement.insert((Word)x); } /** * Remove from the hash table. * @param x the item to remove. */ public void remove( Hashable x ) { chainElements[ x.hash( chainElements.length ) ].remove(((Word)x).getWord()); } /** * Find an item in the hash table. * @param word the item to search for. * @return the matching item, or null if not found. */ public LinkedList get( String word ) { Word x = new Word(word, null); return chainElements[ x.hash( chainElements.length ) ].get( word ); } public int getFreq( String word ) { Word x = new Word(word, null); return chainElements[ x.hash( chainElements.length ) ].getFreq( word ); } /** * Make the hash table logically empty. */ public void makeEmpty( ) { for( int i = 0; i < chainElements.length; i++ ) chainElements[ i ].makeEmpty( ); } /** * Return the total number of collisions in the hash table * @return the total number of collisions */ public int getTotalCollision() { int sum = 0; for( int i = 0; i < chainElements.length; i++ ) sum += chainElements[i].getCollision(); return sum; } /** * Return the number of collisions in the collision bucket * @return the max number of collisions in different position */ public int getMaxCollision() { int max = chainElements[0].getCollision(); for( int i = 0; i < chainElements.length; i++ ) { int c = chainElements[i].getCollision(); max = (max>=c)? max: c; } return max; } /** * Return the collision bucket position in the table * @return the position of collision bucket */ public int getCollisionBucket() { int cmax = chainElements[0].getCollision(); int pos = 0; for( int i = 0; i < chainElements.length; i++ ) { int c = chainElements[i].getCollision(); if(cmax < c) { cmax = c; pos = i; } } return pos; } /** * Return the total number of distinct words hashed into the table * @return the number of distinct words */ public int getWordNum() { int sum = 0; for( int i = 0; i < chainElements.length; i++ ) sum += chainElements[i].getWordNum(); return sum; } /** * Return the table size * @return the size of the table */ public int getSize() { return chainElements.length; } /** * Return the load factor of this hash table * @return the load factor */ public float getLoadFactor() { int oPos = 0; for( int i = 0; i < chainElements.length; i++ ) if(chainElements[i].getWordNum()>0) oPos ++; return ((float)oPos)/getSize(); } /** * Return the list iterator of word with maximum occurance frequency * @return the list iterator */ public CitationListItr findMaxFreq() { CitationListItr mf = chainElements[0].findMaxFreq(); for (int i = 1; i < chainElements.length; i++ ) { CitationListItr cl = chainElements[i].findMaxFreq(); if(cl.retrieveFreq() > mf.retrieveFreq()) mf = cl; } return mf; } /** * A hash routine for String objects. * @param key the String to hash. * @param tableSize the size of the hash table. * @return the hash value. */ public static int hash( String key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } private static final int DEFAULT_TABLE_SIZE = 101; /** The array of Lists. */ private SPTableItem [ ] chainElements; /** * Internal method to find a prime number at least as large as n. * @param n the starting number (must be positive). * @return a prime number larger than or equal to n. */ private static int nextPrime( int n ) { if( n % 2 == 0 ) n++; for( ; !isPrime( n ); n += 2 ) ; return n; } /** * Internal method to test if a number is prime. * Not an efficient algorithm. * @param n the number to test. * @return the result of the test. */ private static boolean isPrime( int n ) { if( n == 2 || n == 3 ) return true; if( n == 1 || n % 2 == 0 ) return false; for( int i = 3; i * i <= n; i += 2 ) if( n % i == 0 ) return false; return true; } }
File SPTableItem.java:
// SPTableItem class // Class used to hold values corresponding to keys in the symbol table // class SPTableItem { public SPTableItem() { collision = 0; itemList = new CitationList(); } public void makeEmpty() { collision = 0; itemList.makeEmpty(); } /** * Insert a word and its sentence information into the hash table * @param w the word inserted */ public void insert (Word w) { if(itemList.insert(w)) collision++; } /** * Return the sentences information containing a certain word * @param word the word looked for * @return the linked list of sentence information */ public LinkedList get(String word) { CitationListItr itr = itemList.find(word); return (itr.isPastEnd())? null: itr.retrieveSentenceList(); } /** * Return the frequency of a certain given word * @param word the word looked for * @return the frequency */ public int getFreq(String word) { CitationListItr itr = itemList.find(word); return (itr.isPastEnd())? -1: itr.retrieveFreq(); } /** * Remove a word from the hash table (not used in this project) * @param word the word to be removed */ public void remove(String word) { itemList.remove(word); } /** * Return the word number of this position of the hash table * @retun the word number */ public int getWordNum() { return itemList.getSize(); } /** * Return the collision number of this position * @return the number of collision */ public int getCollision() { return collision; } /** * Return the list iterator of the word with maximum occurance frequency * @return the resulted list iterator */ public CitationListItr findMaxFreq() { return itemList.findMaxFreq(); } private int collision; private CitationList itemList; }
File Word.java:
/** * Word class, wrapped class for a string word and its sentence information */ public class Word implements Comparable, Hashable { /** * Constrcutor */ public Word() { this(null, null); } /** * Constructor * @param w the word * @param cp the sentence information */ public Word (String w, CitationPoint cp) { word = w; citePoint = cp; } /** * Return the string word */ public String getWord() { return word; } /** * Return the sentence information * @return the sentence information */ public CitationPoint getSentenceInfo() { return citePoint; } /** * Implement the compareTo method of Comparable interface */ public int compareTo( Comparable rhs ) { return word.compareTo(((Word)rhs).word); } /** * Equals */ public boolean equals( Object rhs ) { return rhs != null && word == ((Word)rhs).word; } /** * The hash function of the word * @param tableSize, the table size of the hash table * @return the hash index */ public int hash( int tableSize ) { int hashVal = 0; for( int i = 0; i < word.length( ); i++ ) hashVal = 37 * hashVal + word.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } private String word; private CitationPoint citePoint; }