Homework #2: Search (updated)

Due: 2:30 pm on Sept. 24th, 2007

This assignment is intended as an exploration of heuristic search techniques (Chapter 4 of the textbook). It asks for two different approaches to solving a word puzzle called Fill-in Station and asks that you measure the effectiveness of these search strategies. For the second of these approaches, you are asked to invent and test a heuristic of your own.

Fill-in Station gives you a 3x3 matrix and a list of 9 letters, some of which may be repeated. It is the job of your program to fill in the matrix with the letters so that valid 3-letter words are formed in all the directions that the arrows point. Two examples are given below, along with their solutions. As data, you are given a dictionary of 3-letter words and a bigram list, which contains letter pairs and the frequency with which each pair occurs in the dictionary. Your job is to write a program that can search letter-by-letter to come up with the words to fill in the puzzle.

The two puzzles

The answers to the two puzzles shown above are:

Puzzle 1
SOP
EAR
WRY

Puzzle 2
APE
ILK
LYE

Resources


What to turn in

The assignment will be graded out of 30 points and is divided into 4 parts. Each assignment part must have its own solution write-up. If the write-up for any part is not provided, it will be assumed that the part in question has not been attempted. You may create separate documents for each write-up or you may include all write-ups in separate sections of the same document.

If a part involves programming, you must turn in working, well-documented code. Specific requirements mentioned in the problem statement such as an overview of functionality, description of the algorithm, allow ability to trace execution in the running version or evaluation tables should be included in the solution write-up for the specific part. Code usage and output of sample runs can also be included in the write-up in order to aid program execution.

In addition to the code and write-up for each part, you should turn in a README for the whole assignment which describes the various files submitted, indicates which parts were attempted and lists the location of the write-ups. General code usage should be specified here as well as any special features, capabilities or restrictions that you would like to bring to the reviewer's notice.

Assignment description

Part 1: Written (3 points) Describe the state space you will use, the formulation of the problem and the successor function. There are a variety of different formulations that you can use. One option would be to set up the problem in such a way that you can consider one path through the matrix (e.g. 1st row, followed by 2nd row, followed by 3rd row). At each move, the search algorithm will choose the next letter in the path. The successor function must ensure when it chooses a letter that, if it is the last letter in any word on the board, the resulting word is valid. The successor function will return all letters that can go into the next empty space on the path. You can use this formulation or any other formulation that you find intuitive.

Part 2: Heuristic search (8 points) Code a solution to the problem using a greedy algorithm. For this part of the program, use a heuristic function that returns the bigram frequency of the word pair formed by the letter already chosen and the next letter in the word. Implement the heuristic so that you choose the next letter with the highest bigram frequency. For the first letter in a word, choose the letter with the highest initial frequency (also provided in the bigram frequency list). Turn in the code with ability for TAs to turn on and off traces of the search tree. The trace should display parent, children, depth and heurstic values. We will be running your program on new test problems.

Part 3: Creativity (8 points) Redo the heuristic search. You can choose to do one of the following:

  1. Devise your own heuristic function other than the one given in Part 2 and run your greedy algorithm with this new function. Your goal is to find a heuristic function that performs better than the heuristic search in Part 2 and finds a solution more efficiently.
  2. Reformulate the search as a search through words laid out on the matrix (instead of a search through letters guided by bigram frequencies). Each move will add an entire word to the matrix. Given this new formulation of the problem, devise your own heuristic function and run your greedy algorithm with this new function.

You will need to compare your new heuristic with the heuristic described in Part 2. For this purpose, you should measure the CPU time taken by each version of your search algorithm to solve the same examples and report the average CPU time (over at least 10 examples) for each heuristic. To generate examples (like the two provided above) for the timing experiments, create a random initial state by randomly choosing 9 letters from the alphabet. You will need to augment your code so that it determines when it has an initial state that has no solution. Compute the average CPU time only over those initial states that have a solution.

Make sure you describe your heuristic and summarize how it performs in the write-up for this part of the assignment. Additionally, you must report the numeric results of your timing experiments in the write-up.

Part 4: Performance (7 points) Instrument your code for the sake of scientific research. Augment Part 2 and Part 3 so that your program records the number of nodes expanded and calculates and prints the average effective branching factor (EBF: see page 106 of the textbook) by having it run on about 100 random initial states. To generate the random initial states, you will need to randomly choose 9 letters from the alphabet. You will need to augment your code so that it determines when it has an initial state that has no solution. Determine the average EBF only over those initial states that have a solution. Report the average EBF for your greedy search with the two different heuristics in the write-up. What does the average EBF tell you - does your heuristic work better or worse than Part 2 heuristic?

Readme (4 points) You should turn in a README file for the assignment. It should specify what each file contains and how to run it. It should also provide a description of how you implemented each part and any special properties of your work that you want us to notice.

Technical details

You must write all your code in Java. Code written in other languages will not be accepted. Your code should be compatible with Java 1.5 or higher. It should run on CS machines (currently using Java v1.5.0_01) or, if you don't have CS accounts, on CUNIX (currently using Java v1.6.0_02).

Write-ups should be provided as plain-text files or PDF documents. Other formats will not be accepted. For non-programming assignment parts or answers involving figures, the write-ups may be handwritten or printed; these should be turned in at the beginning of class on the day of the deadline.

Late Policy and Penalties

As per the late policy of the class, you will lose 1.5 points (5%)per day that the assignment is submitted past the deadline. For this purpose, any amount of time greater than one minute counts as a full day. Assignments will not be accepted after a period of 7 days beyond the deadline. A further extension may be granted if and only if the instructor is contacted before the end of the 7-day late period, provided that the circumstances are extreme enough to warrant an extension. In this case, the late penalty will be determined on the basis of the individual case. If a prior request for an extension is not made before the end of the 7-day period, the assignment will not be accepted under any circumstances.

You are required to submit working and documented code for assignments involving programming. If we have to edit your code in any way in order to get it to run, you will be penalized depending on the extent of the changes required. If we are unable to run your code, we will ask you to uncover the problem and you will be penalized depending on the changes as well as the amount of time taken to fix the code. Penalties for non-working code will be applied in the form of a percentage of the points carried by the assignment part in question.

Write-ups turned in on paper are due at the beginning of the class on the day of the deadline. Hardcopy write-ups turned in at the end of class will be penalized by 1 point.

General notes

Make sure that the source code is clear and well-documented internally as hand annotations will be ignored. The source code must also outline how it should be tested. Clear programming style and thorough testing will account for a substantial portion of your grade for the programming part of this assignment.

Sharing and re-using ideas, solutions and code with other students is prohibited. Please refer to the Collaboration policy page if you have any questions regarding the level of discussion permitted.

Refer to the Submissions page of the class website for specific instructions regarding assignment submissions. Please do not email assignments to any of the TAs.