Crossword Puzzles as Constraint Satisfaction Problems

Russell Steinthal  <>

Final Project Report for COMS W4721: Advanced Intelligent Systems (Prof. Siegel)


Crossword puzzles, in addition to being a hobby of potentially millions of people around the world, are an excellent real-world example  of a constraint satisfaction problem.  A solution to a crossword puzzle, after all, is an assignment of words to a grid such that each meets a number of constraints: a semantic constraint provided by the clue, a length constraint provided by the grid, and constraints on its its letters provided by the words which it overlaps in the grid.  Similarly, the construction of a puzzle is based on similar constraints, although the application is somewhat different.  This paper describes an attempt at modeling those constraints formally and using that model to implement systems to solve and generate puzzles.

Ideally, any solution to the crossword puzzle problem would first provide a general-purpose constraint satisfaction implementation which would be applicable to other domains.  Unfortunately, to date I have been unable to achieve such a general solution, although I have had limited success with crossword-specific solutions.  This paper, therefore, describes the models I have created and the successes (and failures) I have had trying to implement it.

Crossword Puzzle Models

Early in my work on this project, two potential models for the problem suggested themselves, one based on words, and the other based on characters.

A word- based approach

The word-based approach is based on a simple implementation of the general constraint-satisfaction framework (such as that suggested by Russel and Norvig), namely an assignment of values to a set of variables such that each meets all the specified constraints.  For crossword puzzles, the variables are the words which need to be filled in, one variable for each of the words in the puzzle solution.  The grid also provides the initial constraints for each word, such as the number of letters, what overlap relationships the word has with other words in the grid, etc.  The basic rule that whenever two words overlap, they must share the same letter in the overlap position must also be encoded in the system.  Finally, a dictionary provides the overall constraint that any valid value for a variable must be in the dictionary.  (Of course, that limits the crossword puzzles which can be solved, since most puzzles use at least one two-word phrase or proper name; from a theoretical perspective, however, one could construct a sufficiently large dictionary to avoid this problem.)

The advantage to this approach is that as a direct implementation of the generic CSP model, there are algorithms which are known to work,  and which would be extensible to other problems.  However, I was unable to come up with a suitable representation for the constraints listed, and so began considering other models.

A character-based approach

This approach is based on another view of the crossword puzzle, and suggests different algorithms.  Instead of looking at a solution as a set of words, it views the grid as a matrix of squares, and a solution, therefore, is an assignment of characters to grid squares such that each string of characters, read across or down, is a word in the dictionary.   Constraints on the length of words are represented by marking appropriate squares in the grid with "stop" flags, indicating that no letter goes in that square, and scanning should stop (i.e. whatever string has been constructed to that point should be checked against the dictionary).  This is, obviously, a direct translation of the blackened squares on a traditional crossword puzzle.

This seemed like a simpler to implement approach, so I chose to begin with it.  It also has the advantage that the time to solution is bounded, since each square can only have one of 26 possible values; unfortunately, the bound is very high for any reasonably sized grid, so heuristics are necessary to reduce the number of possibilities.

A note on natural language constraints

Neither of the models presented above considers the language constraints on a puzzle solution, namely that each word "answer" the associated clue.  To do so would require not only a complete semantic database (such as WordNet), but also a system which could parse the clues and generate queries into the database; this is beyond the scope of this project, although it could be added by inserting an additional constraint, whether in the set of constraints for the first approach, or as a check in the scanning stage of the second approach.

A somewhat more limited use of natural-language processing, however, could be of more immediate use.  In most puzzles, clues and their answers fill the same syntactic role.  For example, if the clue is a noun phrase, the solution is usually a noun; verb phrases and verbs similarly.  If the clues were run through a syntactic analyzer such as CASS, it should be possible to generate part-of-speech hints (or constraints, depending on one's degree of confidence in this rule)

A basic crossword-solving algorithm

A large part of the research on this project centered around the creation and refinement of the following algorithm for filling in a crossword grid (under the character-based model described above).  It takes as input a grid and  a word list; with minor variations, the same algorithm can be used either to solve a puzzle (in which case the grid is input with "stops" already in place, and the word list is (possibly a subset of) the dictionary) or to generate a puzzle solution (in which case the grid is initially empty and the word list contains the words to fill into the grid).

Note that a subgrid, as used in the algorithm, is a section of the grid space whose upper left square is the first letter of both its horizontal and vertical words; the subgrid then extends to all squares which are part of words which intersect either the horizontal or vertical word starting from the initial position.


While the algorithm above is believed to be correct, it is extremely inefficient due to the complexity of the problem.  Heuristics, such as considering only a subset of the dictionary (trimmed at the longest word in the puzzle, for instance) or the part-of-speech heuristics mentioned above can be used to reduce the number of possibilities which must be considered, but the algorithm is still at least polynomial time.

In addition, simple tasks like loading a complete dictionary can be time-consuming: on the author's machine (a Pentium-90), loading /usr/dict/words into the in-memory data structure took in excess of 20 minutes.  (I am aware that running the algorithm on a full /usr/dict/words would take an exorbitant amount of time; it is provided only for reference, and as an example of one of my earlier (foolish) experiments in this project).

Finally, I do not yet have a working implementation of this algorithm, in large part because its time constraints make running test cases fairly difficult.  A pen and paper implementation, however, shows that it should work.


Although an algorithm does exist to solve and generate crossword puzzles using constraint propagation, its time and space complexity is at least polynomial.  Future work, therefore, should focus on either finding a linear time algorithm (which I believe to be unlikely) or on heuristics to reduce the search space; as indicated above, the use of natural language rules to partition the dictionary might be one of the more promising ways of doing so.