COMS 4180 Project Page: Michael E. Locasto

Literature Review

Simple substitution isn't the focus of much cryptographic research. However, the problem domain itself is a useful and interesting testbed for other types of applications.

My project focuses on implementing a simple learning mechanism that will use statistical analysis to break simple substitution ciphers. The following resources are relevant to the task.

Pfleeger [1] begins with a section describing many types of simple substitution ciphers and permutation ciphers. He then continues to describe polyalphabetic ciphers and other basic mechanisms. A discussion on cryptoanalysis of these methods is presented.

A very concise discussion and some tools are available from http://starbase.trincoll.edu/~crypto/historical/substitution.html [2]. Chris Savarese and Brian Hart put this page together.

The page by Fred Brande at http://www.codasaurus.com/Cipher-1.htm[3] is a very detailed analysis of simple substitution.

http://www.dd.chalmers.se/~f96edfa/[4] describes the application of genetic programming in solving simple substitution ciphers.

Jakobsen [5] describes a method of attacking simple substituion where only the distribution of digram in both the plaintext and ciphertext needs to be known. A digram is a related pair of letters (eg, 'th' 'sh' 'ch' 'ed'). The algorithm works by preguessing a key and refining that key in successive passes. He performs significant statistical performance measures of his algorithm.

Bellovin and Wagner[6] describe a known ciphertext attack on DES. They assume that having a known plaintext/ciphertext pair is impractical and create a method using just known ciphertext and some statistical information about the plaintext in general. The paper is available here.

Schneier [7] describes the concept of entropy and the unicity distance as statistical measures of the "goodness" of a ciphertext. This goodness property should give some indication of the overall strength of the cipher used to produce the ciphertext. Entropy, Schnier explains, is a measure of the amount of uncertainty that is present in a bit of information. High entropy in passwords, for example, raises the minimum amount of work needed to crack that password. Schneier also relates the topic of Shannon's unicity distance. This unicity distance is a measure of the ratio between known ciphertext and probable plaintext. Schneier says that "for a standard English message, the unicity distance is K/6.8 characters, where K is hte key length in bits." This knowledge comes in to play in recognizing the plaintext generated by my program. The program can attempt to decrypt a given ciphertext and then return a measure of confidence indicating how close to English the plaintext should be. Naturally, the user will then make a final determination, either throwing away the result or completing the decryption.

References

1. Pfleeger, Charles P. Security In Computing, Second Edition Prentice Hall. 1997.

2. Hart, Brian and Chris Savarese. Simple Substitution Ciphers.

3. Brande, Fred. The Creation & Solution of Simple Substitution Ciphers.

4. Nilsson, Sofia and Edward Fäldt SOLVING SIMPLE SUBSTITUTION CRYPTOGRAMS USING GENETICAL PROGRAMMING

5. T. Jakobsen. A fast method for cryptanalysis of substitution ciphers. Cryptologia, 19(3), 1995. http://citeseer.nj.nec.com/jakobsen95fast.html

6. Bellovin, Steven M. and David Wagner. A Programmable Plaintext Recognizer.

7. Schneier, Bruce. Secrets & Lies: Digital Security in a Networked World. Wiley, 2000.