Using Machine Learning to Solve Nonpolynomial Problems

by Matt Bogosian, December 11, 1997

Index

Author's Note
Introduction

The Problem

The Approach
Performance Evaluation and Conclusion
Appendix: Data Generation
Bibliography

Author's Note

After finishing Astro Teller's Exegesis, I was thoroughly motivated to use Machine Learning to do something revolutionary (or at least useful). What I failed to keep in mind was that the limitations of both myself and those of Genetic Programming (both the package I was using, as well as the paradigm itself). What follows is too heavy on ambition and too light on results. Oh well, someday I'll get it right....

Introduction

The complete set of nonpolynomial (NP) problems have taunted mathematicians and computer scientists for over 30 years. Solutions to such problems are often intractable through the brute force approach. Though no one has discovered a polynomial algorithmic solution to these problems, no one has disproved the existence of said algorithm.

Genetic programming might help us to arrive at a polynomial algorithm (if one exists) to solve NP problems. After 30 years, we have been unable to prove or disprove the absolute intractability of the NP set of problems. Perhaps it is time to let the machines have a crack at the problem.

The Problem

The NP problem I chose to attack was the (in)famous travelling salesman problem which requires one to find a Hamiltonian cycle with an optimal cost of a fully-connected weighted graph. Its representation is simple, and the paths are easy to iterate. This is useful, as both the problem and solution may be naturally and easily representable by a machine.

A special-case algorithm to find a solution to the travelling salesman problem may be easier to discover than a general one (especially for a machine). This special case solution may be useful as a step in discovering a general algorithm. Graphs in general are not required to be representable in d-space (where d is some dimension greater than 0). However, if we restrict a given graph to be representable in d-space, we might be able to find a general solution for that space. For example if d is 1, the nodes must be points in 1-space. This means they are all points on a single line. We can see intuitively that one optimal path is to start at the negative-most point and travel to the positive-most point, stopping at each point along the way, and then end back at the starting point. This approach works for every 1-space graph. If similar solutions can be found for 2- 3- or d-space graphs, perhaps those solutions may be used to find a general solution for graphs not representable in d-space.

The travelling salesman problem has some interesting properties. Given a fully-connected graph with n nodes, there are n * (n - 1) / 2 (or roughly n^2) edges in the graph (see Table 1). There are (n - 1)! / 2 (or roughly n!) unique Hamiltonian cycles in the graph (see Table 2). It is in this property where lies the intractable nature of this problem. Because there is no known algorithm to compute the optimal path, brute force must be used; all (n - 1)! / 2 paths must be computed. This means that, assuming there is only one optimal path in a given graph, a randomly selected path has a chance of 2 in (n - 1)! of being optimal. These aren't such bad odds if n is 5 or less, but they are less than favorable if n is 6 or higher (approaching probablistic impossibility after n is larger than 7).

Table 1: Nodes vs. Edges In a Fully-Connected Graph
Number of Nodes	Number of Edges
3	3
4	6
5	10
6	15
7	21
n	n * (n - 1) / 2

Table 2: Nodes vs. Hamiltonian Cycles In a Fully-Connected Graph
Number of Nodes	Number of Cycles
3	1
4	3
5	12
6	60
7	360
n	(n - 1)! / 2

The Approach

Machine Learning is nothing without data. I had none, so my first step was to generate training, testing and validation sets of travelling salesman graphs and solutions (the hard way). I wrote a small program to do this and generated 2 * (n - 1)! graphs for each of 3, 4, 5 and 6 nodes and (n - 1)! for 7 nodes. I generated four sets of these graphs representable in 1-, 2-, 3- and no-space. Note that the 3-node and 1-space graphs were provided as a "basis cases", as any path in a 3-node fully-connected graph is optimal, and any linearly ordered path in a 1-space graph is optimal.

The first Machine Learning representation I chose was the Neural Network. I chose to create an n^2 X n X n network, where the edge values were used as inputs. The edge from node i to j was inputted to the node with an index n * i + j. Where i was equal to j, a 0 was inputted. The outputs would be used to order the indexes and create a path. For example, assume n is equal to 3, and the following outputs were retrieved (based on index): 0.1, 0.5, 0.9. This would indicate a path of 2-1-0. I trained 25 networks (one for each of the node-dimension combinations, and one for each node count with arbitrary dimensions) for 100 epochs with a learning rate and momentum of 0.3.

The second Machine Learning representation I chose was Genetic Programming. After some expirimentation, I chose the following GP representation. I used a two-dimensional array to store the edges (similar to the Neural Network approach). I created an array of length n and initialized it to hold the values: 0, 1, ..., n - 1. This array was the solution array. I included the operations and terminals: +, -, *, /, %, >, >=, read_edge, swap_node, if_else, 0, 1, ..., n. read_edge took two nodes as arguments and returned the weight of the edge between them. If either of the two nodes were out of range (greater-than-or-equal-to n), INF was returned. swap_node took two nodes as arguments and swapped their places in the solution and returned the weight of the edge between them. If either of the two nodes were out of range (greater-than-or-equal-to n), no operation was performed and INF was returned. if_else took three arguments. If the first was nonzero, it returned the evaluation of the first argument. Otherwise, it returned the evaluation of the second argument. I ran it 20 times (one for each of the node-dimension combinations except for those in 1-space, and one for each node count with arbitrary dimensions) for 100 generations with a population size of 500. Fitness was calculated by measuring the difference in distances between the generated path and the optimal path (i.e., a nonzero fitness meant it didn't get them all right).

Performance Evaluation and Conclusion

Performance for both the Neural Network and the Genetic Programming approaches seemed to be only slighly better than random ordering for the node ranges tested. They both seemed to converge on random ordering after about 7 nodes. With the exception of the 1-dimensional graphs, there were no significant differences between the d-space and no-space dimension graphs (the 1-dimensional results are omitted). The following charts represent the performance averaged over all dimensions.

% Correct                  Neural Network Average Performance
          1 A**#############B###------------+---------------+--------------++
            +  *            +   ##          +               +               +
        0.9 ++  **                ##                            Random *A**++
        0.8 ++    *                 ###                 Neural Network #B##++
            |      *                   ##                                   |
        0.7 ++      **                   ##                                ++
            |         *                    #                                |
        0.6 ++         **                   B###                           ++
        0.5 ++           *                      ###                        ++
            |             **                       ###                      |
        0.4 ++                                        ##                   ++
            |               A****                       ###                 |
        0.3 ++                   ***                       #               ++
            |                       ****                    B#######        |
        0.2 ++                          ***                         ###### ++
        0.1 ++                             *                              ##+
            +               +               A***************+               B
          0 ++--------------+---------------+---------------A***************A
            3               4               5               6               7
                                     Number of Nodes

% Correct                Genetic Programming Average Performance
          1 A**#############+---------------+---------------+--------------++
            +  *            B####           +               +               +
            |   **               ###                            Random *A** |
        0.8 ++    *                 ####           Genetic Programming #B##++
            |      *                    ###                                 |
            |       **                     #                                |
            |         *                     B##                             |
        0.6 ++         **                      ##                          ++
            |            *                       ##                         |
            |             **                       #                        |
        0.4 ++                                      ##                     ++
            |               A****                     ##                    |
            |                    ***                    ##                  |
            |                       ****                  ##                |
        0.2 ++                          ***                                ++
            |                              *                B###########    |
            +               +               A***************+           ####+
          0 ++--------------+---------------+---------------A***************A
            3               4               5               6               7
                                     Number of Nodes

The Neural Net seems to perform as expected. It's very representation is based on being able to recognize inputs "at a glance". One thing we may want to infer here is that it apparently is not trained enough for larger graphs. This would most likely be due to an insufficient number of training examples. If, on average, the random guesser needs 360 tries to correctly guess the optimal path for a 7-node graph, 100 training examples is probably insufficient for training the Network. However, it is more likely that the Network is just developing a heuristic for choosing a pretty good path given a graph. This might mean that what it has "learned" is not generalized.

We also see that the Genetic Programming approach has performed similarly to the Neural Network approach. We have not generalized here either. This may be due to a lack of training examples as well as an insufficient representation. The representation chosen does not have indexed memory (and hence, is not Turing-complete).

While indexed memory is often overkill, the travelling salesman problem may be sufficiently complex to justify the use of indexed memory with Genetic Programming. Even if indexed memory is not required, in the worst case, adding it would be like adding a useless terminal. (Teller.)

Both approaches seem to converge to the random guesser after a sufficiently large number of nodes (and hence, a sufficiently large graph complexity). If I was optimistic, and assumed I did my own homework correctly, I might say that the results would suggest that there is no algorithm to solve the travelling salesman problem. However, it is much more likely that the representation is insufficient to find it (if it exists), or even that there was an insufficient number of training examples.

To conclude, the travelling salesman problem is not a simple one. The representations chosen were not sufficient to find at least a good heuristic for more than 5 nodes. It would be interesting to see how another representation performs. Perhaps a Turing-complete representation with automatically defined functions (e.g., PADO) might fair better. It might also be interesting to see if adding trigonometric funtions (e.g., sin, cos, tan, etc.) to the available operations helps with those graphs representable in d-space.

Appendix: Data Generation

The source to the program used to generate the data sets can be viewed in the gensm/ subdirectory. The manpage is also available in gensm/doc/README.

Bibliography

Teller, Astro. (1994). "Turing Completeness In the Language of Genetic Programming with Indexed Memory". The 1994 First International World Congress on Computational Intelligence, 136-146.

Teller, Astro. (1994). "Genetic Programming, Indexed Memory, The Halting Problem and Other Curiosities". The 1994 7th Annual Florida AI Research Symposium, 270-274.

Siegel, Eric (ed.). (1995). Collective Brainstorming at the AAAI Symposium on Genetic Programming. Cambridge, November 10-12, 1995.