Project 1: Shortest Paths

Next: Project 2: Merge Up: W4995 Programming and Problem Previous: Registering for the Course

Project 1: Shortest Paths

The shortest path problem is one of the most well-studied in the computer science literature. Given a weighted graph, the problem is to find the minimum total weight path(s) in the graph between pairs of nodes.

There are many algorithms developed for variants of the problem. Variants include directed versus undirected edges, arbitrary versus nonnegative weights, all pairs versus single-source paths etc.

The focus of the present problem is on space-time trade-offs. If we have a graph with n nodes and e edges, then there will be O(n²) shortest paths. If L is the average length of a shortest path, we'd need O(n²L) space to store them all. (Can you find an expression for L in terms of n and e?) We want to develop algorithms that have fast performance for queries of the form ``What is the shortest path from A to B?'' for fixed A and B. We'll call these ``fixed-endpoint queries.''

If we have sufficient space, then precomputing and storing all shortest paths is likely to give the best performance. Given suitable indexes or data structures, we can simply look up and output the shortest path in expected O(L) time (for hash-based structures). On the other hand, if we have a very limited amount of space, and we store nothing beyond the graph itself, then we need to run some shortest-path algorithm on the graph to answer our query. Such an approach uses much less space, but takes much more time. (How much? What are the complexities of the various shortest-path algorithms for fixed-endpoint queries?)

In between these two extremes, we might be able to balance space and time. Given some extra space S, can we utilize that space by precomputing something in order to speed up queries?

More formally, your problem is to design a space-adaptive scheme. Given a space bound S, you precompute information and store it into data structures of total size less than S. You provide functions to answer fixed-endpoint queries. You should design and execute experiments to measure the time taken for a bunch of random fixed-endpoint queries, and plot the time measurements against S.

You should consider both RAM-sized problems (up to 100MB or so) and problems bigger than that. For the bigger problems, think about using a database system to store the data, rather than developing your own disk-based data structures.

The primary test data set will be the US highway route system. The data and metadata is available at http://www.fhwa.dot.gov/hep10/data/data.html. It's called the ``National Highway Planning Network (NHPN).'' Look also at http://publish.uwo.ca/~jmalczew/gida_1/Zhan/Zhan.htm. This is an article comparing the performance of various shortest path algorithms on the highway data, describing the data structures used. The cited references will also be useful.

Here are some ideas to get you started:

How might you get the most bang for the buck out of a small amount of extra space?
Think about the problem as if you were trying to compress the full shortest-path list. What kinds of techniques from the field of data compression might apply here?
Don't think too much about low-level tricks such as trying to pack multiple fields into a single byte in order to save space. Instead, focus on the higher-level issues.
From the beginning, try your algorithms on subsets of the data of varying size. Techniques that work for small data sets might not scale to large ones.
If you had some advanced knowledge of the query distribution, would that help? For example, if you knew that the endpoints of all queries were cities (as opposed to arbitrary highway intersections), what would you do differently?
The highway data is based on geography, and so the weights reflect real distances. Can you take advantage of this fact?

We will define some standard subsets of the highway data, and ask all groups to generate space/time plots for these datasets. Groups can compare their plots to see who has the best space/time tradeoff. (Reminder: as always, the ranking of space-time tradeoff plots will not directly affect the grades; it is the underlying ideas that count.)

Next: Project 2: Merge Up: W4995 Programming and Problem Previous: Registering for the Course

Ken Ross
2000-09-27