The shortest path problem is one of the most well-studied in the computer science literature. Given a weighted graph, the problem is to find the minimum total weight path(s) in the graph between pairs of nodes.
There are many algorithms developed for variants of the problem. Variants include directed versus undirected edges, arbitrary versus nonnegative weights, all pairs versus single-source paths etc.
The focus of the present problem is on space-time trade-offs. If we have a graph with n nodes and e edges, then there will be O(n2) shortest paths. If L is the average length of a shortest path, we'd need O(n2L) space to store them all. (Can you find an expression for L in terms of n and e?) We want to develop algorithms that have fast performance for queries of the form ``What is the shortest path from A to B?'' for fixed A and B. We'll call these ``fixed-endpoint queries.''
If we have sufficient space, then precomputing and storing all shortest paths is likely to give the best performance. Given suitable indexes or data structures, we can simply look up and output the shortest path in expected O(L) time (for hash-based structures). On the other hand, if we have a very limited amount of space, and we store nothing beyond the graph itself, then we need to run some shortest-path algorithm on the graph to answer our query. Such an approach uses much less space, but takes much more time. (How much? What are the complexities of the various shortest-path algorithms for fixed-endpoint queries?)
In between these two extremes, we might be able to balance space and time. Given some extra space S, can we utilize that space by precomputing something in order to speed up queries?
More formally, your problem is to design a space-adaptive scheme. Given a space bound S, you precompute information and store it into data structures of total size less than S. You provide functions to answer fixed-endpoint queries. You should design and execute experiments to measure the time taken for a bunch of random fixed-endpoint queries, and plot the time measurements against S.
You should consider both RAM-sized problems (up to 100MB or so) and problems bigger than that. For the bigger problems, think about using a database system to store the data, rather than developing your own disk-based data structures.
The primary test data set will be the US highway route system. The data and metadata is available at http://www.fhwa.dot.gov/hep10/data/data.html. It's called the ``National Highway Planning Network (NHPN).'' Look also at http://publish.uwo.ca/~jmalczew/gida_1/Zhan/Zhan.htm. This is an article comparing the performance of various shortest path algorithms on the highway data, describing the data structures used. The cited references will also be useful.
Here are some ideas to get you started:
We will define some standard subsets of the highway data, and ask all groups to generate space/time plots for these datasets. Groups can compare their plots to see who has the best space/time tradeoff. (Reminder: as always, the ranking of space-time tradeoff plots will not directly affect the grades; it is the underlying ideas that count.)