PhD Candidacy Exam
Advisor: Henning Schulzrinne
The content delivery model for the Internet is evolving from centralized hosting to an affiliated broadcast or content distribution network model. Until recently, much of the Web's rich media content, especially streaming media, was hosted from a centralized location to be accessed by clients distributed throughout the Internet. Hot spots due to popular content are an obvious problem with this model. Early attempts to solve this problem included mirroring sites and installing caching proxies at IAPs (Internet Access Providers). While caching proxies help alleviate hot spots and address IAP requirements for minimizing traffic on their backbone and peering links, they lack focus on Content Publisher's (CPs) concerns. Mirroring generally involves a user manually selecting the server -- without knowledge of which would actually provide the best service. A key turning point in this evolution was the realization that CPs were willing to pay to outsource fast, reliable delivery of their content. Specifically, CPs will pay not just for hosting services, but for the distributed hosting services offered by Content Distribution Networks (CDNs). We anticipate content distribution and delivery services are only the first of a spectrum of content services that will radically change Internet operations.
Building the case for change requires a benchmark through which we can evaluate current technology and predict its shortcomings. There has been extensive research [1, 2, 3, 4, 5] into attempting to build an appropriate model of the Internet over the past ten years. While it is recognized that the rapid evolution of the Internet ensures characterization work cannot be definitive, certain invariants have been identified from which we can gain insights and derive solutions.
With this background on the state of the art in characterizing the Internet, focusing primarily on WWW traffic, we can begin an in-depth look at current, best effort technology for storage in the network. These techniques focus on exploiting well-known properties such as temporal locality, Zipf-like popularity distributions, hierarchical efficiencies, and the periodic nature of human access to content. However, they tend to fall short when it comes to addressing the heavy tailed distributions characterizing file sizes, distribution times, idle times between accesses, session durations, and number of page requests per site. In this context, we will discuss local cache management algorithms [10, 11], organization of cache meshes [6, 7, 9], and optimal placement of storage [8, 12, 23, 24, 25] in the network.
While significant gains have been achieved with Web caching, it exhibits limitations on several fronts. Specifically, the requirements of some content types (e.g., streaming media) and some content providers can not be adequately met through best effort services. From a technical perspective, hierarchies and whole-object caching tend to create instability as the object sizes grow. From a business perspective, the value proprosition, and therefore the quality of service, of all content types is not uniform. We evaluate current proposals for video caching [13,14, 15, 16] and replication [17, 19, 20, 21], in this context.
Finally, we will discuss current proposals for content distribution and delivery networks [18, 22] and for peer-to-peer solutions for distribution content storage and access, and the interoperation of various content service providers [26, 27]. The goal of these proposals is to better address the requirements of CPs, significantly improve end user service, and can potentially create a content services environment which goes significantly beyond today's distribution and delivery orientation. I will include an evaluation [24, 28] of these solutions and the pressing research issues for developing this next phase in the evolution of Web infrastructure.
1. Web Server Workload Characterization:
The Search for Invariants
Arlitt, M., Williamson, C.,
ACM Sigmetrics Conference, 1996.
2. Web Caching and Zip-like Distributions:
Evidence and Implications
Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.,
IEEE Infocom, 1999.
3. Self-Similarity in World Wide Web
Traffic: Evidence and Possible Causes
Crovella, M., Bestavros, A..,
IEEE/ACM Transactions on Networking, Vol 5, No 6, December 1997
4. Traffic Analysis of a Web Proxy Caching
Mahanti, Williamson, C., Eager, D.,
IEEE Network, May/June 2000
5. The Dependence of Internet User Traffic
Characteristics on Access Speed
Vicari, N., Kohler, S., Charzinski, J.,
University of Wurzburg, Research Report 246, January, 2000
6. A Quantitative Comparison of Graph-Based
Models for Internet Topology
Zegura, E., Calvert, K., Donahoo, M.,
IEEE Transactions on Networking , 1997
7. World Wide Web Caching: The Application-Level
View of the Internet
Baentsch, M., Baum, L., Molter, G., Rothkugel, S., Sturm, P.
IEEE Communications, June 1997
8. Self-Organizing Wide-Area Network
Bhattacharjee, S., Calvert, K., Zegura, E.,
IEEE Infocom, 1998
9. Beyond Hierarchies: Design Considerations
for Distributed Caching on the Internet
Tewari, R., Dahlin, M., Vin, H., Kay, J.
UTCS Technical Report: TR98-04, 1998
10. Caching on the World Wide Web
Aggarwal, C., Wolf, J., Yu, P.,
IEEE Transactions on Knowledge and Data Engineering, Vol 11, No 1, January 1999
11. GreedyDual* Web Caching Algorithm:
Exploiting the Two Sources of Temporal Locality in Web Request Streams
Jin, S., Bestavros, A.,
Proceedings of Web Caching Workshop, 2000
12. Coordinated Placement and Replacement
for Large-Scale Distributed Caches
Korupolu, M., Dahlin, M.,
Workshop on Internet Applications, July 1999
13. Resource Based Caching for Web Servers
Tewari, R., Vin, H., Dan, A., Sitaram, D.,
Proceedings of SPIE/ACM Conference on Multimedia Computing and Networking, 1998
14. An Active Services Framework and
Its Application to Real-time Multimedia Transcoding. Proceedings of ACM
Amir, E., McCanne, S., Katz, R.,
SIGCOMM, Sept 1998
15. Multimedia Proxy Caching Mechanism
for Quality Adaptive Streaming Applications in the Internet
Regaie, R., Yu, H., Handley, M., Estrin, D.,
Proceedings of IEEE Infocom, 2000
16. Optimized Caching in Systems with
Heterogenous Client Populations
Eager, D., Ferris, M., Vernon, M.,
Performance Evaluation 42 (2000)
17. The Case for Geographical Push Caching
Gwertzman, J., Seltzer, M.,
Proceedings of the 1995 Workshop on Hot Operating Systems, May 1995.
18. Distributed Network Storage
University of California at Berkeley, Ph.D. Thesis, 1999.
19. World-Wide Web Cache Consistency
Gwertzman, J., Seltzer, M.
Proceedings of the 1996 Usenix Technical Conference, January 1996.
20. Using Leases to Support Server-Driven
Consistency in Large-Scale Systems
Yin, J., Alvisi, L., Dahlin, M., Lin, C.,
IEEE Transactions on Knowledge and Data Engineering Special Issue on Web Technologies. 1999
21. Adaptive Leases: A Strong Consistency
Mechanism for the World Wide Web
Duvvri, V., Shenoy, P., Tewari, R.,
Proceedings of IEEE INFOCOM'2000, Tel Aviv, Israel, March 2000.
22. Active Names: Flexible Location and
Transport of Wide-Area Resources
Vahdat, A., Anderson, T., Dahlin, M.,
USENIX Symposium on Internet Technologies and Systems (USITS99), October 1999
23. On the Optimal Placement of Web Proxies
in the Internet
Li, B., Golin, M., Italiano, F., Deng, X., Sohraby, K.,
Proceedings of Infocom 1999.
24. The Cache Location Problem
Krishnan, P., Raz, D., Shavitt, Y.,
IEEE/ACM Transactions on Networking, August 2000.
25. Optimum Distribution of Switching Centers in a Communications
Network and Some Related Graph Theoretic Problems.
Operations Research, 13, 1965.
26. A Model for CDN Peering
Day, M., Cain, M., Tomlinson, G.,
Internet Draft November, 2000, http://www.ietf.org/internet-drafts/draft-day-cdnp-model.04.txt
27. CDN Peering Architectural Overview
Green, M., Cain, B., Tomlinson, G., Thomas, S.,
Internet Draft November, 2000, http://www.ietf.org/internet-drafts/draft-cdnp-gen-arch-02.txt
28. An Economy for Managing Replicated
Data in Autonomous Decentralized Systems
Ferguson, D., Nikolaou, C., Yemini, Y.,
Proceedings fo International Symposium on Autonomous and Decentralized Systems, 1993.