Analyzing the Internet's BGP Routing Table Geoff Huston December 2000 Draft 2.0 [[Figures drawn from www.telstra.net/ops/bgp]] The Internet continues along a path of seeming inexorable growth, at a rate which has, at a minimum, doubled in size each year. How big it needs to be to meet future demands remains an area of somewhat vague speculation. Of more direct interest in the question of whether the basic elements of the Internet can be extended to meet such levels of future demand, whatever they may be. To rephrase this question, are there inherent limitations in the technology of the Internet, or its architecture of deployment that may impact on the continued growth of the Internet to meet ever expanding levels of demand? There are a number of potential areas to search for such limitations. These include the capacity of transmission systems, the switching capacity of routers, the continued availability of addresses and the capability of the routing system to produce a stable view of the overall topology of the network. The structure of the global Internet can be likened to a loose coalition of semi-autonomous constituent networks. Each of these networks operates with its own policies, prices, services and customers. Each network makes independent decisions about where and how to secure supply of various components that are needed to create the network service. The cement that binds these networks into a cohesive whole is the use of a common address space and a common view of routing. Integrity of routing within each network, or Autonomous System (AS) is maintained through the use of an interior routing protocol (or Interior Gateway Protocol, or IGP). The collection of networks is joined into one large routing domain through the use of an inter-network routing protocol (or Exterior Gateway Protocol, or EGP). When the scaling properties of the Internet was studied in the early 1990s two critical factors identified in the study were, not surprisingly, routing and addressing [RFC 1287]. As more devices connect to the Internet they consume addresses, and the associated function of maintaining reachability information for these addresses implies ever larger routing tables. The work in studying the limitations of the 32 bit IPv4 address space produced a number of outcomes, including the specification of IPv6, as well as the refinement of techniques of network address translation (NAT) intended to allow some degree of transparent interaction between two networks using different address realms. Growth in the routing system is not directly addressed by these approaches, as the routing space is the cross product of the complexity of the topology of the network, multiplied by the number of autonomous domains of connectivity policy multiplied by the base size of a routing table entry. When a network advertises a block of addresses into the exterior routing space this entry is generally carried across the entire exterior routing domain of the Internet. To measure the characteristics of the global routing table it is necessary to establish a point in the default-free part of the exterior routing domain and examine the BGP routing table that is visible at that point. Measurements of the size of the routing table were somewhat sporadic to start, and a number of measurements were take at approximate monthly intervals from 1988 until 1992 by Merit [RFC 1338]. This effort was resumed in 199? By Erik-Jan Bos at Surfnet in the Netherlands, who commenced measuring the size of the BGP table at hourly intervals in 1994. This measurement technique was adopted by the author in 1997, using a measurement point located at the edge of AS 1221 in Australia, again using an hourly interval for the measurement [Huston 2000]. We now have a detailed view of the dynamics of the Internet's routing table growth which spans some 13 years [Figure 1]. [Figure 1 - BGP Table Growth 1988 - 2000] BGP Table Growth At a gross level there appears to be four distinct phases of growth visible in this data. Pre-CIDR Growth The initial characteristics of the routing table size from 1988 until April 1994 show definite characteristics of exponential growth [Fig 2]. Much of this growth can be attributed to the growth in deployment of the historical Class C address space (/24 address prefixes). Unchecked, this growth would've lead to saturation of the BGP routing tables in non-default routers within a small number of years. Estimates of the time at which this would've happened vary somewhat, but the overall observation was that the growth rates were exceeding the growth in hardware and software capability of the deployed network. [Figure 2 - BGP Table growth 1988 - 1994] CIDR Deployment The response from the engineering community was the introduction of routing software which dispensed with the requirement for the Class A, B and C address delineation, replacing this scheme with a routing system that carried an address prefix and an associated prefix length. A concerted effort was undertaken in 1994 and 1995 to deploy CIDR routing, based on encouraging deployment of the CIDR-capable version of the BGP protocol, BGP4. The effects of this effort are visible in the routing table [Fig 3]. Interesting enough the efforts of the IETF CIDRD Working Group are visible in the table, with downward movements in the size of the routing table following each IETF meeting. [Figure 3 - BGP Table growth 1994 - 1995] The intention of CIDR was one of provider address aggregation, where a network provider is allocated an address block from the address registry, and announces this entire block into the exterior routing domain. Customers of the provider use a sub-allocation from this address block, and these smaller routing elements are aggregated by the provider and not directly passed into the exterior routing domain. During 1994 the size of the routing table remained relatively constant at some 20,000 entries as the growth in the number of providers announcing address blocks was matched by a corresponding reduction in the number of address announcements as a result of CIDR aggregation. CIDR Growth For the next four years until the start of 1998, CIDR proved remarkably effective in damping unconstrained growth in the BGP routing table. While other metrics of Internet size grew exponentially during this period, the BGP table grew at a linear rate, adding some 10,000 entries per year. [Fig 4] Growth in 1997 and 1998 was even lower than this linear rate. While the reasons behind this are somewhat speculative, it is relevant to note that this period saw intense aggregation within the ISP industry, and in many cases this aggregation was accompanied by large scale renumbering to fit within provider-based aggregated address blocks. During this period credit for this trend also must be given to Tony Bates, whose weekly reports of the state of the BGP address table, including listings of further potential for route aggregation provided considerable incentive to many providers to improve their levels of route aggregation [Bates 2000]. [Figure 4 - BGP table growth 1995 - 1998] A close examination of the table reveals a greater level of stability in the routing system at this time. The short term variation (hourly) variation in the number of announced routes reduced, both as a percentage of the number of announced routes, and also in absolute terms. One of the other benefits of using large aggregate address blocks is that an instability at the edge of the network is not immediately propagated into the routing core. The instability at the last hop is absorbed at the point at which an aggregate route is used in place of a collection of more specific routes. This, coupled with widespread adoption of BGP route flap damping, has been every effective in reducing the short term instability in the routing space. It has been observed that while the absolute size of the BGP routing table is one factor in scaling, another is the processing load imposed by continually updating the routing table in response to individual route withdrawals and announcements. The encouraging picture from this table is that the levels of such dynamic instability in the network have been reduced considerably by a combination of route flap damping and CIDR. Current Growth In late 1998 the trend of growth in the BGP table size changed radically, and the growth for the past two years is again showing all the signs of a re-establishment of exponential growth. It appears that CIDR has been unable to keep pace with the levels of growth of the Internet. [Fig 5]. Once again the concern is that this level of growth, if sustained, will outstrip the capability of hardware. [Figure 5 - BGP table growth 1998 - 2000] Related Measurements derived from BGP Table The level of analysis of the BGP routing table has been extended in an effort to identify the reasons for this resumption of exponential growth. Current analysis includes measuring the number of AS's in the routing system, and the number of distinct AS paths, the range of addresses spanned by the table and average span of each routing entry. AS Number Consumption Each network that is multi-homed within the topology of the Internet and wishes to express a distinct external routing policy must use an AS to associate its advertised addresses with such a policy. In general, each network is associated with a single AS, and the number of AS's in the default-free routing table tracks the number of entities that have unique routing policies. There are some exceptions to this, including large global transit providers with varying regional policies, where multiple AS's are associated with a single network, but such exceptions are relatively uncommon. The trend of AS number deployment over the past four years is also exponential. [Fig 6] The growth in the number of AS's can be correlated with the growth in the amount of address space spanned by the BGP routing table. At the end of 2000, the span of addresses is growing at an annual rate of 7%, while the number of AS's is growing by 51%. Each AS is advertising smaller average address spans per AS. This points to increasingly finer levels of routing detail being announced into the global routing domain, a trend which causes some level of concern. [Figure 6 - AS number deployment] This is a likely result of an increasingly dense interconnection mesh, where an increasing number of networks are moving from a single-homed connection into multi-homing and peering. The spur for this may well be the declining unit costs of communications bearer services. If this rate of growth continues, the 16 bit AS number set will be exhausted by mid-2005 [Fig 7]. Work is underway within the IETF to modify the BGP protocol to carry AS numbers in a 32 bit field. [Chen 2000] While the protocol modifications are relatively straightforward, the major responsibility rests with the operations community to devise a transition plan that will allow gradual transition into this larger AS number space. [Figure 7 - AS number projections] Average Prefix Length of Advertisements The intent of CIDR aggregation was to support the use of large aggregate address announcements in the BGP routing table. To check whether this is still the case the average span of each BGP announcement has been tracked for the past 12 months. The data indicates a decline in the average span of a BGP advertisement from 16,000 individual addresses in November 1999 to 12,100 in December 2000. [Fig 8] This corresponds to an increase in the average prefix length from /18.03 to /18.44. Separate observations of the average prefix length used to route traffic in operation networks in late 2000 indicate an average length of 18.1 [Lothberg 2000]. Again, this trend is cause for concern as it implies the increasing spread of traffic over greater numbers of increasingly finer forwarding table entries. This, in turn, has implications for the design of high speed core routers, particularly when extensive use is made of cached forwarding entries within the switching subsystem. [Figure 8 - Average span of BGP advertisement] Prefix length Distribution In addition to looking at the average prefix length, the analysis of the BGP table also includes an examination of the number of advertisements of each prefix length. An extensive effort was introduced in the mid-nineties to move away from extensive use of the Class C space and to encourage providers to advertise larger address blocks. This has been reinforced by the address registries who have used provider allocation blocks of /19 and, more recently, /20. These measures were introduced when there were some 20,000 - 30,000 entries in the BGP table. Some five years later it is interesting to note that of the 96,000 entries in the routing table, some 53,000 entries have a /24 prefix. In absolute terms the /24 prefix set is the fastest growing prefix set in the entire BGP table. The routing entries of these smaller address blocks also show a much higher level of change on an hourly basis. While a large number of BGP routing points perform route flap damping, nevertheless there is still a very high level of announcements and withdrawals of these entries in this particular area of the routing table when viewed using a perspective of route updates per prefix length. Given that the number of these small prefixes are growing rapidly, there is cause for some concern that the total level of BGP flux, in terms of the number of announcements and withdrawals per second may be increasing, despite the pressures from flap damping. This concern is coupled with the observation that, in terms of BGP stability under scaling pressure, it is not the absolute size of the BGP table which is of prime importance, but the rate of dynamic path recomputations that occur in the wake of announcements and withdrawals. Withdrawals are of particular concern due to the number of transient intermediate states that the BGP distance vector algorithm explores in processing a withdrawal. Current experimental observations indicate a typical convergence time of some 2 minutes to propagate a route withdrawal across the BGP domain. [Labowitz] An increase in the density of the BGP mesh, coupled with an increase in the rate of such dynamic changes, does have serious implications in maintaining the overall stability of the BGP system as it continues to grow. The registry allocation policies also have had some impact on the routing table prefix distribution. The original registry practice was to use a minimum allocation unit of a /19, and the 10,000 prefix entries in the /17 to /19 range are a consequence of this policy decision. More recently the allocation policy now allows for a minimum allocation unit of a /20 prefix, and the /20 prefix is used by some 4,000 entries, and in relative terms is one of the fastest growing prefix sets. The number of entries corresponding to very small address blocks (smaller than a /24), while small in number as a proportion of the total BGP routing table, is the fastest growing in relative terms. The number of /25 through /32 prefixes in the routing table is growing faster, in terms of percentage change, than any other area of the routing table. If prefix length filtering were in widespread use, the practice of announcing a very small address block with a distinct routing policy would have no particular beneficial outcome, as the address block would not be passed throughout the global BGP routing domain and the propagation of the associated policy would be limited in scope. The growth of the number of these small address blocks, and the diversity of AS paths associated with these routing entries, points to a relatively limited use of prefix length filtering in today's Internet. In the absence of any corrective pressure in the form of widespread adoption of prefix length filtering, the very rapid growth of global announcement of very small address blocks is likely to continue. Aggregation and Holes With the CIDR routing structure it is possible to advertise a more specific prefix of an existing aggregate. The purpose of this more specific announcement is to punch a 'hole' in the policy of the larger aggregate announcement, creating a different policy for the specifically referenced address prefix. Another use of this mechanism is not to promulgate a different connectivity policy, but to perform some rudimentary form of load balancing and mutual backup for multi-homed networks. In this model a network may advertise the same aggregate advertisement along each connection, but then advertise a set of specific advertisements for each connection, altering the specific advertisements such that the load on each connection is approximately balanced. The two forms of holes can be readily discerned in the routing table - while the approach of policy differentiation uses an AS path which is different from the aggregate advertisement, the load balancing and mutual backup configuration uses the same As path for both the aggregate and the specific advertisements. While it is difficult to understand whether the use of such more specific advertisements was intended to be an exception to a more general rule or not within the original intent of CIDR deployment, there appears to be very widespread use of this mechanism within the routing table. Some 37,500 advertisements, or 37% of the routing table, is being used to punch policy holes in existing aggregate announcements. [Fig 9] Of these the overall majority of some 30,000 routes use distinct AS paths, so that once more we are seeing a consequence of finer levels of granularity of connection policy in a densely interconnected space. [Figure 9 - More Specific Advertisements] While long term data is not available for the relative level of such advertisements as a proportion of the full routing table, the growth level does strongly indicate that policy differentiation at a fine level within existing provider aggregates is a significant driver of overall table growth. Address Consumption Originally there were two major concerns a decade ago over scaling of the Internet, and of the two the consumption of address space was considered to be the more immediate and compelling threat to the continued viability of the network to sustain growth. Within the scope of this exercise it has been possible to track the total span of address space covered by BGP routing advertisements. Over the period from November 1999 until December 2000 the span of address space has grown from 1.02 billion addresses to 1.06 billion. However, there are a number of /8 prefixes which are periodically announced and withdrawn from the BGP table, and if the effects of these prefixes is removed, the final value of addresses spanned by the table is some 1.09 billion addresses. This is an annual growth rate of a little under 7%. [Fig 10] Compared to the 42% growth in the number of routing advertisements, it would appear that much of the growth of the Internet in terms of growth in the number of connected devices is occurring behind various forms of NATs. In terms of solving the perceived finite nature of the address space identified just under a decade ago, the Internet appears so far to have embraced the approach of using NATs, irrespective of their various perceived functional shortcomings. [RFC 2993] This also supports the observation of smaller address fragments supporting distinct policies in the BGP table, as such small address blocks encompass arbitrarily large networks located behind one or more NAT gateways. [Figure 10 - Total Address Space] Anomalies A common space such as the inter-provider domain is not actively managed by any single entity, and it is often the case that various anomalies appear in the routing table from time to time. One notable event occurred in late 1997, when some large prefixes were deconstructed into a massive set of /24 prefixes and this set was inadvertently passed into the inter-provider BGP domain. The BGP table graphs show a sudden upswing in the number of routing table entries from 50,000 entries to some 78,000 entries. It could have been higher except that a commonly used routing hardware platform at the time ran into table memory exhaustion at that number of table entries and further promulgation of additional routing entries ceased. A number of other anomalies also exist in the table, including the presence of a /31 prefix and several hundred /32 prefixes. While many of these anomalies can be attributed to configuration errors of various forms, the underlying observation is that there are no universally used strong filters on what can broadcast into the BGP routing space. Considering the distributed nature of this table and the critical role that it plays in supporting the global Internet, this can be considered a significant current vulnerability. One potential response is to place stronger emphasis on authentication mechanisms that can be used as a precondition to accepting BGP advertisements, intended to create a greater level of resiliency in the face of inadvertent, and also potentially deliberate, actions that affect the integrity of this routing table. Conclusions There are strong parallels between the BGP routing space and the condition commonly referred to as the tragedy of the commons. The BGP routing space is simultaneously everyone's problem, as it impacts the stability and viability of the entire Internet, and noone's problem in that no single entity can be considered to manage this common resource. In other common resource domains, once the value of the resource is placed under threat due to damaging exploitative practices, the most typical form of corrective action is through the imposition of a consistent set of policies and practices intended to achieve a particular outcome. The vehicle for such an imposition of policies and practices is most commonly that of regulatory fiat. In a globally distributed space such as the BGP table it is challenging task to identify the source and authority of such potential regulatory activity. Multi-Homed small networks It would appear that one of the major drivers of the recent growth of the BGP table is that of small networks multi-homing with a number of peers and a number of upstream providers. In the appropriate environment where there are a number of networks in relatively close proximity, using peer relationships can reduce total connectivity costs, as compared to using a single upstream service provider. Equally significantly, multi-homing with a number of upstream providers is seen as a means of improving the overall availability of the service. In essence, multi-homing is seen as an acceptable substitute for upstream service resiliency. This has a potential side-effect that when multi-homing is seen as a preferable substitute for upstream provider resiliency, the upstream provider cannot command a price premium for proving resiliency as an attribute of the provided service, and therefore has little incentive to spend the additional money required to engineer resiliency into the network. The actions of the network's multi-homed clients then become self-fulfilling. One way to characterize this behavior is that service resiliency in the Internet is becoming the responsibility of the customer, not the service provider. In such an environment resiliency still exists, but rather than being a function of the bearer or switching subsystem, resiliency is provided through the function of the BGP routing system. The question is not whether this is feasible or desirable in the individual case, but whether the BGP routing system can scale adequately to continue to undertake this role. A denser interconnectivity mesh The decreasing unit cost of communications bearers in many part of the Internet is creating a rapidly expanding market in exchange points and other forms of inter-provider peering. The deployment model of a single-homed network with a single upstream provider is rapidly being supplanted by a model of extensive interconnection at the edges of the Internet. The underlying deployment model assumed by CIDR assumed a different structure, more akin to a strict hierarchy of supply providers. The business imperatives driving this denser mesh of interconnection in the Internet are irresistible, and the casualty in this case is the CIDR-induced dampened growth of the BGP routing table. Traffic Engineering via Routing Further driving this growth in the routing table is the use of selective advertisement of smaller prefixes along different paths in an effort to undertake traffic engineering within a multi-homed environment. While there is considerable effort being undertaken to develop traffic engineering tools within a single network using MPLS as the base flow management tool, inter-provider tools to achieve similar outcomes are considerably more complex when using such switching techniques. At this stage the only tool being used for inter-provider traffic engineering is that of the BGP routing table, further exacerbating the growth and stability pressures being placed on the BGP routing domain. The effects of CIDR on the growth of the BGP table have been outstanding, not only because of their initial impact in turning exponential growth into a linear growth trend, but also because CIDR was effective for far longer than could've been reasonably expected in hindsight. The current growth factors at play in the BGP table are not easily susceptible to another round of CIDR deployment pressure within the operator community. It may well be time to consider how to manage a BGP routing table which has millions of small entries, rather than the expectation of tens of thousands of larger entries. We started this journey over ten years ago when considering the scaling properties of addressing and routing. It is perhaps fitting that we tie the two concepts back together again as we consider the future of the BGP inter-provider routing space. The observation that the BGP growth pressures are largely due to an uptake in multi-homing and the associated advertisement of discrete connectivity policies by increasingly smaller networks at the edge of the network has a corollary for address allocation policy. In such a ubiquitous environment of multi-homed networks we will also need to review how address blocks are allocated to network providers, as the concept of provider-based address allocation which assumes a relatively strict hierarchical supply structure is becoming less and less relevant in today's Internet. [RFC 1287] "Towards the Future Internet Architecture", D. Clark, L. Chapin, V. Cerf, R. Braden, R. Hobby, RFC 1287, December 1991. [RFC 1338[ "Supernetting: an Address Assignment and Aggregation Strategy", Supernetting: an Address Assignment and Aggregation Strategy", V. Fuller, T. Li, J. Yu, K. Varadhan, June 1992. [RFC 2993] "Architectural Implications of NAT", T. Hain, November 2000. [Bates 2000] "The CIDR Report", T. Bates, updated weekly at http://www.employees.org/~tbates/cidr-report.html [Chen 2000] "BGP Support for four-octet AS number space", E. Chen, Y. Rekhter, work in progress (currently published as an Internet Draft: draft-chen-as4bytes-00.txt), November 2000. [Huston 2000] "BGP Table Report" updated hourly at http://www.telstra.net/ops/bgp [Labowitz] bgp convergence [Lothberg 2000] Peter Lothberg, personal communication.