Network Working Group E. Cooper Internet-Draft P. Matthews Expires: August 28, 2006 Avaya February 24, 2006 The Effect of NATs on P2P SIP Overlay Architecture draft-matthews-p2psip-nats-and-overlays-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 28, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document discusses the constraints that NATs put on the possible overlay architectures of a P2P SIP system. Given what seems to be a reasonable set of assumptions on where nodes are deployed and the kinds of NATs they are located behind, the document concludes that a structured partial-mesh overlay network exhibiting a property known as "symmetric interest" is the most reasonable overlay architecture. Cooper & Matthews Expires August 28, 2006 [Page 1] Internet-Draft NATs and Overlays February 2006 1. Introduction In general terms, P2P overlays attempt to eliminate a central bottleneck in a system by taking the data traditionally stored on a server (or set of servers) and dispersing it amongst a number of peers. Also in general terms, NAT boxes multiplex many "private" IP addresses onto a single "public" address. As a result of this multiplexing function, a NAT which receives an unsolicited message on its "public" address cannot determine which "private" address should receive it. Such messages are generally discarded. In client-server network topologies this is not a problem, since servers are usually given "public" addresses and clients never receive unsolicited messages. In P2P networks however, peers that cannot receive unsolicited messages cannot participate in the overlay. It follows then, that the presence of NATs in the network topology has a major influence on the overlay architecture. Comments on this draft are solicited and should be addressed either to the authors or to the P2P-SIP mailing list at p2p-sip@cs.columbia.edu (see https://lists.cs.columbia.edu/cucslists/listinfo/p2p-sip). Cooper & Matthews Expires August 28, 2006 [Page 2] Internet-Draft NATs and Overlays February 2006 2. Scenario Figure 1 shows a set of peers that want to create a P2P SIP overlay network. Though this set is rather small, it still illustrates some key points. ,-------. ,' P P `. ,-----. ( P ) ( P ) `. P P ,' `-----' `-------' NAT NAT _.------------. ,--'' `---. ,-' `-. / \ / \ ( Internet ) \ / \ / `-. ,-' `---. _.--' N A T `------------'' ,-. ,-. ,-----. / P \ ( P ) ,' `. ( P ) `-' ( P P ) \ P/ `. ,' `-' `-----' Legend P - Peer node NAT - NAT box Figure 1: Example Scenario In this figure we see six clouds. Five represent subnets containing peers and one represents the Internet. Some of the subnets contain just a single peer while others contain multiple peers. One of the subnets uses public IP addresses, while the other subnets have NATs between them and the Internet and thus use private addresses. Two of the subnets are sitting behind the same NAT. Not illustrated in this figure are more complex NAT scenarios -- for example, a cascading NAT scenario where there are two NATs between a subnet and the Internet. This document talks about overlay architectures for hooking these peers together into a P2P SIP system. Cooper & Matthews Expires August 28, 2006 [Page 3] Internet-Draft NATs and Overlays February 2006 3. Assumptions This section presents and discusses our assumptions about the P2P SIP system, about the NATs the system must traverse, and about the interaction between the system and these NATs. The first assumption deals with the size range of P2P SIP systems. We assume that there can be many different P2P SIP systems, ranging from very small systems to very large systems, and nodes can be scattered anywhere around the world. This assumption is not directly related to NATs, but influences the other assumptions. Assumption 1: There may be many different P2P SIP systems, with sizes ranging from two nodes to millions of nodes, with the node scattered across one to millions of subnets. The next assumption deals with the question of whether a P2P SIP system will always have a certain proportion of nodes with public IP addresses. This question is important because nodes with public IP addresses make things easier, and if there is a large proportion of them, then nodes behind NATs can be treated as leaf nodes (that hang off of nodes with public IP addresses). Most P2P systems (e.g., file sharing systems) assume a certain proportion of nodes with public IP addresses. However, this assumption seems less tenable with P2P SIP systems, especially in systems where the system is used in an enterprise and/or is primarily composed of hard phones (rather than general- purpose computers). Thus we make the following assumption: Assumption 2: There can be P2P SIP systems where every peer has a NAT box between it and the open Internet. In corporate environments, we expect this situation to be common. The next set of assumptions deal with the behavior of the various NATs. At this point, readers should be familiar with references [1], [2], [3]. In this document, we use the terminology of the IETF BEHAVE working group. The two key behaviors of a NAT as are mapping behavior and its filtering behavior. Consider the various possible mapping behaviors first (c.f. section 4.1 of [1]). If a NAT has a behavior other than "Endpoint Independent mapping", then peers behind the NAT cannot use "UDP hole- punching" (see [3]). The only way to support these peers is by treating them as leaf nodes hanging off a "relay peer". This relay Cooper & Matthews Expires August 28, 2006 [Page 4] Internet-Draft NATs and Overlays February 2006 peer must have either a public IP address or be located behind a NAT with a filtering behavior of "Endpoint Independent filtering". Since (a) acting as a relay is very bandwidth- and processor-intensive (which some peers may not be able to handle) and since (b) a given P2P network may not have a node that has the required address properties to act as a relay, many P2P SIP networks may not be able to support peers behind NATs which do not provide "Endpoint Independent mapping". For these reasons, we limit our architectural investigation to NATs with "Endpoint Independent mapping". (A later version of this document may describe the necessary extensions to support NATs that do not satisfy this assumption). Assumption 3: All NATs must have a mapping behavior of "Endpoint Independent mapping". Note that various investigations (see, for example, sections 6.2 and 6.4 of [3]) have suggested that about 85% of all NATs have a mapping behavior of "Endpoint Independent mapping". Now consider the various possible filtering behaviors (c.f. section 4.1 of [1]). It is easier to create a P2P network with nodes behind NATs that have a filtering behavior of "Endpoint Independent filtering" than with nodes behind NATs with other filtering behaviors. However, other filtering behaviors are seen as more secure, and especially in corporate NATs, these other filtering behaviors are more common. So can we assume that the NATs in the P2P system have a variety of filtering behaviors, and that at least a significant percentage of them have the more P2P-friendly "Endpoint Independent filtering" behavior? Unfortunately, this seems overly optimistic. This may be true in larger systems with a significant number of residential-based peers, but in smaller deployments and/or deployments with a large number of enterprise-based peers, this seems unlikely. Especially in a P2P SIP systems deployed in enterprise environments, it seems likely that many systems will reside exclusively behind NATs with a filtering behavior of "Address Dependent filtering" (or worse). So it seems best to be very conservative in this regard, and assume the worst possible filtering behavior. Assumption 4: The P2P SIP system must function when all peers are located behind NATs with a filtering behavior of "Address and Port Dependent filtering". An architecture that works in this situation will also work where Cooper & Matthews Expires August 28, 2006 [Page 5] Internet-Draft NATs and Overlays February 2006 some NATs have a less-restrictive filtering behavior. The BEHAVE group has specified a number of other NAT UDP requirements [1]. The appendix discusses our assumptions relative to this document in detail. For now, there is no similar table for TCP since the work on TCP in the BEHAVE working group has just started. However, many of the requirements for UDP apply to TCP as well. In addition to the BEHAVE approach, there are some other approaches to NAT traversal that warrant discussion: UPnP, ALGs, SBCs, and manual configuration. Universal Plug-n-Play (UPnP) is an approach developed by Microsoft. In this approach, the P2P application talks directly with the NAT and asks the NAT to open up pinholes for it. Many consumer-grade NATs support the UPnP protocol, and this approach is a viable option for P2P applications targeted only at the consumer market. However, most corporate-grade NATs do not support UPnP. In addition, ISPs that NAT their entire network (a practice that is becoming more common in certain environments) typically do not allow their customers to configure that NAT using UPnP. Many NATs contain one or more Application Level Gateways (ALGs). An ALG is special code within the NAT that recognizes packets of a particular application-level protocol and treats the packets specially. ALG support for the File Transfer Protocol (FTP) is almost universal in NATs, and ALG support for the SIP is becoming more common. However, ALG support requires that the application protocol not be encrypted, and encryption of both SIP and P2P messages is likely to be desirable for security reasons. Also, ALG support for whatever P2P protocol we pick is very unlikely, at least in the short term. Assumption 5: The traversal of a given NAT must not depend on that NAT supporting either UPnP or any ALG (except for FTP). Session Border Controllers (SBCs) are boxes that are deployed in the network, sometimes by the customer but more commonly by the SIP service provider, to enable NAT traversal for standard client-server SIP. SBCs are becoming more common, but are typically restricted to working only with the SIP proxy servers of the SIP service provider that deploys the SBC. Furthermore, they are unlikely to support whatever P2P protocol we pick. Thus they are not a NAT traversal option for P2P SIP networks. Assumption 6: The P2P NAT traversal strategy must not depend on the presence of SBCs in the network. Cooper & Matthews Expires August 28, 2006 [Page 6] Internet-Draft NATs and Overlays February 2006 NAT traversal is often much easier if the user can manually configure the NAT. The user can open up pinholes in the NAT and/or modify the NAT's behavior. However, this requires that the user have the knowledge and interest to do the configuration (non-technical users often do not), have a NAT which is configurable (some low-end NATs are not configurable), and have permission to configure the NAT (problematic in corporate environments or when the ISP NATs the entire access network). Furthermore, history has shown that systems which are "plug-and-play" tend to get much better acceptance by users. We would like users to be able to deploy P2P SIP peers without even know what a NAT is. Though we may not be "plug-and-play" in all cases, our NAT traversal strategy will be a failure if this is not true in the vast majority of cases. Assumption 7: The NAT traversal strategy must be "plug-and-play" in the vast majority of cases. Finally, there is the question of how many mapping and filtering entries ("pinholes") a NAT can support. Low-end NAT boxes found in homes and small enterprises may support only a very small number of mapping and filtering entries. NAT boxes deployed in larger enterprise environments usually support more entries since there are more devices (computers, IP phones, etc) behind them. However, a general rule seems to be that NAT vendors expect a given node to use only fairly few entries at a time. The exact number is not known to the authors at this time, but it is clearly small. Thus a NAT traversal strategy that has one or more peers opening up a large number of pinholes to communicate with other peers is not acceptable, partly because it uses up what may be a very limited resource, and partly because of the refresh traffic required (especially if UDP is used). Assumption 8: The NAT traversal strategy must limit the number of mapping and filtering entries opened up on a given NAT box to a fairly small number (exact value is TBD). Here is a summary of the assumptions listed above: Assumption 1: There may be many different P2P SIP systems, with sizes ranging from two nodes to millions of nodes, with the node scattered across one to millions of subnets. Assumption 2: There can be P2P SIP systems where every peer has a NAT box between it and the open Internet. Cooper & Matthews Expires August 28, 2006 [Page 7] Internet-Draft NATs and Overlays February 2006 Assumption 3: All NATs must have a mapping behavior of "Endpoint Independent mapping". Assumption 4: The P2P SIP system must function when all peers are located behind NATs with a filtering behavior of "Address and Port Dependent filtering". Assumption 5: The traversal of a given NAT must not depend on that NAT supporting either UPnP or any ALG (except for FTP). Assumption 6: The P2P NAT traversal strategy must not depend on the presence of SBCs in the network. Assumption 7: The NAT traversal strategy must be "plug-and-play" in the vast majority of cases. Assumption 8: The NAT traversal strategy must limit the number of mapping and filtering entries opened up on a given NAT box to a fairly small number (i.e., 10s of pinholes, not 100s of pinholes). Cooper & Matthews Expires August 28, 2006 [Page 8] Internet-Draft NATs and Overlays February 2006 4. Architectural Options This section discusses various architectural options in light of the above assumptions. The goal of this section is to do a pretty complete exploration of the design space, and discuss the pros and cons of the various approaches. First of all, it is important to note the distinction between NAT traversal for signaling messages and NAT traversal for media messages. The latter problem (media) is solved in a peer-to-peer fashion using the ICE mechanism[5]. If two peers can exchange signaling messages in some way (perhaps indirectly through other peers), then ICE can be used to set up a direct peer-to-peer connection through intervening NATs for the exchange of media messages. Furthermore, the ICE mechanism is consistent with the assumptions listed above. Thus the problem we need to solve can be reduced to finding a way for peers to exchange signaling messages. 4.1. Types of Networks So let's consider an overlay network of peers where all peers are behind NATs with the most restrictive filtering policy, and consider ways for the peers to exchange signaling messages. Several different approaches can be used to accomplish this: Relay -- All peers exchange SIP messages via a centralized "Relay Server" (with a public IP address). This scheme minimizes the load on the peers and their associated NATs but requires a central server. SIP messages flow relatively quickly between the peers, provided the central server is always available and not constrained by processing power or network bandwidth. Rendezvous -- Peers use a "Rendezvous Server" (with a public IP address) as an intermediary to initiate "NAT hole-punching" ([3]) every time they wish to begin communicating. Once NAT pinholes have been established, SIP messages are then exchanged directly. This scheme is still highly dependant on a central server, but reduces the load on it somewhat. Initial SIP messages are slightly delayed by the retrieval of SIP addresses from the "Rendezvous Server" and by the "NAT hole-punching" technique. The "Rendezvous Server" must maintain knowledge of and links to every active peer. Mesh -- Once connected into the peer network, nodes exchange messages with selected other peers periodically to keep NAT pinholes open. SIP messages are either sent directly to the destination peer, or are sent indirectly via intermediate peers. No central server is required. The load on the peers and their Cooper & Matthews Expires August 28, 2006 [Page 9] Internet-Draft NATs and Overlays February 2006 local NATs is proportional to the number of NAT pinholes that must be maintained and the number of messages that must be sent within the mesh. (Methods for a peer to create or join such a peer-to- peer network are discussed in section 3.2). Graphically, the communication flows in these networks would appear as shown in Figure 2. In the diagram, only signaling connections are shown; Media (RTP) connections are not shown. P P P P | P P |. P P---|---P \ | / .\ | ./ / | / \ \ | / . \ | / . | / | P-----S-----P P-.---S-----P P-----------P / | \ . / | \ \ / | / / | \ ./ | \ /\ | | P | P P | P P \| P P P P Relay Rendezvous Mesh Legend: P - Peers S - Central Server / \ | - Permanent connnections . - Temporary connections Figure 2: Overlay Network Connectivity The networks in the figure above can be considered as discrete points in a spectrum that ranges from "fully centralized" on the left to "fully distributed" on the right. In general, the effort required to establish and maintain NAT pinholes increases as we move to the right, as does the amount of effort required to deliver a SIP message between two arbitrary nodes. However, the reliance on centralized equipment and the overall scalability decreases as we move to the right, and the network becomes more peer-to-peer. Further discussion of each topology is given below. The Relay Network appears similar to a Client-Server configuration. It operates in a straightforward manner. A peer that wishes to call another creates a request and delivers it to the "Relay Server". The server forwards the request on to the target. The performance and scalability characteristics of this network are quite suitable for small- and medium-scale deployments. As the system grows into large Cooper & Matthews Expires August 28, 2006 [Page 10] Internet-Draft NATs and Overlays February 2006 scale deployments however, keeping the NAT pinholes open between the clients and the server places a heavy load on the server's resources. This load increases (at least) linearly with the size of the network. Even on a smaller scale, the "Relay Server" requires a sizable expenditure of resources (both initial and operational). For very small systems, this cost may be impractical. From a network availability standpoint, the "Relay Server" is also a liability. It represents a single point of failure upon which all nodes are totally dependant. Finally, the centralization of the administration of the network may be undesirable or impractical in some deployments. The Rendezvous Network reduces the load on a central server by eliminating it from the messaging path once communications between the two endpoints has been established. One way this could work would be to have the originating node send the "Rendezvous Server" an 'INITIATE_NAT_HOLE' request that specifies the target peer (perhaps via node-id, or SIP URI), as well as its own IP address(es). In processing this request, the "Rendezvous Server" replies with the mapped IP address and port of the target peer and forwards the request to the target peer, perhaps also appending the mapped IP address and port of the originating peer. Upon reception of the 'INITIATE_NAT_HOLE' request, the target peer begins NAT hole-punching procedures to establish a link to the originator. This effort may include an ICE-like trial of various IP addresses, to avoid the problems associated with double-NAT topologies. Once the NAT pinholes are established, the two peers can begin regular SIP communications. Overall load on the "Rendezvous Server" is somewhat reduced, since it is only party to a portion of the session signaling. These savings may not be substantial, though, since the reduction in SIP message traffic will require an increase in traffic to keep NAT pinholes alive. The availability and administration characteristics are the same as with the Relay Network. The Mesh Network eliminates the use of a centralized server (except perhaps for bootstrapping, see section Section 5.2). A node in this type of overlay establishes connections to some of the other peers. SIP messages are then routed via these connections. 4.2. More on Mesh Networks Of the topologies described above, the Mesh Network is the most peer- to-peer, the most scalable, and the most plug-and-play. Thus it seems to line up the best with our assumptions. However, even with the general Mesh paradigm, several variations are still possible. The actual number of NAT pinhole connections is a key consideration. Consider Figure 3: Mesh Network Connectivity: Cooper & Matthews Expires August 28, 2006 [Page 11] Internet-Draft NATs and Overlays February 2006 P P P / \ / | \ / /|\ \ P P P----|----P P----|----P / \ /| | |\ /|/ \ | / \|\ P P P-------------P P-------------P \ / \| | |/ \|/ / | \ \|/ P P P----|----P P----|----P \ / \ | / \ \|/ / P P P Ring Partial Mesh Full Mesh Figure 3 A Mesh Network in which every node is connected only to two neighbours can be termed a "Ring Network". This topology expends very little effort to maintain NAT pinholes but results in extremely high hop counts as the number of nodes increases. As a result, the overall scalability of this topology is very poor. On the other hand, in small peer-to-peer overlay networks it is possible to maintain NAT pinhole connections between all pairs of peers (a "Full Mesh Network"). However, as the number of peers and distinct NATs increase, the number of pinholes (and traffic required to maintain them) quickly becomes impractical. In this topology, overall scalability is also poor. In between these two extremes, the "Partial Mesh Network" seeks to strike a balance between the minimum and maximum sustainable numbers of NAT pinholes. This seems to be the only viable approach. The "ideal" number of pinholes is the one that results in the lowest hop counts whilst also keeping pinhole maintenance traffic manageable. 4.3. Static vs. Dynamic Connections Given the selection of a partial-mesh network, the next question is whether the connection topology should be relatively static, or should evolve dynamically as calls are made. Note that we are talking about signaling connections here -- as with classical client- server SIP, the volume of media messages means that it always makes sense to set up a dedicated connection between the call endpoints for the media whenever that is possible. Say peer P wants to set up a connection to peer Q. In keeping with assumption 4, we assume peer Q is behind a NAT with a restrictive filtering behavior. Thus P cannot send a connection request directly to Q, but must send it via existing connections in the overlay. Only Cooper & Matthews Expires August 28, 2006 [Page 12] Internet-Draft NATs and Overlays February 2006 once the connection request is delivered to Q can P and Q use UDP (or TCP) hole-punching to initiate a connection, and then do any connection handshaking required (e.g, for TCP). So setting up a connection requires a number of messages to be exchanged between P and Q. If P and Q just need to exchange a very small number of messages, then it is probably more efficient for P and Q to use the existing mesh of connections rather than establishing a new connection. Though it is not the goal of this document to discuss lookup and signaling mechanisms for P2P SIP, it seems likely that most transactions between two peers will be short and consist of only a small number of messages. Thus a static connection pattern (perhaps with some additional connections established dynamically) is likely to be appropriate. 4.4. Message Routing and Structured vs. Unstructured Meshes Assuming a fairly static pattern of connections, the next logical question is: What should the pattern of connections be? There are many different patterns or schemes that can be used -- how can we classify and evaluate these choices? We believe that an important property of a overlay is the ability to route messages from one peer to an arbitrary second peer in the overlay. We believe that this property is essential at times to allow a peer to place a call to another node, to publish the status of a peer or user (for example, to a peer acting as a distributed registrar), or when a peer want to create a connection to another peer in the overlay (when creating the partial mesh). With this in mind, we can classify connection patterns (or schemes) into two main groups: Structured -- In a structured scheme, connection pattern between peers is exploited when routing messages between peers. Unstructured -- In an unstructured scheme, the connection pattern is more or less random, and properties of the connection scheme are NOT exploited when routing messages. In the next few subsections, we consider the various properties of structured and unstructured partial meshes. 4.4.1. Unstructured Schemes Some examples of unstructured schemes are: Cooper & Matthews Expires August 28, 2006 [Page 13] Internet-Draft NATs and Overlays February 2006 o Purely Random -- a peer randomly selects a number of other peers to connect to. o Longest Lived -- a peer prefers connections to peers who have been part of the overlay for a longer time. o Nearby Neighbors -- a peer prefers connections to peers who are closer (e.g., smaller round-trip times) There are a number of ways messages might be routed in an unstructured scheme. The simplest way is to flood the message through the overlay. Though not particularly efficient, this way may be practical in smaller overlays or when the volume of messages is low. Another way is to use a graph searching algorithm to locate the message target, for example depth-first search or breadth-first search. A graph search algorithm will generally take longer than flooding to get the message to the peer, but may use fewer messages. Remembering a route, once found, and then using source routing for subsequent messages can be used with either of these two methods to improve performance, but suffers from the problem that topology changes (caused, for example, by a peer leaving the overlay) can invalidate the route unexpectedly. Another approach is to run a routing protocol, which is the approach used in the Internet. In this case, each peer acts as both a host and a router. Let's consider the impact of choosing one of the standard IETF routing protocols. o RIP -- RIP is an example of a Distance Vector protocol. Distance vector protocols require only small amounts of CPU and memory, and work well in networks will only a small number of loops, but tend to perform poorly in networks with lots of loops. Since the number of loops in a partially meshed network increases rapidly as the number of connections per peer increases, DV protocols are likely to be a poor choice. o BGP -- BGP is an example of a Path Vector protocols. Path Vector protocols perform better (than DV protocols) in networks with lots of loops, but require significantly more storage and bandwidth, and can (at least in the case of BGP) converge slowly. o OSPF, IS-IS -- OSPF and IS-IS are Link State protocols. Link state protocols perform very well in meshed networks, but not considered suitable for networks larger than hundreds of routers. As can be seen, no one single IETF protocol works will in meshed networks of the scale we are interested in. The Internet solves this problem by dividing the network up into regions (Autonomous Systems Cooper & Matthews Expires August 28, 2006 [Page 14] Internet-Draft NATs and Overlays February 2006 or ASes), each AS containing up to a few hundred routers, then running both a link state protocol (either OSPF or IS-IS) and a version of BGP call iBGP inside each AS, and running another version of BGP called eBGP between ASes. However, all this requires considerable configuration and monitoring on the part of an army of operational personnel. All this suggests that unstructured schemes may not represent a good choice for P2P-SIP 4.4.2. Structured Schemes The idea of a structured scheme is to create a connection pattern that can be exploited in routing. Consider, for example, the following connection scheme based on a few of the ideas of Chord. As in Chord, some unique peer identifier is hashed and the result used to place peers on a ring. Each peer then maintains connections to peers located at various locations going clockwise around the ring. In this scheme, a message to peer Q can be addressed to Q's location in the ring, and an intermediate peer R can forward the message by forwarding it to the peer S in R's connection table that is closest to Q without overshooting Q. If the NAT can support 160 different connections per peer, then the targets of the connections radiating out from each peer can be located at exponentially increasing distances from that peer. This allows a peer can reach any other peer in O(log N) hops using this scheme. However, if 160 different connections per peer proves excessive (see assumption 8), then hop counts may be larger. Many other structured connections schemes exist. For example, structured connections schemes can be created using the ideas contained any one of a number of DHT schemes. (See, however, the comments of section Section 6). 4.4.3. Symmetric Interest When evaluating connection schemes, there is a property we have dubbed "symmetric interest". A connection scheme exhibits "symmetric interest" if, when peer P desires a connection to peer Q, then peer Q also desires a connection to peer P. "Symmetric interest" seems a desirable property of connection schemes since connections through NATs, by their nature, are bi-directional and because both peers incur the overhead of sending keep-alives to establish and maintain the connection. A connection scheme based on peers randomly selecting peers to Cooper & Matthews Expires August 28, 2006 [Page 15] Internet-Draft NATs and Overlays February 2006 establish connections to does NOT exhibit symmetric interest because peer P can select peer Q without peer Q selecting peer P. The connection scheme based on the ideas of Chord that was mentioned in the previous section also does NOT exhibit symmetric interest because a given peer P in the ring desires connections to peers in the clockwise half-circle but not in the counter-clockwise half-circle. One scheme that does exhibit symmetric interest has each peer maintains connections to peers located an exponentially increasing distances going both clockwise AND counter-clockwise around the ring. The authors have not yet had a chance to do a thorough analysis of various structured schemes. Never-the-less, the idea of a structured scheme (perhaps exhibiting "symmetric interest") seems a lot more promising than unstructured schemes. Cooper & Matthews Expires August 28, 2006 [Page 16] Internet-Draft NATs and Overlays February 2006 5. A Few Additional Points This section discusses a few additional points about P2P SIP architecture. 5.1. Superpeers Orthogonal to these connectivity approaches is the idea of superpeers. A group of peers that are all behind the same NAT can elect one or more of their number to act on their behalf in the larger P2P overlay. These elected peers are called superpeers. The overlay architecture can then create two types of connections: connections between superpeers that traverse NATs, and connections between a superpeer and its local peers that do not traverse NATs. In this way, the number of NAT pinholes can be reduced compared with an architecture that has each peer connect to peers behind other NATs. 5.2. Joining the Network How can a node X, which is not currently a part of a particular P2P network, can join that network. The first thing to note is that if node X can contact just one peer P in the P2P overlay network, then it can learn about other peers though peer P and so join the network. So the question can be reworded as: how can node X locate and contact at least one peer in the P2P overlay network that it wishes to join? One approach is to use multicast. Node X could send out a "Hello, is anyone there?" multicast message, and any peer currently in the P2P network can reply. Alternatively, peers that are currently in a P2P network can periodically send out multicast messages advertising the existence of the network. This approach works well when there are a number of peers on the same subnet. It also works well when there a number of peers on subnets linked by multicast-enabled routers. However, many low-end routers do not support multicast, and multicast support on high-end routers needs to be configured, so using multicast between subnets likely works only in more sophisticated deployments. A second approach can be used if node X was previously part of the P2P network and then disconnected for a while. Node X can remember the IP addresses and ports of some peers when it disconnects, and then try to contact those peers when it wants to rejoin the network. Cooper & Matthews Expires August 28, 2006 [Page 17] Internet-Draft NATs and Overlays February 2006 If at least one of the other peers (a) can be contacted and (b) is still a member of the P2P overlay network, then node X can rejoin the network. This approach will not work if all the other peers are behind NATs with a filtering policy of "Address Restricted filtering" (or worse) and node X disconnects for more than the lifetime of a filtering entry in a NAT (typically 2 - 5 minutes). However, it will work if some peers are behind NATs with "Endpoint-Independent filtering". A third approach is to configure node X with the "mapped address and port" of some peer P. Here the "mapped IP address and port" is the public IP address and port of the peer that the NAT (if any) assigns [ETH] this is typically learned through a protocol such as STUN (which requires a STUN server). If peer P is behind a NAT with a filtering behavior of "Address Restricted filtering" (or worse), then peer P must also configured with the mapped address and port of node X. Given the manual configuration required, this approach must be considered a last-ditch approach. A fourth, and most general, approach is to use an Introduction Server. This is a node with a public IP address and a DNS entry which is not part of the P2P network but is used only for bootstrapping purposes. In the minimal usage scenario, the P2P network elects a single peer P to maintain a connection to the Introduction Server. When node X contacts the Introduction Server, node X is given the mapped IP address and port of peer P, and the Introduction Server forwards node X's mapped address and port to P. The disadvantage of this approach is that it requires a stable helper node with a public IP address. But otherwise it is the most generally applicable of all the approaches. +---------------------+-----------+-------+----------+--------------+ | | Multicast | Buddy | Manual | Introduction | | | | List | Config | Server | +---------------------+-----------+-------+----------+--------------+ | Plug and Play | Y | Y | N | Y | | | | | | | | Works when node X | N | Y | Y | Y | | is anywhere | | | | | | | | | | | | Can be used for | Y | N | Y | Y | | first connnection | | | | | | | | | | | Cooper & Matthews Expires August 28, 2006 [Page 18] Internet-Draft NATs and Overlays February 2006 | Does not require an | Y | Y | Y | N | | external node | | | | | +---------------------+-----------+-------+----------+--------------+ Table 1: Comparison of Discovery Methods Cooper & Matthews Expires August 28, 2006 [Page 19] Internet-Draft NATs and Overlays February 2006 6. Comments on Existing P2P Overlays Many existing P2P overlays have ignored the presence of NATs in the network. Their assumption is that all participating nodes are fully reachable by all other nodes. In practice, this turns out not to be true. The "Endpoint-dependant filtering" NAT behaviour specified in [1] will impair the ability of many DHT algorithms to provide the guarantees they strive for. Some popular file-sharing networks require manual configuration of user's local NAT in order to join. Incorrect configuration makes it impossible to participate in the overlay. Other P2P systems deal with NATs by assigning "helpers" to nodes behind NATs. These "helpers" have publicly available addresses and act as relay points for the NAT-ed nodes. This is a relatively effective approach, but requires the nodes with publicly available addresses to carry more than their share of the load. The load will quickly become overwhelming in a network with a small proportion of public nodes. Cooper & Matthews Expires August 28, 2006 [Page 20] Internet-Draft NATs and Overlays February 2006 7. Conclusions Given the analysis done so far, it seem like the best P2P overlay architecture will have the following properties: o Partial mesh, o Mostly static connections, o Structured, o Exhibits symmetric interest, and o Uses superpeers. Cooper & Matthews Expires August 28, 2006 [Page 21] Internet-Draft NATs and Overlays February 2006 Appendix A. Detailed NAT UDP Assumptions +-----------+-------+-----------------+-----------+-----------------+ | Criterion | BEHAV | Brief | Our | Justification | | | E # | Description | Requireme | | | | | | nt | | +-----------+-------+-----------------+-----------+-----------------+ | Mapping | REQ-1 | MUST be | Must | Peers behind a | | | | "Endpoint-Indep | comply | NAT which does | | | | endent" | | not comply | | | | | | require a | | | | | | "surrogate" to | | | | | | act on their | | | | | | behalf in the | | | | | | P2P network and | | | | | | to relay | | | | | | traffic to | | | | | | them. This | | | | | | surrogate must | | | | | | have either a | | | | | | public IP | | | | | | address or be | | | | | | behind a NAT | | | | | | with a | | | | | | Filtering rule | | | | | | of | | | | | | "Endpoint-Indep | | | | | | endent" (REQ-8) | | | | | | .It is likely | | | | | | that some | | | | | | systems will | | | | | | not have peers | | | | | | that can act a | | | | | | ssurrogates. | | | | | | Furthermore, | | | | | | acting as a | | | | | | surrogate is | | | | | | very bandwidth | | | | | | -and | | | | | | processor-inte | | | | | | nsive. | | | | | | | | IP | REQ-2 | RECOMMENDED to | Don't | Since we | | Address | | be "Paired" | care | control both | | Pooling | | | | endpoints, it | | | | | | is easy for us | | | | | | to handle other | | | | | | behaviors | Cooper & Matthews Expires August 28, 2006 [Page 22] Internet-Draft NATs and Overlays February 2006 | Port | REQ-3 | MUST NOT be | Must | "Port | | Assignmen | | "Port | comply | Overloading" | | t | | Overloading" | | can often cause | | | | | | seemingly | | | | | | random and | | | | | | inexplicable | | | | | | failures, as | | | | | | well as making | | | | | | testing much | | | | | | harder. | | | | | | | | Port | REQ-3 | RECOMMENDED | Don't | Since we | | Range | a | that the range | care | control both | | | | classification | | endpoints, it | | | | of the source | | is easy for us | | | | port be | | to handle other | | | | preserved. | | behaviors. | | | | | | | | Port | REQ-4 | RECOMMENDED | Don't | Since we | | Parity | | that the NAT | care | control both | | | | exhibit "Port | | endpoints, it | | | | parity | | is easy for us | | | | preservation" | | to handle other | | | | | | behaviors. | | | | | | | | Mapping | REQ-5 | MUST NOT be | (TBD) | (TBD) | | Refresh | | less than 2 | | | | Interval | | minutes | | | | | | | | | | | REQ-5 | Value MAY be | (TBD) | (TBD) | | | a | configurable | | | | | | | | | | | REQ-5 | Default | Don't | | | | b | RECOMMENDED to | care | | | | | be 5 minutes | | | | | | | | | | Mapping | REQ-6 | MUST have "NAT | Must | Are their any | | Refresh | | Outbound | comply | NATs that do | | Direction | | refresh | | not comply with | | | | behavior" of | | this??? | | | | "True". | | | | | | | | | Cooper & Matthews Expires August 28, 2006 [Page 23] Internet-Draft NATs and Overlays February 2006 | | REQ-6 | MAY have "NAT | Don't | Many NATs | | | a | Inbound refresh | care | refresh only on | | | | behavior" of | | outbound | | | | "True" | | traffic, so it | | | | | | is simplest to | | | | | | assume this is | | | | | | false. | | | | | | | | Conflicti | REQ-7 | MUST either | Should | Conflicting | | ng Addres | | ensure no | comply | addresses are | | sSpaces | | conflict or | | not common, but | | | | behave sensibly | | do occur. NATs | | | | when a conflict | | that do not | | | | occurs | | comply will | | | | | | cause problems | | | | | | for the peers | | | | | | behind them. | | | | | | | | Filtering | REQ-8 | RECOMMENDED to | Should | (see discussion | | | | be either | comply | in section XXX) | | | | "Endpoint | | | | | | independent" or | | | | | | "Address | | | | | | dependent" | | | | | | | | | | | REQ-8 | Filtering | Don't | Best to assume | | | a | behavior MAY be | care | it is NOT | | | | configurable | | configurable | | | | | | | | Hairpinni | REQ-9 | MUST support | Should | This issue | | ng | | "hairpinning" | comply | becomes crucial | | | | | | when the NAT in | | | | | | question is the | | | | | | NAT closest to | | | | | | the public | | | | | | internet in a | | | | | | multi-NAT | | | | | | environment. | | | | | | In this | | | | | | scenario, a | | | | | | failure to | | | | | | support | | | | | | "hairpinning" | | | | | | will hinder | | | | | | (possibly | | | | | | prevent) | | | | | | bootstrapping | | | | | | attempts. | Cooper & Matthews Expires August 28, 2006 [Page 24] Internet-Draft NATs and Overlays February 2006 | | REQ-9 | Hairpinning | Must | (TBD) | | | a | behavior MUST | comply | | | | | be "External | (if NAT | | | | | source IP | does | | | | | address and | hair-pinn | | | | | port" | ing) | | | | | | | | | ALGs | REQ-1 | RECOMMENDED | Should | (TBD) | | | 0 | that ALGs be | comply | | | | | disabled by | | | | | | default | | | | | | | | | | | REQ-1 | RECOMMENDED | Should | (TBD) | | | 0 a | that each ALG | comply | | | | | can be enabled | | | | | | or disabled | | | | | | separately | | | | | | | | | | Determini | REQ-1 | MUST have | Must | (TBD) | | sm | 1 | deterministic | comply | | | | | behavior | | | | | | | | | | ICMP | REQ-1 | Receipt of ICMP | Must | (TBD) | | support | 2 | message MUST | comply | | | | | NOT destroy NAT | | | | | | mapping | | | | | | | | | | | REQ-1 | SHOULD NOT | Don't | (TBD) | | | 2 a | filter ICMP | care | | | | | messages based | | | | | | on source IP | | | | | | address. | | | | | | | | | | | REQ-1 | RECOMMENDED | Don't | (TBD) | | | 2 b | that the NAT | care | | | | | support ICMP | | | | | | Destination | | | | | | Unreachable | | | | | | messages. | | | | | | | | | | Fragmenta | REQ-1 | MUST support | Should | (TBD) | | tion when | 3 | fragmentation | comply | | | sending | | of packets | | | | | | larger than | | | | | | link MTU | | | | | | | | | Cooper & Matthews Expires August 28, 2006 [Page 25] Internet-Draft NATs and Overlays February 2006 | Fragmenta | REQ-1 | MUST support | Should | (TBD) | | tion when | 4 | "Receive | comply | | | receivin | | Fragment Out of | | | | g | | Order" behavior | | | +-----------+-------+-----------------+-----------+-----------------+ Table 2: NAT UDP Assumptions 8. References [1] Audet, F. and C. Jennings, "NAT Behavioral Requirements for Unicast UDP", draft-ietf-behave-nat-udp-04 (work in progress). [2] Guha, S. and P. Francis, "NAT Behavioral Requirements for Unicast TCP", draft-hoffman-behave-tcp-03 (work in progress). [3] Ford, B. and P. Srisuresh, "Peer-to-Peer Communication Across Network Address Translators", article available at http://www.brynosaurus.com/pub/net/p2pnat/. [4] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A Scalable Peer-to- peer Lookup Service for Internet Applications", article available at http://pdos.csail.mit.edu/chord/. [5] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", draft-ietf-mmusic-ice-06 (work in progress). [6] Network World, "P2P Traffic Still Dominates the Net", article available at http://www.toptechnews.com/story.xhtml?story_id=38121. Cooper & Matthews Expires August 28, 2006 [Page 26] Internet-Draft NATs and Overlays February 2006 Authors' Addresses Eric Cooper Avaya 100 Innovation Drive Ottawa, Ontario K2K 3G7 Canada Phone: +1 613 592 4343 x228 Email: ecooper@avaya.com Philip Matthews Avaya 100 Innovation Drive Ottawa, Ontario K2K 3G7 Canada Phone: +1 613 592 4343 x224 Email: philip_matthews@magma.ca Cooper & Matthews Expires August 28, 2006 [Page 27] Internet-Draft NATs and Overlays February 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Cooper & Matthews Expires August 28, 2006 [Page 28]