Internet Engineering Task Force SIP WG Internet Draft J.Rosenberg,H.Schulzrinne draft-rosenberg-sip-entfw-00.txt dynamicsoft,Columbia U. November 17, 2000 Expires: May, 2001 SIP Traversal through Residential and Enterprise NATs and Firewalls STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract In this draft, we discuss how SIP can traverse enterprise and residential firewalls and NATs. This environment is challenging because we assume here that the end user has little or no control over the firewall or NAT, and that the firewall or NAT is completely ignorant of SIP. 1 Introduction The problem of getting applications through firewalls and NATs has received a lot of attention [1]. Getting SIP through firewalls and NATs is particularly troublesome. In a previous draft [2] we discussed some of the general issues regarding traversal of firewalls, and discussed some solutions for it. Our solutions were based on either placing a SIP ALG within the firewall NAT, or having J.Rosenberg,H.Schulzrinne [Page 1] Internet Draft entfw November 17, 2000 a proxy server control the firewall/NAT with a control protocol of some sort [3]. This protocol can open and close pinholes in the firewall, and/or obtain NAT address bindings to use in rewriting the SDP in a SIP message. The use of a protocol like FCP is ideal for carriers, but it does not work when the SIP service provider is not the same as the ISP and transport provider of the end user. This is frequently the case for users behind enterprise firwalls and NATs who are trying to access SIP services outside of their networks. The same happens for residential NATs and firewalls. These devices are often used by consumers who have cable modem and DSL connections, and wish to connect multiple computers using the single address provided by the cable company or DSL company [1] referred to as cable/DSL routers, and are manufactured by companies like Linksys and Netgear. The other alternative, embedding a SIP ALG within enterprise NATs and firewalls, has not happened. The top commercial firewall and NAT products continue to be SIP-unaware. Even if SIP ALG support were added immediately, there is still a huge installed based of firewalls and NATs that do not understand SIP. In this draft, we propose solutions for getting SIP through enterprise and residential NATs and firewalls that does not require changes to these devices or to their configurations. The solutions are not pretty. However, these NATs and firewalls are a reality, and SIP deployment is being hampered by the lack of support for SIP ALGs in these boxes. A solution MUST be found, and we provide one here. 2 Architecture We assume that the network architecture we are dealing with looks like Figure 1. The caller is a UA in enterprise A, and the called party is a UA in enterprise B. The caller uses proxy X as its local outbound proxy, which forwards the call to the proxy of the called party, Y, also outside of the firewall. The call is then forwarded to the called party within enterprise B. The firewall and/or NAT (FW/NAT) boxes are off-the-shelf boxes with no support for SIP ALG. We assume only that these boxes will allow users inside their enterprises to browse the web, and specifically, to browse secure web sites. _________________________ [1] The author of this draft is amongst those who have such a residential NAT, and thus feels highly motivated to solve this particular problem J.Rosenberg,H.Schulzrinne [Page 2] Internet Draft entfw November 17, 2000 +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | | . . | | . . | SIP UA| . . | SIP UA| . . +-------+ . . +-------+ . ............................. ............................. Enterprise A Enterprise B Figure 1: Network Architecture J.Rosenberg,H.Schulzrinne [Page 3] Internet Draft entfw November 17, 2000 We choose this scenario as it is the most complex, and the most difficult to support. There are several problems that need to be resolved for this scenario to work: o Getting SIP requests from the caller to proxy X, and responses from proxy X back to the caller. o Getting SIP requests from proxy Y to the called party, and responses from the called party back to proxy Y. o Getting media to go from the caller to called party somehow. We discuss solutions for each in turn. 3 Originating requests The first problem is originating request from the caller through a firewall/NAT, out to a proxy. We assume the FW/NAT blocks all outgoing UDP. We assume it may block some TCP from leaving, but at the very least, the firewall will allow HTTPS through. HTTPS is nothing more than HTTP over TLS/SSL [4]. Its default port is 443. Whats interesting about https is that the connection starts out with TLS, negotiates a secure channel, and then runs HTTP over this channel. All HTTP messages are encrypted. The FW/NAT never sees any HTTP messages in the clear, only TLS/SSL messages. The important implication is that there is no way for a FW/NAT to have application layer intelligence that depends on the existence of HTTP on port 443. In fact, any protocol can be run over TLS on port 443, and it will look the same to the FW/NAT. Since we assume that the FW/NAT lets HTTPS through, it should allow SIP over TLS through, running on port 443. Thus, our proposal is to have the caller initiate a TLS connection on port 443 to the proxy server X. Once the TLS connection is secured, the client can send SIP messages over this connection. Handling of SIP over TLS/SSL is identical to TCP. Responses from the proxy are sent over this connection as well [5]. We recommend that the client maintain the TLS connection to be open (more on this in Section 4). This avoids the need to re-initiate the TLS connection for every outgoing call. Fooling the FW/NAT into believing the traffic is HTTPS by running it over port 443 is not nice. We would strongly recommend that clients first try the IANA registered port for SIP over TLS [port to be allocated]. If no response is received over this connection, the client should then try 443. Note that outgoing requests may work with just vanilla TCP. However, J.Rosenberg,H.Schulzrinne [Page 4] Internet Draft entfw November 17, 2000 we have observed that some firewalls examine TCP connections to look for specific protocols. Thus, SIP over TCP on 5060 may not work. SIP over TCP on port 80 may also not work, as some firewalls check for HTTP messages. This is why we prefer TLS; we believe that it is most likely to work. 4 Receiving requests Unfortunately, receiving requests is not as simple as sending them. The problem has to do with registrations. In Figure 1, the callee will receive requests at their UA because they had previously sent a REGISTER request to their registrar, which is co-located with proxy Y. However, forwarding incoming INVITEs from Y to the callee has two problems. First, the address placed in the Contact header in the REGISTER is not likely to be correct. It will contain a domain name or IP address that is within the private space of enterprise B (we assume NAT here). Thus, the REGISTER might look like: REGISTER sip:Y.com SIP/2.0 From: sip:callee@Y.com To: sip:callee@Y.com Contact: sip:callee@10.0.1.100 Even if the enterprise is not running NAT, just a firewall (in which case the IP address in the REGISTER Contact header is correct), no firewall will let incoming packets over UDP, nor will they let the proxy initiate TCP connections into the enterprise from outside. To address the latter problem, we recommend that clients that send REGISTER requests do so over a TLS connection, as described in Section 3. Furthermore, they keep this connection open permanently. REGISTER refreshes are sent over this connection. We further recommend that the proxy/registrar hold this connection in a table, where the table is indexed by the remote side of the transport connection. When the proxy wishes to send a packet to some server at IP address M, port N, transport O, it looks up the tuple (M,N,O) in the table to see if a connection already exists, and then uses it. This will possibly allow incoming calls at proxy Y, destined for the callee, to be routed over the connection initiated by the callee when he registered. Why possibly? When the call from X arrives at proxy Y, the request URI contains sip:callee@Y.com. Proxy Y looks this up in its registration database. It finds the contact registered by the callee. If the contact has an IP address, port, and transport that J.Rosenberg,H.Schulzrinne [Page 5] Internet Draft entfw November 17, 2000 correspond to the IP address, port, and transport on the originating side of the TLS connection from the callee, it will work. However, this will only be the case when (1) the callee is single-homed, and it correctly discovered its IP address, placing it into the registration, (2) the local address of the TLS connection it initiated is the default TLS port [port to be determined], else the callee placed the local port in its Contact, and (3) the callee placed the transport=tls attribute in the Contact URI. What does multihoming have to do with it? When a host initiates a connection, the source IP address of that connection is determined by the routing tables of the host (try netstat -r from windows or unix). As a result, the source IP address of the connection could vary depending on where the proxy server is located on the Internet. Multi-homed hosts are increasingly common as VPNs become more pervasive. VPNs show up as virtual interfaces, making hosts multihomed. We have also observed that many hosts have a hard time figuring out what their IP address is. We have seen some systems report back 127.0.0.1 (the loopback address), in fact. Thus, even without NAT, the Contact address may not match the source address of the TLS connection used to register. Furthermore, with NAT, the Contact header in the registration is doomed not to match the source address of the connection (as seen by proxy Y). The NAT will rewrite the source address and possibly the source port, and what this rewritten value is generally not be known to the client. Our solution is a horrible, horrible hack. We propose that a specific contact hostname value be reserved to have the meaning "I don't know what my address is, please use the IP address and port from the connection over which this REGISTER was delivered". We propose that this host name be "jibufobutbmpu". This name is "I hate NATS a lot" with each letter incremented by one. This name is unlikely to be used in real systems (as opposed to something like "default", which could be real host name). Consider once more the architecture of Figure 1. The callee has an IP address of 10.0.1.100. It initiates a TLS connection to TLS port [port to be determined by IANA] on proxy Y. This TLS connection goes through the NAT, and the source address is rewritten to 77.2.3.88, and the port to 2937. The registration looks like: REGISTER sip:Y.com SIP/2.0 From: sip:callee@Y.com To: sip:callee@Y.com J.Rosenberg,H.Schulzrinne [Page 6] Internet Draft entfw November 17, 2000 Contact: sip:callee@jibufobutbmpu;transport=tls The proxy Y then stores the incoming TLS connection into a table: (77.2.3.88,2397,TLS) -> [reference to TLS connection] It also updates the contact list for sip:callee@Y.com to include the URL sip:callee@77.2.3.88:2937;transport=tls. Now, when an INVITE arrives for callee@Y.com, it is looked up in the registration database. The contact is extracted, and the proxy tries to send the request to that address. To do so, it checks its connection table to an open connection to the IP address, port and transport where the request is destined. In this case, such a connection is available, and the request is forwarded over it. The response from the callee is also routed over the same connection. Storing incoming connections in a table for later reuse is useful even between proxies. If TCP or TLS is used between proxies X and Y, that connection can be stored by both X and Y, and thus reused for messaging in either direction. It is for this reason that we separate the connection table management from the registration processing. Such table management is needed if one of the proxies was on the inside of the firewall, for example. In that case, responses and requests in the reverse direction would need to be forwarded over the connection initiated by the proxy. Horrible as our hack is, it solves the firewall and NAT problem, and it also simplifies configuration. We have seen far to many system problems because of incorrect IP address configuration. This solution basically allows the proxy to automatically set the IP address and port correctly, without the client needing to worry about it. By using a well-known host names, we can allow clients to decide whether they want this automatic detection or not. An alternative is to use a Require header, whose presence asks the server to create this contact automatically. 5 Handling RTP Dealing with SIP was the easy part. Getting RTP through firewalls is hard because it is on dynamic ports, its UDP, and its peer-to-peer. Firewalls don't like any of those J.Rosenberg,H.Schulzrinne [Page 7] Internet Draft entfw November 17, 2000 things. As such, we discuss here our proposed solution, which requires the use of a device we call an RTP forwarder combined with RTP over TCP or TLS. RTP forwarders can be thought of as RTP routers; they receive RTP packets on a particular incoming port, and send them out on a different port/address. They are, in essence, NATs for RTP specifically. We show these boxes incorporated into the architecture in Figure 2. Only one forwarder is needed per call. Our architecture will only result in usage of the box when both parties are behind NATs, which is the only case when one is needed. Thus, it can be deployed by either the calling domain or called domain. We arbitrarily choose the calling domain. First off, even though the client knows they are behind a NAT, and knows that they are contacting a proxy outside the NAT, this does not imply, in the general case, that the media will not work. It could be that both caller and callee are both within the same enterprise. In this case, using the procedures in the above two sections will get SIP through the NAT, and then the RTP will work just fine. As a result, it is our recommendation that SIP clients be configured to always try the standard RTP over UDP approach first. If that fails (this condition can be detected if no RTP or RTCP is received for some long duration, say, 10 seconds), it then tries the procedures defined here. As such, for the remainder of this section, we assume that one or both parties is on the other side of a NAT or firewall. When the caller makes a call, the SDP in their INVITE indicates RTP over TLS or RTP over TCP. We indicate this by adding the keyword RTP/AVP-TLS to the additional media stream keywords defined in [6]. The client uses the direction tag of "active" if it believes it is behind a NAT, otherwise, it uses either "passive" or "both" if it knows it is not. The direction tag of "active" means that the client can only initiate TLS connections, it cannot be on the receiving side. This is the case when the media stream is crossing a NAT. We recommend that SIP clients be configurable about whether they think they are behind a NAT or firewall, so that users can set this properly (they will know in many cases). We also recommend that when the client is not behind a NAT, the client first attempt the call with TCP on 5060, then TCP on 80, then TLS on [port to be determined by IANA], then TLS on 443. We also recommend that the configuration of a successful call to a user be cached and reused on the next call attempt, as this process of checking the various configurations can take some time. J.Rosenberg,H.Schulzrinne [Page 8] Internet Draft entfw November 17, 2000 +-------+ +-------+ | SIP | | SIP | | Proxy | | Proxy | | X | | Y | | | | | +-------+ +-------+ ---- /RTP \ | Forw.| \ / ---- +-------+ +-------+ ........|FW/NAT |............ ........|FW/NAT |............ . | | . . | | . . +-------+ . . +-------+ . . . . . . . . . . . . . . . . . . . . . . . . . . +-------+ . . +-------+ . . | | . . | | . . | SIP UA| . . | SIP UA| . . +-------+ . . +-------+ . ............................. ............................. Enterprise A Enterprise B Figure 2: RTP Forwarders J.Rosenberg,H.Schulzrinne [Page 9] Internet Draft entfw November 17, 2000 If the caller is behind a NAT or firewall, the resulting SDP in the INVITE might look like: v=0 o=Me s=Call me using TLS t=0 0 c=IN IP4 10.1.1.2 m=audio 54111 RTP/AVP-TCP 0 a=direction:active This INVITE arrives at proxy X. The proxy knows that it is the local outbound proxy for the caller, and thus knows it is responsible for usage if the RTP forwarder. The proxy first checks the value of the direction tag in the SDP. If the media streams are all RTP/AVP-TLS or RTP/AVP-TCP, and the direction tags are all active, the proxy allocates an IP address/port value from the RTP forwarder. This value, which we denote A, is then placed into the SDP in the INVITE before it is forwarded out. The proxy also changes the direction attribute to "both". The request is forwarded to the called party. If the called party knows its behind a NAT, it uses RTP/AVP-TLS or RTP/AVP-TCP (depending on what was in the INVITE) with a direction of "active". Otherwise, the direction is set to "passive" or "both". This response is sent, eventually arriving at proxy X. If the response contains a direction tag of "active", and the request also contained a direction tag of "active" (but was rewritten), the proxy allocates another IP address/port value from the RTP forwarder. This value, which we denote B, is then placed into the SDP in the response, modifying any IP address and port that were there previously. The direction tag is also changed to "both". The proxy then creates an association in the RTP forwarder, binding A to B. This binding means that all data received on A is forwarded to B, and all data received on B is forwarded to A, assuming that connections were actually opened to A and B. This response is then forwarded. If it is the case that the request contained the direction tag "active", and the response did not, the proxy frees the IP address/port it allocated from the RTP forwarder. The IP address/port, however, should not be reused for at least one minute. At this point, the clients try to open connections for media. There are four cases. The first case is when the caller and callee were both behind NATs or J.Rosenberg,H.Schulzrinne [Page 10] Internet Draft entfw November 17, 2000 firewalls. In this case, the proxy rewrote both the request and the response. The caller will attempt to open a connection to B, and the callee attempts to open a connection to A. Both succeed. If the caller sends media on the connection, it goes to port A, which the RTP forwarder forwards to B, which then goes to the caller. The same happens in the reverse direction. The second case is when the caller is behind a NAT or firewall, and the callee is not. In this case, the SDP in the INVITE was translated, but the SDP in the response was not. The caller attempts to open a connection to the address in the response, which is that of the called party. The called party attempts to open a connection to the address in the INVITE. This is the address/port allocated by the proxy from the forwarder, now freed but not in use. As the port is not in use, the connection will not succeed. But, the incoming connection from the caller to the callee will. The result is a direct TLS connection from the caller out to the callee. The third case is when the callee is behind a NAT or firewall, and the caller isn't. In this case, the SDP in the INVITE was not translated, nor was the SDP in the response. The callee attempts to open a connection to address in the INVITE, which is that of the caller, which succeeds. The result is a direct TLS connection from the callee out to the caller. The fourth case is when neither are behind NATs or firewalls. No SDP translation will have taken place. One or both participants will open TCP or TLS connections, and both may succeed in being created. The rules specified in [6] are then followed regarding usage of these connections and closing of idle connections. Either the proxy or the RTP forwarder can manage the lifecycle of the connection binding. If the proxy does it, the proxy must record-route whenever the direction tag in the INVITE is "active". When the call is over (known through the BYE), the proxy destroys the connections and connection bindings from the forwarder. If the RTP forwarder manages the lifecycles, the proxy need not ever record route or maintain call state. When the call is over, the caller and callee both disconnect their TLS connections to the forwarder. When both connections disconnect, the server can destroy the bindings. Our approach only results in the creation of a binding in the forwarder when both parties are behind a firewall or NAT. Otherwise, the connection is direct. Care must be taken in the selection of the address/ports for the RTP forwarder. If the forwarder has only one address, and uses a range of ports within that address, the connections from the clients may not J.Rosenberg,H.Schulzrinne [Page 11] Internet Draft entfw November 17, 2000 succeed. This is because they will require connections to be established out, through the NAT or firewall, to a non-standard port. Since certain firewalls or NATs may block anything but 80 or 443, nothing will work. As a result, we recommend that forwarders use many IP addresses instead, and always allocate port 443 or port [port to be determined by IANA]. 6 Caveats There are many, many caveats with our proposed approach. o Riding on top of port 443 for SIP over TLS goes against the principles of the guidelines established by the IESG [7]. o TLS or TCP will result in very bad voice delays as soon as the packet loss is nonzero. Interestingly, with zero packet loss, the delays for voice over TCP will be equal to those of voice over UDP. Clients will need adaptive voice buffer algorithms that can tolerate wide swings in latencies. o Our approach requires a TLS server process (to receive RTP) embedded within a SIP enabled communications client. This will require a public/private key and its associated certificate, available to the client, issued from a Certification Authority (CA) that is known to the other party. Similarly, use of a TLS client will require that the client be configured with the keys of a set of well known CAs. o RTP forwarders are horrible. The author spent much time arguing against such devices, on the grounds that the underlying IP network already providing routing capabilities, and that these do not need to be replicated at the voice transport layer. They will increase overall voice latency, introduce another point of failure, and incur additional costs to providers. However, they are unavoidable given that the fundamental semantic of the IP address, that it is a globally reachable point for communications, has been violated by NATs. Perhaps this is argument can be rephrased as, "two wrongs make a right". o If the RTP forwarder is not co-resident with the proxy, some kind of control protocol is needed to allocate addresses and to establish bindings. No such protocol exists right now. We expect these forwarders to be bundled with proxies, and thus make use of proprietary protocols. Support for TCP and/or TLS in the softphones can be mitigated by deploying UDP to TCP/TLS translation proxies inside of the firewall. J.Rosenberg,H.Schulzrinne [Page 12] Internet Draft entfw November 17, 2000 7 Special case: the Cable/DSL Router and Smart User SIP can be made to run through a residential NAT/firewall (called a Cable/DSL Router in the market) without any of the above processing requirements, given a smart user, a configurable Cable/DSL router, and a configurable soft phone or desktop SIP phone. Most commercial Cable/DSL router products (the Linksys BEFR41 or Netgear RT314) allow a certain host to be declared a "DMZ" machine. The router will forward all incoming connection requests and UDP packets to this host. The user must set the IP phone to be this DMZ machine. Furthermore, the user must check the status of the router and determine the public IP address that the router has been assigned from the ISP (we assume that there is only one). The user must note this address, and then configure the softphone to use that address in the SDP sent by the softphone, and in Contacts sent to the registrar server. The user must also make sure that the router is not filtering outbound packets. Because the router is not filtering outbound packets, both UDP and TCP packets can be sent outwards. Because the SDP and Contact in the registration both contain the public address of the NAT, incoming packets (SIP requests and RTP packets) will be routed to the NAT, and then forwarded to the SIP phone. SIP responses will be routed correctly, since the Via handling procedures will cause them to be sent to port 5060 on the source address of the IP packet carrying the request. As this source address is the public address of the router, the response goes back to the router, and is then forwarded to the DMZ machine, in this case, the SIP phone. As an alternative to setting the SIP phone as the DMZ machine, the router can be configured to port forward port 5060 to the SIP phone. Similarly, we propose that softphones have configurable RTP ports that they use in the SDP. The router can then be configured to port forward that port to the SIP phone. This provides increased security, as DMZ machines are more vulnerable. The difficulty with this approach is that it requires a smart user and a very configurable softphone. A rare combination in most cases. 8 Security Considerations RTP forwarders are effectively man-in-the middle systems. As a result, a rogue proxy and RTP forwarder can listen in on the media of all users initiating calls through it. To prevent this, clients initiating TLS connections to a server should verify that the server J.Rosenberg,H.Schulzrinne [Page 13] Internet Draft entfw November 17, 2000 name in the SDP is a subdomain of the name presented in the certificate. Furthermore, the client should only connect to servers whose domains are subdomains of their service provider, or the provider of the other party in the call. 9 Conclusion In this draft, we have proposed some modifications to SIP operation which allow it to successfully pass through enterprise and residential NATs and firewalls. We believe this is critical to the success of SIP. 10 Author's Addresses Jonathan Rosenberg dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jdrosen@dynamicsoft.com Henning Schulzrinne Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: schulzrinne@cs.columbia.edu 11 Bibliography [1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP network address translator (NAT)," Internet Draft, Internet Engineering Task Force, Oct. 2000. Work in progress. [2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through firewalls and NATs," Internet Draft, Internet Engineering Task Force, Feb. 2000. Work in progress. [3] J. Kuthan and J. Rosenberg, "Firewall control protocol framework and requirements," Internet Draft, Internet Engineering Task Force, June 2000. Work in progress. [4] E. Rescorla, "HTTP over TLS," Request for Comments 2818, Internet J.Rosenberg,H.Schulzrinne [Page 14] Internet Draft entfw November 17, 2000 Engineering Task Force, May 2000. [5] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP: session initiation protocol," Request for Comments 2543, Internet Engineering Task Force, Mar. 1999. [6] D. Yon, "TCP-Based media transport in SDP," Internet Draft, Internet Engineering Task Force, Nov. 2000. Work in progress. [7] K. Moore, "On the use of HTTP as a substrate for other protocols," Internet Draft, Internet Engineering Task Force, Oct. 2000. Work in progress. J.Rosenberg,H.Schulzrinne [Page 15]