Problem: Issue #184: maximum message sizes for SIP over UDP, truncated responses. Explanation: While the request may fit the MTU, the response may be sufficiently large to exceed it and thus get fragmented. This appears likely only with 2xx responses, since error responses typically don't contain message bodies. Responses could either start large at the UAS, become large as they traverse the proxies back to the UAC (e.g., because Record-Route headers are being added) or an error response at a proxy can be large. The latter case seems least likely, but is possible: For example, P2 receives a request over UDP. Its less than 1500 bytes. It adds its own Via and RR. RR is big. Now, the request is over the limit. So, it sends it via TCP. It gets the response over TCP. Even after stripping Via, its still too big, but the request came in over UDP. Another case is a proxy adding headers to a response for services, such as R-P-ID, or even bodies. Fragmentation leads to loss of efficiency; more seriously, NATs and firewalls may not be able for forward fragments, thus causing all packets to be lost. Proposals: - Proposal 1: Ignore the issue 1. If you send a request bigger than MTU, you MUST use TCP. 2. TCP is recommended between elements that may have an intervening nat, if they are aware of such, or for elements that exchange a significant amount of traffic. 3. Intermediate elements SHOULD NOT insert headers or bodies into requests or, in particular, responses if they are above half MTU size, as this may cause the response to exceed the MTU size and thus incur fragmentation. Problems: may not fly past the IESG. - Proposal 2: Status code 499 If the UAS needs to send a large response, it sends a 499 instead, which propagates back to the UAC. This is treated like a branch failure. The first proxy that detects a response that has grown too large converts the response to 499. Problems: (1) Backwards-compatibility. Old UACs have no clue what this means and will not do the right thing. (2) Branches get lost, i.e., a perfectly servicable branch that would generate a 2xx may not get asked if there's another "better" branch, simply because the response is large. This isn't a bug as such, but would yield different behavior if somebody switches from UDP to TCP initially, which is at least hard to explain. - Proposal 3: Use redirection -- 307 (Temporary Redirect). The relevant entity issues a 307 response, with a Contact indicating the new transport, as in Contact: Normal 3xx behavior applies. As long as the 307 response is issued by a UAS, no special handling is necessary. The UAS needs to recognize the incoming TCP connection as being part of the same call, but this should happen based on standard dialogue identification. If the proxy increases the message size beyond the MTU, it needs to convert a 2xx response received via UDP to a 307 response. The upstream proxy or UAC then reissues the request as TCP. To ensure the same routing of the request, the proxy needs to insert some state-identifying information in the Contact header. I believe this is backwards-compatible and avoids the "branch loss" problem. Also, proxies can do recursive resolution of 3xx, so the request doesn't have to go back all the way to the UAC. The case is where a proxy at 1.2.3.4 got a 2xx from a downstream element, over TCP, but the request that triggered it came in on UDP. It wants to send the response, but its too big to go over UDP. So, it remembers it, indexing it with a key, say with value 3, and issues a 307: 307 Contact: sip:response-index.3@1.2.3.4 When it gets the recursed INVITE, it immediately returns the 2xx stored there. If the recursion never arrives, it would be bad. - Proposal 3a: Variant of redirection Route 307 all the way back to the origin. Restart the request with TCP from the UAC. - How does this fit with normal 3xx processing? - Requires UAC to speak TCP - but that's unavoidable in this case. - Proposal 4: Always send large responses via TCP Even if a request comes in via UDP, send the response via TCP. Difficulty: - associating requests and responses. - if source proxy is behind NAT or firewall, this may not work (but then, that proxy won't be able to receive incoming calls, either) Proposed Resolution: (3) or (4) If possible, I'd like to avoid the case of proxies turning 2xx into 3xx. See http://www.caida.org/outreach/papers/2001/Frag/ for recent measurements related to fragmentation.