Internet Engineering Task Force Audio/Video Transport wg Internet Draft J. Rosenberg, H. Schulzrinne draft-ietf-avt-byerecon-00.txt Bell Laboratories/Columbia U. November 13, 1997 Expires: May 1998 New Results in RTP Scalability STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working docu- ments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference mate- rial or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. 1 Abstract Recently, a number of problems related to RTP scalability to large multicast groups have been identified. The main problem, congestion of RTCP packets due to near-simultaneous joins, has been resolved in previous work. In this document, we present some additional problems and describe the solutions. In particular, we discuss the problem of BYE floods and premature timeouts. To resolve the BYE flood problem, we propose a BYE reconsideration mechanism. To help alleviate prema- ture timeouts, we propose a reverse timer reconsideration algorithm. Both algorithms are simple, and require minimal state and computa- tion. 2 Introduction Recently, a number of problems related to RTP [1] scalability to large multicast groups have been identified [2]. The most serious of these problems is RTCP congestion in simultaneous joins. In this sce- nario, a large number of users join a session at nearly the same time. This can happen if the join is an automatic response to an announced session on the mbone [3], or if RTP is being used for J. Rosenberg, H. Schulzrinne [Page 1] broadcast applications, and a popular show comes on. When this hap- pens, each user joins the group believing that they are the only user. They then send their initial RTCP packets within a short period Internet Draft RTP Scalability 2 November 13, 1997 of time, causing a flood of RTCP reports. This can result in network and/or access link congestion. For modem dial-in users, the slow access links can cause congestion even with moderate simultaneous joins. To combat this problem, an algorithm called reconsideration has been proposed [4] [5]. This algorithm causes backoff in the transmission of RTCP packets as group sizes increase. However, several new problems have recently been discovered relating to RTCP scalability. These difficulties are similar to the step-join congestion problem, but different in that they require additional algorithms to resolve. This document describes the two new difficul- ties which have been encountered - BYE floods and premature timeouts. 3 BYE Floods The BYE flood problem is very similar to the simultaneous join prob- lem. Instead of many users joining at the same time, many users leave the group at the same time. Since an RTP client sends an RTCP BYE packet when it leaves the group, this causes a flood of BYE packets, which congests the network. Users can be expected to leave a group simultaneously for much the same reasons they might join simultane- ously - an automatic leave as a result of the end of the session (as indicated in the SDP [6] announcement for the session), or because the show is over, and the users manually exit their applications. A number of aspects of the BYE flood problem make it different than the simultaneous join problem. These must be taken into consideration when designing an algorithm to reduce the flood. We therefore state the goals of the BYE flood prevention algorithm as follows: oUsers often terminate their applications just after leaving the session. The algorithm must be aware of this possibility, and define the appropriate behavior if an application decides to ter- minate. oThe algorithm should behave gracefully; when very few users are leaving the group simultaneously, users should generally be allowed to send their BYE packets right away. It is only in the presence of a large number of BYE packets that the algorithm should kick in, and force users to hold back on sending their BYE packets. oThe algorithm should be simple, requiring minimal computation and storage. We propose an algorithm called BYE reconsideration to accomplish these J. Rosenberg, H. Schulzrinne [Page 2] Internet Draft RTP Scalability 2 November 13, 1997 goals. The algorithm operates much like standard reconsideration. How- ever, instead of counting other users, and using the resulting count as a multiplier for the packet transmission interval, the client counts BYE packets, and uses the number of BYE packets received thus far as the multiplier for the interval. The operation of the algorithm is as fol- lows. At some time tl, the user decides to leave the session. The application first checks to see if it has ever sent an RTCP packet. If it has not, the application must not send a BYE packet. Instead, it should leave the session silently. Without having sent an RTCP packet, the BYE packet provides no useful information. Next, the application checks to see if the group size is less than some threshold, Bt (a value of 50 seems rea- sonable). If it is, the application may send a BYE packet immediately, and then leave the session. For small groups, BYE packets are of signif- icant value (for loose session management), and sending them immediately is quite important. Since the group contains only a small number of par- ticipants, the flood of packets is limited. It is possible that a user has a lowball group size estimate if they leave a session quickly after joining it. If this session is large, and there are many users coming and going fairly quickly (typical of a chan- nel surfer), it might appear that this can cause a steady flow of BYE packets. However, if these clients implement the forward reconsideration algorithms, they generally will have never sent an RTCP packet. This is because the new users' RTCP packet transmission will be constantly reconsidered, as the new user will be receiving RTCP packets at a steady rate from other users already in the group. This constant reception of packets will cause the new user to see a steady growth in the group size, causing its own RTCP packet transmission to be pushed into the future. Since a user who never sends an RTCP packet cannot send a BYE packet, this will generally cause these channel surfers to neither send RTCP SDES or RR information, nor a BYE packet. If the user has sent an RTCP packet previously, and the group size exceeds Bt, the application computes a time interval T as: T = R(1/2) max(T_min, n_l * C) Where Tmin is 2.5 seconds, C=avgpktsz/(bw*.05), R(1/2) is a random vari- able uniformly distributed between 1/2 and 3/2, avgpktsz is the average size of all BYE packets received thus far, and bw is the session band- width. The average packet size is computed using the same exponential weighted average filter used to compute the average RTCP packet size in the current specification. The value is updated through the filter every time a BYE packet is received or transmitted (not when it is reconsid- ered). J. Rosenberg, H. Schulzrinne [Page 3] Internet Draft RTP Scalability 2 November 13, 1997 The user then schedules the BYE packet to be sent at time tl+T. Between tl and this time, the user increments nl for each BYE packet that is received. In this fashion, nl counts the number of BYE's from other users since deciding to leave the session. When this time arrives, the user recomputes T according to the previous equation. If tl+T is less than the current time, the BYE packet may be sent. If tl+T is more than the current time, the BYE packet transmission is rescheduled for time tl+T. At that time, the computation and compari- son are repeated. All along, nl is incremented for each BYE packet received. A BYE packet which is from an SSRC which already sent a BYE (a dupli- cate) is ignored. Furthermore, the application should not increment nl if it receives a BYE from a user which has never sent an RTCP packet. Under normal situations, an application should never send a duplicate BYE packet, or send a BYE if an RTCP packet was never sent. However, a malicious user may send many BYE packets. If this check were not made, these BYE's would cause the variable nl to increase, and effectively prevent any other user from sending a BYE. If an application wishes to terminate before it can send a BYE RTCP packet according to these rules, it must not send a BYE packet. Instead, it should terminate silently. The BYE reconsideration algorithm will effectively re-allocate the bandwidth from users who leave without BYE's to those who wait around to send a BYE. The effect of this algorithm is to restrict the BYE packet transmission rate to at most an additional 10% of the session bandwidth (assuming a very large simultaneous leave). At the same time, if only a few users are leaving the group (even for a large group), they will get to send their BYE packets in a timely fashion. This meets the design objectives described in the beginning of the section. We ran numerous simulations to verify the performance of the algorithm. Even with as many as 10,000 users simultaneously leaving the session, the BYE reconsideration algorithm maintained the BYE transmission rate at 10%. This is demonstrated in Figure 1, which depicts the cumulative number of RTCP packets (BYE and others) send to the multicast group over time. At time 10,000, almost all of the users leave the group. The top line depicts the performance without BYE reconsideration, where some 10,000 BYE packets are sent all at once. The lower curve shows perfor- mance for BYE reconsideration. Note how there is only a small increase in packet transmission rates. [Figure available in Postscript version only] Figure 1: BYE Reconsideration Performance J. Rosenberg, H. Schulzrinne [Page 4] Internet Draft RTP Scalability 2 November 13, 1997 4 Premature Timeouts We have observed a secondary effect when many users simultaneously leave a group. There are many applications where not all of the users will leave - some will stick around. An example is a distance learn- ing application. There are perhaps several hundred students in the class. When the class ends, most of the students leave at about the same time. However, some stay behind to talk with the professor. We have observed that rapid leaves can cause the remaining users to time each other out. It can take a significant amount of time for the users to return. This implies that each user will not see the other users, which is undesirable for the post-class discussion scenario just mentioned. 4.1 Quantifying the Problem The difficulty is related to the way timeouts are handled. In the current specification [1], a user is timed out if they have not sent an RTP or RTCP packet within the last five RTCP intervals. In dynamic groups, the interval itself is dynamic. As many users leave a group, their BYE packets cause the group size estimate to rapidly decrease. This, in turn, decreases the timeout interval. If the number of users who leave the group is sufficiently large, the timeout interval may decrease so much that the remaining users will time out. For example, consider a group of 505 users. If the total RTCP inter- val is to be limited to 1 packet per second, each user sends RTCP packets once every 505 seconds (on average). Assume user 1 last sent an RTCP packet at time 0. The user schedules the next RTCP packet for time 505. At time 490, 500 of the 505 members (not including user 1) leave the group, and send BYE packets (assume for the moment that there is no BYE flood prevention algorithm). Shortly thereafter (say time 500) the BYE packets have been received, and the remaining 5 users perceive the group size to be 5. Based on this, the timeout interval is 25 seconds. Any user who has not sent a packet since time 475 will therefore be timed out. User 1 last sent a packet at time 0, so they are timed out. In fact, odds are good that most of the remaining 5 users sent their last packet before time 475. Thus, every user will time out all of the other users. Furthermore, it may take a long time for those users to come back. Consider user 2, who did not leave the group, and who was unfortunate enough to have last sent an RTCP packet at time 450. They then scheduled their next RTCP packet for time 955 (since there were still 505 users at the time). After the exodus at time 500, user 2 will remain, but will not send the next RTCP packet until time 955 (unless the user sends data, in which case they will be known via the RTP packet). J. Rosenberg, H. Schulzrinne [Page 5] Internet Draft RTP Scalability 2 November 13, 1997 The first question to ask is whether BYE flood prevention helps alle- viate this problem. Since the algorithm is designed to reduce the flood of BYE packets, the group size cannot drop so rapidly. This does help, of course, but not completely. With network delays, there still can be spikes in BYE packets. It does not require many BYE packets for this phenomenon to surface; it only requires that the ratio of users left after the leave to the number before the leave be less than around 1/5. This can occur in both small and large groups alike. Even assuming the rate of BYE packets is restricted to some constant factor, the efficacy of the timeout algorithm is reduced. To show this, we computed the number of packets sent by a user, on average, during the timeout interval, as the group size decreases linearly from some initial value Ns to some final value Nf. We denote the slope of the decrease by r users per second. We also define C as the multiplier to convert from group size to interval (so that if there are N group members, each member sends RTCP packets every CN seconds, on average), and M as the factor of 5 timeout multiplier. If r times C is much smaller than 1, and less than (1/CM)(Ns/Nf - 1)the number of packets sent during the timeout window, is on average: N_p = (-1)/(ln(1 - rC)) ln(1 + rCM) and for rC larger than 1, but less than (1/cM)(Ns/Nf-1): N_p <= (M)/(1 + rCM) In the limit as r goes to zero (that is, nobody leaving the group), the number of packets is M (according to the first equation), as expected. However, this quantity decreases rapidly as the slope, r, increases relative to 1/C. With the BYE prevention algorithm, the flood of packets can be shown to be bounded at an average rate of 2/C in the absence of network delays. With network delays, there can be a spike of packets, but following this spike, the BYE packet rate will also settle to 2/C initially, gradually decreaseing to 1/C. Plugging in r=2/C into the second equation above, the number of packets sent is: N_p <= 5/11 This means that each sender will send only half a packet during the timeout window, on average. In reality, this means that half of the users remaining will send 1, and the other half will not. Therefore, many users will still timeout. Those users which do manage to send a packet may still timeout if the packet is lost. Note that both equations above rely on many users leaving the group J. Rosenberg, H. Schulzrinne [Page 6] Internet Draft RTP Scalability 2 November 13, 1997 at the same time (the constraint that r<(1/MC)(Nf/Ns-1)).If the num- ber who leave is small relative to the slope of the leave (r>(1/MC) (Nf/Ns-1)),the effects are less severe. 4.2 Reverse Reconsideration One of the major factors contributing to the premature timeout effect is the delay between when the group size decreases, and when users begin to send packets using the resulting smaller interval. In the example in the previous section, user 2 sent an RTCP packet at time 450, and scheduled the next for time 955. After the exodus at time 490, user 2 knows that the group size has dropped - but does nothing. The user instead waits until time 955, sends the packet, and then computes the next send time. Since the group size is now 5, user 2 will schedule the next packet 5 seconds later, on average. There is thus a 500 second delay between the exodus and when user 2 gets around to scheduling a packet using the new, smaller interval. To resolve this problem, we propose an algorithm called reverse reconsideration. The idea is simple. If the group membership decreases, each user reschedules their next packet immediately. The packet is rescheduled for a time earlier than previously. The amount earlier is made to depend on how much the group size has decreased. More specifically, assume that the last time a user sent an RTCP report is tp. The next report is scheduled for time tn, and tc is the current time. Before the arrival of a BYE packet at the current time tc, there were np users. There are now nc users. Before the BYE, RTCP packet transmissions should be uniformly scheduled over time. That means that there should have been nc packet transmissions scheduled between tc and tc+C*nc. Now, however, the group size has decreased to np. This should cause there to be np packet transmissions scheduled between tcandC*np. To accomplish this, every user should compress the interval between the current time and their next packet transmission by nc/np.This implies that the next packet transmission should be rescheduled for time tn: t_n = t_c + (n_c/n_p)(t_n - t_c) This new value for tn has two key properties: 1. The new time is always earlier than the previous time. 2. The new time can never be before the current time. The second property is key; it guarantees that there will not be a spike of packets transmitted due to a sharp decrease in group size. J. Rosenberg, H. Schulzrinne [Page 7] Internet Draft RTP Scalability 2 November 13, 1997 On the surface, it would seem that an alternate algorithm might be a direct application of regular (or forward) reconsideration. Such an implementation might work as follows. At tc, when the BYE arrives, the user recomputes the transmission interval T, based on the new group size nc. This interval is then added to the previous packet transmission time tp. If the result is a time before the current time, the packet is sent, else it is rescheduled for tp+T. This algorithm does not work. Even a moderate decrease in the group size would cause many users to send their RTCP packets immediately, causing an additional spike. This is because this version of the algorithm does not maintain property 2 - the new transmission time can be before the current time. There is one additional aspect to the reverse reconsideration algorithm that must be considered - how it interacts with forward reconsideration. Consider the following example. There are 100 users in a group. The con- stant C is equal to 1 packet per second. At time 0, user A sends an RTCP packet, and schedules the next for time 100. At time 50, 50 users leave the group. User A executes the reverse reconsideration algorithm, and reschedules their packet for time 50 + (50/100)(100 - 50) = 75. At time 60, one more user joins the group. At time 75, user A executes the for- ward reconsideration algorithm (we assume conditional reconsideration). Since the group size has increased (51 vs. 50), user A recomputes the interval - which is now 51 seconds on average. This is then added to the previous transmission time, which is time 0. The result is t=51, signif- icantly earlier than the current time t=75. This will cause the user (and in fact, all other users) to send their packets immediately, even if the group size further increases. The problem is that while we have adjusted the next packet transmission time, tn, with reverse reconsider- ation, we have not adjusted the value of the previous packet transmis- sion time, tp. This quantity is used for forward reconsideration, and must be adjusted as well in order to maintain consistency. The adjustment algorithm is simple. The value for tp is updated when tn is updated by reverse reconsideration. Its value is adjusted to: t_p = t_c - (n_c/n_p)(t_c - t_p) The nature of this adjustment is the same as for tn. In the previous example, it would have caused tp to be adjusted from 0 to (50 - (50/100)(50 - 0)) = 25. At time 75, when the user is performing the for- ward reconsideration algorithm, the interval T=51 is added to tp=25, yielding t=76, slightly ahead of the current time t=75, as expected (since the group has only increased by one member). 4.3 Performance What is the improvement due to reverse reconsideration? We have been able to prove that the number of packets sent during the timeout J. Rosenberg, H. Schulzrinne [Page 8] Internet Draft RTP Scalability 2 November 13, 1997 window has an achievable lower bound of: N_p = (1 / rC) * ln(1 + rCM) This time, independent of the relative value of rC. Again, this holds only when the actual number of users who leave is a significant frac- tion of the current group size (Nf/Ns<1/(1+MCr)).Assuming that BYE reconsideration is being used, the BYE rate is limited to around r=2/C when network delays are small. Plugging this in: N_p = 1/2 ln 11 = 1.19 This means that a user will send 1.19 packets on average, during the timeout window. This is to be compared to the situation before reverse reconsideration, where Np=5/11. Therefore, reverse reconsid- eration affords a factor of three improvement in performance. Now, no users will timeout under normal circumstances (on average). However, a single packet loss may cause a user to timeout prematurely. 5 Conclusion We presented two additional problems with RTP scalability - BYE floods and premature timeouts. BYE floods are caused when many users simultaneously leave a group. To fix the problem, we presented an algorithm called BYE reconsideration, which works much like forward reconsideration, but for BYE packets. The second problem, premature timeouts, is a secondary effect, but still problematic. To help resolve it, we presented an algorithm called reverse reconsideration, which shows a threefold factor of improvement. Both algorithms are extremely simple, requiring no additional memory beyond forward reconsideration, and requiring a single O(1) computation upon recep- tion of a BYE packet. 6 Bibliography [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a transport protocol for real-time applications, Tech. Rep. RFC 1889, Internet Engineering Task Force, Jan. 1996. [2] B. Aboba, Alternatives for enhancing RTCP scalability, Internet Draft, Internet Engineering Task Force, Jan. 1997. Work in progress. [3] M. Handley, SAP: Session announcement protocol, Internet Draft, Internet Engineering Task Force, Nov. 1996. Work in progress. [4] J. Rosenberg and H. Schulzrinne, Timer reconsideration for enhanced RTP scalability, Internet Draft, Internet Engineering Task Force, July 1997. Work in progress. J. Rosenberg, H. Schulzrinne [Page 9] Internet Draft RTP Scalability 2 November 13, 1997 [5] J. Rosenberg and H. Schulzrinne, Timer reconsideration for enhanced rtp scalability, in To appear in Proceedings of IEEE Info- com '98 , 1998. [6] M. Handley and V. Jacobson, SDP: Session description protocol, Internet Draft, Internet Engineering Task Force, Mar. 1997. Work in progress. 7 Authors' Addresses Jonathan Rosenberg Bell Laboratories, Lucent Technologies 101 Crawfords Corner Rd. Holmdel, NJ 07733 Rm. 4C-526 email: jdrosen@bell-labs.com Henning Schulzrinne Columbia University M/S 0401 1214 Amsterdam Ave. New York, NY 10027-7003 email: schulzrinne@cs.columbia.edu J. Rosenberg, H. Schulzrinne [Page 10]