Synchronizing the Rollover Counter in SRTP Multiparty Sessions David McGrew March 3, 2006 1. Introduction Like many other security protocols, the Secure RTP protocol (RFC3711) includes a sequence number in its authentication coverage to provide anti-replay protection. It also uses this value to form a unique nonce for use in counter mode encryption. Since the RTP protocol's sequence number is only 16 bits long, SRTP uses an larger 'implicit' sequence number, which it calls a packet index. The number of times that the RTP sequence number rolls over (from 65,535 back to zero) is tracked by a field called the rollover counter (ROC), which is used to form the packet index. The ROC is not explicitly included in the packets, in order to conserve bandwidth. SRTP uses a simple algorithm that enables an SRTP receiver to estimate the correct value of the ROC, even in the face of bursts of packet loss as large as 32,767. At a packet rate of 50 per second (typical for conversational voice), this burst of lost packets would last ten minutes. Thus, in a point-to-point SRTP session, it is easy for both participants to maintain the correct value of the ROC. Similarly, in a point-to-multipoint SRTP session in which all participants join at the outset, ROC synchronization is simple. However, RTP supports scenarios in which receivers can join a session that is already in progress. In that case, the SRTP ROC estimation algorithm may be unable to deal with packet loss. When a new receiver joins an SRTP session, the SRTP master keys for the SRTP source(s) in that session need to be provided to that participant. The keys can be provided by a signaling protocol (with suitable protections applied either to the signaling protocol, or to the keys themselves). Alternatively, a dedicated group key management group controller can provide them. The ROC for the source(s) can be provided along with the keys - if the group controller or signaling system knows the ROC values. If there is a single SRTP source, then this method may be practical. However, in some situations, it may be problematic for the controller or signaling system to know the ROCs of each source in the session. The ROC values themselves are dynamic, and new SRTP sources may join the session. This motivates the consideration of other methods for synchronizing ROC values in multi-party SRTP sessions. In the remainder of this document, we consider the case in which a participant joins an SRTP session that is already in progress. We assume that the participant knows the SRTP master key(s) of the sources, but not the ROC values of those sources. 2. Heuristic Methods One useful approach is to have the receiver apply heuristics to estimate the ROC and then rely on the authentication check to discern bad guesses from good ones. The great benefit of this method is interoperability; it does not require any changes to SRTP senders. These methods are only practical when message authentication is used with SRTP, since otherwise it is problematic for the SRTP receiver to identify bad ROC estimates. 2.1 ROC Iteration One simple method for estimating a ROC is to iterate starting at zero and then incrementing after each bad guess. In order to use this method without having an impact on performance or security, it is necessary to apply the authentication function only once per each packet. In other words, until the correct ROC is found, each packet will undergo a single authentication check with the current ROC estimate, and if the check fails, the ROC estimate will be incremented. This method is simple and readily implementable. When ROC iteration is used, the packet rate of the source determines the amount of time before ROC synchronization is achieved. For example, at 50 packets per second, it takes a receiver one second to catch up with nine hours worth of roll-overs. In many scenarios, the slight delay associated with the catchup is acceptable. Considering its simplicity and interoperability, this method is preferable whenever authentication is present. When an SRTP receiver applies multiple authentication checks, each with a distinct ROC estimate, to a single packet, there is a security degradation due to the fact that an adversaries chance of perpetrating a forgery increases (almost) proportionally with the number of ROC estimates that are tried per packet. Roughly speaking, if N estimates are tried against each packet, then the effective size of the authentication tag is reduced by lg(N) bits, where lg() is the logarithm base two. In some cases, it may be acceptable to forgo a few bits of authentication security, especially when the authentication tag is large. For example, using ROC iteration on a single packet with a 96-bit tag would provide better security than the default 80-bit tag (whenever fewer than 65,535 different values of the ROC are tried against each received packet). 2.2 Using the Session Time In some systems, the number of packets sent by an SRTP source can be estimated from the rate at which packets are sent. Many codecs have characteristic transmission rates; knowing these rates and the time that a session has been in existence, it is easy to derive an upper bound on the ROC of each sender in the session. It is less easy to derive a lower bound, especially for sessions with many senders. Unfortunately, it is the lower bound that is more important, since it can be used with the ROC iteration method. Information about the session, such as its starting time and the codecs and parameters used in it, is not available to an SRTP implementation in general. For this reason, SRTP implementations should not be expected to use the session time to estimate the ROC. If an external agency (e.g. a secure group controller or a signaling system) has this information available to it, then it should use this information to form a lower bound on the ROC and provide the SRTP implementations with this value. 3. Explicit Transport If heuristic methods are not possible, then the ROC must be conveyed to the receiver. Two in-band transport methods are available: SRTP and SRTCP. The latter is the control protocol, which provides "minimal control", monitoring, and identification functionality for the session, in a scalable manner. SRTCP packets are sent at regular (but possibly long) intervals from each participant in the session; SRTP sources send control packets more frequently than receivers. In a two-party voice session, SRTCP sends packets approximately once a second. The simplest way to do achieve ROC synchronization is to include the entire ROC value in each SRTP packet. However, bandwidth-sensitive applications would find this unappealing; it adds a lot of redundancy. For such applications, a better alternative is to include the ROC in the SRTCP packets, as suggested by McGrew, Andreasen, and Dondeti [EKT]. This alternative is consistent with the function of RTCP to "convey minimal session control information". In multi-party sessions, RTCP is relied on to provide information such as the canonical name associated with a source, and synchronization between sources. When RTCP is relied upon for these facilities, no loss in functionality is introduced by also relying on it to transport the ROC values of the SRTP sources. (Note that it is unnecessary to transport the ROC values of participants that are not sources.) 3.1 Bandwidth Minimization with only Unauthenticated SRTP Some RTP implementations lack RTCP, and thus could only use SRTP to transport the ROC. Bandwidth constrained applications may want to consider alternatives to conveying the entire ROC in each packet. Lehtovirta, Naslund, and Norrman describe a method for passing the ROC in SRTP which addresses the extreme bandwidth-constrained case [RCC], in which no authentication is no provided on the SRTP session. In their method, most packets contain neither an explicit ROC nor an authentication tag, but these values are periodically included. The RTP sequence number is used by both the sender and receiver to determine which packets will include these values. This method addresses the very specific case in which no authentication can be provided due to the need to conserve bandwidth. Because authentication is absent, none of the heuristic methods described above can be used. One disadvantage of this method is the fact that the rate at which the ROC is transmitted must be established in advance. Alternatively, the packet format could contain a flag bit that indicates the presence or absence of the ROC and authentication tag. This flag bit would enable the sender to choose the rate at which the ROC was transmitted adaptively. 3.2 Bandwidth Minimization with Authenticated SRTP More alternatives are available when SRTP is authenticated. One simple way to achieve robustness while minimizing bandwidth loss is to fragment the Rollover Counter into several distinct parts, and include one of these parts in each SRTP packet, periodically cycling through all of the fragments. An simple use of this method would be to send byte 0 in the packet with RTP sequence number 0, byte 1 in packet 1, and so on, with byte (i mod 4) going out in the packet with RTP sequence number i. Using R[i] to denote the value of the ith byte of the ROC, successive packets would contain the values R[0], R[1], R[2], R[3], R[0], R[1], R[2], R[3], R[0], ... With this simple byte-cycling method, a joiner will need to need to wait no more than four packets before knowing the ROC, typically. This method can be combined with heuristic methods, to make it even more effective. For example, the iterative method can be used to estimate the value of whatever bytes of the ROC are unknown. The expected time to convergence can be reduced by using ways of mapping the bytes of the ROC to the sequence number, in which the least significant byte of the ROC is sent more frequently. This method has the benefit that the packet expansion is identical for all packets. However, it can only be used when each SRTP packet is authenticated, in which case the value of bandwidth minimization is lower. 4. Commonalities between EKT and RCC EKT and RCC both define new authentication functions, and both use these functions to convey the ROC. EKT also sends out additional fields, which enable in-band secure key transport, and is based on SRTCP instead of SRTP. Since EKT could easily be extended so that it can be conveyed by SRTP as well as SRTCP, it is worthwhile to consider combining these two extensions. The two largest problems with such a merger are that EKT is less bandwidth-optimal than RCC, and that in EKT it is desirable to allow the SRTP sender to adaptively choose the transmission rate, as described in the last paragraph of Section 3.1. Both of these problems could be overcome, though perhaps at some cost in complexity. 5. Conclusions We conclude with some recommendations. SRTP implementations SHOULD be able to accept ROC estimates from trusted external sources, such as a signaling system. SRTP implementations MAY implement the ROC iteration method of Section 2.1. Signaling systems SHOULD provide SRTP implementations with ROC estimates that are known to be less than or equal to the current ROC, but not greater than that value. This requirement ensures that that the ROC estimates are suitable for use with the ROC iteration method of Section 2.1. It may be worthwhile to consider combining EKT and RCC into a single method, but this question deserves more study. 6. References [RCC] Lehtovirta, Naslund, Norrman, "Integrity Transform Carrying Roll-over Counter", draft-lehtovirta-srtp-rcc-01.txt, February, 2006. [EKT] McGrew, Andreasen, Dondeti, "Encrypted Key Transport for Secure RTP", draft-mcgrew-srtp-ekt-00.txt, February, 2006.