Friday, September 9, 2011
What's the story about SRTP?
I am still surprised how little VoIP traffic is encrypted in the year 2011. While it has become quite normal to use HTTPS to protect, most of the phone calls run encrypted over the WAN or the LAN. While in the old times, the national intelligence organizations has to work hard to get access to a specific phone line, today they just have to bid as a 2nd tier service provider for the cheapest route to a specific destination. They could even advertise it like “NSA—we have the cheapest routes to Afghanistan”.
SRTP makes sure that someone who sees the RTP traffic just gets some pretty random noise that does not allow any decoding of what is actually said on the phone. What a wire tapper will see is from where and to where the traffic flows, and it is also possible to see the RTP headers, including the sequence numbers. Someone who can actually modify the SRTP packet can also drop packets; however when changing the payload or even one of the headers, the checksum will not match any more and then the decoder will drop the packet.
SRTP uses AES as the encryption algorithm. Usually people use AES-128 which gives you a superb security. The way it works is that the AES algorithm essentially calculates a pseudo-random number based on the input from the RTP packet, essentially the sequence number and some “salt”. It is extremely difficult to calculate the input from the output. 64 bits already represent 18446744073709551616 possibilities, and 128 bits are 18446744073709551616 times 18446744073709551616 possibilities. You do need to do a lot of number crunching to get the right input if you should know the output.
Some people are confused how just 128 bits can be so secure, especially because today usually 2048 or 4096 bits are used in private and public RSA keys. The point here is that the 2048 bits must be a multiple of two prime numbers, so that not every bit combination in the 2048 is possible and there are a lot of algorithms in the game to calculate the two prime numbers. At the end of the day, RSA is used to negotiate the input for the SRTP algorithm for each call: While in RSA there are public and private keys being used, SRTP uses keys that both parties know. And the SRTP key changes from phone call to phone call, and the RSA keys remain the same for a potentially long time, so some extra security is welcome here.
One stupid thing with SRTP is that the receiver has to guess what the “rollover counter” looks like. The problem is that the sequence number is only 16 bits, and after 65536 packets you start over again. This happens after 22 minutes latest for a packet length of 20 ms. So one good test is to put the call on hold for 23 minutes, and then resume the call. If the other side has problems, this is probably because of the rollover counter. The m9 has some special logic built in to guess where the rollover counter is when you resume the call after such a long time. Actually there is also another SRTP called SSRTP which solves that problem, but unfortunately it is not backward compatible with SRTP. The snom m9 also supports the SSRTP, but as far as I know only Microsoft OCS/Lync use it.
The other common problem was that some implementations don’t check the SRTP checksum (MAC). For example, some old firmware versions of the snom phones happily played everything back that looked like SRTP, and when there was something going wrong with the key negotiation or holding the call for too long, it would play thundering loud static noise (which is what you “hear” if you are tapping the wire). The m9 did the proper MAC checking from day one, so the thundering loud static noise should be no problem.
As for encrypting traffic over the service providers there are a couple of problems. Most service providers use some kind of session border controller that will translate the SRTP stream into regular RTP stream. Especially the checksum will not make it through the session border controller, ending up in pretty much the thundering loud noise that I talked about above. The biggest problem though is to negotiate the SRTP keys with the end partner, because TLS might transport the keys to the session border controller, but this one will remove it and the other party will not receive it. Apart from that, the SBC will also be able to read the keys, so that at least the carrier can record the call. Standards like ZRTP try to solve the problem by negotiating the keys through special RTP packets, but again the problem is that the SBC might intercept that.
Anyway, when SRTP is being used and everything works okay, you should see a lock symbol on the snom m9 handset display. At least in the LAN or when connected to the corporate PBX that should be relatively easy and you can at least be sure that the internal calls remain private.
The encryption from the base station to the handset is another story.