Monday, March 12, 2012
Voice over TCP
In the last blog, I was talking about SIP and TCP and it would have been a great idea to use TCP as the default transport layer for SIP. When it comes to media, TCP has some disadvantages.
The overall problem is delay. Voice is an application where you want as little delay as possible. The only other application that I can think of is online shooter games, where you literally need a low latency to survive against your online enemies. If you are behind a slow DSL line, you’ll probably get shot down before you can even see what is going on.
With voice you won’t get shot down, but the conversation gets bad when the delay is too long. The first obvious problem is that both parties start talking all the time, which makes a conversation very unnatural. The other big problem is that there is still some small echo coming back mostly from the handset cord from the other side. Even if it is very low, if you have a long delay, you can hear it and it feels uncomfortable. So with voice, you want as little delay as possible. 40 ms delay is great, 80 ms gets to the limit already.
The problem with TCP is that when packets get lost, the TCP subsystem has to repeat the last packet and that might take a long time. On the other side, when the packet finally arrives, the audio buffer has already an under run and it is better to just drop the packet and play the next one immediately. That is why it is better to use UDP for audio.
Video might actually be a different story. Packet loss for video is a much bigger problem than for audio. Because of the high compression for video, a lot packet really screws the screen up. For video it would make sense to use TCP transport layer even when there is a risk of packet loss.
All that said, there is still an increasing demand to use TCP for voice as well. The problem is that in many environments, the firewall blocks UDP traffic. This is either because the firewall is not able to support too many UDP sessions, but maybe it is because the firewall should actually block voice traffic (for example in the hotel). Then TCP might still be a possibility. The price is an increased average delay, but if you have the choice between a conversation with a long delay and no conversation, you might choose the long delay.The support for that is still weak (so far only in Microsoft Lync environments as far as I know). However as more and more environments support it, your m9 might eventually pick TCP transport layer for voice eventually.