ZRTP is an extension to Real-time Transport Protocol (RTP) which describes a method of Diffie-Hellman Key-agreement protocol|key agreement for Secure Real-time Transport Protocol (Secure Real-time Transport Protocol. It was submitted to the IETF by Phil Zimmermann, Jon Callas and Alan Johnston on March 5, 2006. Session Initiation Protocol (SIP) is a VoIP standard. A few VOIP programs support it but it never got into any hardware products.
ZRTP is described in the Internet-Draft as a "key agreement protocol which performs Diffie-Hellman key exchange during call setup in-band in the Real-time Transport Protocol (RTP) media stream which has been established using some other signaling protocol such as Session Initiation Protocol (SIP). This generates a shared secret which is then used to generate keys and salt for a Secure RTP (SRTP) session." One of ZRTP's features is that it does not rely on SIP signaling for the key management, or on any servers at all. It supports opportunistic encryption by auto-sensing if the other VoIP client supports ZRTP.
This protocol does not require prior shared secrets or rely on a Public key infrastructure (PKI) or on certification authorities, in fact ephemeral Diffie-Hellman keys are generated on each session establishment: this allows to bypass the complexity of creating and maintaining a trusted third-party.
These keys will contribute to the generation of the session secret, from which the session key and parameters for SRTP sessions will come, along with previously shared secrets (if some): this gives protection against man in the middle (MitM) attacks, assuming the attacker was not present in the first session between the two endpoints.
To ensure that the attacker is indeed not present in the first session (when no shared secrets exist) , the Short Authentication String method is used: the two endpoint compare a value by reading it aloud. In case the two values match, then no MitM attack has been performed.
ZRTP can be used with any signaling protocol, including Session Initiation Protocol, H.323 and Jingle. ZRTP is independent of the signaling layer, because it does all its key negotiations in the RTP media stream.
The Diffie-Hellman key exchange by itself does not provide protection against man in the middle attack|man in the middle (MitM) attacks. To authenticate the key exchange, ZRTP uses a Short Authentication String (SAS), which is essentially a Cryptographic hash function|cryptographic hash of the two Diffie-Hellman values. The SAS value is rendered to both ZRTP endpoints. To carry out authentication, this SAS value is read aloud to the communication partner over the voice connection. If the values on both ends do not match, it indicates the presence of a man-in-middle attack. If they do match, there is a high probability that no man-in-the-middle is present. The use of hash commitment in the DH exchange constrains the attacker to only one guess to generate the correct SAS in his attack, which means the SAS can be quite short. A 16-bit SAS, for example, provides the attacker only one chance out of 65536 of not being detected.
ZRTP provides a second layer of authentication against a MitM attack, based on a form of key continuity. It does this by caching some hashed key material to use in the next call, to be mixed in with the next call's DH shared secret, giving it key continuity properties analogous to Secure Shell|SSH. If the MitM is not present in the first call, he is locked out of subsequent calls. Thus, even if the SAS is never used, most MitM attacks are stopped, because they weren't present in the first call.
ZRTP was a great idea which never got widely implemented. A few programs uses it but not many. You can't really expect that someone you want to talk to has support for ZRTP.