Chapter 32

VoIP Security

Dan Wing, Cisco Systems

Harsh Kupwade Patil, Southern Methodist University

The Internet has become an important medium for communication, and all kinds of media communications are carried on today’s Internet. Telephony over the Internet has received a lot of attention in the last few years because it offers users advantages such as long distance toll bypass, interactive ecommerce, global online customer support, and much more.

1. Introduction

H.323 and Session Initiation Protocol (SIP) are the two standardized protocols for the realization of VoIP.1,2 The multimedia conference protocol H.323 of the International Telecommunication Union (ITU) consists of multiple separate protocols such as the H.245 for control signaling and H.225 for call signaling. H.323 is difficult to implement because of its complexity and the bulkiness that it introduces into the client application.3 In contrast, SIP is simpler than H.323 and also leaner on the client-side application. SIP uses the human-readable protocol (ASCII) instead of H.323’s binary signal coding.

VoIP Basics

SIP is the Internet Engineering Task Force (IETF) standard for multimedia communications in an IP network. It is an application layer control protocol used for creating, modifying, and terminating sessions between one or more SIP user agents. It was primarily designed to support user location discovery, user availability, user capabilities, and session setup and management.

In SIP, the end devices are called user agents (UAs), and they send SIP requests and SIP responses to establish media sessions, send and receive media, and send other SIP messages (e.g., to send short text messages to each other or subscribe to an event notification service). A UA can be a SIP phone or SIP client software running on a PC or PDA.

Typically, a collection of SIP user agents belongs to an administrative domain, which forms a SIP network. Each administrative domain has a SIP proxy, which is the point of contact for UAs within the domain and for UAs or SIP proxies outside the domain. All SIP signaling messages within a domain are routed through the domain’s own SIP proxy. SIP routing is performed using Uniform Resource Identifiers (URIs) for addressing user agents. Two types of SIP URIs are supported: the SIP URI and the TEL URI. A SIP URI begins with the keyword sip or sips, where sips indicates that the SIP signaling must be sent over a secure channel, such as TLS.4 The SIP URI is similar to an email address and contains a user’s identifier and the domain at which the user can be found. For example, it could contain a username such as sip: [email protected], a global E.164 telephone number5 such as sip: [email protected];user = phone, or an extension such as sip: [email protected]. The TEL URI only contains an E.164 telephone number and does not contain a domain name, for example, tel: + 1.408.555.1234.

A SIP proxy server is an entity that receives SIP requests, performs various database lookups, and then forwards (“proxies”) the request to the next-hop proxy server. In this way, SIP messages are routed to their ultimate destination. Each proxy may perform some specialized function, such as external database lookups, authorization checks, and so on. Because the media does not flow through the SIP proxies—but rather only SIP signaling—SIP proxies are no longer needed after the call is established. In many SIP proxy designs, the proxies are stateless, which allows alternative intermediate proxies to resume processing for a failed (or overloaded) proxy. One type of SIP proxy called a redirect server receives a SIP request, performs a database query operation, and returns the lookup result to the requester (which is often another proxy). Another type of SIP proxy is a SIP registrar server, which receives and processes registration requests. Registration binds a SIP (or TEL) URI to the user’s device, which is how SIP messages are routed to a user agent. Multiple UAs may register the same URI, which causes incoming SIP requests to be routed to all of those UAs, a process termed forking, which causes some interesting security concerns.

The typical SIP transactions can be broadly viewed by looking at the typical call flow mechanism in a SIP session setup, as shown in Figure 32.1. The term SIP trapezoid is often used to describe this message flow where the SIP signaling is sent to SIP proxies and the media is sent directly between the two UAs.

image

Figure 32.1 An example of a SIP session setup.

If Alice wants to initiate a session with Bob, she sends an initial SIP message (INVITE) to the local proxy for her domain (Atlanta.com). Her INVITE has Bob’s URI ([email protected]) as the Request-URI, which is used to route the message. Upon receiving the initial message from Alice, her domain’s proxy sends a provisional 100 Trying message to Alice, which indicates that the message was received without error from Alice. The Atlanta.com proxy looks at the SIP Request-URI in the message and decides to route the message to the Biloxi.com proxy. The Biloxi.com proxy receives the message and routes it to Bob. The Biloxi.com proxy delivers the INVITE message to Bob’s SIP phone, to alert Bob of an incoming call. Bob’s SIP phone initiates a provisional 180 Ringing message back to Alice, which is routed all the way back to Alice; this causes Alice’s phone to generate a ringback tone, audible to Alice. When Bob answers his phone a 200 OK message is sent to his proxy, and Bob can start immediately sending media (“Hello?”) to Alice. Meanwhile, Bob’s 200 OK is routed from his proxy to Alice’s proxy and finally to Alice’s UA. Alice’s UA responds with an ACK message to Bob and then Alice can begin sending media (audio and/or video) to Bob. Real-time media is almost exclusively sent using the Real-time Transport Protocol (RTP).6 At this point the proxies are no longer involved in the call and hence the media will typically flow directly between Alice and Bob. That is, the media takes a different path through the network than the signaling. Finally, when either Alice or Bob want to end the session, they send a BYE message to their proxy, which is routed to the other party and is acknowledged.

One of the challenging tasks faced by the industry today is secure deployment of VoIP. During the initial design of SIP the focus was more on providing new dynamic and powerful services along with simplicity rather than security. For this reason, a lot of effort is under way in the industry and among researchers to enhance SIP’s security. The subsequent sections of this chapter deal with these issues.

2. Overview of Threats

Attacks can be broadly classified as attacks against specific users (SIP user agents), large scale (VoIP is part of the network) and against network infrastructure (SIP proxies or other network components and resources necessary for VoIP, such as routers, DNS servers, and bandwidth).7 This chapter does not cover attacks against infrastructure; the interested reader is referred to the literature.8 The subsequent parts of this chapter deal with the attacks targeted toward the specific host and issues related to social engineering.

Taxonomy of Threats

The taxonomy of attacks is shown in Figure 32.2.

image

Figure 32.2 Taxonomy of threats.

Reconnaissance of VoIP Networks

Reconnaissance refers to intelligent gathering or probing to assess the vulnerabilities of a network, to successfully launch a later attack; it includes footprinting the target (also known as profiling or information gathering).

The two forms of reconnaissance techniques are passive and active. Passive reconnaissance attacks include the collection of network information through indirect or direct methods but without probing the target; active reconnaissance attacks involve generating traffic with the intention of eliciting responses from the target. Passive reconnaissance techniques would involve searching for publicly available SIP URIs in databases provided by VoIP service providers or on Web pages, looking for publicly accessible SIP proxies or SIP UAs. Examples include dig and nslookup. Although passive reconnaissance techniques can be effective, they are time intensive.

If an attacker can watch SIP signaling, the attacker can perform number harvesting. Here, an attacker passively monitors all incoming and outgoing calls to build a database of legitimate phone numbers or extensions within an organization. This type of database can be used in more advanced VoIP attacks such as signaling manipulation or Spam over Internet Telephony (SPIT) attacks.

Active reconnaissance uses technical tools to discover information on the hosts that are active on the target network. The drawback to active reconnaissance, however, is that it can be detected. The two most common active reconnaissance attacks are call walking attacks and port-scanning attacks.

Call walking is a type of reconnaissance probe in which a malicious user initiates sequential calls to a block of telephone numbers to identify what assets are available for further exploitation. This is a modern version of wardialing, common in the 1980s to find modems on the Public Switched Telephone Network (PSTN). Performed during nonbusiness hours, call walking can provide information useful for social engineering, such as voicemail announcements that disclose the called party’s name.

SIP UAs and proxies listen on UDP/5060 and/or TCP/5060, so it can be effective to scan IP addresses looking for such listeners. Once the attacker has accumulated a list of active IP addresses, he can start to investigate each address further. The Nmap tool is a robust port scanner that is capable of performing a multitude of types of scans.9

Denial of Service

A denial-of-service (DoS) attack deprives a user or an organization of services or resources that are normally available. In SIP, DoS attacks can be classified as malformed request DoS and load-based DoS.

Malformed Request DoS

In this type of DoS attack, the attacker would craft a SIP request (or response) that exploits the vulnerability in a SIP proxy or SIP UA of the target, resulting in a partial or complete loss of function. For example, it has also been found that some user agents allow remote attackers to cause a denial of service (“486 Busy” responses or device reboot) via a sequence of SIP INVITE transactions in which the Request-URI lacks a username.10 Attackers have also shown that the IP implementations of some hard phones are vulnerable to IP fragmentation attacks [CAN-2002-0880] and DHCP-based DoS attacks [CAN-2002-0835], demonstrating that normal infrastructure protection (such as firewalls) is valuable for VoIP equipment. DoS attacks can also be initiated against other network services such as DHCP and DNS, which serve VoIP devices.

Load-Based DoS

In this case an attacker directs large volumes of traffic at a target (or set of targets) and attempts to exhaust resources such as the CPU processing time, network bandwidth, or memory. SIP proxies and session border controllers (SBCs) are primary targets for attackers because of their critical role of providing voice service and the complexity of the software running on them.

A common type of load-based attack is a flooding attack. In case of VoIP, we categorize flooding attacks into these types:

• Control packet floods

• Call data floods

• Distributed denial-of-service attack

Control Packet Floods

In this case the attacker will flood SIP proxies with SIP packets, such as INVITE messages, bogus responses, or the like. The attacker might purposefully craft authenticated messages that fail authentication, to cause the victim to validate the message. The attacker might spoof the IP address of a legitimate sender so that rate limiting the attack also causes rate limiting of the legitimate user as well.

Call Data Floods

The attacker will flood the target with RTP packets, with or without first establishing a legitimate RTP session, in an attempt to exhaust the target’s bandwidth or processing power, leading to degradation of VoIP quality for other users on the same network or just for the victim.

Other common forms of load-based attacks that could affect the VoIP system are buffer overflow attacks, TCP SYN flood, UDP flood, fragmentation attacks, smurf attacks, and general overload attacks. Though VoIP equipment needs to protect itself from these attacks, these attacks are not specific to VoIP.

A SIP proxy can be overloaded with excessive legitimate traffic—the classic “Mother’s Day” problem when the telephone system is most busy. Large-scale disasters (e.g., earthquakes) can also cause similar spikes, which are not attacks. Thus, even when not under attack, the system could be under high load. If the server or the end user is not fast enough to handle incoming loads, it will experience an outage or misbehave in such a way as to become ineffective at processing SIP messages. This type of attack is very difficult to detect because it would be difficult to sort the legitimate user from the illegitimate users who are performing the same type of attack.

Distributed Denial-of-Service Attack

Once an attacker has gained control of a large number of VoIP-capable hosts and formed a “zombies” network under the attacker’s control, the attacker can launch interesting VoIP attacks, as illustrated in Figure 32.3.

image

Figure 32.3 Distributed denial-of-service attack (DDoS).

Each zombie can send up to thousands of messages to a single location, thereby resulting in a barrage of packets, which incapacitates the victim’s computer due to resource exhaustion.

Loss of Privacy

The four major eavesdropping attacks are:

• Trivial File Transfer Protocol (TFTP) configuration file sniffing

• Traffic analysis

• Conversation eavesdropping

TFTP Configuration File Sniffing

Most IP phones rely on a TFTP server to download their configuration file after powering on. The configuration file can sometimes contain passwords that can be used to directly connect back to the phone and administer it or used to access other services (such as the company directory). An attacker who is sniffing the file when the phone downloads this configuration file can glean through these passwords and potentially reconfigure and control the IP phone. To thwart this attack vector, vendors variously encrypt the configuration file or use HTTPS and authentication.

Traffic Analysis

Traffic analysis involves determining who is talking to whom, which can be done even when the actual conversation is encrypted and can even be done (to a lesser degree) between organizations. Such information can be beneficial to law enforcement and for criminals committing corporate espionage and stock fraud.

Conversation Eavesdropping

An important threat for VoIP users is eavesdropping on a conversation. In addition to the obvious problem of confidential information being exchanged between people, eavesdropping is also useful for credit-card fraud and identity theft. This is because some phone calls—especially to certain institutions—require users to enter credit-card numbers, PIN codes, or national identity numbers (e.g., Social Security numbers), which are sent as Dual-Tone Multi-frequency (DTMF) digits in RTP. An attacker can use tools like Wireshark, Cain & Abel, vomit (voice over misconfigured Internet telephones), Voipong, and Oreka to capture RTP packets and extract the conversation or the DTMF digits.11

Man-in-the-Middle Attacks

The man-in-the-middle attack is a classic form of an attack where the attacker has managed to insert himself between the two hosts. It refers to an attacker who is able to read, and modify at will, messages between two parties without either party knowing that the link between them has been compromised. As such, the attacker has the ability to inspect or modify packets exchanged between two hosts or insert new packets or prevent packets from being sent to hosts. Any device that handles SIP messages as a normal course of its function could be a man in the middle: a compromised SIP proxy server or session border controller. If SIP messages are not authenticated, an attacker can also compromise a DNS server or use DNS poisoning techniques to cause SIP messages to be routed to a device under the attacker’s control.

Replay Attacks

Replay attacks are often used to impersonate an authorized user. A replay attack is one in which an attacker captures a valid packet sent between the SIP UAs or proxies and resends it at a later time (perhaps a second later, perhaps days later). As an example with classic unauthenticated telnet, an attacker that captures a telnet username and password can replay that same username and password. In SIP, an attacker would capture and replay valid SIP requests. (Capturing and replaying SIP responses is usually not valuable, as SIP responses are discarded if their Call-Id does not match a currently outstanding request, which is one way SIP protects itself from replay attacks.)

If Real-time Transport Protocol (RTP) is used without authenticating Real-time Transport Control Protocol (RTCP) packets and without sampling synchronization source (SSRC), an attacker can inject RTCP packets into a multicast group, each with a different SSRC, and force the group size to grow exponentially. A variant on a replay attack is the cut-and-paste attack. In this scenario, an attacker copies part of a captured packet with a generated packet. For example, a security credential can be copied from one request to another, resulting in a successful authorization without the attacker even discovering the user’s password.

Impersonation

Impersonation is described as a user or host pretending to be another user or host, especially one that the intended victim trusts. In case of a phishing attack, the attacker continues the deception to make the victim disclose his banking information, employee credentials, and other sensitive information. In SIP, the From header is displayed to the called party, so authentication and authorization of the values used in the From header are important to prevent impersonation. Unfortunately, call forwarding in SIP (called retargeting) makes simple validation of the From header impossible. For example, imagine Bob has forwarded his phone to Carol and they are in different administrative domains (Bob is at work, Carol is his wife at home). Then Alice calls Bob. When Alice’s INVITE is routed to Bob’s proxy, her INVITE will be retargeted to Carol’s UA by rewriting the Request-URI to point to Carol’s URI. Alice’s original INVITE is then routed to Carol’s UA. When it arrives at Carol’s UA, the INVITE needs to indicate that the call is from Alice. The difficulty is that if Carol’s SIP proxy were to have performed simplistic validation of the From in the INVITE when it arrived from Bob’s SIP proxy, Carol’s SIP proxy would have rejected it—because it contained Alice’s From. However, such retargeting is a legitimate function of SIP networks.

Redirection Attack

If compromised by an attacker or via a SIP man-in-the-middle attack, the intermediate SIP proxies responsible for SIP message routing can falsify any response. In this section we describe how the attacker could use this ability to launch a redirection attack. If an attacker can fabricate a reply to a SIP INVITE, the media session can be established with the attacker rather than the intended party. In SIP, a proxy or UA can respond to an INVITE request with a 301 Moved Permanently or 302 Moved Temporarily Response. The 302 Response will also include an Expires header line that communicates how long the redirection should last. The attacker can respond with a redirection response, effectively denying service to the called party and possibly tricking the caller into communicating with, or through, a rogue UA.

Session Disruption

Session disruption describes any attack that degrades or disrupts an existing signaling or media session. For example, in the case of a SIP scenario, if an attacker is able to send failure messages such as BYE and inject them into the signaling path, he can cause the sessions to fail when there is no legitimate reason why they should not continue. For this to be successful, the attacker has to include the Call-Id of an active call in the BYE message. Alternatively, if an attacker introduces bogus packets into the media stream, he can disrupt packet sequence, impede media processing, and disrupt a session. Delay attacks are those in which an attacker can capture and resend RTP SSRC packets out of sequence to a VoIP endpoint and force the endpoint to waste its processing cycles in resequencing packets and degrade call quality. An attacker could also disrupt a Voice over Wireless Local Area Network (WLAN) service by disrupting IEEE 802.11 WLAN service using radio spectrum jamming or a Wi-Fi Protected Access (WPA) Messate Integrity Check (MIC) attack. A wireless access point will disassociate stations when it receives two invalid frames within 60 seconds, causing loss of network connectivity for 60 seconds. A one-minute loss of service is hardly tolerable in a voice application.

Exploits

Cross-Site Scripting (XSS) attacks are possible with VoIP systems because call logs contain header fields, and administrators (and other privileged users) view those call logs. In this attack, specially crafted From: (or other) fields are sent by an attacker in a normal SIP message (such as an INVITE). Then later, when someone such as the administrator looks at the call logs using a Web browser, the specially crafted From; causes a XSS attack against the administrator’s Web browser, which can then do malicious things with the administrator’s privileges. This can be a damaging attack if the administrator has already logged into other systems (HR databases, the SIP call controller, the firewall) and her Web browser has a valid cookie (or active session in another window) for those other systems.

Social Engineering

SPIT (Spam over Internet Telephony) is classified as a social threat because the callee can treat the call as unsolicited, and the term unsolicited is strictly bound to be a user-specific preference, which makes it hard for the system to identify this kind of transaction. SPIT can be telemarketing calls used for guiding callees to a service deployed to sell products. IM spam and presence spam could also be launched via SIP messages. IM spam is very similar to email spam; presence spam is defined as a set of unsolicited presence requests for the presence package. A subtle variation of SPIT called Vishing (VoIP phishing) is an attack that aims to collect personal data by redirecting users toward an interactive voice responder that could collect personal information such as the PIN for a credit card. From a signaling point of view, unsolicited communication is technically a correct transaction.

Unfortunately, many of the mechanisms that are effective for email spam are ineffective with VoIP, for many reasons. First, the email with its entire contents arrives at a server before it is seen by the user. Such a mail server can therefore apply many filtering strategies, such as Bayesian filters, URL filters, and so on. In contrast, in VoIP, human voices are transmitted rather than text. To recognize voices and to determine whether the message is spam or not is still a very difficult task for the end system. A recipient of a call only learns about the subject of the message when he is actually listening to it. Moreover, even if the content is stored on a voice mailbox, it is still difficult for today’s speech recognition technologies to understand the context of the message enough to decide whether it is spam or not.

One mechanism to fight automated systems that deliver spam is to challenge such suspected incoming calls with a Turing test. These methods include:

• Voice menu. Before a call is put through, a computer asks the caller to press certain key combinations, for example “Press #55.”

• Challenge models. Before a call is put through, a computer asks the caller to solve a simple equation and to type in the answer—for example, “Divide 10 by 2.”

• Alternative number. Under the main number a computer announces an alternative number. This number may even be changed permanently by a call management server. All these methods can even be enforced by enriching the audio signal with noise or music. This prevents SPIT bots from using speech recognition.

Such Turing tests are attractive, since it is often hard for computers to decode audio questions. However, these puzzles cannot be made too difficult, because human beings must always be able to solve them.

One of the solutions to the SPIT problem is the whitelist. In a whitelist, a user explicitly states which persons are allowed to contact him. A similar technique is also used in Skype; where Alice wants to call Bob, she first has to add Bob to her contact list and send a contact request to Bob. Only when Bob has accepted this request can Alice make calls to Bob.

In general, whitelists have an introduction problem, since it is not possible to receive calls by someone who is not already on the whitelist. Blacklists are the opposite of whitelists but have limited effectiveness at blocking spam because new identities (which are not on the blacklist) can be easily created by anyone, including spammers.

Authentication mechanisms can be used to provide strong authentication, which is necessary for strong whitelists and reputation systems, which form the basis of SPIT prevention. Strong authentication is generally Public Key Infrastructure (PKI)B dependent. Proactive publishing of incorrect information, namely SIP addresses, is a possible way to fill up spammers’ databases with existing contacts. Consent-based communication is the other solution. Address obfuscation could be an alternative wherein spam bots are unable to identify the SIP URIs.

3. Security in Voip

Much existing VoIP equipment is dedicated to VoIP, which allows placing such equipment on a separate network. This is typically accomplished with a separate VLAN. Depending on the vendor of the equipment, this can be automated using CDP (Cisco Discovery Protocol), LLDP (Link Layer Discovery Protocol), or 802.1x, all of which will place equipment into a separate “voice VLAN” to assist with this separation. This provides a reasonable level of protection, especially within an enterprise where employees lack much incentive to interfere with the telephone system.

Preventative Measures

However, the use of VLANs is not an ideal solution because it does not work well with softphones that are not dedicated to VoIP, because placing those softphones onto the “voice VLAN” destroys the security and management advantage of the separate network. A separate VLAN can also create a false sense of security that only benign voice devices are connected to the VLAN. However, even though 802.1x provides the best security, it is still possible for an attacker to gain access to the voice VLAN (with a suitable hub between the phone and the switch). Mechanisms that provide less security, such as CDP or LLDP, can be circumvented by software on an infected computer. Some vendors’ Ethernet switches can be configured to require clients to request inline Ethernet power before allowing clients to join certain VLANs (such as the voice VLAN), which provides protection from such infected computers. But, as mentioned previously, such protection of the voice VLAN prevents deployment of softphones, which is a significant reason that most companies are interested in deploying VoIP.

Eavesdropping

To counter the threat of eavesdropping, the media can be encrypted. The method to encrypt RTP traffic is Secure RTP (RFC3711), which does not encrypt the IP, UDP, or RTP headers but does encrypt the RTP payload (the “voice” itself). SRTP’s advantage of leaving the RTP headers unencrypted is that header compression protocols (e.g., cRTP,12 ROHC,13) and protocol analyzers (e.g., looking for RTP packet loss and (S)RTCP reports) can still function with SRTP-encrypted media.

The drawback of SRTP is that approximately 13 incompatible mechanisms exist to establish the SRTP keys. These mechanisms are at various stages of deployment, industry acceptance, and standardization. Thus, at this point in time it is unlikely that two SRTP-capable systems from different vendors will have a compatible SRTP keying mechanism. A brief overview of some of the more popular keying mechanisms is provided here.

One of the popular SRTP keying mechanisms, Security Descriptions, requires a secure SIP signaling channel (SIP over TLS) and discloses the SRTP key to each SIP proxy along the call setup path. This means that a passive attacker, able to observe the unencrypted SIP signaling and the encrypted SRTP, would be able to eavesdrop on a call. S/MIME is SIP’s end-to-end security mechanism, which Security Descriptions could use to its benefit, but S/MIME has not been well deployed and, due to specific features of SIP (primarily forking and retargeting), it is unlikely that S/MIME will see deployment in the foreseeable future.

Multimedia Internet Keying (MIKEY) has approximately eight incompatible modes defined; these allow establishing SRTP keys.14 Almost all these MIKEY modes are more secure than Security Descriptions because they do not carry the SRTP key directly in the SIP message but rather encrypt it with the remote party’s private key or perform a Diffie-Hellman exchange. Thus, for most of the MIKEY modes, the attacker would need to actively participate in the MIKEY exchange and obtain the encrypted SRTP to listen to the media.

Zimmermann Real-time Transport Protocol (ZRTP)15 is another SRTP key exchange mechanism, which uses a Diffie-Hellman exchange to establish the SRTP keys and detects an active attacker by having the users (or their computers) validate a short authentication string with each other. It affords useful security properties, including perfect forward secrecy and key continuity (which allows the users to verify authentication strings once, and never again), and the ability to work through session border controllers.

In 2006, the IETF decided to reduce the number of IETF standard key exchange mechanisms and chose DTLS-SRTP. DTLS-SRTP uses Datagram TLS (a mechanism to run TLS over a non-reliable protocol such as UDP) over the media path. To detect an active attacker, the TLS certificates exchanged over the media path must match the signed certificate fingerprints sent over the SIP signaling path. The certificate fingerprints are signed using SIP’s identity mechanism.16

A drawback with SRTP is that it is imperative (for some keying mechanisms) or very helpful (with other keying mechanisms) for the SIP user agent to encrypt its SIP signaling traffic with its SIP proxy. The only standard for such encryption, today, is SIP-over-TLS which runs over TCP. To date, many vendors have avoided TCP on their SIP proxies because they have found SIP-over-TCP scales worse than SIP-over-UDP. It is anticipated that if this cannot be overcome we may see SIP-over-DTLS standardized. Another viable option, especially in some markets, is to use IPsec ESP to protect SIP.

Another drawback of SRTP is that diagnostic and troubleshooting equipment cannot listen to the media stream. This may seem obvious, but it can cause difficulties when technicians need to listen to and diagnose echo, gain, or other anomalies that cannot be diagnosed by examining SRTP headers (which are unencrypted) but can only be diagnosed by listening to the decrypted audio itself.

Identity

As described in the “Threats” section, it is important to have strong identity assurance. Today there are two mechanisms to provide for identity: P-Asserted-Identity,17 which is used within a trust domain (e.g., within a company or between a service provider and its paying customers) and is simply a header inserted into a SIP request, and SIP Identity,18 which is used between trust domains (e.g., between two companies) and creates a signature over some of the SIP headers and over the SIP body.

SIP Identity is useful when two organizations connect via SIP proxies, as was originally envisioned as the SIP architecture for intermediaries between two organizations—often a SIP service provider. Many of these service providers operate session border controllers (SBCs) rather than SIP proxies, for a variety of reasons. One of the drawbacks of SIP Identity is that an SBC, by its nature, will rewrite the SIP body (specifically the m=/c=lines), which destroys the original signature. Thus, an SBC would need to rewrite the From header and sign the new message with the SBC’s own private key. This effectively creates hop-by-hop trust; each SBC that needs to rewrite the message in this way is also able to manipulate the SIP headers and SIP body in other ways that could be malicious or could allow the SBC to eavesdrop on a call. Alternative cryptographic identity mechanisms are being pursued, but it is not yet known whether this weakness can be resolved.

Traffic Analysis

The most useful protection from traffic analysis is to encrypt your SIP traffic. This would require the attacker to gain access to your SIP proxy (or its call logs) to determine who you called.

Additionally, your (S)RTP traffic itself could also provide useful traffic analysis information. For example, someone may learn valuable information just by noticing where (S)RTP traffic is being sent (e.g., the company’s in-house lawyers are calling an acquisition target several times a day). Forcing traffic to be concentrated to a device can help prevent this sort of traffic analysis. In some network topologies this can be achieved using a NAT, and in all cases it can be achieved with an SBC.

Reactive

An intrusion prevention system (IPS) is a useful way to react to VoIP attacks against signaling or media. An IPS with generic rules and with VoIP-specific rules can detect an attack and block or rate-limit traffic from the offender.

IPS

Because SIP is derived from, and related to, many well-deployed and well-understood protocols (HTTP), IDS/IPS vendors are able to create products to protect against SIP quite readily. Often an IDS/IPS function can be built into a SIP proxy, SBC, or firewall, reducing the need for a separate IDS/IPS appliance. An IDS/IPS is marginally effective for detecting media attacks, primarily to notice an excessive amount of bandwidth is being consumed and to throttle it or alarm the event.

A drawback of IPS is that it can cause false positives and deny service to a legitimate endpoint, thus causing a DoS in an attempt to prevent a DoS. An attacker, knowledgeable of the rules or behavior of an IPS, may also be able to spoof the identity of a victim (the victim’s source IP address or SIP identity) and trigger the IPS/IDS into reacting to the attack. Thus, it is important to deny attackers that avenue by using standard best practices for IP address spoofing19 and employing strong SIP identity. Using a separate network (VLAN) for VoIP traffic can help reduce the chance of false positives, as the IDS/IPS rules can be more finely tuned for that one application running on the voice VLAN.

Rate Limiting

When suffering from too many SIP requests due to an attack, the first thing to consider doing is simple rate limiting. This is often naïvely performed by simply rate limiting the traffic to the SIP proxy and allowing excess traffic to be dropped. Though this does effectively reduce the transactions per second the SIP proxy needs to perform, it interferes with processing of existing calls to a significant degree. For example, a normal call is established with an INVITE, which is reliably acknowledged when the call is established. If the simplistic rate limiting were to drop the acknowledgment message, the INVITE would be retransmitted, incurring additional processing while the system is under high load. A separate problem with rate limiting is that both attackers and legitimate users are subject to the rate limiting; it is more useful to discriminate the rate limiting to the users causing the high rate. This can be done by distributing the simple rate limiting toward the users rather than doing the simple rate limiting near the server.

On the server, a more intelligent rate limiting is useful. These are usually proprietary rate-limiting schemes, but they attempt to process existing calls before processing new calls. For example, such a scheme would allow processing the acknowledgment message for a previously processed INVITE, as described above; process the BYE associated with an active call, to free up resources; or process high-priority users’ calls (the vice president’s office is allowed to make calls, but the janitorial staff is blocked from making calls).

By pushing rate limiting toward users, effective use can be made of simple packet-based rate limiting. For example, even a very active call center phone does not need to send 100 Mb of SIP signaling traffic to its SIP proxy; even 1 Mb would be an excessive amount of traffic. By deploying simplistic, reasonable rate limiting very near the users, ideally at the Ethernet switch itself, bugs in the call processing application or malicious attacks by unauthorized software can be mitigated.

A similar situation occurs with the RTP media itself. Even high-definition video does not need to send or receive 100 Mb of traffic to another endpoint and can be rate-limited based on the applications running on the dedicated device. This sort of policing can be effective at the Ethernet switch itself, or in an IDS/IPS (watching for excessive bandwidth), a firewall, or SBC.

Challenging

A more sophisticated rate-limiting technique is to provide additional challenges to a high-volume user. This could be done when it is suspected that the user is sending spam or when the user has initiated too many calls in a certain time period. A simple mechanism is to complete the call with an interactive voice response system that requests the user to enter some digits (“Please enter 5, 1, 8 to complete your call”). Though this technique suffers from some problems (it does not work well for hearing-impaired users or if the caller does not understand the IVR’s language), it is effective at reducing the calls per second from both internal and external callers.

4. Future Trends

Certain SIP proxies have the ability to forward SIP requests to multiple user agents. These SIP requests can be sent in parallel, in series, or a combination of both series and parallel. Such proxies are called forking proxies.

Forking Problem in SIP

The forking proxy expects a response from all the user agents who received the request; the proxy forwards only the “best” final response back to the caller. This behavior causes a situation known as the heterogeneous error response forking problem [HERFP], which is illustrated in Figure 32.4.20

image

Figure 32.4 The heterogeneous error response forking problem.

Alice initiates an INVITE request that includes a body format that is understood by UAS2 but not UAS1. For example, the UAC might have used a MIME type of multipart/mixed with a session description and an optional image or sound. As UAC1 does not support this MIME format, it returns a 415 (Unsupported Media Type) response. Unfortunately the proxy has to wait until all the branches generate the final response and then pick the “best” response, depending on the criteria mentioned in RFC 3261. In many cases the proxy has to wait a long enough time that the human operating the UAC abandons the call. The proxy informs the UAS2 that the call has been canceled, which is acknowledged by UAS2. It then returns the 415 (Unsupported Media Type) back to Alice, which could have been repaired by Alice by sending the appropriate session description.

Security in Peer-to-Peer SIP

Originally SIP was specified as a client/server protocol, but recent proposals suggest using SIP in a peer-to-peer setting.21 One of the major reasons for using SIP in a peer-to-peer setting is its robustness, since there is no centralized control. As defined, “peer to peer (P2P) systems are distributed systems without any centralized control or hierarchical organization.” This definition defines pure P2P systems. Even though many networks are considered P2P, they employ central authority or use supernodes. Early systems used flooding to route messages, which was found to be highly inefficient. To improve lookup time for a search request, structured overlay networks have been developed that provide load balancing and efficient routing of messages. They use distributed hash tables (DHTs) to provide efficient lookup.22 Examples of structured overlay networks are CAN, Chord, Pastry, and Tapestry.23,24,25,26

We focus on Chord Protocol because it is used as a prototype in most proposals for P2P-SIP. Chord has a ring-based topology in which each node stores at most log(N) entries in its finger table, which is like an application-level routing table, to point to other peers. Every node’s IP address is mapped to an m bit chord identifier with a predefined hash function h. The same hash function h is also used to map any key of data onto a key ID that forms the distributed hash table. Every node maintains a finger table of log(N) = 6 entries, pointing to the next-hop node location at distance 2i−1 (for i = 1,2…m) from this node identifier. Each node in the ring is responsible for storing the content of all key IDs that are equal to the identifier of the node’s predecessor in the Chord ring. In a Chord ring each node n stores the IP address of m successor nodes plus its predecessor in the ring. The m successor entries in the routing table point to nodes at increasing distance from n. Routing is done by forwarding messages to the largest node-ID in the routing table that precedes the key-ID until the direct successor of a node has a longer ID than the key ID.

Singh and Schulzrinne envision a hierarchical architecture in which multiple P2P networks are represented by a DNS domain. A global DHT is used for interdomain routing of messages.

Join/Leave Attack

Security of structured overlay networks is based on the assumption that joining nodes are assigned node-IDs at random due to random assignment of IP addresses. This could lead to a join/leave attack in which the malicious attacker would want to control O(logN) nodes out of N nodes as search is done on O(logN) nodes to find the desired key ID. With the adoption of IPv6, the join/leave attack can be more massive because the attacker will have more IP addresses. But even with IPv4, join/leave attacks are possible if the IP addresses are assigned dynamically. Node-ID assignment in Chord is inherently deterministic, thereby allowing the attacker to compute Node-IDs in advance where the attack could be launched by spoofing IP addresses. A probable solution would be to authenticate nodes before allowing them to join the overlay, which can involve authenticating the node before assigning the IP address.

Attacks on Overlay Routing

Any malicious node within the overlay can drop, alter, or wrongly forward a message it receives instead of routing it according to the overlay protocol. This can result in severe degradation of the overlay’s availability. Therefore an adversary can perform one of the following:

• Registration attacks. One of the existing challenges to P2P-SIP registration is to provide confidentiality and message integrity to registration messages.

• Man-in-the-middle attacks. Let’s consider the case where a node with ID 80 and a node with ID 109 conspire to form a man-in-the-middle attack, as shown in Figure 32.5. The honest node responsible for the key is node 180. Let’s assume that a recursive approach is used for finding the desired key ID, wherein each routing node would send the request message to the appropriate node-ID until it reaches the node-ID responsible for the desired key-ID. The source node (node 30) will not have any control nor can it trace the request packet as it traverses through the Chord ring. Therefore node 32 will establish a dialog with node 119, and node 80 would impersonate node 32 and establish a dialog with node 108. This attack can be detected if an iterative routing mechanism is used wherein a source node checks whether the hash value is closer to the key-ID than the node-ID it received on the previous hop.27 Therefore the source node (32) would get suspicious if node 80 redirected it directly to node 119, because it assumes that there exists a node with ID lower than Key ID 107.

image

Figure 32.5 Man-in-the-middle attack.

• Attacks on bootstrapping nodes. Any node wanting to join the overlay needs to be bootstrapped with a static node or cached node or discover the bootstrap node through broadcast mechanisms (e.g., SIP-multicast). In any case, if an adversary gains access to the bootstrap node, the joining node can easily be attacked. Securing the bootstrap node is still an open question.

• Duplicate identity attacks. Preventing duplicate identities is one of the open problems whereby a hash of two IP addresses can lead to the same node-ID. The Singh and Schulzrinne approach reduces this problem somewhat by using a P2P network for each domain. Further, they suggest email-based authentication in which a joining node would receive a password via email and then use the password to authenticate itself to the network.

• Free riding. In a P2P system there is a risk of free riding in which nodes use services but fail to provide services to the network. Nodes use the overlay for registration and location service but drop other messages, which could eventually result in a reduction of the overlay’s availability.

The other major challenges that are presumably even harder to solve for P2P-SIP are as follows:

• Prioritizing signaling for emergency calls in an overlay network and ascertaining the physical location of users in real time may be very difficult.

• With the high dynamic nature of P2P systems, there is no predefined path for signaling traffic, and therefore it is impossible to implement a surveillance system for law enforcement agencies with P2P-SIP.

End-to-End Identity with SBCs

As discussed earlier,28 End-to-End Identity with SBCs provides identity for SIP requests by signing certain SIP headers and the SIP body (which typically contains the Session Initiation Protocol (SDP)). This identity is destroyed if the SIP request travels through an SBC, because the SBC has to rewrite the SDP as part of the SBC’s function (to force media to travel through the SBC). Today, nearly all companies that provide SIP trunking (Internet telephony service providers, ITSPs) utilize SBCs. In order to work with29 those SBCs, one would have to validate incoming requests (which is new), modify the SDP and create a new identity (which they are doing today), and sign the new identity (which is new). As of this writing, it appears unlikely that ITSPs will have any reason to perform these new functions.

A related problem is that, even if we had end-to-end identity, it is impossible to determine whether a certain identity can rightfully claim a certain E.164 phone number in the From: header. Unlike domain names, which can have their ownership validated (the way email address validation is performed on myriad Web sites today), there is no de facto or written standard to determine whether an identity can rightfully claim to “own” a certain E.164.

It is anticipated that as SIP trunking becomes more commonplace, SIP spam will grow with it, and the growth of SIP spam will create the necessary impetus for the industry to solve these interrelated problems. Solving the end-to-end identity problem and the problem of attesting E.164 ownership would allow domains to immediately create meaningful whitelists. Over time these whitelists could be shared among SIP networks, end users, and others, eventually creating a reputation system. But as long as spammers are able to impersonate legitimate users, even creating a whitelist is fraught with the risk of a spammer guessing the contents of that whitelist (e.g., your bank, family member, or employer).

5. Conclusion

With today’s dedicated VoIP handsets, a separate voice VLAN provides a reasonable amount of security. Going forward, as nondedicated devices become more commonplace, more rigorous security mechanisms will gain importance. This will begin with encrypted signaling and encrypted media and will evolve to include spam protection and enhancements to SIP to provide cryptographic assurance of SIP call and message routing.

As VoIP continues to grow, VoIP security solutions will have to consider consumer, enterprise and policy concerns. Some VoIP applications, commonly installed on PCs may be against corporate security policies (e.g., Skype). One of the biggest challenges with enabling encryption is with maintaining a public key infrastructure and the complexities involved in distributing public key certificates that would span to end users30 and key synchronization between various devices belonging to the same end user agent.31

Using IPsec for VoIP tunneling across the Internet is another option; however, it is not without substantial overhead.32 Therefore end-to-end mechanisms such as SRTP are specified for encrypting media and establishing session keys.

VoIP network designers should take extra care in designing intrusion detection systems that are able to identify never-before-seen activities and react according to the organization’s policy. They should follow industry best practices for securing endpoint devices and servers. Current softphones and consumer-priced hardphones use the “haste-to-market” implementation approach and therefore become vulnerable to VoIP attacks. Therefore VoIP network administrators may evaluate VoIP endpoint technology, identify devices or software that will meet business needs and can be secured, and make these the corporate standards. With P2P-SIP, the lack of central authority makes authentication of users and nodes difficult. Providing central authority would dampen the spirit of P2P-SIP and would conflict the inherent features of distributed networks. A decentralized solution such as the reputation management system, where the trust values are assigned to nodes in the network based on prior behavior, would lead to a weak form of authentication because the credibility used to distribute trust values could vary in a decentralized system. Reputation management systems were more focused on file-sharing applications and have not yet been applied to P2P-SIP.


1ITU-T Recommendation H.323, Packet-Based Multimedia Communications System, www.itu.int/rec/T-REC-H.323-200606-I/en. 1998.

2J. Rosenberg, H. Schulzrinne, G. Camarillo, J. Peterson, R. Sparks, M. Handley, and E. Schooler, “SIP: Session Initiation Protocol,” IETF RFC 3261, June 2002.

3H. Schulzrinne and J. Rosenberg, “A Comparison of SIP and H.323 for Internet telephony,” in Proceedings of NOSSDAV, Cambridge, U.K., July 1998.

4S. Fries and D. Ignjatic, “On the applicability of various MIKEY modes and extensions,” IETF draft, March 31, 2008.

5F. Audet, “The use of the SIPS URI scheme in the Session Initiation Protocol (SIP),” IETF draft, February 23, 2008.

6H. Schulzrinne, R. Frederick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” IETF RFC 1889, Jan. 1996.

7T. Chen and C. Davis, “An overview of electronic attacks,” in Information Security and Ethics: Concepts, Methodologies, Tools and Applications, H. Nemati (ed.), Idea Group Publishing, to appear 2008.

8A. Chakrabarti and G. Manimaran, “Internet infrastructure security: a taxonomy,” IEEE Network, Vol. 16, pp. 13–21, Dec. 2002.

9http://nmap.org/.

10The common vulnerability and exposure list for SIP, http://cve.mitre.org/cgibin/cvekey.cgi?keyword=SIP.

11D. Endler and M. Collier, Hacking VoIP Exposed: Voice over IP Security Secrets and Solutions, McGraw-Hill, 2007.

12T. Koren, S. Casner, J. Geevarghese, B. Thompson, and P. Ruddy, “Enhanced compressed RTP (CRTP) for links with high delay,” IETF RFC 3545, July 2003.

13G. Pelletier and K. Sandlund, “Robust header compression version 2 (ROHCv2): Profiles for RTP, UDP, IP, ESP and UDP-Lite,” IETF RFC 5225, April 2008.

14S. Fries and D. Ignjatic, “On the applicability of various MIKEY modes and extensions,” IETF draft, March 31, 2008.

15P. Zimmermann, A. Johnston, and J. Callas, “ZRTP: Media path key agreement for secure RTP,” IETF draft, July 9, 2007.

16J. Peterson and C. Jennings, “Enhancements for authenticated identity management in the Session Initiation Protocol (SIP),” IETF RFC 4474, August 2006.

17C. Jennings, J. Peterson and M. Watson, “Private extensions to the Session Initiation Protocol (SIP) for asserted identity within trusted networks,” IETF RFC 3325, November 2002.

18J. Peterson and C. Jennings, “Enhancements for authenticated identity management in the Session Initiation Protocol (SIP),” IETF RFC 4474, August 2006.

19P. Ferguson and D. Senie, “Network ingress filtering: Defeating denial of service attacks which employ IP source address spoofing,” IETF 2827, May 2000.

20H. Schulzrinne, D. Oran, and G. Camarillo, “The reason header field for the Session Initiation Protocol (SIP),” IETF RFC 3326, December 2002.

21K. Singh and H. Schulzrinne, “Peer-to-peer Internet telephony using SIP,” in 15th International Workshop on Network and Operating Systems Support for Digital Audio and Video, June 2005.

22H. Balakrishnan, M. FransKaashoek, D. Karger, R. Morris, and I. Stoica, “Looking up data in P2P systems,” Communications of the ACM, Vol. 46, No. 2, February 2003.

23S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A scalable content-addressable network,” in Proceedings of ACM SIGCOMM 2001.

24I. Stoica, R. Morris, D. Karger, M. F Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” in Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 149–160, 2001.

25A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems,” in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pp. 329–350, 2001.

26B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D. Kubiatowicz, “Tapestry: A resilient global-scale overlay for service deployment,” IEEE Journal on Selected Areas in Communications, Vol. 22, No. 1, pp. 41–53, Jan. 2004.

27M. Srivatsa and L. Liu, “Vulnerabilities and security threats in structured overlay networks: A quantitative analysis,” in Proceedings of 20th Annual Computer Science Application Conference, Tucson, pp. 251–261, Dec. 6–10, 2004.

28J. Peterson and C. Jennings, “Enhancements for authenticated identity management in the Session Initiation Protocol (SIP),” IETF RFC 4474, August 2006.

29J. Peterson and C. Jennings, “Enhancements for authenticated identity management in the Session Initiation Protocol (SIP),” IETF RFC 4474, August 2006.

30D. Berbecaru, A. Lioy, and M. Marian, “On the complexity of public key certificate validation,” in Proceedings of the 4th International Conference on Information Security, Lecture Notes in Computer Science, Springer-Verlag, Vol. 2200, pp. 183–203, 2001.

31C. Jennings and J. Fischl, “Certificate management service for the Session Initiation Protocol (SIP),” IETF draft April 5, 2008.

32Z. Anwar, W. Yurcik, R. Johnson, M. Hafiz, and R. Campbell, “Multiple design patterns for voice over IP (VoIP) security,” IPCCC 2006, pp. 485–492, April 10–12, 2006.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset