Speech issues for the next generation of networks
The Network side of speech

Voice over Internet Protocol (VoIP) requires better-designed networks. It requires applications to be built better. And, it requires better network management and monitoring than either the enterprises’ telecommunications systems or data only networks have in the past.

The good news for carriers and PBX vendors—and the bad news for enterprises—is that most companies simply are not ready to host their own IP-based voice networks.

There are many network issues that come into play when companies undertake implementing VoIP networks. But before delving into the network issues, let’s step back and explore the nature of voice for the enterprise.

Voice in the Enterprise

There are four main categories of voice systems for enterprises: local enterprise voice systems, call center voice systems, automated call processing systems and toll bypass systems. Certainly there is overlap between some, if not all, of these categories, but where there are differences they are acute enough to warrant being considered individually.

Local enterprise voice systems are those systems that provide basic and advanced telephone services to the general workforce of enterprises at each of their facilities. Today’s systems typically consist of PBXs or ACDs, handsets and some form of message storage and retrieval. Tomorrow’s systems will consist of iPBXs or iACDs, handsets and some form of message storage and retrieval. They will also consist of a large number of services that we’re only now starting to contemplate and appreciate, like email, voice-mail and fax convergence, voice activated dialing from voice-managed integrated phone and email contact managers, and follow-me call routing.

Call center voice systems are comprised of all of the parts that are necessary to serve agents in an enterprise’s call center. These parts often include the PBXs or ACDs of the enterprise voice system. Additionally they will include advanced handsets or headsets, call flow managers, e.g. Genesys’s T-Server or Cisco’s ICM, and integration applications for the agents to coordinate their telephone conversations and pertinent data concerning the caller presented in CTI (computer-telephony integration) applications. Tomorrow’s systems will benefit from the changes in the enterprise voice systems in addition to more complete integration with automated call processing systems and better integration with the enterprise’s database applications.

Automated call processing systems are the systems that answer and service calls without involving a human. Today’s systems are mainly touch-tone based IVR (interactive voice response) systems and a smattering of voice recognition systems. Tomorrow’s systems will be completely voice recognition based with touch-tone support being used simply to transition applications from IVRs to voice systems. Toll bypass systems are those systems that allow enterprises to run their own telephone systems between their facilities on the enterprise data network or across the Internet, bypassing long distance carriers in the process. This is an area that has been the focus of the VoIP network vendors the past couple of years. Given the continued erosion of long distance rates in the United States during the same time period, it seems less likely that enterprises will be able to realize much of an ROI by building such a system, at least domestically. For connecting facilities worldwide companies may still realize benefits from VoIP running over their international data network or across the Internet. (Note that toll bypass may be illegal in some countries.)

Voice and Packet Networks

Though it is easy to convert voice streams to packets and pass them on an IP network just like any other packets of data, voice isn’t like any other data and must receive special consideration. After all, we don’t listen to any other type of data.

The human ear is an amazing device. We can hear artifacts of ¼ second, or less in some cases, in an audio signal. When passing voice data across a data network, there are four main distortions that affect what we hear: packet loss, delay, delay variation and echo. Echo is a problem with all telephony systems. It has two forms: acoustic echo and hybrid echo. Acoustic echo is caused by the acoustic signal being fed back into the call signal. It is introduced into the call at each end of a call and is affected by such things as handset sensitivity, distance and orientation of the microphone to the earpiece, and the acoustic characteristics of the location from which the person has called. The simplest solution is for everyone to use headsets, but since that isn’t practical, most modern hand or desk sets have some form of echo-cancellation circuitry. Hybrid echo is an artifact of the two-wire telephone system that we all use. All modern equipment has circuitry to minimize its effect. While echo may be a problem that VoIP systems share with other telephony systems, it can be exacerbated when its effects are coupled with the other forms of distortion, which are unique to packetized voice. Packet loss, delay and delay variation are directly affected by network design and are the subject of many, if not all, of the QoS (quality of service) additions to IP routing in the past several years.

Packet loss occurs when packets are discarded by the routing elements in a network due to network congestion and resource contention. Since each packet of voice data contains a 10-25 ms piece of a voice stream, a listener will not perceive occasional single packet losses. When multiple packets are lost the listener will hear little jumps in the speaker’s voice. As more packets are lost the distortion becomes acute and normal conversation is difficult. Voice-based automated call processing systems are particularly affected by packet loss. Delay is the late delivery of packets to the listeners of a call and has three basic causes: in-route handling, serialization and propagation. Generally, delays of less than 250 ms do not impact normal conversation. Delays in the 250-600 ms become increasingly frustrating but conversation can occur. When delays exceed 600 ms conversation is impaired.

Propagation delay is caused by the limits imposed by physics. The speed of light, the transmission characteristics of copper and glass fiber, the speed of sound—in the end, it takes an inherent amount of time to get from here to there and back, regardless of where here and there is. Our ability to control propagation delay is limited to which transmission medium we chose, its length and the transmission speed. While it may seem that propagation would not be significant in most network designs, coupling it with in-route handling delays can have noticeable affects on the quality of the voice heard by the listeners.

Serialization delay is caused by the amount of time it takes to place an incoming stream of parallel data bits onto a serial transmission interface, or vice-versa, and in the bigger scheme of things is minimal though choosing faster transmission speeds can minimize it. This is particularly important in congestion points in the network. In-route handling of voice is the problem area for which we have the most tools to combat. In-route handling delays are caused by:

The codecs used to convert analog voice to and from digital voice.
Compression of the voice stream.
Packetization of the voice stream.
Queuing of the voice packets as they traverse network routing elements.

The last form of distortion experienced by callers using VoIP systems is delay variation, also known as jitter. Jitter is where the packets that make up a voice stream arrive at their intended destination at different time intervals. Normally the packets in a voice stream arrive at the same rate, every few milliseconds the next packet is expected. If the next packet does not arrive as expected the voice stream has jitter. If the variation in arrival time becomes severe, the voice that the listener hears will be choppy.

Overcoming QoS Problems

So, the biggest components of QoS problems of VoIP systems are the choice of transmission media and speed, packet loss, packet delay and jitter. Unfortunately, these problems can be worsened when they are coupled with the typical problems faced on many existing data networks.

To mitigate the cumulative affects of adding voice, with its near real-time requirements, and data, which generally has less time sensitivity, a number of QoS mechanisms and protocols have been developed. In the broadest view these can be broken into issues that affect the edge of a network and those that affect the core of the network.

The mechanisms that are used to control QoS issues on the network’s edge include: selected protocol header compression, packet buffering, packet queuing, packet classification and shaping traffic flows. In the network core, the mechanism to control QoS issues is high-speed packet queuing.

Packet queuing, packet classification and shaping traffic flows are all mechanisms that first identify that a packet is a voice packet and then gives it some priority over other data packets in how it is forwarded in the network. In networks that experience jitter, packet buffering will hold some number of voice packets before forwarding them, this buffering allows for the variation in arrival times of the voice packet to be masked from the listener, at the expense of increased one-way delay.

Network Design

By now it should be evident that the key to quality packetized voice is the network design. What constitutes good network design varies somewhat based on which vendor is talking to you. Given the QoS issues in the preceding paragraphs, at the very least a good multiservice network design needs an edge and a core. One way to organize the network elements into an edge and core design is to use a campus hierarchical switching model, such as Cisco’s “Multilayer Campus Design”. In its basic configuration this model has three distinct layers: access layer, distribution layer and core layer, as seen in Figure 1.

Figure 1: Campus Hierarchical Switching Model

The access layer is the edge of the network. It is where all of the individual workstations and VoIP phones of the enterprise are attached to the network. Additionally, it is where departmental class servers are attached. Lastly, it is also where WAN links to other networks are attached.

In this model, the QoS mechanisms for the edge of the networks would be applied in two ways. First, for links to the PSTN header compression and packet buffering would be enabled at the telephony endpoints, as shown in Figure 2. Second, for WAN-based IP links to other networks, packet queuing, packet classification and traffic shaping would be enabled at either the access-layer switch or in the WAN link router, seen in Figure 3.

Figure 2: Telephony Access Node Qos Configuration

Figure 3: VoIP Across a WAN Link QoS Configuration

Figure 4: Core Network QoS Configuration

In the network core, high-speed queuing would typically be enabled in the distribution layer switches, see Figure 4. Depending on the capabilities of the core layer switches, it can be enabled there as well. But generally speaking it is best to avoid policy based routing decisions, such as QoS mechanisms, in the core switches. For best performance, traffic should flow through the core as unimpeded as possible, routing decisions should be limited to simple route choices.

Smaller enterprises, would most likely not implement a network model as deep as the campus hierarchy. One of the beauties of the model is that the distribution and core layers can be collapsed into a single layer and the model will still scale well.

Ideally a multiservice network would be built from layer 3 switches that posses the QoS capabilities, VLAN support, link trunking or high-speed links, routing and modern route advertisement protocol support. The reality is that most enterprises have an amalgam of equipment that reflects some combination of former network designers’ preferences, and the lowest cost bid at the time of acquisition. Can these existing networks be migrated into a network capable of delivering quality voice services? Maybe.

Because of the volume of calls that can be handled by automated call processing platforms there are some special considerations that apply to them. They will be covered later.

The first issue that must be considered is which voice services are going to be extended over the data network. Second, a reasonably detailed model of the traffic expected for each of the supported voice services must be developed. It is also important that the data traffic be studied and a model developed. When both models are available, they should be combined to give a total view of the expected network load.

The third step is to complete a preliminary network design based on the converged model. Add layers and network routing elements as appropriate to the size of your organization.

Next is to perform an assessment of the existing network infrastructure. Some important things to do are:

Any hubs in the network should be retired or placed where they will not carry voice traffic. Hubs create a collision domain that spans every port on the hub, and more if hubs are interconnected together. As network traffic across a hub ebbs and flows, voice packets will be involved in collisions. As they are, the retry logic of Ethernet will automatically introduce jitter into the voice stream. Except in the very smallest of networks, hubs will not provide the performance for quality voice.

Assess the capabilities of the existing layer-two switches. Consider replacing any that are unmanaged. Good network monitoring is a key to keeping a VoIP system running smoothly.

Become aware of layer two switches that lack QoS and VLAN capabilities. In the case of switches that lack QoS services, if either a telephony access node or a WAN link router is going to be connected to such a switch, be sure that the missing service is available in the devices being connected. So for instance, the telephony access node should have jitter buffering and perhaps header compression, whilst the WAN link router should have the full suite of QoS mechanisms discussed above.

Upgrade any layer 3 switches that lack QoS mechanisms and the latest routing protocols, particularly those that can quickly compensate for routing element failures like OSPF.

The fifth step is to reassess the preliminary network design to incorporate or remove suspect elements identified in step four. At this point you are in a position to complete the requisite financial and business planning processes necessary in your enterprise to upgrade old devices and procure the new devices. Before wrapping up, the automated call processing platforms bear some discussion. Many of these platforms either come complete with, or specify that they should be installed on their own dedicated network. Consider a system that is capable of processing 2688 calls. All calls are routed through the telephony access node across an IP network to the processing point. As long as the calls are being serviced by the automation, there is absolutely no reason to inject that call onto your enterprise network. Treating the entire call-processing platform as a telephony access node means that you will not need to build up your enterprise network to service those calls. The overall network design can be smaller and less expensive because of that. Creating a multi-service network requires the consideration of many more factors than was necessary to build a data network. Additionally, running a multi-service network is not only more complex, but it requires better supervision. In the end, however, having a converged voice and data network will allow the consolidation of staff and the elimination of old telephony and data networking devices. In the end, VoIP should be able to give quite attractive ROIs to businesses in addition to providing new and enhanced services to the enterprise.

Richard D. Houser is CTO of verascape inc. He can be reached at rhouser@verascape.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speech issues for the next generation of networks
The Network side of speech

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses

Speech issues for the next generation of networksThe Network side of speech

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses

Speech issues for the next generation of networks
The Network side of speech