Lecture #18 - Implementing Quality of Service
The Need for Quality of Service
The best-effort model implemented by the Internet Protocol provides a high level of service when the network is under minimal load, however as the load increases the level of service decreases. Part of this is due to the fact that IP does not control what traffic can enter the network, how much traffic a host may send or at what rate it can send it. Furthermore, all traffic is treated as equal.
Whilst this approach is suitable for applications such as file transfers, it is not viable when applications need the network to provide a better level of service. For applications such as Voice over IP (VoIP), streaming multimedia and online gaming, reduced delay and jitter is more important than bandwidth.
What is Quality of Service?
Quality of Service (QoS) refers to the network's ability to provide a higher level of service than that afforded by best-effort delivery. These higher levels of service will often be provided only to certain types of traffic.
The aim of QoS is to provide consistent or "guaranteed" service responses for one or more of the following:
- Bandwidth
- Delay
- Jitter
- Packet loss
From a network perspective, this means:
- Providing guaranteed bandwidth.
- Reducing packet loss.
- Shaping network traffic.
- Providing traffic prioritisation.
- Managing or preventing network congestion.
Approaches to Quality of Service
There are two main approaches to QoS - Integrated Services (or IntServ) and Differentiated Services (or DiffServ). We will look at each of these approaches in some detail.
Integrated Services
Integrated Services was the initial approach to QoS. In this approach an application signals its service requirements in advance using the Resource reSerVation Protocol (RSVP). If the network can support this request, resources are allocated for this application and the reservation is granted. The application can now proceed to send data that will be afforded the requested level of service.
The Resource Reservation (RR) remains in force so long as the client regularly refreshes the router's state. An application is expected to stay within its approved traffic profile. Traffic which exceeds the agreed profile may be downgraded or discarded.
Typical grades of service provided by IntServ:
-
Guaranteed "Quality of Service":
- Provides an assured data rate.
- Guarantees an upper bound on queueing delay.
- Has no queueing loss (due to buffer overflow).
-
Controlled Load QoS:
- Attempts to keep queueing delay "small", however there is no guaranteed upper-bound.
- Non-conforming traffic may be downgraded to "best effort" delivery.
-
"Best Effort" QoS:
- No assured data rate.
- No upper bound on queueing delay.
- May discard packets to relieve congestion.
Challenges with IntServ
The "Integrated Services" approach imposes "per application" state on routers - this does not scale well!
In addition to the job of maintaining the routing tables and actually routing packets, the Integrated Services approach requires the router to:
- Manage a database of the RSVP streams which pass through the router.
- Allocate high QoS packets to a priority queue. This involves measuring the stream characteristics to make sure it conforms to its agreed service profile.
- Downgrading or discarding packets which exceed the agreed service profile.
Differentiated Services
The Differentiated Services (DiffServ) approach provides a core network that can support a limited number of grades of service. Admission is controlled by border routers which police and shape the traffic entering the network so that the core network cannot become overloaded.
- DiffServ uses the existing IPv4 (or IPv6) "Type of Service" header field.
- A "Service Level Agreement" has to be agreed between the client and "the network". This avoids the need to modify applications to work with DiffServ.
- DiffServ provides a built in aggregation mechanism. All traffic with the same "Type of Service" value is treated the same.
- Routers do not have to keep state for each data flow. This provides good scaling for larger networks and heavier traffic flows.
DiffServ Per Hop Behaviours (PHBs)
The Per Hop Behaviour (PHB) defines the service that is to be provided to the DiffServ traffic class. There are three primary PHBs defined by DiffServ, however additional traffic classes can be defined.
Expedited Forwarding (EF) PHB
EF provides a premium grade service with low delay, low loss and low jitter, making it suitable for realtime services such as video and voice. EF traffic will usually be given strict priority queueing, being handled ahead of other traffic classes. Admission of EF traffic to the network is usually strictly controlled in order to maintain a high level of service.
Assured Forwarding (AF) PHB
AF provides an assurance of delivery providing the traffic does not exceed the subscribed data rate. Within the AF PHB four individual traffic classes are defined, with packets within each class being assigned a low, medium or high drop precedence. If congestion occurs between classes, traffic within the higher classes are given priority. If congestion occurs within a class then packets with a higher drop precedence will be dropped first.
Default PHB
Any traffic not assigned to another traffic class will be placed in the default PHB. The Default PHB provides best-effort characteristics.
Queueing Policies
One of the tools available for traffic engineering is that of queueing policies. By adjusting the way in which traffic is queued and prioritised a router is able to provide differing qualities of service. Some of the following queueing policies are Cisco specific, however most routers with QoS support will provide similar functionality.
First-In-First-Out (FIFO)
FIFO is the basic store and forward queuing policy. It provides no special treatment - all packets are processed "in sequence".
Priority Queueing (PQ)
PQ gives strict priority to "important" traffic by handling it first. PQ can flexibly prioritize according to network protocol, incoming interface, packet size, source/destination address etc.
Custom Queueing (CQ) or Class Based Queueing (CBQ)
CQ (or CBQ) reserves a percentage of available bandwidth of an interface for each selected traffic class. Other classes may use ("borrow") any unused bandwidth, depending on the exact configuration.
Flow-Based Weighted Fair Queueing (WFQ)
WFQ provides fairness to flows by processing an equal number of bytes from each flow in a round-robin approach, providing a predictable level of service. Flows are automatically identified based on source IP, destination IP, source port, destination port and session identifier (if applicable). Each flow is provided with its own queue, if packets for that flow need to be buffered.
The IP precedence bits are used as weights, allowing certain flows to be given priority over others.
Class-Based Weighted Fair Queueing (CBWFQ)
Very similar to plain WFQ, however traffic is classified into classes, rather than being treated as individual flows. A minimum bandwidth can be assigned to a given class. Traffic is processed ensuring that the bandwidth constraints are met.
IP RTP Priority
This feature allows for delay sensitive data to be dequeued and sent before any other. RTP traffic is matched based on UDP traffic with a specific range of UDP port numbers.
Low Latency Queue (LLQ)
LLQ provides for strict priority queueing on serial interfaces. Unlike IP RTP Priority, LLQ is not limited to UDP traffic.
Congestion Avoidance
Rather than waiting until buffers reach capacity, improved service levels can be achived by preemptively taking action to prevent buffer exhaustion. When a tail drop queue is used, once the buffer reaches capacity any data destined for output via the given interface will be discarded. This can lead to a single flow being heavily penalised and it can also lead to synchronisation issues with TCP congestion control.
Random Early Detection (RED)
Random Early Detection (RED) works by monitoring the average queue size and dropping or marking packets based on a statistical probability, which increases as the queue size increases. This means that no packets are dropped when the queue size is small, however as the queue size increases more packets are dropped at random. This means that we do not unfairly penalise a single flow. Additionally, we have a good chance of dropping packets from multiple flows, which should lead to an overall reduction in traffic across all sources.
Weighted Random Early Detection (WRED)
WRED is the same as RED, however the probability is weighted to alter the drop precedence and queue size thresholds for certain traffic classes. The weighting is usually associated with the IP precedence given in the ToS field, or through the DiffServ traffic class (PHB).
Traffic Shaping and Policing
Shaping is the act of queueing a data flow so that its profile is well behaved. Policing is the enforcement of profiling. Both are usually based on the Token Bucket flow control model, here shown as a traffic shaper.
Tokens are added to a bucket at the average data rate. Incoming data is allowed through if it can get enough tokens (one per byte). In the shaper, data that cannot get through is queued until it can. In the policer, data that cannot get through is discarded.
Any tokens overflowing from the bucket are lost. Under some circumstances, it is possible to get tokens on credit. The end result is that the average data rate is limited to the average token rate. Limited size data bursts are possible (equal to the bucket size).
Committed Access Rate (CAR) Rate Limiting
This is an application of token bucket policing. It controls input or output data rates and specifies control policies when traffic either conforms to or exceeds the rate limit. Traffic can be matched on a "per interface" basis, or finer control can be exercised. Flows can be classified by physical port, packet type, IP address, MAC address, application flow etc.
Excess traffic can either be dropped or marked down in priority. CAR propagates conforming bursts, but adds no delay. It may be implemented on incoming or outgoing interfaces.