Key Audio Quality Metrics Used in the Skype for Business Monitoring Reports

skype-for-businessHere are the 7 audio quality metrics I think the most effective for analyzing Skype for Business audio quality and what numbers represent an audio quality issue.

Quality Metric Key Notes Thresholds
Avg. listening MOS Mean Opinion Score (MOS) is the gold standard measurement to gauge the perceived audio quality (an algorithm calculates how a typical user would rate the voice quality).It is is an integer rating from 0 to 5.
  • 5 – excellent
  • 4 – good
  • 3 – fair
  • 2 – poor
  • 1 – bad
Avg. round trip Network Round Trip Time (RTT) is the most common measure of latency and is measured in ms.This measure is the average round trip time for RTP packets between endpoints.

When latency is high, users will likely hear the words, but there will be delays in sentences and words.

For RTP packets as reported in the monitoring reports:

  • < 200 ms is good
  • > 200 ms is poor
  • > 500 ms is bad

For basic UDP packets used by the ping command line:

  • < 100 ms is considered decent
  • > 150 ms is considered problematic
Avg. jitter Jitter (ms) measures the variability of packet delay and results in a distorted or choppy audio experience. VoIP packets are sent at regular intervals from the sender to the receiver, but because of network latency the interval between packets can vary at the destination.
Jitter can increase latency on networks.
  • < 20 ms is good
  • > 30 ms is not good (but can be ‘ok’)
  • > 45 ms is considered very bad
Avg. packet loss rate Packet Loss (%) represents the % of packets that did not make it to their destination.Packet loss will cause the audio to be distorted or missing (on the receiver end).
  • < 3% is considered good
  • an average pack loss rate of > 5% will impact audio
  • > 7% is not good (some consider +7% packet loss “huge”)
  • > 10% is bad
  • > 50% packet loss … no chance!
Avg. network MOS degradation Average network MOS degradation is an integer represents the amount of the MOS value lost to network affects.
  • > 1 is not good.
  • < 0.5 represents acceptable degradation.
Avg. concealed samples ratio Concealing audio samples is a technique used to deal with dropped network packets.Average concealed samples Ratio (%) is the % of packets that were concealed.
  • < 2% is good
  • > 3% is not good
  • > 7% is bad
Bandwidth estimates (Kbps) This is available bandwidth estimated on the client-side.
  • Absolute thresholds are not that helpful, but when the client detects bandwidth is low (< 100 Kbps) audio quality can easily be impacted by other applications or network congestion.

Also there are:

Burst density that is the fraction of RTP (Real-Time Transport Protocol) data packets within burst periods since the beginning of reception that were either lost or discarded. A burst period is a period in which a high proportion of packets are either lost or discarded due to late arrival. Burst density is used in the call detail report.


Burst length – the mean duration, expressed in milliseconds, of the burst periods that have occurred since the beginning of reception. Burst length is used in the call detail report.

Basis to classify a call as poor in Skype for Business

The conditions we use to classify poor calls are shown in the 3 tables below. The poor call flag is set if one or more the conditions are met. Please note that a record in the MediaLine table can cover multiple media streams. The flagging occurs on the MediaLine level, so if you want to understand specifically which stream was the reason for the classification you need to look at the individual streams and use the columns in the tables below.

Column in AudioStream Table Condition Explanation
DegradationAvg > 1.0 Network MOS Degradation for the whole call. This metric shows the amount the Network MOS was reduced because of jitter and packet loss
RoundTrip > 500 Round trip time
PacketLossRate > 0.1 The packet loss rate
JitterInterArrival > 30 Average network jitter
RatioConcealedSamplesAvg > 0.07 Average ratio of concealed samples generated by audio healing to typical samples


Column in VideoStream Table Condition Explanation
VideoPostFECPLR > 0.1 The packet loss rate after forward error correction has been applied
VideoLocalFrameLossPercentageAvg > 10 The percentage of total video frames that are lost
RecvFrameRateAverage < 7 Average video frame rate used by the receiver
LowFrameRateCallPercent > 10 Percentage of the call below the low frame rate threshold
VideoPacketLossRate > 0.1 The packet loss rate
InboundVideoFrameRateAvg < 7 The average video frame rate received during the call
OutboundVideoFrameRateAvg < 7 The average video frame rate sent during the call
DynamicCapabilityPercent > 10 Percentage of the call where the client experienced high CPU load when processing video


Column in AppSharingStream Table Condition Explanation
SpoiledTilePercentTotal > 36 This value is the percentage of the content from the sharer that did not reach the viewer. Content may be discarded (or spoiled) when the sharer discards tiles from the graphics source or the ASMCU tiles discards tiles from Sharer respectively.
RDPTileProcessingLatencyAverage > 400 Acceptable value of the average RDP tile processing latency in the AS Conferencing Server over the duration of the viewing session
RelativeOneWayAverage > 1.75 Optimal value for the relative one-way delay between the two media endpoints involved in the application sharing. This is a single-hop latency measure



