How Phones Reconstruct Speech When Packets Go Missing

When the Signal Drops Out, the Phone Keeps Talking

You're halfway through a sentence, walking into a concrete stairwell, and the person on the other end turns into something bubbling and aquatic for about a second before snapping back into a human. You caught most of it. You assume your brain did the work.

It didn't. Your phone got there first.

Mobile and internet voice calls don't send a continuous ribbon of sound the way old copper landlines did. They slice speech into packets, typically 20 milliseconds each, and fire them across the network in sequence. When signal degrades, some packets arrive late, arrive mangled, or vanish entirely. What happens next is a piece of engineering that receives almost no credit: the phone reconstructs something plausible from the wreckage.

The formal name is Packet Loss Concealment, or PLC. The colloquial name is educated guessing. Both are accurate.

The Guessing Machine Inside Every Codec

The audio codec on your phone, the software encoding and decoding voice, doesn't just process incoming packets. It watches them arrive. It holds a short buffer covering roughly the last 50 to 200 milliseconds of audio and builds a statistical picture of what the speaker's voice has been doing: pitch, rhythm, the rate at which sounds are decaying.

When a packet fails to show up, the codec doesn't produce silence. Silence would actually sound worse, because the human ear is primed to notice sudden gaps in a way it isn't primed to notice short stretches of plausible-sounding nonsense. So the codec synthesises audio that statistically resembles what just happened. If the speaker was mid-vowel at a 120 Hz fundamental, the codec extends that waveform forward for the duration of the missing packet. If a consonant was decaying, it continues the decay curve.

This works remarkably well for losses up to about 5 percent of packets. A 20-millisecond gap, convincingly filled, is invisible to most listeners.

Opus, the codec underpinning most modern voice-over-IP calls including WhatsApp voice and the majority of video calls, has PLC built directly into its specification. So does G.711, the older codec still common on carrier networks. They differ in sophistication: Opus uses a pitch predictor that can extrapolate periodic speech patterns across multiple consecutive lost packets, while G.711's approach is simpler, essentially repeating and fading the last known waveform. G.711's concealment is fine for the occasional stray packet. Ask it to cover a rough patch and it starts to show its age.

What Happens When Losses Stack Up

Single missing packets: covered cleanly. But what about a full second or two of collapse, the kind you hit passing through a dead zone on a rural road?

Take Maya. She's on a call when her train enters a tunnel, and she loses roughly 40 percent of packets across about 800 milliseconds. Her codec attempts concealment, but by the third or fourth consecutive miss, its model of her caller's voice is extrapolating from increasingly stale data. The pitch prediction drifts. The synthesised audio turns robotic, then garbled, then dissolves into that characteristic underwater warble. Her brain still reconstructs some meaning, because it has its own context about what the conversation is probably about.

Her colleague Priya, on the same call from a weak but stable WiFi connection, hits 8 percent packet loss spread randomly across the whole call. Her codec handles nearly every gap without a trace. She notices nothing.

Same total call quality on paper. Very different experience. Bursty loss is the enemy; spread loss is manageable. That distinction matters more than the headline percentage.

Modern adaptive codecs try to respond to burst conditions in real time. Opus can drop its bitrate from 510 kbps down to 6 kbps if needed, trading fidelity for resilience. Smaller packets survive bad networks better. The voice sounds thinner, more landline-ish, but stays intelligible. When you notice a call suddenly sounding like it's coming through a tin can, that narrowing is intentional. The codec is lowering its ambitions to stay in the fight.

The Layer Below the Codec

PLC handles missing packets. There's a second system handling something different: packets that arrive out of order or too late to use.

Every voice app runs a jitter buffer, a small waiting room where incoming packets queue before playback. The buffer absorbs natural variation in delivery times. A packet due in 40 milliseconds but arriving in 90 is still usable if the buffer is wide enough. But a wider buffer means more delay, and delay above roughly 150 milliseconds makes conversation feel strange, like the other person is slightly on the moon.

So the buffer adapts. Under good conditions it shrinks to minimise lag. Under choppy conditions it expands to catch stragglers. When a packet misses even the expanded window, that's when PLC takes over. The two systems work in sequence: the jitter buffer rescues the late ones, PLC invents replacements for the ones that never arrive.

On 4G and 5G networks there's a third layer below all of this: the network itself uses forward error correction, sending redundant data so the receiver can reconstruct a damaged packet without requesting a retransmission. Requesting a retransmission on a live voice call would simply take too long. The redundancy travels with the original signal instead.

What People Misread About Bad Calls

The default assumption is that a bad call is a network problem, full stop. Sometimes it is. But the experience of a bad call is shaped just as much by which codecs both ends are running, how well each phone's jitter buffer is tuned, and the pattern of loss rather than just the raw quantity.

A call that sounds fine to you can sound broken to the person on the other end, if their device's PLC implementation is weaker or if the loss is concentrated on their side of the network. Bad calls are not symmetrical. Complaining that the other person sounded terrible while they report the same about you is not a contradiction. It's two different codec pipelines hitting two different loss patterns.

There's also a hard ceiling on what reconstruction can do. PLC is a palliative, not a cure. Think of it as the difference between a smudged photocopy you can still read and a blank page: both are degraded, but only one is usable. Once sustained burst loss climbs past roughly 15 to 20 percent of packets, even the best codec produces audio that is more invention than transmission.

At that point you are no longer hearing the other person. You are hearing your phone's best statistical guess about what a voice in that conversation might plausibly sound like.

The words that came through clean were real. The robotic ones were reconstructed. The ones that turned to silence were lost before anyone could fake them.

When the Signal Drops Out, the Phone Keeps Talking

The Guessing Machine Inside Every Codec

What Happens When Losses Stack Up

The Layer Below the Codec

What People Misread About Bad Calls

More Tech*

Mobile Games Are Watching How Distracted You Are

How Mesh Networks Handle Device Handoffs

Why Game Engines Render Things You Can't See

Phone Screen Colors Shift at an Angle: The Physics