The Room That Isn't There

You're crouched behind a wall. Somewhere ahead, a guard's footsteps are getting louder, not louder like someone turned up the volume, but louder like they're closer. Your stomach tightens on cue. What your brain doesn't know, and wouldn't believe if you told it, is that nothing moved. No sound traveled through any space. A few numbers changed in a render loop running sixty times a second, and your ancient primate threat-detection system did the rest.

That's the trick. Gaming audio engines don't move sound. They sculpt the impression of distance using physical properties your auditory system has been calibrating against the real world your entire life.

Volume Is the Dumbest Tool, and Engines Know It

The obvious lever is amplitude: things far away are quieter. Engines use this, but raw volume alone is a terrible distance cue. A whisper two feet away and a shout fifty feet away can land at the same decibel level. Your brain has never relied on volume in isolation, and a game that does sounds immediately, unmistakably wrong.

The real workhorse is the inverse square law. Sound pressure drops in proportion to the square of the distance from a source. Double the distance, get roughly one quarter the intensity. Engines implement this as an attenuation curve: a sound emitter in the game world has a minimum distance (inside which volume stays constant) and a maximum distance (beyond which it's silent), with a rolloff shape between them. Unreal Engine calls these Attenuation Settings. Unity calls the same concept 3D Sound Settings rolloff. Both let designers pick between logarithmic, linear, or custom curves, because real environments don't all behave like open fields. A gunshot in a narrow stone corridor rolls off differently than a campfire in a meadow.

So far, so mechanical. The interesting stuff is what happens to the character of sound as distance grows.

Air Is a Low-Pass Filter

High frequencies lose energy faster than low ones when traveling through air. This is called air absorption, and it's why a thunderstorm two miles away sounds like a low rumble while one overhead has that sharp crack. At 1,000 Hz, air absorbs roughly 3 to 5 dB per 100 meters under typical conditions. At 8,000 Hz, that figure climbs to around 20 to 40 dB per 100 meters depending on humidity.

Audio engines model this with a high-frequency rolloff filter that gets progressively heavier as virtual distance increases. It's essentially a low-pass filter whose cutoff frequency slides downward the farther the source sits from the listener. A close explosion has snap and presence. The same explosion heard from across a valley has its top end shaved off, leaving something thudding and diffuse. Both use the same audio asset. Only the filter changes.

This is where cheaper implementations cut corners, and you can hear it. If distant sounds are just quieter but tonally identical to close ones, they feel pasted onto the scene rather than embedded in it. Your ears expect the texture to change, not just the level.

The Geometry of Your Own Head Is Doing Heavy Lifting

Distance cues work in combination with directionality, and directionality in headphones is a separate problem that engines solve with Head-Related Transfer Functions, or HRTFs.

Your ears are not microphones on a flat surface. They sit on the sides of a head with a specific shape, attached to pinnae (the outer ear folds) that are themselves irregular. When a sound arrives from your upper left, it reaches your left ear a fraction of a millisecond before your right (interaural time difference), and arrives at your right ear with its high frequencies subtly shadowed by your skull (interaural level difference). Your brain reads both cues simultaneously and triangulates position in three dimensions, including elevation, which pure stereo can't convey at all.

An HRTF is a measured impulse response that encodes all of that geometry for every point in a sphere around the head. Think of it as a fingerprint of how your skull bends sound. Engines convolve a mono sound source against the appropriate HRTF sample for its virtual position, and the result, heard through headphones, can sound like it's genuinely above you, behind you, or ten meters to your right.

Valve's Steam Audio, Microsoft's Spatial Sound platform, and Sony's Tempest 3D Audio on PlayStation 5 all use HRTF convolution as their core spatial layer. The PS5's implementation offers personalized HRTF profiles, because the exact shape of your ear affects how well a generic HRTF works for you. Some people hear generic HRTF as convincingly spatial. Others get something vaguely "in the head," slightly wrong. Personalization is a real improvement, not a marketing slide.

Reverb Is the Room, and the Room Tells You Everything

Most discussions skim past this part. That's a mistake, because reverb is probably the single most powerful distance cue in a game environment.

When you hear a sound in a real space, you hear two things: the direct signal (sound traveling in a straight line from source to ear) and the reverberant tail (sound that bounced off walls, floors, and ceilings before reaching you). The ratio between those two components tells your brain an enormous amount. A high direct-to-reverb ratio means you're close to the source. A low one means you're far away. In a cathedral, a voice twenty meters away might carry more reverberant energy than direct energy, and your brain reads that as both distance and room size simultaneously.

Engines handle this in layers. The simplest is a global reverb send: all sounds in a level feed into a shared reverb bus, with the send level scaled by distance. Closer sounds get less reverb mix; distant ones get more. It works, but it's coarse.

More sophisticated implementations use acoustic zones or audio occlusion. In Frostbite, the engine behind the Battlefield series, each room or outdoor zone can carry its own reverb impulse response, and the engine crossfades between zones as the player moves. Walk from a marble lobby into a carpeted office and the tail character changes: the lobby's long, bright reverb bleeds into something shorter and more absorbent. The transition itself is a spatial cue. You know you've moved into a different kind of room before you see it.

Occlusion handles the other half. When a wall sits between the listener and a source, the engine attenuates high frequencies and boosts reverberant content, simulating sound bleeding through structure. The guard you can't see is audible, but muffled, and your brain correctly infers an obstacle.

Here's where it gets concrete. Two players run the same corridor in a multiplayer match, one with a flat stereo mix and one with a full spatial audio pipeline. The first player hears an enemy reload and thinks: somewhere nearby. The second player hears it, correctly places it as behind and to the left, roughly ten meters, past a wall, and repositions before the first player has even turned around. That's not a marginal difference. That's the game.

Headphones vs. Speakers: The Assumption Is Backwards

A common belief is that speakers provide "real" 3D audio and headphones are a compromise. It's the wrong way round, at least for gaming.

Speakers in a room produce actual wavefronts that interact with your real head and pinnae. Authentic, yes, but completely dependent on your room acoustics, speaker placement, and where exactly your head sits. Move six inches and the imaging shifts. Headphones deliver an isolated, repeatable signal directly to each ear, which means a well-implemented HRTF can be more precisely controlled than a speaker array in an untreated room. The catch is that headphones require accurate HRTF convolution to externalize sound, to make it feel like it's coming from outside your head rather than inside it. Poorly implemented spatial headphone audio sounds like the world is trapped in a fishbowl two inches behind your eyes.

The best spatial audio you'll hear in a game is almost certainly through a good pair of closed-back headphones running a personalized HRTF profile on hardware with the processing budget to do it properly. Speakers win on presence and feel. Headphones win on precision.

The One Cue Your Brain Trusts Above All Others

If you stripped every technique above down to one, keep reverb. Volume lies. Filtering is subtle. HRTFs vary by person. But the direct-to-reverb ratio is a cue so fundamental that humans calibrated against it before they were old enough to walk. So ask yourself: when you last felt genuinely unnerved by something you heard in a game, not saw, just heard, was the room right? Was there weight behind the sound, a tail, a sense of enclosed or open space?

An audio engine that nails reverb can fake distance convincingly even with a mediocre attenuation curve. One that botches it sounds like cardboard no matter how sophisticated the rest of the pipeline.

The guard's footsteps didn't travel anywhere. But your heart rate went up. That's the whole job.