Audio Variation: Why Your Sound Effects Need a Pool, Not a File
Playing the same .wav on every footstep kills immersion in seconds. How to build randomised pitch, volume, and sample pools. The minimum variation set for each event type, and middleware approaches in FMOD and Wwise.
29 April 2026 ยท 5 min read
Play any game long enough and you'll hear it: the same sword swing on every hit, the same footstep on every step, the same button click on every press. Within minutes, the sounds fade from the foreground of attention into background noise. Within hours, they become invisible. The brain habituates to any repeated stimulus - this is a neurological fact, not an opinion. Audio variation is the engineering response to that fact.
The goal of audio variation isn't to make every sound unique for its own sake. It's to prevent habituation - to keep the brain treating sounds as signals rather than wallpaper. A sound that remains attention-grabbing throughout a session continues to do its job as feedback. A sound that habituates stops communicating. You can have the most perfectly designed impact sound in history and it will stop working if you play it identically on every single hit.
Pitch Variation
The simplest and cheapest variation technique: randomly shift the pitch of a sound each time it plays, within a small range. A variation of plus or minus 5 to 10 percent is enough to prevent the brain from pattern-matching on pitch while remaining imperceptible as 'different sounds.' The player never hears 'that was higher pitched than last time.' They just hear the hits as continuously fresh.
Pitch variation is built into most game audio middleware and engines. Unity's AudioSource has a pitch property you can randomise on play. FMOD and Wwise both have pitch randomisation as a first-class parameter on any event. The cost is effectively zero. There is almost no circumstance where you should play the same sound at the same pitch twice in a row for a repeated action - enable pitch randomisation on every frequently-used asset by default.
The range matters. Too narrow (plus or minus 2%) and habituation still occurs. Too wide (plus or minus 20%) and the variation becomes audible as inconsistency - the player notices that some hits sound higher and some lower, which can undermine the feedback language. The 5 to 10% range is the zone where variation is perceptually effective without being perceptually obvious. For naturally pitched sounds like voices or music stingers, stay at the lower end; for mechanical impacts, the full 10% is usually fine.
Variation Pools
A variation pool is a set of 3 to 6 audio assets for the same event, with the system randomly selecting (or cycling through) them each time the event fires. Where pitch variation produces subtle organic randomness, a pool produces genuine timbral variety - each sample has slightly different attack characteristics, reverb tail length, or mid-frequency content.
Pool size guidelines by event frequency: very high frequency events (footsteps, rapid-fire weapon shots) need 4 to 6 variants minimum to prevent audible looping. High frequency events (sword hits, ability activations in a combat game) need 3 to 4 variants. Low frequency events (level-up sounds, death sounds, boss transitions) can work with 1 to 2 variants since the brain doesn't get enough repetitions to habituate quickly.
The variants in a pool should share the same core identity - they need to read as 'the same type of event' to maintain feedback language consistency - but differ in texture and character. The canonical method: record multiple takes of the source material and keep the best 4 to 6, or create multiple processing variants of the same source recording with different EQ and compression settings. Do not just pitch-shift a single recording to create a pool - that produces a detectable pattern.
Avoid same-sound-twice-in-a-row logic: track the last sound played and exclude it from the next random selection. Without this, random selection will occasionally play the same sample twice consecutively, which sounds worse than a systematic rotation. This is a one-line implementation and catches the most jarring form of repetition.
Layer Blending
Layer blending combines multiple audio sub-layers at slightly varied volumes each time an event plays. An impact sound might be constructed from three layers: a sharp transient layer (the initial crack), a body layer (the mid-frequency weight), and a sub-bass layer (the low-frequency gut punch). Playing each layer at a slightly randomised volume - the body layer at 80 to 100% volume, the sub-bass at 60 to 90% - produces variation in the perceived character of the sound without replacing the asset.
This technique is particularly powerful for hits that need to communicate intensity. A light hit might have the sub-bass layer at 30% with the transient at 90%. A heavy hit might have the sub-bass at 100% and the transient at 70%. Same assets, different mix, completely different perceived weight. FMOD and Wwise both support per-layer volume randomisation as a native feature. In Unity, implement it with multiple AudioSources playing simultaneously with randomised volume ranges.
Silence as Contrast
The most overlooked variation technique: allow moments of relative quiet. If combat is a constant wall of impact sounds, the player's auditory system adapts to the baseline volume and individual impacts lose their punch. A brief lull - a moment where fewer sounds play, ambient music comes to the foreground, footsteps dominate - resets the auditory adaptation and makes the next significant impact hit harder than it physically would have.
This is why the best action game audio designers think in arcs rather than events. Individual sounds are designed to work together over time, with peaks and valleys of audio density. A boss fight that builds from quiet to overwhelming communicates the escalation through volume envelope as much as through specific sounds. The quiet start makes the loud end possible.
Middleware and Implementation
FMOD Studio and Wwise are the industry-standard audio middleware tools for implementing all of these variation techniques without hand-coding. Both support variation pools, pitch randomisation, layer blending, and parameter-driven mixing natively. For indie projects, FMOD is free up to a revenue threshold and has good Unity and Unreal integration. Wwise has a free tier as well but is more complex to set up.
If you're not using middleware, Unity's AudioSource with Random.Range on pitch, combined with an array of AudioClips and a no-repeat selection algorithm, implements the core techniques adequately for most indie projects. The important thing is that variation is built into the audio playback system from the start, not retrofitted after release when the repetition has already appeared in reviews.
Part of a series
Audio