Acoustics & Psychoacoustics
Topics 2.1 + 2.2Hear the contours, feel the masking, treat the room. The studio depends on knowing how the ear works and how sound behaves indoors.
The ear as a transducer
Sound waves enter the outer ear and push the eardrum back and forth. Three tiny bones (the ossicles) lever those movements into the cochlea, a fluid-filled spiral. Inside the cochlea, the basilar membrane vibrates in different places for different frequencies β and hair cells convert those vibrations into nerve impulses. The ear is a mechanical-to-electrical transducer, the inverse of a loudspeaker.
Human hearing range
A young, healthy listener hears roughly 20 Hz to 20 kHz. The lower end is felt as much as heard; the upper end fades with age and exposure to loud sound. Most musical content sits between 50 Hz and 8 kHz β bass weight, presence, air. The two octaves around 1β4 kHz are where the ear is most sensitive and where speech intelligibility lives.
Threshold of hearing & threshold of pain
At 1 kHz, the quietest sound a young ear can detect is around 0 dB SPL. The threshold of pain sits around 120 dB SPL β sustained exposure above 85 dB SPL causes permanent damage. The decibel scale is logarithmic: each 6 dB doubles the sound pressure, and each 10 dB roughly doubles perceived loudness.
Equal-loudness contours (FletcherβMunson)
The ear is not equally sensitive across frequencies. At low listening levels we need much more energy at the extremes (20 Hz, 16 kHz) to perceive them as equally loud as a 1 kHz tone. As volume rises, the contours flatten β mixes feel more balanced loud than quiet. This is why engineers reference at a consistent monitoring level: a mix balanced quietly will feel bass-heavy when played loud, and vice versa.
Frequency masking
A louder sound at one frequency hides quieter sounds at nearby frequencies. A loud kick around 60 Hz masks a quieter bass guitar fundamental at 80 Hz; a bright cymbal at 8 kHz masks a vocal sibilant at 7 kHz. Mixing is, in large part, an exercise in unmasking β carving frequency space so each instrument is heard. EQ subtractions and panning both relieve masking.
Temporal masking
A loud sound masks quieter sounds immediately before and after it. Pre-masking is about 20 ms; post-masking can last 100β200 ms. This is why a snare hit can hide a quiet hi-hat ghost note that follows it, and why MP3 encoders aggressively discard data hidden behind transients without listeners noticing.
Reflection, absorption, diffusion
Sound striking a surface can be reflected (bounced back), absorbed (converted to heat in porous material), or diffused (scattered in many directions). A hard, parallel-walled room has loud reflections that comb-filter the direct sound. Soft furnishings and porous absorbers (mineral wool, foam) cut reflections, especially at higher frequencies. Diffusers (geometric panels, bookshelves) preserve liveliness while breaking up flutter.
Bass trapping and modal problems
Low frequencies have long wavelengths (a 60 Hz wave is about 5.7 m). Inside a small room they set up resonances called modes, where some frequencies pile up at the walls and corners while others cancel in the middle. Thick porous bass traps in corners (where pressure is highest) absorb low-frequency energy and tame the modal peaks that otherwise make a control room dishonest.
Reverb time (RT60)
RT60 is the time it takes a sound to decay by 60 dB once the source stops. A bedroom typically has RT60 below 0.3 s; a concert hall sits around 1.8β2.2 s. The optimum for mixing varies, but anechoic chambers (very short RT60) feel oppressive, and untreated bedrooms have ragged uneven decay. Acoustic treatment is about getting an even, predictable decay across frequencies, not about deadening the room.