Abstract
Spatial audio can be captured and subsequently rendered to headphones in a perceptually convincing manner, using a binaural filter-and-sum beamformer whose coefficients are optimized to recreate the individual head-related transfer functions (HRTFs) of individual listeners. The remaining challenge associated with this technology is that, while individual HRTFs can be reproduced accurately for certain frequency regions and/or directions of sound incidence, the reproduction will be less accurate for others. The perceptual impact of such degradations could be investigated using real-time auralizations. In the current study, we report on listening tests in which various perceptual attributes were assessed in a lecture room, by comparing the real room to auralized rooms. The stimuli were presented either via loudspeakers (real room, reference condition) or dynamically via headphones (auralized room, test conditions). The auralizations were based both on measured and simulated binaural room impulse responses. Subjects rated the extent to which the auralized room agreed with the real room with respect to the attribute in question, using a multiple-stimulus paradigm with sorting. The results show that it is possible to realize highly convincing auralizations of speech when using measured BRIRs even with non-individual BRIRs. Small deficiencies in the simulations, could reliably be detected.