In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases. © (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). |