Spatial auditory-vision sensory substitution using range cameras

(TÜBİTAK 3001, 12/2014 - 12/2016)

Sensory substitution is a technique whereby sensory information in one modality (such as vision) can be assimilated by an individual in another modality (such as audition or taction). Sensory substitution holds promise especially for those with vision loss, to the extent that considerable information in the deficient modality may be made available to the user of the sensory substitution system.

Typical sensory substitution systems for transmitting visual information utilise typical RGB or, more commonly, grey-scale images and transmit them via a tactile array placed on the skin, or via a spectro-temporal audio encoding of the visual data. The capabilities of these systems have not yet provided the required acuity, denseness and responsiveness to be of use in everyday life for most potential users. Range sensors have also proved useful for assisting sight-impaired individuals but their output is generally made available to the user in a low-bandwidth form, such as an obstacle detection warning or sparse information about the object currently in focus.

The proposed project aims to make use of time-of-flight and similar range cameras to use sensory substitution to provide an important component of dense visual data to the user in the form of a video stream of range images. Such images consist of 2D arrays of ranges similar to a very accurate disparity map, and can be transformed into 3D point clouds for further processing. Range cameras, combined with normal imagery, so called RGBD, has become affordable in recent years – on the order of $250 USD for one camera. The project will investigate the use of binaural auditory cues in presenting this range data to the user. Binaural auditory cues are well known to provide accurate 3D localisation of sparse sound sources, and do not require extra training to use. In the proposed project binaural cues will be exploited along with intelligent surface segmentation and audio encoding of surface properties. The extent to which such cues can be used to present range image streams will also be assessed using objective and subjective evaluation criteria. The aim of the project is thus to investigate the scope of solutions possible in this new area, as well as to drive at a level of acuity, density and responsiveness such that such systems can become a viable option in the every day life of normal sight-impaired individuals.

As for novelty, this is the first study to create a sensory substitution system that uses range images and the use of spatial audio is also not well studied in the area of sensory substitution.
The project is envisioned to take as a framework two different approaches to encoding range information as auditory information and then to build on these approaches by adding in user controls and examining variants and combinations of different encodings. These two approaches are the spatio-temporal surface primitive and the spatial scanning approaches. The spatio-temporal surface primitive approach segments range data into relatively smooth or simple discrete contiguous surface regions and then encodes the local shape of those regions as auditory signals spatially located with respect to the location of the surface. This approach has the advantage of an intuitive and responsive encoding of spatial data. The spatial scanning approach trades off the responsiveness of the previous approach to achieve a greater density of information by scanning the range image in one dimension and providing an audio signal (which also makes use of binaural cues) that changes as the scan-line moves down the image.
By the end of the project the aim is to have proof-of-concept as well as a set of experimental evaluations on a set of ecologically relevant scenarios.

Project Team
Lect. Dr. Damien Jade Duff

Assist. Prof. Dr. Gökhan İnce

MSc Students:
Ahmad Mhaish
Torkan Gholamalizadeh
Hossein Pourghaemi

BSc Students: 
Hakan Çoban
Çağatay Erdiz
Oğuz Kerem Tural
Javid Nuriyev