Comparison of Performance in Binaural Sound Source Localisation using Convolutional Neural Networks for differing Feature Representations

Jones, K orcid iconORCID: 0000-0001-6689-3225, Reed-Jones, J orcid iconORCID: 0000-0002-6398-1980, Marsland, J, Fergus, P orcid iconORCID: 0000-0002-7070-4447 and Ellis, D (2023) Comparison of Performance in Binaural Sound Source Localisation using Convolutional Neural Networks for differing Feature Representations. In: AES Convention 154 Conference Proceedings . (AES Convention 154, 13th May - 15th May 2023, Helsinki, Finland).

[thumbnail of magnitude_features.pdf]
Preview
Text
magnitude_features.pdf - Accepted Version

Download (1MB) | Preview

Abstract

Binaural Sound Source Localisation is increasingly being achieved by means of the Convolutional Neural Network (CNN). These networks take in a Time-Frequency representation of audio as an input, and use this to estimate the direction of arrival of a sound. In previous works, different Time-Frequency representations have been used, but never only using solely magnitude spectra, leading to a lack of understanding in the importance of this in full azimuthal binaural sound source localisation. This work aims to address that gap by testing the performance of a CNN trained and tested on four different Time-Frequency representations: Mel-Spectrogram, Gammatonegram, Mel-Frequency Cepstrum, and Gammatone-Frequency Cepstrum. From this test, it was found that Spectrograms are suitable for the task of full azimuthal binaural sound source localisation.

Item Type: Conference or Workshop Item (Paper)
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Engineering
Publisher: Audio Engineering Society
Date of acceptance: 1 May 2023
Date of first compliant Open Access: 8 July 2025
Date Deposited: 08 Jul 2025 08:13
Last Modified: 08 Jul 2025 08:13
URI: https://researchonline.ljmu.ac.uk/id/eprint/26711
View Item View Item