Jones, K ORCID: 0000-0001-6689-3225, Reed-Jones, J
ORCID: 0000-0002-6398-1980, Marsland, J, Fergus, P
ORCID: 0000-0002-7070-4447 and Ellis, D
(2023)
Comparison of Performance in Binaural Sound Source Localisation using Convolutional Neural Networks for differing Feature Representations.
In:
AES Convention 154 Conference Proceedings
.
(AES Convention 154, 13th May - 15th May 2023, Helsinki, Finland).
Preview |
Text
magnitude_features.pdf - Accepted Version Download (1MB) | Preview |
Abstract
Binaural Sound Source Localisation is increasingly being achieved by means of the Convolutional Neural Network (CNN). These networks take in a Time-Frequency representation of audio as an input, and use this to estimate the direction of arrival of a sound. In previous works, different Time-Frequency representations have been used, but never only using solely magnitude spectra, leading to a lack of understanding in the importance of this in full azimuthal binaural sound source localisation. This work aims to address that gap by testing the performance of a CNN trained and tested on four different Time-Frequency representations: Mel-Spectrogram, Gammatonegram, Mel-Frequency Cepstrum, and Gammatone-Frequency Cepstrum. From this test, it was found that Spectrograms are suitable for the task of full azimuthal binaural sound source localisation.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | T Technology > TA Engineering (General). Civil engineering (General) |
Divisions: | Engineering |
Publisher: | Audio Engineering Society |
Date of acceptance: | 1 May 2023 |
Date of first compliant Open Access: | 8 July 2025 |
Date Deposited: | 08 Jul 2025 08:13 |
Last Modified: | 08 Jul 2025 08:13 |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/26711 |
![]() |
View Item |