Deep Binaural Direction of Arrival Estimation

Reed-Jones, J orcid iconORCID: 0000-0002-6398-1980 (2025) Deep Binaural Direction of Arrival Estimation. Doctoral thesis, Liverpool John Moores University.

[thumbnail of 2025reedjonesphd 1.pdf]
Preview
Text
2025reedjonesphd 1.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (4MB) | Preview

Abstract

The objective of binaural direction of arrival (DoA) estimation is to find the DoA of a sound source by measuring the sound field with a binaural array. This field increasingly applies deep learning to this task, particularly convolutional neural networks which are trained on relatively raw representations of the binaural audio. This work investigates the field, establishing common trends among different publications, particularly in the data preparation, scrutinising these trends for instances of the emergence of collective wisdom without empirical backing. Based on this, an experimental evaluation is performed to gain insight into the efficacy of different existing and novel techniques, based on a recurring testing framework. Such experimental evaluations are undertaken for several topics: an analysis of acoustic conditions on the performance of binaural DoA estimation, a broad empirical study on binaural feature representations to be used with convolutional neural networks (CNNs), the proposal and comparison of convolutional recurrent neural network (CRNN) models for binaural DoA estimation, and an investigation into binaural DoA estimation in the mismatched anechoic condition; referring to a mismatch in headrelated transfer function (HRTF) measurements between training and testing datasets for an identical binaural array. The findings in this thesis lead to recommendations for more effectively using deep neural networks for binaural DoA estimation, while also demonstrating the limited ability of such systems to generalise to unseen binaural data when using simulated binaural datasets which are limited in their scope.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Binaural Audio; Deep Learning; Sound Source Localisation; Acoustic Signal Processing; Human Audition
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Engineering
Date of acceptance: 9 June 2025
Date of first compliant Open Access: 6 August 2025
Date Deposited: 06 Aug 2025 11:47
Last Modified: 06 Aug 2025 11:47
DOI or ID number: 10.24377/LJMU.t.00026709
Supervisors: Jones, K, Fergus, P and Ellis, D
URI: https://researchonline.ljmu.ac.uk/id/eprint/26709
View Item View Item