A Robust Hybrid Neural Network Architecture for Blind Source Separation of Speech Signals Exploiting Deep Learning

Export Citation

Ansari, S, Alnajjar, KA, Khater, T, Mahmoud, S and Hussain, A ORCID: 0000-0001-8413-0045 (2023) A Robust Hybrid Neural Network Architecture for Blind Source Separation of Speech Signals Exploiting Deep Learning. IEEE Access, 11. pp. 100414-100437. ISSN 2169-3536

[thumbnail of A Robust Hybrid Neural Network Architecture for Blind Source Seperation of Speech Signals.pdf]

Preview

Text
A Robust Hybrid Neural Network Architecture for Blind Source Seperation of Speech Signals.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (6MB) | Preview

Publisher URL: https://doi.org/10.1109/ACCESS.2023.3313972

Abstract

In the contemporary era, blind source separation has emerged as a highly appealing and significant research topic within the field of signal processing. The imperative for the integration of blind source separation techniques within the context of beyond fifth-generation and sixth-generation networks arises from the increasing demand for reliable and efficient communication systems that can effectively handle the challenges posed by high-density networks, dynamic interference environments, and the coexistence of diverse signal sources, thereby enabling enhanced signal extraction and separation for improved system performance. Particularly, audio processing presents a critical domain where the challenge lies in effectively handling files containing a mixture of human speech, silence, and music. Addressing this challenge, speech separation systems can be regarded as a specialized form of human speech recognition or audio signal classification systems that are leveraged to separate, identify, or delineate segments of audio signals encompassing human speech. In various applications such as volume reduction, quality enhancement, detection, and identification, the need arises to separate human speech by eliminating silence, music, or environmental noise from the audio signals. Consequently, the development of robust methods for accurate and efficient speech separation holds paramount importance in optimizing audio signal processing tasks. This study proposes a novel three-way neural network architecture that incorporates transfer learning, a pre-trained dual-path recurrent neural network, and a transformer. In addition to learning the time series associated with audio signals, this network possesses the unique capability of direct context-awareness for modeling the speech sequence within the transformer framework. A comprehensive array of simulations is meticulously conducted to evaluate the performance of the proposed model, which is benchmarked with seven prominent state-of-the-art deep learning-based architectures. The results obtained from these evaluations demonstrate notable advancements in multiple objective metrics. Specifically, our proposed solution showcases an average improvement of 4.60% in terms of short-time objective intelligibility, 14.84% in source-to-distortion ratio, and 9.87% in scale-invariant signal-to-noise ratio. These extraordinary advancements surpass those achieved by the nearest rival, namely the dual-path recurrent neural network time-domain audio separation network, firmly establishing the superiority of our proposed model's performance.

Item Type:	Article
Uncontrolled Keywords:	08 Information and Computing Sciences; 09 Engineering; 10 Technology
Subjects:	T Technology > T Technology (General) T Technology > TK Electrical engineering. Electronics. Nuclear engineering
Divisions:	Computer Science and Mathematics
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Date of acceptance:	5 September 2023
Date of first compliant Open Access:	21 November 2023
Date Deposited:	21 Nov 2023 14:55
Last Modified:	04 Jul 2025 15:45
DOI or ID number:	10.1109/ACCESS.2023.3313972
URI:	https://researchonline.ljmu.ac.uk/id/eprint/21913

View Item

CORE (COnnecting REpositories)