How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation

Williams, EL orcid iconORCID: 0009-0003-7845-6040, Jones, KO orcid iconORCID: 0000-0001-6689-3225, Robinson, C, Chandler Crnigoj, S, Burrell, H and McColl, S orcid iconORCID: 0000-0001-8972-2998 (2025) How Frequency and Harmonic Profiling of a ‘Voice’ Can Inform Authentication of Deepfake Audio: An Efficiency Investigation. In: Journal of Advances in Engineering and Technology , 3 (1). pp. 49-58. (SLIIT International Conference on Engineering and Technology, 25th Jul 2024, Malabe, Sri Lanka).

[thumbnail of An investigation into how frequency and harmonic profiling of a voice can inform authentication of deepfake audio.pdf]
Preview
Text
An investigation into how frequency and harmonic profiling of a voice can inform authentication of deepfake audio.pdf - Accepted Version

Download (418kB) | Preview

Abstract

As life in the digital era becomes more complex, the capacity for criminal activity within the digital realm becomes even more widespread. More recently, the development of deepfake media generation powered by Artificial Intelligence pushes audio and video content into a realm of doubt, misinformation, or misrepresentation. The instances of deepfake videos are numerous, with some infamous cases ranging from manufactured graphic images of the musician Taylor Swift, through to the loss of $25 million dollars transferred after a faked video call. The problems of deepfake are becoming increasingly concerning for the general public when such material is submitted into evidence in a court case, especially a criminal trial. The current methods of authentication against such deepfake evidence threats are insufficient. When considering speech within audio forensics, there is sufficient ‘individuality’ in one’s own voice to enable comparison for identification. In the case of authenticating audio for deepfake speech, it is possible to use this same comparative approach to identify rogue or incomparable harmonic and formant patterns within the speech. The presence of deepfake media within the realms of illegal activity demands appropriate legal enforcement, resulting in a requirement for robust detection methods. The work presented in this paper proposes a robust technique for identifying such AI-synthesized speech using a quantifiable method that proves to be justified within court proceedings. Furthermore, it presents the correlation between the harmonic content of human speech patterns and the AI-generated clones they produce. This paper details which spectrographic audio characteristics were found that may prove helpful towards authenticating speech for forensic purposes in the future. The results demonstrate that using specific frequency ranges to compare against a known audio sample of a person’s speech, indicates the presence of deepfake media due to different harmonic structures.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: 46 Information and Computing Sciences; 4603 Computer Vision and Multimedia Computation; Machine Learning and Artificial Intelligence
Subjects: T Technology > T Technology (General)
T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Engineering
Pharmacy and Biomolecular Sciences
Publisher: Sri Lanka Institute of Information Technology
Date of acceptance: 16 December 2024
Date of first compliant Open Access: 28 October 2025
Date Deposited: 28 Oct 2025 10:50
Last Modified: 28 Oct 2025 10:50
DOI or ID number: 10.54389/hgbc7543
URI: https://researchonline.ljmu.ac.uk/id/eprint/27344
View Item View Item