Creating speech deepfakes is becoming increasingly easy. Not so long ago, the Finnish language still posed an obstacle, but not anymore.
“Today, anyone can create a speech deepfake. In the past, it took greater technical dedication, but nowadays, numerous voice cloning services are available to virtually anyone,” says Professor Tomi Kinnunen of the School of Computing at the University of Eastern Finland.
Speech synthesis could, in principle, be used to deceive biometric authentication systems as part of scam calls or disinformation on social media. Therefore, it is essential to understand when automatic systems and humans can be deceived – and develop countermeasures accordingly.
“Such countermeasures include, for instance, speech deepfake detection and deepfake source tracing, that is, identifying the voice cloning or synthesis software used to create the deepfake. In the case of biometric authentication, the aim is to improve the robustness of systems against various attacks,” Kinnunen notes.
“Neural networks and artificial intelligence are widely used in research in this field. Personally, however, I’ve felt it important to move on to more interpretable methods in which the detection method can ‘justify’ its decisions.”
Developing automated deepfake detection
Speech as a field of research is rapidly evolving, and there is plenty to investigate. Speech research has an interdisciplinary focus, drawing on machine learning, data collection, speech sciences and explainable AI.
According to Kinnunen, deepfake research is like playing cat and mouse. Recent years have seen significant advances in the accuracy of detection methods and countermeasures, but model generalisation remains a major challenge.
“Machine learning is based on fitting models to large sets of training data, and models can easily overfit to the training data used. As a result, the detection of speech deepfakes created with previously unseen synthesis techniques becomes difficult,” he explains.
“An additional challenge arises from the fact that real-world deepfakes often contain encoded or compressed speech, which masks the artefacts produced by speech synthesis. This makes detection more difficult.”
Speech technology research utilises signal processing and machine learning, essentially deep neural network models trained on large datasets.
“We are currently developing automated speech deepfake detection to determine whether speech is genuine or synthetic. We are also working on synthetic speech source tracing, in other words, examining the speech synthesis technique used to create the deepfake.”
In the ongoing SPEECHFAKES project funded by the Research Council of Finland, researchers have developed methods for identifying the sub-components of synthesis methods used to create speech deepfakes.
The project has also created entirely new metrics for assessing accuracy. The challenge is to objectively evaluate and compare different detection solutions to understand which models generalise best, and under which circumstances systems are likely to err.
“When biometric authentication is combined with deepfake detection, something as self-evident as accuracy assessment becomes less straightforward,” Kinnunen says.
The study Kinnunen refers to was published in IEEE Transactions on Pattern Analysis and Machine Intelligence, one of the leading machine learning journals.
“Our goal is to further improve method accuracy and interpretability. We will certainly be seeing more new AI-based voice cloning services and tools in the future.”
Further reading
T. H. Kinnunen, et al, "t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 2622-2637, May 2024, doi: 10.1109/TPAMI.2023.3313648. ArXiv link: https://arxiv.org/abs/2309.12237
J. Mishra et al, "Towards explainable spoofed speech attribution and detection: A probabilistic approach for characterizing speech synthesizer components," Computer Speech & Language, Volume 95, 2026. Available at https://www.sciencedirect.com/science/article/pii/S0885230825000658
This story is part of UEF Insight, the University of Eastern Finland’s new online magazine exploring current issues and emerging phenomena. To read all stories, please visit uef.fi/insight.