In his doctoral dissertation, M.Sc. Ville Vestman focuses on the latest developments and challenges in automatic speaker recognition technology. The dissertation proposes several new methods to improve the speed and accuracy of speaker recognition systems. In addition, the dissertation sheds light upon the vulnerability of speaker recognition systems against malicious spoofing attempts.
Voice-based authentication systems are often more difficult to implement than the ones based on fingerprint or face recognition. This is due to the vast variation in speech production, acoustic environments, and recording technology. Among other topics, Vestman’s dissertation studies speaker recognition under two such sources of variability: First, speaker recognition is performed in variable acoustic conditions that induce different amounts of reverberation. Second, speaker recognition systems are subjected to whispered speech instead of modal speech. To tackle these challenging use cases, the dissertation proposes a novel acoustic feature extraction technique based on time-varying linear prediction. The proposed features demonstrated promising results particularly in reverberant conditions.
In addition to the challenges caused by acoustic variation, ensuring the security of automatic speaker recognition turns out to be challenging. A fraudster might attempt to spoof a speaker recognition system in a number of different ways. For instance, the fraudster might try to imitate another person’s voice, or replay recorded speech through a loudspeaker. In addition, the fraudster might use advanced technological means such as voice conversion or voice synthesis tools to fool the speaker recognition system.
As a part of his dissertation, Vestman addressed such security threats in speaker recognition. He is a co-organizer of the international ASVspoof 2019 challenge. The ASVspoof challenge promotes the development of voice anti-spoofing systems — methods that can be used to detect spoofing attacks.
The experimental research conducted in the dissertation was greatly accelerated by leveraging from the computational power provided by graphics processing units. In addition to utilizing modern hardware, the studied recognition systems were made faster by optimizing and simplifying existing speaker recognition models.
The field of speech technology evolves fast. The publications in Vestman’s dissertation reflect the ongoing technological transition from classical statistical methods to modern deep learning. The last publication in the dissertation combines know-how of both approaches in the form of a statistical generative models spiced up with deep learning. The proposed approach brings improvements to the existing generative voice biometrics.
The doctoral dissertation of M.Sc. Ville Vestman, entitled Methods for Fast, Robust, and Secure Speaker Recognition will be examined at the Faculty of Science and Forestry. The examination will be held online on the 10th of November at 10 am (UTC+2). The opponents in the public examination will be Associate Professor Tom Bäckström of the Aalto University and Associate Professor Brian Kan-Wing Mak of the Hong Kong University of Science and Technology. The custos will be Associate Professor Tomi Kinnunen of the University of Eastern Finland.