Article (Scientific journals)
A comparative study of 3D and 1D acoustic simulations of the higher frequencies of speech
Blandin, Rémi; Stone, Simon; Remacle, Angélique et al.
2023In IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, p. 3837-3847
Peer Reviewed verified by ORBi
 

Files


Full Text
_d_vs__d_synthesis_revised___no_highlight-1.pdf
Author preprint (554.33 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Acoustics; Three-dimensional displays; Solid modeling; Transfer functions; Shape; Speech processing; Resonant frequency; Articulatory synthesis; wideband speech; multimodal method
Abstract :
[en] Articulatory synthesis generates speech sounds by simulating the physical phenomen involved in speech production. The accuracy of the physical modelling is expected to affect the naturalness of the synthesis: the more realistic the description is, the greater the naturalness is expected to be. In this work, the accuracy of acoustic wave propagation in the vocal tract was evaluated with two perceptual experiments. Sustained vowels generated using a one-dimensional acoustic model, a three-dimensional acoustic model and an artificial bandwidth extension algorithm (without a physical basis) were compared. Since the difference between the acoustic methods tested affects mainly the frequencies above 4 kHz, we ensured that the low frequency part of the stimuli, up to 4 kHz, was similar. Thus, the participants' responses were based only on the differences at high frequency. The first experiment was a pair comparison, in which the participants had to select the more natural sounding stimuli. In the second experiment, the participants had to rate the naturalness of the stimuli on a linear scale. The results confirmed that a more accurate physical modeling leads to greater naturalness. However, this was limited to the phonemes /o/ and /u/, for which transverse resonances in the anterior vocal tract may play an important role that only a 3D acoustic simulation can accurately represent. It was also found that male stimuli were perceived as significantly more natural than female ones. However, voice quality did not affect naturalness.
Disciplines :
Physics
Author, co-author :
Blandin, Rémi ;  Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
Stone, Simon ;  Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
Remacle, Angélique  ;  Université de Liège - ULiège > Département de Logopédie > Logopédie des troubles de la voix ; Research Unit for a Life-Course Perspective on Health and Education, Faculty of Psychology, Speech and Language Therapy, and Educational Sciences, University of Liè,ge, Belgium
Didone, Vincent ;  Université de Liège - ULiège > Psychologie et Neuroscience Cognitives (PsyNCog) ; Psychology and Neuroscience of Cognition Research Unit (PsyNCog), Quantitative psychology, University of Liè,ge, Liè,ge, Belgium
Birkholz, Peter;  Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
Language :
English
Title :
A comparative study of 3D and 1D acoustic simulations of the higher frequencies of speech
Publication date :
08 September 2023
Journal title :
IEEE/ACM Transactions on Audio, Speech and Language Processing
ISSN :
2329-9290
eISSN :
2329-9304
Publisher :
Institute of Electrical and Electronics Engineers (IEEE)
Volume :
31
Pages :
3837-3847
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
DFG - German Research Foundation [DE]
Funding number :
BI 1639/7-1
Available on ORBi :
since 15 October 2023

Statistics


Number of views
14 (5 by ULiège)
Number of downloads
9 (3 by ULiège)

Scopus citations®
 
0
Scopus citations®
without self-citations
0
OpenCitations
 
0

Bibliography


Similar publications



Contact ORBi