Perceptual evaluation of the naturalness of broadband articulatory speech synthesis using a 1D versus a 3D acoustic model

Blandin, Rémi; Didone, Vincent; Birkholz, Peter; Remacle, Angélique

doi:10.21437/issp.2024-4

Paper published in a book (Scientific congresses and symposiums)

Perceptual evaluation of the naturalness of broadband articulatory speech synthesis using a 1D versus a 3D acoustic model

Blandin, Rémi; Didone, Vincent; Birkholz, Peter et al.

2024 • In Proceedings of the 13th International Seminar of Speech Production

Peer reviewed

Permalink
https://hdl.handle.net/2268/319000

DOI
10.21437/issp.2024-4

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ISSP_2024_article_v3.pdf

Author postprint (176.19 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Speech; Acoustics; Perception; Naturalness; Articulatory Synthesis

Abstract :

[en] Articulatory synthesis is a useful tool to explore the relationship between the speech production and perception processes. However, including the high frequencies (HF, above about 5 kHz) requires a three-dimensional (3D) acoustical model for realistic simulations. In this frequency range, one-dimensional (1D) acoustic models fail to predict additional resonances and anti-resonances related to the 3D properties of the acoustic field. While articulatory synthesis based on 3D acoustic models is nowadays achievable for isolated phonemes, the impact of such models on the perception by human listeners remains largely unknown. The objective of this work was to determine whether a more realistic computation of transfer functions with a frequency domain approach results in phonemes perceived as more natural. For this purpose, a perception experiment using a 4-points Likert scale was conducted to evaluate the naturalness of seven static phonemes synthesized with a 1D and a 3D models. No significant influence of the acoustic model was found, however, significant differences between the phonemes were perceived.

Disciplines :

Physics
Theoretical & cognitive psychology
Speech and language therapy

Author, co-author :

Blandin, Rémi; TU Dresden > Institute of Acoustics and Speech Communication

Didone, Vincent ; Université de Liège - ULiège > Psychologie et Neuroscience Cognitives (PsyNCog)

Birkholz, Peter; TE Dresden > Institute of Acoustics and Speech Communication

Remacle, Angélique ; Université de Liège - ULiège > Département de Logopédie > Logopédie des troubles de la voix ; Université de Liège - ULiège > Unités de recherche interfacultaires > Research Unit for a life-Course perspective on Health and Education (RUCHE)

Language :

English

Title :

Perceptual evaluation of the naturalness of broadband articulatory speech synthesis using a 1D versus a 3D acoustic model

Publication date :

May 2024

Event name :

13th International Seminar of Speech Production

Event place :

Autrans, France

Event date :

du 13 au 17 mai 2024

Audience :

International

Main work title :

Proceedings of the 13th International Seminar of Speech Production

Publisher :

Fougeron Cécile et Perrier Pascal

Pages :

12-15

Peer review/Selection committee :

Peer reviewed

Additional URL :

https://www.isca-archive.org/issp_2024/blandin24_issp.html

Available on ORBi :

since 31 May 2024

Statistics

Number of views

143 (2 by ULiège)

Number of downloads

173 (1 by ULiège)

More statistics

OpenAlex citations