Operation LiLi: Using Crowd-Sourced Data and Automatic Alignment to Investigate the Phonetics and Phonology of Less-Resourced Languages-Reference-Cited by-同舟云学术

Operation LiLi: Using Crowd-Sourced Data and Automatic Alignment to Investigate the Phonetics and Phonology of Less-Resourced Languages

Published:2022-09-08 Issue:3 Volume:7 Page:234
ISSN:2226-471X
Container-title:Languages
language:en
Short-container-title:Languages

Author:

Hutin Mathilde^ORCID,Allassonnière-Tang Marc^ORCID

Abstract

Less-resourced languages are usually left out of phonetic studies based on large corpora. We contribute to the recent efforts to fill this gap by assessing how to use open-access, crowd-sourced audio data from Lingua Libre for phonetic research. Lingua Libre is a participative linguistic library developed by Wikimedia France in 2015. It contains more than 670k recordings in approximately 150 languages across nearly 740 speakers. As a proof of concept, we consider the Inventory Size Hypothesis, which predicts that, in a given system, variation in the realization of each vowel will be inversely related to the number of vowel categories. We investigate data from 10 languages with various numbers of vowel categories, i.e., German, Afrikaans, French, Catalan, Italian, Romanian, Polish, Russian, Spanish, and Basque. Audio files are extracted from Lingua Libre to be aligned and segmented using the Munich Automatic Segmentation System. Information on the formants of the vowel segments is then extracted to measure how vowels expand in the acoustic space and whether this is correlated with the number of vowel categories in the language. The results provide valuable insight into the question of vowel dispersion and demonstrate the wealth of information that crowd-sourced data has to offer.

Publisher

MDPI AG

Subject

Linguistics and Language,Language and Linguistics

Link

https://www.mdpi.com/2226-471X/7/3/234/pdf

Reference51 articles.

1. Voxcommunis: A corpus for cross-linguistic phonetic analysis;Ahn;Paper presented at the 12th International Conference on Language Resources and Evaluation Conference (LREC 2022),��15

2. Does vowel space size depend on language vowel inventories? Evidence from two Arabic dialects and French;Al-Tamimi;Paper presented at the Interspeech Eurospeech 2005,–8

3. PraatR: An architecture for controlling the phonetics software “Praat” with the R programming language

4. Common voice: A massively-multilingual speech corpus;Ardila;arXiv,2020

5. Languages with More Second Language Learners Tend to Lose Nominal Case

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. L’apport des données participatives pour l’étude linguistique des français du monde : le cas de l’opposition /a∼ɑ/;Journal of French Language Studies;2023-08-24