Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues-Reference-Cited by-同舟云学术

Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues

Published:2023-09-12 Issue: Volume: Page:1-41
ISSN:0305-0009
Container-title:Journal of Child Language
language:en
Short-container-title:J. Child Lang.

Author:

GORIELY Zébulon^ORCID,CAINES Andrew,BUTTERY Paula

Abstract

Abstract We compare two frameworks for the segmentation of words in child-directed speech, PHOCUS and MULTICUE. PHOCUS is driven by lexical recognition, whereas MULTICUE combines sub-lexical properties to make boundary decisions, representing differing views of speech processing. We replicate these frameworks, perform novel benchmarking and confirm that both achieve competitive results. We develop a new framework for segmentation, the DYnamic Programming MULTIple-cue framework (DYMULTI), which combines the strengths of PHOCUS and MULTICUE by considering both sub-lexical and lexical cues when making boundary decisions. DYMULTI achieves state-of-the-art results and outperforms PHOCUS and MULTICUE on 15 of 26 languages in a cross-lingual experiment. As a model built on psycholinguistic principles, this validates DYMULTI as a robust model for speech segmentation and a contribution to the understanding of language acquisition.

Funder

Cambridge Trust

Publisher

Cambridge University Press (CUP)

Subject

General Psychology,Linguistics and Language,Developmental and Educational Psychology,Experimental and Cognitive Psychology,Language and Linguistics

Reference89 articles.

1. Estimation of probabilities from sparse data for the language model component of a speech recognizer

2. Where learning begins: initial representations for language learning

3. The role of exposure to isolated words in early vocabulary development

4. Cross-linguistic comparison of complexity measures in phonological systems

5. The Beginnings of Word Segmentation in English-Learning Infants