Abstract
Vocalizations are highly specialized motor gestures that regulate social interactions. The reliable detection of vocalizations from raw streams of microphone data remains an open problem even in research on widely studied animals such as the zebra finch. A promising method for finding vocal samples from potentially few labelled examples (templates) is nearest neighbor retrieval, but this method has never been extensively tested on vocal segmentation tasks. We retrieve zebra finch vocalizations as neighbors of each other in the sound spectrogram space. Based on merely 50 templates, we find excellent retrieval performance in adults (F1 score of 0.93±0.07) but not in juveniles (F1 score of 0.64±0.18), presumably due to the larger vocal variability of the latter. The performance in juveniles improves when retrieval is based on fixed-size template slices (F1 score of 0.72±0.10) instead of entire templates. Among the several distance metrics we tested such as the cosine and the Euclidean distance, we find that the Spearman distance largely outperforms all others. We release our expert-curated dataset of more than 50’000 zebra finch vocal segments, which will enable training of data-hungry machine-learning approaches.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献