Abstract
The current paper focuses on the investigation of spoken-language classification in audio broadcasting content. The approach reflects a real-word scenario, encountered in modern media/monitoring organizations, where semi-automated indexing/documentation is deployed, which could be facilitated by the proposed language detection preprocessing. Multilingual audio recordings of specific radio streams are formed into a small dataset, which is used for the adaptive classification experiments, without seeking—at this step—for a generic language recognition model. Specifically, hierarchical discrimination schemes are followed to separate voice signals before classifying the spoken languages. Supervised and unsupervised machine learning is utilized at various windowing configurations to test the validity of our hypothesis. Besides the analysis of the achieved recognition scores (partial and overall), late integration models are proposed for semi-automatically annotation of new audio recordings. Hence, data augmentation mechanisms are offered, aiming at gradually formulating a Generic Audio Language Classification Repository. This database constitutes a program-adaptive collection that, beside the self-indexing metadata mechanisms, could facilitate generic language classification models in the future, through state-of-art techniques like deep learning. This approach matches the investigatory inception of the project, which seeks for indicators that could be applied in a second step with a larger dataset and/or an already pre-trained model, with the purpose to deliver overall results.
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献