Neural Network-Based Approach to Detect and Filter Misleading Audio Segments in Classroom Automatic Transcription
-
Published:2023-12-14
Issue:24
Volume:13
Page:13243
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Hewstone Jorge1, Araya Roberto1ORCID
Affiliation:
1. Institute of Education, University of Chile, Periodista José Carrasco Tapia N° 75, Santiago 8380453, Chile
Abstract
Audio recording in classrooms is a common practice in educational research, with applications ranging from detecting classroom activities to analyzing student behavior. Previous research has employed neural networks for classroom activity detection and speaker role identification. However, these recordings are often affected by background noise that can hinder further analysis, and the literature has only sought to identify noise with general filters and not specifically designed for classrooms. Although the use of high-end microphones and environmental monitoring can mitigate this problem, these solutions can be costly and potentially disruptive to the natural classroom environment. In this context, we propose the development of a novel neural network model that specifically detects and filters out problematic audio sections in classroom recordings. This model is particularly effective in reducing transcription errors, achieving up to a 96% success rate in filtering out segments that could lead to incorrect automated transcriptions. The novelty of our work lies in its targeted approach for low-budget, aurally complex environments like classrooms, where multiple speakers are present. By allowing the use of lower-quality recordings without compromising analysis capability, our model facilitates data collection in natural educational settings and reduces the dependency on expensive recording equipment. This advancement not only demonstrates the practical application of specialized neural network filters in challenging acoustic environments but also opens new avenues for enhancing audio analysis in educational research and beyond.
Funder
Chilean National Agency for Research and Development
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference26 articles.
1. Bittencourt, I., Cukurova, M., Muldner, K., Luckin, R., and Millán, E. (2020). Artificial Intelligence in Education. AIED 2020, Springer. Lecture Notes in Computer, Science. 2. Li, H., Kang, Y., Ding, W., Yang, S., Yang, S., Huang, G.Y., and Liu, Z. (2020, January 4–8). Multimodal Learning for Classroom Activity Detection. Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain. 3. Cosbey, R., Wusterbarth, A., and Hutchinson, B. (2019, January 12–17). Deep Learning for Classroom Activity Detection from Audio. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK. 4. Zhang, X.-L., and Wu, J. (2013). Denoising Deep Neural Networks Based Voice Activity Detection. arXiv. 5. Thomas, S., Ganapathy, S., Saon, G., and Soltau, H. (2014, January 4–9). Analyzing Convolutional Neural Networks for Speech Activity Detection in Mismatched Acoustic Conditions. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy. Available online: https://api.semanticscholar.org/CorpusID:1646846.
|
|