Affiliation:
1. College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, China
2. Key Laboratory of Perception Technology and Intelligent System of Inner Mongolia Autonomous Region, Hohhot 010080, China
Abstract
Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom interactivity through classroom image data is limited by the interference of complex teaching scenarios. However, audio data within the classroom are characterized by significant student–teacher interaction. This study proposes a multi-scale audio spectrogram transformer (MAST) speech scene classification algorithm and constructs a classroom interactive audio dataset to achieve interactive teacher–student recognition in the classroom teaching process. First, the original speech signal is sampled and pre-processed to generate a multi-channel spectrogram, which enhances the representation of features compared with single-channel features; Second, in order to efficiently capture the long-range global context of the audio spectrogram, the audio features are globally modeled by the multi-head self-attention mechanism of MAST, and the feature resolution is reduced during feature extraction to continuously enrich the layer-level features while reducing the model complexity; Finally, a further combination with a time-frequency enrichment module maps the final output to a class feature map, enabling accurate audio category recognition. The experimental comparison of MAST is carried out on the public environment audio dataset and the self-built classroom audio interaction datasets. Compared with the previous state-of-the-art methods on public datasets AudioSet and ESC-50, its accuracy has been improved by 3% and 5%, respectively, and the accuracy of the self-built classroom audio interaction dataset has reached 92.1%. These results demonstrate the effectiveness of MAST in the field of general audio classification and the smart classroom domain.
Funder
Inner Mongolia Natural Science Foundation Project
Basic Scientific Research Business Expense Project of Inner Mongolia Universities
Inner Mongolia Science and Technology Plan Project
Subject
Computer Networks and Communications
Reference56 articles.
1. Strategies for Building Positive Student-Instructor Interactions in Large Classes;Solis;J. Eff. Teach.,2016
2. Building positive student-instructor interactions: Engaging students through caring leadership in the classroom;Solis;J. Empower. Teach. Excell.,2017
3. Classroom interaction in EMI high schools: Do teachers who are native speakers of English make a difference?;An;System,2021
4. Intent, action and feedback: A preparation for teaching;Flanders;J. Teach. Educ.,1963
5. Speech emotion recognition using deep learning techniques: A review;Khalil;IEEE Access,2019
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献