Implementation of an Automatic Meeting Minute Generation System Using YAMNet with Speaker Identification and Keyword Prompts
-
Published:2024-06-29
Issue:13
Volume:14
Page:5718
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Lu Ching-Ta1ORCID, Wang Liang-Yu2
Affiliation:
1. Department of Communications Engineering, Feng Chia University, Taichung City 407, Taiwan 2. Department of Information Communication, Asia University, Taichung City 413, Taiwan
Abstract
Producing conference/meeting minutes requires a person to simultaneously identify a speaker and the speaking content during the course of the meeting. This recording process is a heavy task. Reducing the workload for meeting minutes is an essential task for most people. In addition, providing conference/meeting highlights in real time is helpful to the meeting process. In this study, we aim to implement an automatic meeting minutes generation system (AMMGS) for recording conference/meeting minutes. A speech recognizer transforms speech signals to obtain the conference/meeting text. Accordingly, the proposed AMMGS can reduce the effort in recording the minutes. All meeting members can concentrate on the meeting; taking minutes is unnecessary. The AMMGS includes speaker identification for Mandarin Chinese speakers, keyword spotting, and speech recognition. Transferring learning on YAMNet lets the network identify specified speakers. So, the proposed AMMGS can automatically generate conference/meeting minutes with labeled speakers. Furthermore, the AMMGS applies the Jieba segmentation tool for keyword spotting. The system detects the frequency of words’ occurrence. Keywords are determined from the highly segmented words. These keywords help an attendant to stay with the agenda. The experimental results reveal that the proposed AMMGS can accurately identify speakers and recognize speech. Accordingly, the AMMGS can generate conference/meeting minutes while the keywords are spotted effectively.
Funder
National Science and Technology Council, Taiwan
Reference21 articles.
1. Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., and Seybold, B. (2017, January 5–9). CNN architectures for large-scale audio classification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA. 2. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. ACM,2017 3. Simonyan, K., and Zisserman, A. (2015, January 7–9). Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA. 4. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27–30). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 5. He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7–12). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.
|
|