Implementation of an Automatic Meeting Minute Generation System Using YAMNet with Speaker Identification and Keyword Prompts-Reference-Cited by-同舟云学术

Implementation of an Automatic Meeting Minute Generation System Using YAMNet with Speaker Identification and Keyword Prompts

Published:2024-06-29 Issue:13 Volume:14 Page:5718
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Lu Ching-Ta¹^ORCID,Wang Liang-Yu²

Affiliation:

1. Department of Communications Engineering, Feng Chia University, Taichung City 407, Taiwan

2. Department of Information Communication, Asia University, Taichung City 413, Taiwan

Abstract

Producing conference/meeting minutes requires a person to simultaneously identify a speaker and the speaking content during the course of the meeting. This recording process is a heavy task. Reducing the workload for meeting minutes is an essential task for most people. In addition, providing conference/meeting highlights in real time is helpful to the meeting process. In this study, we aim to implement an automatic meeting minutes generation system (AMMGS) for recording conference/meeting minutes. A speech recognizer transforms speech signals to obtain the conference/meeting text. Accordingly, the proposed AMMGS can reduce the effort in recording the minutes. All meeting members can concentrate on the meeting; taking minutes is unnecessary. The AMMGS includes speaker identification for Mandarin Chinese speakers, keyword spotting, and speech recognition. Transferring learning on YAMNet lets the network identify specified speakers. So, the proposed AMMGS can automatically generate conference/meeting minutes with labeled speakers. Furthermore, the AMMGS applies the Jieba segmentation tool for keyword spotting. The system detects the frequency of words’ occurrence. Keywords are determined from the highly segmented words. These keywords help an attendant to stay with the agenda. The experimental results reveal that the proposed AMMGS can accurately identify speakers and recognize speech. Accordingly, the AMMGS can generate conference/meeting minutes while the keywords are spotted effectively.

Funder

National Science and Technology Council, Taiwan

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/13/5718/pdf

Reference21 articles.

1. Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., and Seybold, B. (2017, January 5–9). CNN architectures for large-scale audio classification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA.

2. Imagenet classification with deep convolutional neural networks;Krizhevsky;Commun. ACM,2017

3. Simonyan, K., and Zisserman, A. (2015, January 7–9). Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA.

4. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27–30). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

5. He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7–12). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.