Automated labeling of PDF mathematical exercises with word N-grams VSM classification-Reference-Cited by-同舟云学术

Automated labeling of PDF mathematical exercises with word N-grams VSM classification

Published:2023-10-18 Issue:1 Volume:10 Page:
ISSN:2196-7091
Container-title:Smart Learning Environments
language:en
Short-container-title:Smart Learn. Environ.

Author:

Yamauchi Taisei,Flanagan Brendan^ORCID,Nakamoto Ryosuke,Dai Yiling,Takami Kyosuke,Ogata Hiroaki

Abstract

AbstractIn recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.

Funder

Japan Society for the Promotion of Science London

New Energy and Industrial Technology Development Organization

Publisher

Springer Science and Business Media LLC

Subject

Computer Science Applications,Education

Link

https://link.springer.com/content/pdf/10.1186/s40561-023-00271-9.pdf

Reference71 articles.

1. Abekawa, T., & Aizawa, A. (2016). SideNoter: Scholarly paper browsing system based on PDF restructuring and text annotation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, 136–140.

2. Australian Curriculum, Assessment and Reporting Authority (ACARA). F-10 curriculum mathematics structure. Retrieved 01 September, 2023 from https://www.australiancurriculum.edu.au/f-10-curriculum/mathematics/structure/.

3. Bhartiya, D., Contractor, D., Biswas, S., Senjupta, B., & Mohania, M. (2016). Document segmentation for labeling with academic learning objectives. In Paper presented at the International Conference on Educational Data Mining, 282–287.

4. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

5. Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, 161175.