Abstract
The massive nature of modern university programming courses increases the burden on academic workers. The Digital Teaching Assistant (DTA) system addresses this issue by automating unique programming exercise generation and checking, and provides means for analyzing programs received from students by the end of semester. In this paper, we propose a machine learning-based approach to the classification of student programs represented as Markov chains. The proposed approach enables real-time student submissions analysis in the DTA system. We compare the performance of different multi-class classification algorithms, such as support vector machine (SVM), the k nearest neighbors (KNN) algorithm, random forest (RF), and extreme learning machine (ELM). ELM is a single-hidden layer feedforward network (SLFN) learning scheme that drastically speeds up the SLFN training process. This is achieved by randomly initializing weights of connections among input and hidden neurons, and explicitly computing weights of connections among hidden and output neurons. The experimental results show that ELM is the most computationally efficient algorithm among the considered ones. In addition, we apply biology-inspired algorithms to ELM input weights fine-tuning in order to further improve the generalization capabilities of this algorithm. The obtained results show that ELMs fine-tuned with biology-inspired algorithms achieve the best accuracy on test data in most of the considered problems.
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference53 articles.
1. PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets
2. Automatic Source Code Plagiarism Detection;Kustanto;Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing,2009
3. Deckard: Scalable and Accurate Tree-Based Detection of Code Clones;Jiang;Proceedings of the 29-th International Conference on Software Engineering (ICSE’07),2007
4. Syntax Tree Fingerprinting for Source Code Similarity Detection;Chilowicz;Proceedings of the 2009 IEEE 17th International Conference on Program Comprehension,2009
5. Unsupervised Learning-Based Approach for Plagiarism Detection in Programming Assignments;Yasaswi;Proceedings of the 10th Innovations in Software Engineering Conference,2017
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献