Due to the COVID-19 pandemic, education institutions have to rely on e-learning tools, including in programming courses. Automatic graders can be used to speed up the process of evaluating the correctness. Unfortunately, answers for coding exercises can be easily plagiarized. Manual grading of all student submissions may notbe feasible. Therefore, a system that can help detecting similar codes is needed. The detection can be done by grouping similar source codes based on their structure. This method is used in previous research by using automatic K-means iterations algorithm. That algorithm, although produced decent clusters, had a long execution time. The purpose of this research is to improve the time efficiency and clusters result quality by using bisecting K-means algorithm. The results showed a significant improvement in execution time from 11.68 seconds to 6.64 seconds. Bisecting K-means also produced fewer clusters with slightly better Rand Index than K-means iterations. We also conduct experiments using 2-grams to 6-grams and confirm that 4-grams result in the best performance.
Vinita RohillaMs Sanika Singh kumarSudeshna ChakrabortyMs. Sanika Singh
Harpreet SinghSarabpreet KaurChetna Kaushal
Zi YeKun LiangZhiyuan ZhangChunfeng WangZhe Peng