Pritha Singha RoyVinay Kukreja
Abstract Rice leaf diseases represent a major hazard to rice production worldwide, affecting the output, integrity, and nutritional value of the crop. Conventional methods are time-consuming, costly, and often inaccessible to smallholder farmers, necessitating scalable and efficient solutions. This research delves into the application of Vision Transformers (ViTs) to detect rice leaf diseases and estimate severity levels, addressing the limitations of conventional Convolution Neural Network (CNN)-based models, such as overfitting and computational inefficiency. The proposed model was trained on a custom dataset comprising 3,345 annotated images of rice leaves representing 10 disease types and three severity levels (mild, moderate, and severe). The ViT model used multi-head self-attention and a shared backbone for disease classification and severity estimation. Cross-entropy loss, Adam optimizer (η = 0.001), and data augmentation techniques (e.g., rotations, flips) were employed to enhance performance. Evaluation metrics included precision, recall, F1-score, and Area under the Receiver Operating Characteristic Curve (AUC-ROC). The ViT model achieved a macro-averaged F1-score of 53.52% and a weighted-average F1-score of 54.17% for disease classification, with Yellow Molte performing best (F1 = 65.85%) and Rice Blast lowest (F1 = 48.64%). Severity classification achieved higher accuracy, with a macro-averaged F1-score of 77.79% and weighted-average F1-score of 77.94%, with mild severity scoring the highest (81.70%). The model exhibited strong discriminative ability (AUC = 0.86).
Manan BakshiPriyal ChughV. Arulalan
Radhika WadhawanMayyank GargAshish Kumar Sahani
Saurabh JainVipin KumarManish SharmaNaveen Dwivedi
Nikhil SharmaRahul NijhawanKaramjeet Singh