Chat Generative Pre-trained Transformer (ChatGPT) is a large language model-based chatbot that can interact with people and hold interesting and interactive conversations.Individuals have the ability to engage in dialogues with the model by submitting input sentences or prompts of their choosing.Over the past months, ChatGPT has been continuously growing in popularity, reaching over one million users in a matter of days and surpassing the one billion visits in less than 5 months.It is clear that ChatGPT has become an important aid for numerous people, as there are various tasks it is used into, such as generation, question answering, rewriting or simple chatting.Such tasks are represented by certain instructions that are encapsulated in the user input sent to the model.Having access to the most common types of user's instructions could help Machine Learning engineers improve current datasets and models and adapt them to better suit human needs.However, obtaining a large amount of annotated data is expensive and time-consuming.In order to address the aforementioned issues, we investigate the usage of semi-supervised learning techniques.In this paper we describe the creation process of a new multi-label classification dataset for i nstruction classification i n C hatGPT u sing u ser-shared c onversations and employ various semi-supervised learning approaches in order to boost our model's performances.The unlabeled data used for semi-supervised learning methods is extracted from the same source as our labeled dataset.This approach increased the weighted F1 score of the model by 3.5%.
Chengzhe YuanZekai ZhouFeiyi TangRonghua LinChengjie MaoLuyao Teng
Tim FrommknechtPedro Alves ZipfQuanfu FanNina ShvetsovaHilde Kuehne
Chao DengMaozu GuoYang LiuHaifeng Li