HALİL İBRAHİM OKURAhmet Sertbaş
In the text classification process, which is a sub-task of NLP, the preprocessing and indexing of the text has a direct determining effect on the performance for NLP models. When the studies on pre-trained models are examined, it is seen that the changes made on the models developed for world languages or training the same model with a Turkish text dataset. Word-embedding is considered to be the most critical point of the text processing problem. The two most popular word embedding methods today are Word2Vec and Glove, which embed words into a corpus using multidimensional vectors. BERT, Electra and Fastext models, which have a contextual word representation method and a deep neural network architecture, have been frequently used in the creation of pre-trained models recently. In this study, the use and performance results of pre-trained models on TTC-3600 and TRT-Haber text sets prepared for Turkish text classification NLP task are shown. By using pre-trained models obtained with large corpus, a certain time and hardware cost, the text classification process is performed with less effort and high performance.
Kaifeng HaoJianfeng LiCuiqin HouXuexuan WangLI Peng-yu
Alaettin UçanMurat DörterlerEbru Akçapınar Sezer
Andrew Blaze PittaNarendra Reddy PingalaNaga Venkata Mani Charan JaladhiSowmya BogoluB. Suvarna