To purify the online environment, it is essential to identify objectionable content, including offensive texts. However, some offensive texts are expressed in a more subtle manner, making it difficult to detect their literal characteristics. To enhance the effectiveness of detecting offensive Chinese text, we propose a multi-feature fusion-based method. First, we combine the word vectors obtained from Wobert with the character vectors obtained from ALBERT. The attention mechanism assigns greater importance to key features within the word vectors. Next, we merge the fusion vector with the sentence vector generated by ALBERT, which encompasses contextual semantics and syntactic information. This results in a new fusion vector that captures information at the character, word, and sentence levels. Finally, we employ a fully connected layer to process the three-level fusion vector and obtain the detection outcome. Experimental results demonstrate that this approach provides a comprehensive characterization of offensive text by fusing information from multiple levels. It substantially enhances the detection performance for offensive Chinese text.
Tengda GuoLianxin LinHang LiuChengping ZhengZhijian TuHaizhou Wang
Bing XiaoJing ZhaoCong ZhaoJunliang Ma
Guixian XuYueting MengXiaokai ZhouZiheng YuXu WuLijun Zhang
LIAN ZheYIN YanjunZHI MinXU Qiaozhi