Sequence labeling tasks, such as named entity recognition and part of speech tagging, are the fundamental compositions of the information extraction system, and thus received attentions these years. This paper proposes k-similar conditional random fields for semi-supervised sequence labeling, and makes use of unlabeled data to calculate the similarity between words with distributional clustering. The named entity recognition experiments show that this method can improve the performance through unlabeled data.
Feng JiaoShaojun WangChi‐Hoon LeeRussell GreinerDale Schuurmans
David DuvenaudBenjamin M. MarlinKevin J. Murphy
Romansha ChopraNivedita SinghZhenning YangN. Ch. S. N. Iyengar
Nataliya SokolovskaThomas LavergneOlivier CappéFrançois Yvon