Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack of such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank con-struction. Since Vietnamese is an isolated language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automatic-labeling tools and a tree-editor tool. Raw texts are extracted from Youth, an online Vietnamese daily newspaper. The current annotation agreement is around 90 percent. 1
Jan CuřínMartin ČmejrekJiří HavelkaVladislav Kuboň
An DaoThinh Hung TruongLong NguyenĐiền Đinh
Tomohiro OhnoS. MatsuharaNobuo KawaguchiYoshiyuki Inagaki