It is known that Linear Chain Conditional Random Fields have quadratic time and space complexity in terms of the output tags set cardinality. This fact poses a prohibitive performance penalty when the tag set is large, such as in language applications where the language has a rich set of morphosyntactic tags. However, knowledge of the allowed tag bigram combinations can lead to significant speedup and memory savings, often by several orders of magnitude, for both training and inference. This theoretical exposition presents how to exploit this knowledge by introducing steps and data structures to the ordinary Linear Chain Conditional Random Field implementation in order to achieve these savings.
Slavko ŽitnikLovro ŠubeljMarko Bajec