Motivated by the abundance of images labeled only by their captions, we construct tree-structured multiscale conditional random fields capable of performing semi-supervised learning. We show that such caption-only data can in fact increase pixel-level accuracy at test time. In addition, we compare two kinds of tree: the standard one with pair wise potentials, and one based on noisy-or potentials, which better matches the semantics of the recursive partitioning used to create the tree.
Feng JiaoShaojun WangChi‐Hoon LeeRussell GreinerDale Schuurmans
Xuming HeRichard S. ZemelMiguel Á. Carreira-Perpiñán