Rajeev AgrawalWilliam I. GroskyFarshad FotouhiChanghua Wu
In this paper, we propose an approach to negotiate the gap between low-level image features and the human interpretation of the image. Taking the cue from text-based retrieval techniques, we construct "visual keywords" using vector quantization of small- sized image tiles. Both visual and textual keywords are combined and used to represent an image as a single multimodal vector. We use a diffusion kernel-based non-linear approach to fuse the visual and textual keywords. By comparing the performance of this approach with a low-level features-based approach, we demonstrate that visual keywords, when combined with textual keywords, improve the image retrieval results significantly.
Rajeev AgrawalWilliam I. GroskyFarshad FotouhiChanghua Wu
Rajeev AgrawalWilliam I. GroskyFarshad FotouhiChanghua Wu
Stefan RombergRainer LienhartEva Hörster