Visual attention -- the brain's mechanism for selecting important visual information -- is influenced by a combination of bottom-up (sudden, unexpected visual events that are spatio-temporally different from the surroundings) and top-down (goal-relevant) factors. Although both are crucial for real-world applications like robot navigation or visual surveillance, most existing models are either purely bottom-up or top-down. In this thesis, we present a new model that integrates top-down and bottom-up attention. We begin with a wide perspective ofhow a task specification (e.g., "who is doing what to whom'') influences attention during scene understanding. We propose and partially implement a general-purpose architecture illustrating how different bottom-up and top-down components of visual processing such as the gist, saliency map, object detection and recognition modules, working memory, long term memory, task-relevance map may interact and interface with each other to guide attention to salient and relevant scene locations. Next, we investigate the specifics of how bottom-up and top-down influences may integrate while searching for a target in a distracting background. We probe the granularity of information integration within feature dimensions such as color, size, luminance. Results of our eye tracking experiments assert that bottom-up responses encoding feature dimensions can be modulated by not just one, but several top-down gain control signals, thusrevealing high granularity of integration. Finally, we investigate the computational principles underlying the integration. We derive a formal theory of optimal integration of bottom-up salience with top-down knowledge about target and distractor features, such that the target's salience relative to the distractors is maximized, thereby accelerating search speed.
Zehua JiSteven P. BrumbyGarrett T. KenyonLuís M. A. Bettencourt
Qiong WuChunlin LiSatoshi TakahashiJinglong Wu
Gerald S. WassermanAmanda R. BolbeckerJia LiCorrinne C. M. Lim-Kessler