Detection of events and actions in video entails substantial processing of very large, even open-ended, video streams. Video data present a unique challenge for the information retrieval community because properly representing video events is challenging. We propose a novel approach to analyze temporal aspects of video data. We consider video data as a sequence of images that forms a 3-dimensional spatiotemporal structure, and perform multiview orthographic projection to transform the video data into 2-dimensional representations. The projected views allow a unique way to represent video events and capture the temporal aspect of video data. We extract local salient points from 2D projection views and perform detection-via-similarity approach on a wide range of events against real-world surveillance data. We demonstrate that our example-based detection framework is competitive and robust. We also investigate synthetic example driven retrieval as a basis for query-by-example. Abstract Approved: Thesis SupervisorApproved: Thesis Supervisor Title and Department
Robert MertensHoward LeiLuke GottliebGerald FriedlandAjay Divakaran
Ning ZhangLing‐Yu DuanQingming HuangLingfang LiWen GaoLing Guan
Shile ZhangJianping FanHong LuXiangyang Xue