文摘
In this dissertation, we investigate novel ways of modeling and optimizing spatiotemporal priors beyond their conventional scope for video analysis problems. We are among the first to model spatiotemporal priors for the purpose of propagating human annotated pixel labels throughout video frames, which multiplies the training data size for joint segmentation and classification problems while retaining the same level of human labor. Our latest approach, based on hierarchical supervoxels that promotes longer range motion and spatiotemporal coherence, provides state-of-the-art performance compared to earlier versions built upon simple motion priors and spatially invariant appearance models. Furthermore, we have demonstrated new ways to model and optimize spatiotemporal priors in a random field framework. Our Higher Order Proxy Neighbors algorithm exhibit the plausibility of modeling higher order spatial priors with traditional first order neighborhoods, while our Video Graph-Shifts algorithm shows how to integrate the typically standalone motion estimation problem into a multi-class semantic pixel labeling framework. Finally, we show that, by extending the local smoothness assumptions to the image level (i.e., similar sub-images share the same parameter configurations for a given computer vision algorithm), we are able to build an efficient Parameter Inference Engine (PIE) for various computer vision problems. We develop an efficient SPEA2-LLP algorithm for multi-objective optimization, to explore and learn the optimal parameter configurations for the sub-images, which allows each region to be processed with different locally optimal parameter configurations. We show in our experiments that PIE provides an significant boost to image binarization, segmentation, and classification algorithms, which are all directly applicable to improve the performance of video analysis systems.