支持多层表示的海量视频快速检索及反馈学习
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络视频应用的普及和发展,海量视频检索需求强烈,其中的核心技术是海量高维向量的快速检索。如何快速的发现,检索并结合用户相关反馈信息处理海量视频数据成为行业和领域中亟待研究和解决的问题。
     本文针对视频信息内容的数据特点和实际应用中海量,高维,语义,快速的需求,研究了基于哈希的近似最近邻搜索,相关反馈学习,多媒体内容的多层表示,海量数据的分布式检索,视频拷贝检测问题,视频检索中的反馈学习等方面国内外科研的最新成果,分析了目前多层高维向量快速检索的关键问题与发展方向。本文主要的研究工作和创新点包括如下几个方面:
     1.提出了局部敏感哈希与对等网络结合的分布式海量高维向量检索方法。
     针对海量高维数据的快速检索问题,首先研究了近似最近邻搜索的局部敏感哈希算法索引数据分布特点以及基于分布式哈希表的对等网络协议,然后提出一种非均匀的Hilbert曲线算法,将索引数据从哈希桶标签空间映射到对等网络的节点命名空间。进一步提出一种有效的分布式海量高维数据快速检索计算框架,用来完成索引数据的和计算负载的分布式处理。实验结果表明在5000节点的网络规模,本文算法相比SHA-1算法减少了约40%处理路由跳数和约30%的参与节点数。
     2.研究了采用虚拟节点算法对分布式海量高维向量检索负载均衡的优化。
     针对海量高维数据分布式检索的负载均衡问题,研究了对等网络的动态负载均衡算法和索引结构的数据分布特点,设计了一种基于虚拟节点的动态调度算法。在物理节点与索引数据之间引入虚拟节点层,将数据关键字空间从映射到物理节点命名空间优化为映射到虚拟节点命名空间,并提出了两层网络结构中负载动态迁移调度的算法。通过在对等协议中附加负载信息数据结构和负载探测与迁移方法,保证分布式检索效率的同时优化了系统的负载均衡,在仿真平台OverSim的实验结果表明算法在付出13%额外网络通讯负载的同时负载均衡效果改善50%以上。
     3.提出了基于多层表示的重复视频检测算法。
     重复视频检测是多媒体应用的一个热点问题,本文研究了视频的多层表示,即通过镜头分割,关键帧提取,局部特征向量提取将视频内容的检索转化为海量高维向量检索问题。提出了基于多层表示的重复视频检测计算框架,设计了一种自适应局部敏感哈希算法,通过样本学习参数,预估哈希表中桶内数据的总数,突破了桶内距离计算的性能瓶颈,并提出了特征过滤和基于投票机制的两层匹配算法。在标准数据集MUSCLE-VCD上的实验表明本文的快速检测算法相比国内外最新算法,消耗3%存储负载的同时获得了3.6-5.1倍的速度提升。
     4.提出了视频检索中基于典型关联分析的反馈学习算法。
     基于内容的视频检索问题不同于重复视频检测问题,需要更多的语义信息来判定相似的概念,本文研究了目前相似视频检索的主要算法,指出了基于二进制编码和度量学习的算法无法增量式更新索引,依赖学习样本,检索过程维护代价高的问题。在视频内容多层表示基础上,提出了基于典型关联分析的反馈学习算法,利用用户的相关反馈,将视频级别的样本信息回溯到特征向量级别,通过典型关联分析学习集合于集合之间的最优关联矩阵,并修正检索结果。在TRECVID2008的评估数据集上的实验表明与其他算法相比在保证高准确率情况下,召回率平均提升16.7%。
With the rapid growth of digital video content production on the web, content-based video retrieval (CBVR) has been receiving increasing attention over the last decade. In the computer vision and machine learning community, many approaches focus on multimedia information indexing and retrieval techniques. Compared with individual images, videos have much richer content and therefore need a more complicated structure to describe, index and retrieve.
     This paper analyzes the data characteristics of video content and focuses on the four demands of practical applications including large-scale, high dimensions, semantics and fast retrieval. Some challenging tasks are also present such as hash-based appropriate nearest-neighbor search, relevance feedback learning, multi-level presentation of video content, distributed retrieval in massive datasets, video copy detection and semantic learning in content-based video retrieval. The main contributions are illustrated as follows.
     1. A scalable peer-to-peer indexing scheme for high dimensional nearest neighbor search in massive dataset using locality-sensitive hashing is proposed. LSH indexing structure and DHT network topology are introduced and a nun-uniform Hilbert Curve is present to map LSH bucket labels to Chord node rings. Also a framework of distributed indexing and retrieval is designed to accelerate the retrieval and expand the indexing from centralized settings to distributed scenarios. Experimental results show that in a Chord ring containing5000nodes, the proposed approach reduce the hops of retrieval by40%and the number of working nodes by30%.
     2. A load balancing and maintenance scheme in structured P2P network for locality-sensitive hashing is proposed. Based on the distributed similarity search system using LSH and load balancing schemes using virtual nodes in DHT, after discussing the particular load balancing problem with mapping the multi-dimensional LSH bucket space to the DHT naming space, a chord-based structure using virtual nodes to perform load balancing algorithms is present. Simulations on the OverSim platform demonstrate the effectiveness of the proposed method by50%load balancing gains with about13%additional communication cost.
     3. A computationally efficient algorithm for large scale near-duplicate video detection is proposed. Large scale video copy detection is very desirable for web video processing especially the computational efficiency is essential for practical applications. Based on multi-level video content analysis, local features are extracted from key frames of videos and indexed by an novel adaptive locality sensitive hashing scheme. By learning several parameters, fast retrieval in the new hashing structure is performed without high dimensional distance computations and achieves better real-time retrieving performance compared with state-of-the-art approaches. Then a descriptor filtering method and a two-level matching scheme are performed to generate a relevance score for detection. Experiments demonstrate the efficiency gains of the proposed algorithm.
     4. An incremental relevance feedback learning method for large scale content-based video retrieval is proposed. By leveraging local feature detectors feature descriptor are extracted from video collections. Then multi-level matching is performed after indexing and retrieval of feature vectors using the state-of-the-art techniques in content representation, similarity measure selection. A novel incremental relevance feedback approach based on canonical correlation analysis (CCA) is introduced to bridge the gap between semantic notions of search relevance and the low-level representation of video content. Experimental results on real world demonstrate the precision and recall gains of the proposed method.
引文
[1]De Berg M, Cheong O, Van Kreveld M. Computational geometry:algorithms and applications. Springer-Verlag New York Inc,2008.
    [2]Bentley J L. Multidimensional binary search trees used for associative searching. Communications of the ACM,1975,18(9):509-517.
    [3]Samet H. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006.
    [4]Weber R, Schek H J, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,1998[C].
    [5]Indyk P, Motwani R. Approximate nearest neighbors:towards removing the curse of dimensionality:Proceedings of the thirtieth annual ACM symposium on Theory of computing, 1998[C].ACM.
    [6]Kushilevitz E, Ostrovsky R, Rabani Y. Efficient search for approximate nearest neighbor in high dimensional spaces:Proceedings of the thirtieth annual ACM symposium on Theory of computing,1998[C]. ACM.
    [7]Ailon N, Chazelle B. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform:Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, 2006[C].ACM.
    [8]Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing: Proceedings of the International Conference on Very Large Data Bases,1999[C].
    [9]Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions:Proceedings of the twentieth annual symposium on Computational geometry, 2004[C]. ACM.
    [10]Bawa M, Condie T, Ganesan P. LSH forest:self-tuning indexes for similarity search: Proceedings of the 14th international conference on World Wide Web,2005 [C]. ACM.
    [11]Lv Q, Josephson W, Wang Z, et al. Multi-probe LSH:efficient indexing for high-dimensional similarity search:Proceedings of the 33rd international conference on Very large data bases,2007[C]. VLDB Endowment.
    [12]Hauptmann A, Baron R V, Chen M, et al. Informedia at TRECVID 2003:Analyzing and searching broadcast news video.2004.
    [13]Liu Z, Liu T, Gibbon D C, et al. Effective and scalable video copy detection:Proceedings of the international conference on Multimedia information retrieval,2010[C]. ACM.
    [14]Shang L, Yang L,Wang F, et al. Real-time large scale near-duplicate web video retrieval: Proceedings of the international conference on Multimedia,2010[C]. ACM.
    [15]Yeh M C, Cheng K T.A compact, effective descriptor for video copy detection:Proceedings of the 17th ACM international conference on Multimedia,2009[C]. ACM.
    [16]Lowe D G. Distinctive image features from scale-invariant keypoints. International journal of computer vision,2004,60(2):91-110.
    [17]Ke Y, Sukthankar R. PCA-SIFT:A more distinctive representation for local image descriptors:Computer Vision and Pattern Recognition,2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on,2004[C]. Ieee.
    [18]Bay H, Tuytelaars T, Van Gool L. Surf:Speeded up robust features. Computer Visiont鈥揈 CCV 2006,2006:404-417.
    [19]Juan L, Gwun O. A comparison of sift, pca-sift and surf. International Journal of Image Processing,2009,3(4):143-152.
    [20]Sivic J, Zisserman A. Video Google:A text retrieval approach to object matching in videos: Computer Vision,2003. Proceedings. Ninth IEEE International Conference on,2003[C]. Ieee.
    [21]Sivic J, Zisserman A. Video Google:Efficient visual search of videos. Toward Category-Level Object Recognition,2006:127-144.
    [22]Pearson W R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in enzymology,1990,183:63.
    [23]Tatusova T A, Madden T L. BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS microbiology letters,1999,174(2):247-250.
    [24]Kent W J. BLAT:The BLAST-like alignment tool. Genome research,2002,12(4):656-664.
    [25]Katoh K, Misawa K, Kuma K, et al. MAFFT:a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research,2002,30(14):3059-3066.
    [26]Yen M C, Cheng K T. Video copy detection by fast sequence matching:Proceedings of the ACM International Conference on Image and Video Retrieval,2009[C]. ACM.
    [27]Chiu C Y, Wang H M, Chen C S. Fast min-hashing indexing and robust spatio-temporal matching for detecting video copies. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP),2010,6(2):10.
    [28]Hu W, Xie N, Li L, et al. A survey on visual content-based video indexing and retrieval. Systems, Man, and Cybernetics, Part C:Applications and Reviews, IEEE Transactions on, 2011(99):1-23.
    [29]Manning C D, Raghavan P, Schutze H. Introduction to information retrieval.1. Cambridge University Press Cambridge,2008.
    [30]Raghavan H, Madani O, Jones R. Active learning with feedback on features and instances. The Journal of Machine Learning Research,2006,7:1655-1686.
    [31]Gorisse D, Cord M, Precioso F. Scalable active learning strategy for object category retrieval: Proceedings of the 17th IEEE International Conference on Image Processing (ICIP10),2010[C].
    [32]Chen M, Christel M, Hauptmann A, et al. Putting active learning into multimedia applications:dynamic definition and refinement of concept classifiers:Proceedings of the 13th annual ACM international conference on Multimedia,2005[C]. ACM.
    [33]Guttman A. R-trees:a dynamic index structure for spatial searching.14. ACM,1984.
    [34]Bentley J L. K-d trees for semidynamic point sets:Proceedings of the sixth annual symposium on Computational geometry,1990[C]. ACM.
    [35]Berchtold S, Bechm C, Kriegal H P. The pyramid-technique:towards breaking the curse of dimensionality:ACM SIGMOD Record,1998[C]. ACM.
    [36]Yu C, Ooi B C, Tan K L, et al. Indexing the distance:An efficient method to knn processing: Proceedings of the international conference on very large data bases,2001[C].
    [37]Falchi F, Gennaro C, Zezula P. A content-addressable network for similarity search in metric spaces:Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing,2005 [C]. Springer-Verlag.
    [38]Ratnasamy S, Francis P, Handley M, et al. A scalable content-addressable network. ACM SIGCOMM Computer Communication Review,2001,31(4):161-172.
    [39]Banaei-Kashani F, Shahabi C. SWAM:a family of access methods for similarity-search in peer-to-peer data networks:Proceedings of the thirteenth ACM international conference on Information and knowledge management,2004 [C]. ACM.
    [40]. Zhang C, Krishnamurthy A, Wang R Y. Skipindex:Towards a scalable peer-to-peer index service for high dimensional data:Princeton Univ,2004[C].
    [41]Jagadish H V, Ooi B C, Vu Q H, et al. Vbi-tree:A peer-to-peer framework for supporting multi-dimensional indexing schemes:Data Engineering,2006. ICDE'06. Proceedings of the 22nd International Conference on,2006[C]. IEEE.
    [42]Sahin O, Emekeri F, Agrawal D, et al. Content-based similarity search over peer-to-peer systems. Databases, Information Systems, and Peer-to-Peer Computing,2005:61-78.
    [43]Haghani P, Michel S, Aberer K. Distributed similarity search in high dimensions using locality sensitive hashing:Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology,2009[C]. ACM.
    [44]Tao Y, Yi K, Sheng C, et al. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Transactions on Database Systems (TODS),2010,35(3):20.
    [45]Andoni A. Nearest Neighbor Search:the Old, the New, and the Impossible.2009.
    [46]Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters. Communications of the ACM,2008,51(1):107-113.
    [47]Stoica I, Morris R, Karger D, et al. Chord:A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review,2001,31(4):149-160.
    [48]Stoica I, Morris R, Liben-Nowell D, et al. Chord:a scalable peer-to-peer lookup protocol for internet applications. Networking, IEEE/ACM Transactions on,2003,11(1):17-32.
    [49]Cates J. Robust and efficient data management for a distributed hash table.2003.
    [50]Sagan H. Space-filling curves.2. Springer-Verlag New York,1994.
    [51]Andoni A, Indyk P. E21sh:Exact euclidean locality-sensitive hashing. Implementation available at,2004.
    [52]Bolettieri P, Esuli A, Falchi F, et al. CoPhIR:a test collection for content-based image retrieval. Arxiv preprint arXiv:0905.4627,2009.
    [53]Karger D R, Ruhl M. Simple efficient load balancing algorithms for peer-to-peer systems: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures,2004[C]. ACM.
    [54]Rao A, Lakshminarayanan K, Surana S, et al. Load balancing in structured P2P systems. Peer-to-Peer Systems Ⅱ,2003:68-79.
    [55]Dabek F. A cooperative file system.2001.
    [56]Rowstron A, Druschel P. Pastry:Scalable, decentralized object location, and routing for large-scale peer-to-peer systems:Middleware 2001,2001 [C]. Springer.
    [57]Zhao B Y, Kubiatowicz J, Joseph A D. Tapestry:An infrastructure for fault-tolerant wide-area location and routing.2001.
    [58]Baumgart I, Heep B, Krause S. OverSim:A flexible overlay network simulation framework: IEEE Global Internet Symposium,2007,2007[C]. IEEE.
    [59]Baumgart I, Heep B, Krause S. OverSim:A scalable and flexible overlay framework for simulation and real network applications:Peer-to-Peer Computing,2009. P2P'09. IEEE Ninth International Conference on,2009[C]. IEEE.
    [60]Caspi Y, Irani M. Spatio-temporal alignment of sequences. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2002,24(11):1409-1424.
    [61]Kim C, Vasudev B. Spatiotemporal sequence matching for efficient video copy detection. Circuits and Systems for Video Technology, IEEE Transactions on,2005,15(1):127-132.
    [62]Poullot S, Buisson O, Crucianu M. Scaling content-based video copy detection to very large databases. Multimedia Tools and Applications,2010,47(2):279-306.
    [63]Yeo C, Zhu Y W, Sun Q, et al. A framework for sub-window shot detection:Multimedia Modelling Conference,2005. MMM 2005. Proceedings of the 1 lth International,2005 [C]. IEEE.
    [64]Ko K C, Cheon Y, Kim G Y, et al. Video Shot Boundary Detection Algorithm. Computer Vision, Graphics and Image Processing,2006:388-396.
    [65]Cernekova Z, Pitas I, Nikou C. Information theory-based shot cut/fade detection and video summarization. Circuits and Systems for Video Technology, IEEE Transactions on, 2006,16(l):82-91.
    [66]Camara-Chavez G, Precioso F, Cord M, et al. Shot boundary detection by a hierarchical supervised approach:Systems, Signals and Image Processing,2007 and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. 14th International Workshop on,2007[C]. IEEE.
    [67]Matsumoto K, Naito M, Hoashi K, et al. SVM-based shot boundary detection with a novel feature:Multimedia and Expo,2006 IEEE International Conference on,2006[C]. IEEE.
    [68]Detection S B. TRECVID 2007 by the Brno Group.
    [69]Lo C C, Wang S J. Video segmentation using a histogram-based fuzzy c-means clustering algorithm. Computer Standards & Interfaces,2001,23(5):429-438.
    [70]Sze K W, Lam K M, Qiu G. A new key frame representation for video segment retrieval. Circuits and Systems for Video Technology, IEEE Transactions on,2005,15(9):1148-1155.
    [71]Truong B T, Venkatesh S. Video abstraction:A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 2007,3(1):3.
    [72]Besiris D, Laskaris N, Fotopoulou F, et al. Key frame extraction in video sequences:a vantage points approach; Multimedia Signal Processing,2007. MMSP 2007. IEEE 9th Workshop on,2007[C]. IEEE.
    [73]Mukherjee D P, Das S K, Saha S. Key frame estimation in video using randomness measure of feature point pattern. Circuits and Systems for Video Technology, IEEE Transactions on, 2007,17(5):612-620.
    [74]Boussaid L, Mtibaa A, Abid M, et al. A real-time shot cut detector:Hardware implementation. Computer Standards & Interfaces,2007,29(3):335-342.
    [75]Yeh M C, Cheng K T. Fast Visual Retrieval Using Accelerated Sequence Matching. Multimedia, IEEE Transactions on,2010(99):1.
    [76]Shang L, Yang L, Wang F, et al. Real-time large scale near-duplicate web video retrieval: Proceedings of the international conference on Multimedia,2010[C]. ACM.
    [77]Xue-Feng P, Jin-Tao L I, Zhang Y D, et al. Spatiotemporal video copy detection based on visual perception analyses. Chinese Journal of Computers,2009,32(1):108-114.
    [781 Law-To J, Buisson O, Gouet-Brunet V, et al. Robust voting algorithm based on labels of behavior for video copy detection:Proceedings of the 14th annual ACM international conference on Multimedia,2006[C]. ACM.
    [79]Fischler M A, Bolles R C. Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981,24(6):381-395.
    [80]Avrithis Y, Tolias G, Kalantidis Y. Feature map hashing:sub-linear indexing of appearance and global geometry:Proceedings of the international conference on Multimedia,2010[C]. ACM.
    [81]Law-To J, Chen L, Joly A, et al. Video copy detection:a comparative study:Proceedings of the 6th ACM international conference on Image and video retrieval,2007[C]. ACM.
    [82]Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions:Foundations of Computer Science,2006. FOCS'06.47th Annual IEEE Symposium on,2006[C]. Ieee.
    [83]Zhang Z, Cao C, Zhang R, et al. Video copy detection based on speeded up robust features and locality sensitive hashing:Automation and Logistics (ICAL),2010 IEEE International Conference on,2010[C]. IEEE.
    [84]Law-To J, Joly A, Boujemaa N. Muscle-VCD-2007:a live benchmark for video copy detection.2007.
    [85]Jolliffe I T, Myilibrary. Principal component analysis.2. Wiley Online Library,2002.
    [86]Gordo A, Perronnin F. Asymmetric distances for binary embeddings:Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on,2011 [C]. IEEE.
    [87]Weiss Y, Torralba A, Fergus R. Spectral hashing,2008[C]. NIPS.
    [88]Kulis B, Grauman K. Kernelized locality-sensitive hashing for scalable image search: Computer Vision,2009 IEEE 12th International Conference on,2009[C]. Ieee.
    [89]Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2011,33(1):117-128.
    [90]Strecha C, Bronstein A, Bronstein M, et al. LDAHash:Improved matching with smaller descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2010(99):1.
    [91]Gong Y, Lazebnik S. Iterative quantization:A procrustean approach to learning binary codes: Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on,2011 [C]. IEEE.
    [92]Le T L, Boucher A, Thonnat M. AN INTERFACE FOR IMAGE RETRIEVAL AND ITS EXTENSION TO VIDEO RETRIEVAL.2006.
    [93]Chen L H, Chin K H, Liao H Y. An integrated approach to video retrieval:Proceedings of the nineteenth conference on Australasian database-Volume 75,2008[C]. Australian Computer Society, Inc..
    [94]Joachims T. Optimizing search engines using clickthrough data:Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,2002[C]. ACM.
    [95]Ghosh H, Poornachander P, Mallik A, et al. Learning ontology for personalized video retrieval:Workshop on multimedia information retrieval on The many faces of multimedia semantics,2007[C]. ACM.
    [96]Hopfgartner F, Urban J, Villa R, et al. Simulated testing of an adaptive multimedia information retrieval system:Content-Based Multimedia Indexing,2007. CBMI'07. International Workshop on,2007[C]. IEEE.
    [97]Muneesawang P, Guan L. Automatic relevance feedback for video retrieval:Acoustics, Speech, and Signal Processing,2003. Proceedings.(ICASSP'03).2003 IEEE International Conference on,2003 [C]. IEEE.
    [98]Hauptmann A G, Christel M G, Yan R. Video retrieval based on semantic concepts. Proceedings of the IEEE,2008,96(4):602-622.
    [99]Chapelle O, Haffner P, Vapnik V N. Support vector machines for histogram-based image classification. Neural Networks, IEEE Transactions on,1999,10(5):1055-1064.
    [100]Gosselin P H, Cord M, Philipp-Foliguet S. Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval. Computer Vision and Image Understanding,2008,110(3):403-417.
    [101]Gorisse D, Cord M, Precioso F. Locality-Sensitive Hashing Scheme for Chi2 Distance. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2012(99):1.
    [102]Hardoon D R, Szedmak S, Shawe-Taylor J. Canonical correlation analysis:An overview with application to learning methods. Neural Computation,2004,16(12):2639-2664.
    [103]Chaudhuri K, Kakade S M, Livescu K, et al. Multi-view clustering via canonical correlation analysis:Proceedings of the 26th annual international conference on machine learning,2009[C]. ACM.