视觉注意机制的若干关键技术及应用研究

作者：单列
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：视觉注意 ; 视觉特征提取 ; 数据驱动注意机制 ; 任务驱动注意机制 ; 目标检测 ; 图像检索 ; MPEG-21
英文关键词：Visual attention ; visual feature extraction ; data-drive attention mechanism ; task-drive attention mechanism ; object detection ; image retrieval ; MPEG-21
学位年度：2008
导师：刘政凯
学科代码：081002
学位授予单位：中国科学技术大学
论文提交日期：2008-08-01

摘要

在人类所感知的外界信息中,大约80%的部分都来自于视觉,视觉信息的重要地位决定了对视觉信息的研究必定是当前科学研究领域的热门课题。从计算机诞生之日起,人类就希望有朝一日计算机能够像人类那样通过视觉观察去理解世界,具备自动适应环境的能力。但是目前的计算机视觉与人类视觉在能力上存在着巨大的差距。为了缩小这个差距,科学家们长久以来不断研究人类的视觉机制,并提出许多新的符合人类视觉处理特征的计算机信息处理方法来提高计算机的视觉处理能力。计算机视觉注意机制正是在此背景之下于近年新兴起的一种图像信息处理方法。
     本论文尝试针对以下问题进行探索:如何更准确地模拟人类视觉信息处理过程,搭建完整的视觉注意机制模型对图像数据进行处理;如何将视觉注意机制与具体的图像分析与理解任务相结合,根据视觉任务优化模型,使注意机制在多个实用领域发挥巨大的作用。
     本文的主要工作和创新之处归纳为以下几点:
     1.深入分析了视觉注意机制的工作原理,为将注意机制引入图像信息处理过程奠定了扎实的理论基础。
     2.研究并实现了当前较为成熟的视觉注意模型,引申并优化了模型的关键算法,并在此研究基础之上提出了新的感兴趣区域提取模型和多显著目标检测模型,对比表明新的模型对显著区域的提取更加准确和有效,符合人的视觉习惯。
     3.将视觉注意机制应用于复杂背景下的目标搜索。通过实例对比分析注意模型的抗干扰性和噪声鲁棒性。提出了注意机制和偏振信息检测机制相融合的目标检测框架,在人造目标检测上获得了很好的实验结果。
     4.在尺度注意模型的研究基础之上,将注意机制引入图像检索技术,提出了一种新的基于尺度注意机制和EMD判决距离的建筑物检索方法,并在此研究基础之上提出了一个完整的建筑物检索框架。
     5.研究视觉注意机制在MPEG-21中的应用。在深入研究MPEG-21框架中数字项适配技术的基础之上,研究注意模型在数字项自适应适配机制中的应用,并给出了实验结果。
     综上所述,本论文深入研究了视觉注意机制的关键技术,并系统地设计和试验了视觉注意机制在图像信息处理领域的多项应用,将本文中的各种算法和模型应用于多种类型的真实自然图像,都取得了较好的实验结果。视觉注意机制的研究是一个很有潜力的领域,随着人类视觉研究的不断进步,关于视觉注意机制的更多设想和技术会不断更新,对视觉注意机制应用的研究也会更加的丰富,希望本文有限的工作能够为推动该技术的发展略尽绵薄之力。
As we all know,almost 80%of the information we captured is from vision when we observe the outside world.The research on vision information has become excitingly attractive because of its important role in information area.From the birth of computer,people are expecting computer could understand the outside world by sight like human beings and have the ability to adapt surroundings automatically.To reduce the difference between the visual ability of computer and human beings, scientists have been researching on the visual mechanism of human and brought forward many computer information processing methods to improve the computer visual processing ability.Visual attention mechanism is an innovative image processing method arising recently on this background.
     In this dissertation,we discuss on these questions below:how to simulate the human visual processing procedure and create an integrated visual attention model to handle image data;How to apply the visual attention mechanism on practical tasks in image analysis and understanding;How to optimize visual attention model according to visual tasks and make it play an important role in applications.
     The main contribution of this dissertation can be summarized as follows:
     1.Analyze the working principles of visual attention mechanism in detail and build a profound theory foundation to apply the attention mechanism on image information processing procedure.
     2.Research on the classic visual attention models and optimize the key techniques of them.Based on the research we put forward an innovative model for salient region extraction and multi-salient objects detection.The comparison shows that our extraction results are relatively more accurate and effective,and more appropriate for human visual habits.
     3.Apply visual attention model on object searching in complex background. Analyze the noise robustness of attention model with examples.Put forward an integrated object detection scheme combined attention mechanism and polarization information detection,which acquired satisfied result on man-made object searching.
     4.Based on the research of scale saliency model,we applied attention model on image retrieval area.We implemented a building retrieval scheme based on scale saliency and EMD distance measure.
     5.We applied visual attention model on MPEG-21.Based on the digital item adaptation mechanism,the application of attention model in DIA is researched and experiment result is provided.
     In conclusion,we studied the key technology of visual attention mechanism, designed and tested its various applications in image processing area,Lots of good experiment results are achieved with our method and model.The research on visual attention mechanism is a quite promising area.With the advancement of the research on human vision,the technology of visual attention mechanism can be incessantly updated with more innovative ideas,and the application of such technology could also be enriched.This dissertation is expected to impel the advancement of such technologies.

引文

阿查姆R M A.1986.圆偏振测量术和偏振光[M].科学出版社.
    Andre.Y等.1997.POLDER仪器的原理和性能(上、下)[J].红外,11(1),12(22).
    曹汉军,乔延利等.2002.偏振遥感图像特性表征及分析[J].量子电子学报,19(4):373-378.
    D.A.Forsyth,J.Ponce著.2004.计算机视觉:一种现代方法[M].电子工业出版社.
    D 马尔.1988.视觉计算理论.姚国正,刘磊,王云九译.北京:科学出版社.
    F.Crick,C.Koch著.1993.意识问题[M],北京:科学出版社.
    F.克里克著.2007.惊人的假说[M].汪云九等译.湖南科技出版社.
    贾云得.2000.机器视觉[M].科学出版社.
    姜丹,钱玉关.1992.信息理论与编码[M].合肥:中国科学技术大学出版社.
    刘博文,余松煜,徐奕,杨小康.2007.宽基线主动视觉中感兴趣目标的对应技术[J].中国图象图形学报,(10)
    马颂德,张正友.1997.计算机视觉:计算理论与算法基础[M].科学出版社.
    M.Sonka,V.Hlavac,R.Boyle著.2003.图像处理、分析与计算机视觉[M].人民邮电出版社.
    钱亮于,高世伟,柴珠利.2008.基于EMD度量的图像匹配技术[J].计算机工程与设计,vol.29,No.11.
    乔延利,杨世植,罗睿智等.2001.对地遥感中的光谱偏振探测方法研究[J].高技术通讯,11(7):36-39.
    单列,刘政凯.2007.显著区域检测的建筑物检索[J].声学技术,vol.26,No.6,1145-1149.
    单列,刘政凯.2008.MPEG-21中的数字项适配技术研究[J].小型微型计算机系统,已录用.
    宋志平,洪津,乔延利.2002.机载多波段偏振CCD相机原理样机的电子学系统设计研究[J].光电子技术与信息,15(4):11-14.
    孙伟,刘政凯,单列.2004.利用偏振技术识别人造目标[J].光学技术,30(3):267-269.
    孙晓兵,乔延利等.2003.人工目标偏振特性实验研究[J].高技术通讯,8:23-27.
    王文惠,王展,周良柱,万建伟.2000.一种测量图像相似性的新方法[J].国防科技大学学报,vol.22,No.6.
    杨之文.2004.地面物体偏振光谱的获取及分析[J].红外,4:1-9.
    章毓晋.2003.基于内容的视觉信息检索[M].北京:科学出版社.
    庄越挺,潘云鹤.1999.基于内容的图像检索综述[J].模式识别与人工智能, 12(2):170-177.
    庄越挺,潘云鹤,吴飞.2002.网上多媒体信息分析与检索[M].北京:清华大学出版社.
    Ajay Divakaran,Kadir A.Peker,Regunathan Radhakrishnan,Ziyou Xiong,and Romain Cabasson.2003.Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors.Computing Science Technical Report TR2003-34,Mitsubishi Electric Research Laboratory.Available at http://www.merl.com/papers/TR2003-34.
    Attneave F.1954.Some informational aspects of visual perception[J].Psychological Review,61:183-193.
    Backer G,Mertsching B,and Bollmann M.2001.Data and Model Driven Gaze Control for an Active-Vision System[J].IEEE Trans.On Pattern Analysis and Machine Intelligence,23(12):1415-1429.
    Bariow H B.1961.Possible principles underlying the transformation of sensory messages[J].In:Rosenblith W A,ed.Sensory Communication.Cambridge,MA:MIT Press,217-234.
    Barlow H B.1972.Single units and sensation:a neuron doctrine for perceptual psychology[J].Perception,1:371-394.
    Ben-Dor B.1992.Polarization properties of targets and backgrounds in the infrared[J].Proc SPIE-1971:68-76.
    Biederman I.1978.Recognition by components:A theory of human image understanding[J].Psychological Review,94:115-147.
    Bormans J,Hill K.2002.MPEG-21 Overview V.5.In ISO/IEC JTCI/SC29/WG11/N5231.
    Bormans J,Gelissen J,Perkis A.2003.MPEG-21:the 21st century multimedia framework[J].IEEE Signal Processing Magazine,20(2):53-62.
    Breazeal C,Edsinger A,Fitzpatrick P,Scassellati B.2000.Social Constraints on Animate Vision[J].IEEE Intelligent Systems,Special Issue on Humanoid Robotics,15:4,32-37.
    Brun G L,Jeune B L,Cariou J,et al.1992.Analysis of polarization signature of immersed targets[J].Polarization and Remote Sensing.SPIE-1747:128-139.
    Burt P J,Andelson E H.1983.The Laplacian pyramid as a compat image code[J].IEEE Trans Commun,31(4):532-542.
    Chen H,Wolff L B.1996.Polarization phase-based method for material classification and object recognition in computer vision[J].Computer Vision and Pattern Recognition.Proceedings CVPR '96,1996 IEEE Computer Society Conference on,128-135.
    Chen L Q,Xie Xing,Fan X,Ma W Y,Zhang H J,Zhou H Q.2003.A visual attention model for adapting images on small displays[J].ACM Multimedia Systems Journal,9(4):353-364.
    Chen Shu-Ching,Kashyap R L.2001.A spatio-temporal semantic model for multimedia database systems and multimedia information systems[J]. IEEE Transaction on Knowledge and Data Engineering, 13(4): 607-622.

    Christopoulos C, Skodras A N, and Ebrahimi T. 2000. JPEG2000 still image coding system :An overview[J]. IEEE Trans. Consumer Electron, vol. 46, pp. 1103-1127.

    Christopoulos C, Askelf J, and Larsson M. 2000. Efficient methods for encoding regions of interest in the upcoming JPEG2000 still image coding standard[J]. IEEE Signal Processing Lett.,vol. 7, pp. 247-249.

    Chun C S L, Sadjadi F A, Ferris D. 1995. Automatic target recognition using polarization-sensitive, thermal imaging[J]. SPIE-2485: 353-364.

    Cohen L. 1995. Time-Frequency Analysis[M]. Hunter College, NY: Prentice-Hall, Inc.

    Cohen S D, Guibas L J. 1997. The earth mover's distance: Lower bounds and invariance under translation. Technical Report STAN-CS-TR-97-1597, Stanford University.

    Daugman J G 1988. Complete discrete 2-D Gabor transforms by networks for image analysis and compression[J]. IEEE Trans ASSP, 36(1): 169-179.

    Debargha Mukherjee. 2004. Technical report of HP Laboratories. MPEG-21 DIA: Objectives and Concepts.

    Edelman S. 1997. Computational theories of object recognition[J]. Trends in cognitive Sciences, 1:296-304.

    Egan W G 2000. Detection of vehicles and personnel using polarization[J]. Proc SPIE-4133,:233-237.

    Egan W G, Duggin M J. 2000. Optical enhancement of aircraft detection using polarization[J].Proc SPIE-4133: 172-178.

    Egner S, Itti L, Scheier C R. 2000. Comparing attention models with different types of behavior data[J]. In: Investigative Ophthalmology and Visual Science(Proc. ARVO 2000),Vol. 41,No. 4, p. S39.

    Eysenck M W. 1984. A Handbook of Cognitive Psychology[M]. London: Lawrence Erlbaum Associates.

    Fernando Pereira and Ian Burnett. 2003. Universal multimedia experiences for tomottow.IEEE Signal Processing Magazine, 20(2): 63-73.

    Field D J. 1987. Relations between the statistics of natural images and the response properties of cortical cells[J]. Journal of Optical Society Am, Series A, 4(12): 2379-2394.

    Flickner M, Sawhney H, Niblack W, et al. 1995. Query by image and video content: the QBIC system[J]. IEEE Computer, 28(9):23-32.

    Foley J D, van Dam A, Feiner S, Hughes J. 1990. Computer Graphics, Principles and Practice (2~(nd) ed.)[M]. New York, NY: Addison-Wesley.

    Fukushima K, Miyake S, Ito T. 1988. Neocognitron: A neural network model for a mechanism of visual pattern recognition[J]. In: Anderson J A, Rosenfeld E, eds. Neurocomputing.Cambridge, MA: MIT Press. 526-534.

    Gibson J J. 1966. The Senses Considered as Perceptural System[M]. Boston: Houghton Mifflin.

    Gilles S. 1998. Robust Description and Matching of Images[D]: [Ph.D.]. University of Oxford.

    Goodale M, Milner A D. 1992. Separate visual pathways for perception and action[J]. Trends in Neuroscience, 15: 20-25.

    Greenspan H, Belongie S, Goodman R, Perona P, Rakshit S, Anderson C H. 1994.Overcomplete steerable pyramid filters and rotation invariance[J]. Proc. IEEE Computer Vision and Pattern Recognition (CVPR), Seattle, WA. 222-228.

    Gregory R L. 1970. The Intelligent Eye[M]. New York: Mc-Graw-Hill.

    Grigorescu C, Petkov N, Kruizinga P. 2002. Improved Contour Detection by Non-Classical Receptive Field Inhibition[J]. Second International Workshop on Biologically Motivated Computer Vision(BMCV 2002), LNCS 2525, 50-59.

    Grossberg S, Mingolla E. 1985. Neural Dynamics of Form Perception: Boundary Completion, Illusory Figures, and Neon Color Spreading[J]. Psychological Review, 92: 173-211.

    Guibas L J, Tomasi C. 1996. Image retrieval and robot vision research at Stanford[J]. In Proceedings of the ARPA Image Understanding Workshop, pages 101-108, Palm Springs, CA.

    Hafner J, Sawhney H S, Equitz W, Flickner M, Niblack W. 1995. Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on Pattern Analysis ad\nd Machine Intelligence, 17(7):729-735.

    Hao Liu, Xing Xie, Wei-Ying Ma, and Hong-Jiang Zhang. 2003. Automatic browsing of large pictures on mobile devices[J]. In Proceedings of the eleventh ACM international conference on Multimedia, pages 148-155. ACM Press.

    Horvitz E, Lengyel J. 1997. Perception, Attention, and Resources: A Decision-Theoretic Approach to Graphics Rendering[J]. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, pp. 238-249, Providence, RI.

    Hubel D H, Wiesel T N. 1962. Receptive fields, binocular interaction, and functional architecture in cat's visual cortex[J]. Journal of Physiology, 160: 106-154.

    Ian Burnett, Rik Van de Walle, Keith Hill, Jan Bormans, Fernando Pereira. 2003. MPEG-21:Goals and achievemens[J]. IEEE Multimedia, 10(4): 60-70.
    Indiveri G. 2001. Modeling Selective Attention Using a Neuromorphic Analog VLSI Device[J]. Neural Computation, 12(12): 2857-2880.

    ISO/IEC JTC1/SC29/WG11/N4518, From MPEG-1 to MPEG-21: Creating an Interoperable Multimedia Infrastructure[S].

    ISO/IEC JTC1/SC29/WG11/N4991, MPEG-21 Use Case Scenario Document[S]

    ISO/IEC JTC1/SC29/WG11/N6264, MPEG-21 Requirements v. 2[S]

    ISO/IEC JTC1/SC29/WG11/N6044, Background to Requirements for MPEG-21 Architecture and IPMP[S].

    Itti L, Koch C, Niebur E. 1998. A model of Saliency-Based Visual-Attention for Rapid Scene Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254-9.

    Itti L, Koch C. 1999. A comparison of feature combination strategies for saliency-based visual attention systems[J]. In: Rogowitz B, Pappas T, eds. Proceedings of SPIE Human Vision and Electronic Imaging. CA: San Jose.. 3644: 373-382.

    Itti L. 2000. Models of Bottom-up and Top-down Visual Attention[D]: [Ph.D.]. USA:California Institute of Technology.

    Itti L, Koch C. 2001. Computational modeling of visual attention[J]. Nature Reviews Neuroscience, 2(3): 194-203.

    Itti L, Gold C, Koch C. 2001. Visual Attention and Target Detection in Cluttered Natural Scenes[J]. Optical Engineering, Vol. 40, No. 9, pp 1784-1793.

    Itti L. 2007. Visual Salience[J], In: Scholarpedia - the free peer-reviewed encyclopedia, Vol. 2,No. 9, p. 3327.

    Kadir T, Brady M. 2001. Scale, Saliency and Image Description[J]. International Journal of Computer Vision, 45(2):83-105.

    Kherfi M L, Ziou D, Bernardi A. 2004. Image Retrieval From the World Wide Web: Issues, Techniques, and Systems[J]. ACM Computing Surveys, 2004, 36(1):35-67.

    Koch C, Ullman S. 1985. Shifts in Selective Visual Attention: towards the Underlying Neural Circuitry[J]. Hum Neurobio, 4(4): 219-227.

    Kohonen T. 1995. The adaptive-subspace SOM (ASSOM) and its use for the implementation of invariant feature detection[J]. International Conference on Artificial Neural Networks, 1:3-10.

    Kruizinga P, Petkov N. 1999. Nonlinear operator for oriented texture. IEEE Trans on Image Processing, 8(10): 1395-1407.

    LeCun Y, Boser B, Denker J S, et al. 1989. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1(4): 541-551.

    Leventhal A G 1991. The Neural Basis of Visual Function (Vision and Visual Dysfunction Vol. 4)[M]. Boca Raton, FL: CRC Press.

    Ma W Y, Manjunath B. 1999. NETRA: A toolbox for navigating large image databases,Multimedia Systems[J]. vol.7, no. 3.

    Mallat S G. 1989. A theory for multiresolution signal decomposition: The wavelet representation[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 11(47): 674-693.

    Marr D, Nishihara H K. 1978. Representation and recognition of the spatial organization of three dimensional structure[J]. Proceedings of the Royal Society of London B, 200: 269-294.

    Meer P, Baugher E S, Rosenfeld A. 1987. Frequency domain analysis and synthesis of image pyramid generating kernels[J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 9(4):512-522.

    Milanese R, Gil S, Pun T. 1995. Attentive Mechanisms for Dynamic and Static Scene Analysis[J]. Opt Eng, 34(8): 2428-34.

    Milner A D, Goodale M A. 1995. The visual brain in action[M]. USA: Oxford University Press.

    Mukherjee D, Kuo G, Liu S, Beretta G. 2003. Motivation and Use cases for Decision-wise BSDLink, and a proposal for Usage Environment Descriptor-AdaptationQoSLinking[J]. In ISO/IEC JTC 1/SC 29/WG 11, HP Lab.

    Myskowski, K., Rokita, P., and Tawara, T. Perceptually-informed Accelerated Rendering of High Quality Walkthrough Sequences. In Proceedings of the Tenth Eurographics Workshop on Rendering, pp. 5-18. Grenada, Spain. June 1999.

    Navalpakkam V, Itti L. 2003. Sharing Resources: Buy Attention, Get Recognition[J]. In: Proc.International Workshop on Attention and Performance in Computer Vision(WAPCV'03), Graz,Austria.

    Navalpakkam V, Itti L. 2006. An Integrated Model of Top-down and Bottom-up Attention for Optimal Object Detection[J]. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2049-2056.

    Navon D. 1977. Forest before trees: The precedence of global features in visual perception[J].Cognitive Psychology, 9(2): 353-383.

    Nothdurft R, Yao G 2006. Applying the polarization memory effect in polarization-gated subsurface imaging[J]. Optics Express, 14(11): 4656-4661.

    Olshausen B A, Anderson C H, Van Essen D C. 1993. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information[J]. Journal of Neuroscience, 13(11): 4700-4719.

    Panis G, Hutter A, Heuer J, Hellwagner H, Kosch H, Timmerer C, Devillers S, and Amielh M. 2003. Bitstream syntax description: a tool for multimedia resource adaptation within mpeg-21[J].Singal Processing: Image Communication, EURASIP, 18(8).

    Panisa G, Huttera A, J. Heuera, Hellwagnerb H, et al. 2003. Bitstream syntax description: a tool for multimedia resource adaptation within MPEG-21[J]. Signal Processing: Image Communication, EURASIP, 18(8): 721-747.

    Posner M I, Cohen Y, Rafal R D. 1982. Neural systems control of spatial orienting[J]. Philos Trans R Soc Lond B Biol Sci, 298(1089), 187-98.

    Postma E O, et al. 1997. SCAN: A scalable model of attentional selection[J]. Neural Networks, 10:993-1015.

    Querhani N, Bracamonte J, Hugli H, et al. 2001. Adaptive color image compression based on visual attention[J]. In: Ardizzone E, Di Gesu V, eds. Proceedings of International Conference on Image Analysis and Processing. Palermo, Italy: IEEE Computer Society Press, 416-421.

    Rakesh Mohan, John R. Smith, and Chung-Sheng Li. 1999. Adapting multimedia internet content for universal access[J]. IEEE Transaction on Multimedia, 1(1): 104-114.

    Reid M B, Spirkovska L, Ochoa E. 1989. Simultaneous Position, Scale, and Rotation Invariant Pattern Classification Using Third-order Neural Networks[J]. International Journal of Neural Networks: Research & Applications, 1(3): 154-159.

    Rodieck R W, Stone J J. 1965. Analysis of receptive fields of cat retina ganglion cells[J].Journal of Neurophysiology, 28: 833-849.

    Rubner Y, Tomasi C, Guibas L. 2000. The earth mover's distance as a metric for image retrieval[J]. International Journal of Computer Vision, 40(2):99-121.

    Rubner Y, Guibas L, Tomasi C. 2000. The earth mover's distance, multi-dimensional scaling,and color-based image retrieval[J]. Computer Science Department, Stanford University, CA 94305.

    Rui Y, Huang T S, Chang S F. 1999. Image Retrieval: Current Techniques, Promising Directions, and Open Issues[J]. Journal of Visual Communication and Image Representation,10:39-62.

    Salah A A, Alpaydin E, Akarun L. 2002. A Selective Attention-based Method of Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition[J].IEEE Trans on Pattern Analysis and Machine Intelligence, 24(3): 420-425.

    Salinas E, Abbott L F. 1997. Invariant visual responses from attentional gain fields[J]. Journal of Neurophysiology, 77(6): 3267-3272.

    Schmid C, Mohr R. 1997. Local greyvalue invariants for image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530-534.
    Schwartz E L, Rojer A S. 1994. Cortical hypercolumns and the topology of random orientation maps[J]. International Conference on Pattern Recognition, 150-155.

    Sejnowski T, Rosenberg C R. 1987. Parallel networks that learn to pronounce English text[J].Complex System, 1: 145-168.

    Shimon Ullman and Amnon Sha'ashua. 2002. Structural saliency: The detection of globally salient structures using a locally connected network[J]. In Proceedings of the International Conference on Computer Vision, Washington, DC, USA.

    Solomon J E. 1981. Polarization Imaging [J]. Applied Optics, 20(9).

    Swain M J, Stricker M A. 1993. Promising Directions in Active Vision[J]. International Journal of Computer Vision, 11(2): 109-126.

    Tian Q, Wu Y, Huang T S. 2000. Combine User Defined Region-of-Interest and Spatial Layout for Image Retrieval[J]. IEEE 2000 International Conference on Image Processing,Vol.3:746-749, Vancouver, BC, Canada.

    Timor Kadir, Paola Hobson. 2000. Determining regions of interest in images[J]. Patent filing number:GB 0112540.0.

    Timor Kadir, Paola Hobson. 2001. Communication system, communication unit and method for describing texture-like regions in images[J]. Patent filing number: GB 0024669.4.

    Timor Kadir. 2002. Scale, Saliency and Secne Description[D]: [Ph.D.]. University of Oxford.

    Treisman A, Gelade G. 1980. A feature integration theory of attention[J]. Cognitive Psychology, 12(2): 97-136.

    Treisman A, Paterson R. 1984. Emergent features, attention and object perception[J]. Journal of Experimental Psychology: Human Perception and Performance, 10: 12-31.

    Treisman A. 1985. Preattentive Processing in Vision[J]. Computer Vision, Graphics, and Image Processing, 31(2): 156-177.

    Ungerleider L G, Mishkin M. 1982. Two cortical visual systems[M]. In: Ingle D J, Goodale M A, Mansfield R J W, eds. Analysis of Visual Behavior. Cambridge, MA: MIT Press, 549-586.

    Vetro A, Timmerer C. 2003. ISO/IEC 21000-7 FCD Part 7: Digital Item Adaptation. In ISO/IEC JTC1/SC 29/WG 11/N5845.

    Vinje W E, Gallant J L. 2000. Sparse coding and decorrelation in primary visual cortex during natural vision[J]. Science, 287: 1273-1276.

    Von der Malsburg C, Schneider W. 1986. A neural cocktail-party processor[J]. Biology Cybernetics, 54: 29-40.

    Walther D, Riesenhuber M, Poggio T, Itti L, Koch C. 2002. Towards an integrated model of saliency-based attention and object recognition in the primate's visual system[J]. In: Journal of Cognitive Neuroscience, Vol. B14, No. S, pp 46-47.

    Walther D, Itti L, Riesenhuber M, Poggio T, Koch C. 2002. Attention Selection for Object Recognition - a Gentle Way[J]. In: Lecture Notes in Computer Science, Vol. 2525, pp. 472-479.

    Wang Z, Bovik A C. 2002. Bitplane-by-Bitplane Shift (BbBShift) -A Suggestion for JPEG2000 Region of Interest Image Coding[J]. IEEE Signal Processing Letters, 9(5): 160-162.

    Watanabe M. 1996. Reward expectancy in primate prefrontal neurons[J]. Nature, 382:521-535.

    Wolff L B. 1990. Polarization-based material classification from specular reflection[J]. IEEE Trans of PAMI, 1059-1071.

    Yee H, Pattanaik S N, and Greenberg D P. 2001. Spatiotemporal Sensitivity and Visual Attention for Efficient Rendering of Dynamic Environments[J]. ACM Trans. On Graphics, 20(1):39-65.

    Yuille A L, Grzywacz N M. 1989. A Mathematical-Analysis of the Motion Coherence Theory[J]. International Journal of Computer Vision, 3(2): 155-75.

    Zachary J M. 2000. An Information Theoretic Approach to Content Based Image Retrieval[D]: [Ph.D.]. Louisiana State University.

    Zhang B L, Fu M Y, Yan H, et al. 1999. Handwritten digit recognition by adaptive-subspace self-organizing map[J]. IEEE Trans. On Neural Networks, 10(4): 939-945.

    Zhou X S, Huang T S. 2002. Unifying keywords and visual contents in image retrieval[J].IEEE Multimedia Magazine.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700