多示例学习方法在乳腺钼耙病灶图像检索中的应用研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

多示例学习方法在乳腺钼耙病灶图像检索中的应用研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Application of Multi-instance Learning Method for Mass Retrieval in Digitized Mammograms
作者：卢鹏飞
论文级别：硕士
学科专业名称：模式识别与智能系统
中文关键词：乳腺CAD ; 图像检索 ; 多示例学习 ; 计算机辅助诊断 ; 包生成器
英文关键词：breast CAD ; image retrieval ; multi-instance learning ; computer-aided diagnosis ; bag generator
学位年度：2012
导师：厉力华 ; 刘伟
学科代码：081104
学位授予单位：杭州电子科技大学
论文提交日期：2012-02-01

摘要

乳腺癌是一种严重威胁中年女性生命与健康的恶性肿瘤。近年来乳腺癌在中国的发病率呈上升趋势。早发现、早诊断、早治疗能有效提高乳腺癌治愈率和乳腺病人的存活率。钼靶X线摄影成为临床上乳腺癌检测的最常用的手段。研究表明计算机辅助诊断（Computer-AidedDiagnosis，CAD）技术可以有效辅助医生帮助提高诊断效率，目前CAD中对肿块检测还存在许多困难。近年来，许多钼靶CAD系统引入了基于内容的图像检索技术（Content-basedImage Retrieval，CBIR）。相关研究表明，CBIR技术可以辅助医师提高肿块检测精度。
     临床诊断中，肿块病灶在影像中往往表现为多语义问题，一个病灶往往既含有病变部分又含有正常乳腺组织。CBIR的基本技术框架是“按例检索（query-by-example，QBE）”，仅仅基于特征匹配的QBE框架不能很好地解决图像检索中的“语义沟”问题，往往需要融合（有监督）机器学习方法以提高检索精度。由于医师提交的疑似病灶图像存在不确定性使得用传统的有监督学习方法来解决肿块病灶检索问题并非是一个最佳选择。多示例学习（Multi-instance learning, MIL）方法是用于解决上述不确定性问题的一种新的机器学习框架。与有监督学习不同，MIL框架中训练集是由包含概念标记的包组成，而包中示例是没有概念标记的。一个包被标记为正包要求包中至少有一个示例是正例，否则该包被标记为负包。学习算法从由有标记的包组成的训练集中学习出概念来预测新包的标记。MIL应用于CBIR时将每一幅图像视为一个包，分割后的每一个区域视为包中的示例。然后利用学习算法从训练集学习用户感兴趣的概念，并检索包含类似概念的相关图像。
     本文研究目的是将MIL方法应用于钼靶肿块病灶检索中。在乳腺钼靶检索系统中，查询病灶通常是不确定的且难以被描述，因为其既包含病变部分又含有正常乳腺组织。如果查询病灶被视为图像包，那么就可以利用MIL技术解决存在的不确定性问题。本文提出了三种不同的包生成器算法并用MIL算法进行概念学习，学习得到的概念用于检索。本文通过大量实验比较了不同的MIL算法下各包的检索性能。本文研究主要从以下三个部分进行。
     第一部分，提出三种在MIL框架下用于乳腺钼靶肿块病灶检索的包生成器方法，分别是基于JSEG分割图像的J-Bag，基于视觉注意计算模型的A-Bag以及基于改进的k-means聚类分割图像的K-Bag。最后病灶图像被转换成一个包含4个示例的包，其中每个示例包含4维特征向量。第二部分，建立本文实验所需的数据库，一个是DDSM数据库，另一个是病灶图像采集自浙江省肿瘤医院的数据库。第三部分，从训练数据集中随机挑选一定数量的正包和负包组成训练集，用给定的包生成器对病灶图像进行处理并计算各包，然后分别采用MIL算法（DD、EM-DD、BP-MIP）进行学习。学习所得的概念用于对测试数据集中的图像进行检索。实验中比较了MIL框架中不同包生成器和学习算法的性能，同时将本文提出的三种包生成器算法与SBN算法进行比较。从实验结果来看，MIL方法可以用于乳腺钼靶肿块病灶图像检索；本文提出的A-Bag和K-Bag包检索性能要好于经典的SBN包。使用的MIL算法中EM-DD算法检索性能最佳。
     最后总结了本论文的工作，并展望了未来研究需要改进的几个方面。
Breast cancer is one of the leading causes of death among the middle-aged women。In China, theincidence of breast cancer presents persistent high growth. Early diagnosis and treatment caneffective increase the chances of survival for the patients of breast cancer. Mammography hasbecome one of the most popular approaches for early detection of breast cancer in the currentclinical environment. The studies show that computer-aided diagnosis (CAD) techniques can assistradiologists to detect masses and micro-calcifications in mammograms, but, accuracy to detectmasses with current CAD is still poor. Recently, content-based image retrieval (CBIR) techniqueshave been used widely in various CAD schemes. Relevant studies show that CBIR techniques canhelp clinicians to improve mass detection precision.
     In clinical diagnosis, the benign or malignant lesion and the normal tissue are physically adjacent ina ROI. The classical technique framework for CBIR is query by example (QBE), however, the QBEframework only based on feature matching can not solve the“semantic gap”problem well in imageretrieval, and often needs to be combined with (supervised) machine learning approaches toimprove the retrieval precision. The query mass given by clinicians is often ambiguity and difficultto be described which makes it not a best choice to apply supervised learning based approaches todeal with mass retrieval problem. Multi-Instance Learning (MIL) is a new machine learningframework for learning from ambiguity mentioned before. Unlike supervised learning, the trainingset is a composition of bag and its label; the labels are only marked to bags of instance. A bag islabeled positive if at least one instance in that bag is positive, otherwise the bag is labeled negative.The goal of MIL is to predict the labels of new bags based on the labeled bags as the training set.MIL is applied in the CBIR systems, in which each image is deemed as a labeled bag, and thesegmented regions in the images correspond to the instances in that bag. Then the MIL algorithmsare used to learn from the concept of insterests to users, and retrieval relevant images containsimilar concept.
     The objective of this paper is to research the implement of the MIL techniques in mass retrievaltask. In mammogram retrieval system, the query mass is ambiguity and difficult to be describedbecause in which the lesion and the normal tissue are physically adjacent. If the query mass can beprocessed as an image bag, then the ambiguity can be tackled by MIL techniques. In this paper, weproposed three image bag generators and used MIL algorithms to learn the target points andretrieval. An experimental study was taken to make a comparison of retrieval performance of threebag generators under different MIL algorithms. In the experiment, a bag generator called SBN is compared with three bag generators. This paper consists of three parts.
     In the first part, three image bag generators, which named J-Bag, A-Bag and K-Bag respectively,were proposed. J-Bag is based on the JSEG image segementation algorithm, A-Bag is based on asaliency-based bottom-up visual attention computational model and K-Bag is based on themodified k-means clustering image segementation algorithm. Finally the mass image is thenconverted into a corresponding image bag consisting of four 4-dimensional feature vectors. In thesecond part, two different mass databases were created. One is DDSM database, the other database,where the images were collected from the Zhejiang Cancer Hospital in China. In the last part, in thetraining phase, for each mass type, several positive query examples and several negative examplesare randomly selected. After that, a bag generator is chosen for transforming the mass images intoimage bags, and then the target concept is learned by Diverse Density (DD), EM-DD and BP-MIP,respectively. After the target concept has been learned, the remaining mass images in the test set areranked based on their distance to the learned concept. Experimental results show that: The MILtechniques can be applied to mammograms retrieval systems. The proposed bag generators A-Bagand K-Bag can achieve more efficient results than the existing bag generator SBN. EM-DDalgotithm get the best retrieval performance.
     Finally, give the summaries and predict some areas need to improvement in furtrue work.

引文

[1] Jemal A., R. Siegel, E. Ward. et al. Cancer statistics, 2008[J]. Ca-a Cancer Journal for Clinicians，2008，58(2)：71-96.
    [2]葛光富.基于数字钼靶X线图像的乳腺肿块计算机辅助检测算法研究[D].西安：西安电子科技大学，2011.
    [3]徐伟栋.乳腺X线图像的计算机辅助诊断技术研究[D].杭州：浙江大学，2006.
    [4]报告称癌症成北京人主要死因，乳腺癌及肺癌居首[EB/OL].http://news.sohu.com/20111011/n321857525.shtml.
    [5]刘佩芳.乳腺影像诊断必读[M].北京：人民军医出版社，2007.
    [6] J S Tang, R M Rangayyan, J Xu. Computer-Aided Detection and Diagnosis of Breast Cancer WithMammography: Recent Advances[J]. IEEE Transactions on Information Technology in Biomedicine, 2009,13（2）：236-251.
    [7]郝欣，曹颖，夏顺仁.基于医学图像内容检索的计算机辅助乳腺X线影像诊断技术[J].中国生物医学工程学报，2009，28(6)：922-930.
    [8] C.D.Maggio. State of the art of current modalities for the diagnosis of breast lesions. European Journal ofNuclear Medicine and Molecular Imaging, 2004, 31:S56-S69.
    [9] Available: http://health.sohu.com/20060406/n242665902.shtml.
    [10] Available: http://www.tj.gov.cn/zjtj/shsy/ws/200803/t20080317_48618.htm.
    [11] Available: http://www.came-online.org/view_article.php?id=2318&col=16%2C15.
    [12] Linda J. Warren Burhenne, Susan A. Wood, CarlJ. D’Orsi, et al. Potential contribution of computer-aideddetection to the sensitivity of screening mammography. Radiology，2000，215(2)：554~562
    [13] B.Zheng, G.Abrams, C.A.Britton, etal. Evaluation of an interactive Computer-Aided Diagnosis(ICAD)System for Mammography: A Pilot Study. In Y.Jiang, B.Sahiner.Medieal Imaging 2007: Image Perception,Observer Performance, and Technology Assessment. Bellingham, Washington: SPIE，2007. 6515：6515M.
    [14] H. Muller, A. Rosset, A. Garcia, et al. Benefits of content-based visual data access in radiology [J].Radiographics 2005，25：849-858.
    [15]姜娈.基于乳腺X线摄片的计算机辅助检测肿块方法研究[D].武汉：华中科技大学，2009.
    [16] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain. Content-Based Image Retrieval at the Endof the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349~1380
    [17] R Datta，D Joshi，J Li. Image Retrieval：Ideas，Influences，and Trends of the New Age[J]. ACM ComputingSurveys，2008，40（2）：1-60.
    [18] http://www.r2tech.com.
    [19] P.Kom, N.SidiroPoulos, C.Faloutsos. Fast and Effeetive Retrieval of Medical Tumor Shapes[Jl. IEEE Transon Knowledge and Data Engineering. 1998.10(6): 889-904.
    [20] Shyu C, Brodley C, Kak A, et al. ASSERT: A Physician-in-the-loop content-based image retrieval system forHRCT image databases [J]. Computer Vision and Image Understanding, 1999, 75(1/2): 111-132.
    [21]夏顺仁，莫伟荣，王小英，严勇.基于特征融合和相关反馈的医学图像检索技术[J].航天医学与医学工程，2004，17(6)：429-433.
    [22]金丰华.基于内容的医学图像检索[D].南京：东南大学，2003.
    [23]邵虹，崔文成，张继武，赵宏.低级特征和语义特征相结合的医学图像检索方法.中图象图形学报，2004，9（2）：220-224.
    [24]邰晓英，王李冬，巴特尔.基于小波纹理、语义特征和相关反馈的医学图像检索.电路与系统学报，2007，12（4）：24-30.
    [25]林春漪，马丽红，尹俊勋，陈建宇.基于多层贝叶斯网络的医学图像语义建模.生物医学工程杂志，2009：400-404.
    [26] Maron O. Learning from ambiguity. PhD dissertation, Department of Electrical Engineering and ComputerScience, MIT, 1998
    [27] Z.H. Zhou. Multi-Instance Learning: A Survey[EB/OL]. [2011-3-28].
    [28] J Yang. Review of Multi-Instance Learning and Its applications[EB/OL]. [2011-8-11].http://www.cs.cmu.edu/~juny/MILL/mil_review.pdf.
    [29]周志华.多示例学习.知识科学中的基本问题研究.北京：清华大学出版社.2006, 322-336
    [30]蔡自兴，李枚毅.多示例学习及其研究现状[J].控制与决策，2004，19（6）：607-610.
    [31] Babenko, B., Multiple Instance Learning: Algorithms and Applications[EB/OL].[2011-4-27].http://vision.ucsd.edu/~bbabenko/data/bbabenko_re.pdf
    [32] Dong, L., A comparison of multi-instance learning algorithms[EB/OL].[2011-4-28].http://researchcommons.waikato.ac.nz/handle/10289/2453
    [33] O. Maron and T. Lozano-Pérez. A Framework for Multiple-instance Learning. Advances in NeuralInformation Processing System 10, Cambridge, MIT Press,1998
    [34] C. Yang and T. Lozano-Pérez. Image Database Retrieval with Multiple Instance Learning Techniques. InProceedings of the 16th International Conference on Data Engineering, 2000: 233~243
    [35] Q. Zhang and S. A. Goldman. EM-DD: An Improved Multiple-instance Learning Technique. In Proceedingsof Advances in Neural Information Processing Systems, 2001: 1073~1080
    [36] Z.H.Zhou, M.L.Zhang, K.J. Chen. A Novel Bag Generator for Image Database Retrieval With Multi-instanceLearning Techniques. Proceedings of the 15th IEEE International Conference on Tools with ArtificialIntelligence 2003, 1082-3409/03.
    [37]戴宏斌，张敏，周志华.一种基于多示例学习的图像检索方法[J].模式识别与人工智能，2006，19（2）：179-185.
    [38]李杰，程义民，葛仕明，曾丹.基于显著点特征多示例学习的图像检索方法[J].光电子激光，2008，19（10）：1405-1409.
    [39]彭瑜，乔奇峰，魏昆娟.基于多示例学习的图像检索方法[J].中文信息学报，2008，22（22）：64-69.
    [40]王春燕，袁津生.一种结合多示例学习的图像检索方法[J].计算机应用系统，2010，19（6）：212-215.
    [41]温超，耿国华，李展.基于k均值聚类和多示例学习的图像检索方法[J].计算机应用，2011，31（6）：1546-1548.
    [42]孟繁杰，郭宝龙.使用兴趣点局部分布特征及多示例学习的图像检索方法[J].西安电子科技大学学报（自然科学版），2011，38（2）：47-53.
    [43] Xiaoxue, Y., et al., Abnormality Detection in Retinal Image[EB/OL].[2011-5-19].http://dspace.mit.edu/bitstream/handle/1721.1/3845/CS001.pdf?sequence=2
    [44] Raykar, V.C., et al. Bayesian multiple instance learning: automatic feature selection and inductive transfer.2008: ACM.
    [45] Krishnapuram, B., et al., Multiple-instance learning improves CAD detection of masses in digitalmammography. Digital Mammography, 2008: p. 350-357.
    [46] Dundar, M.M., et al., Multiple-instance learning algorithms for computer-aided detection. BiomedicalEngineering, IEEE Transactions on, 2008. 55(3): p. 1015-1021.
    [47] Zhu, L., B. Zhao, and Y. Gao. Multi-class Multi-instance Learning for Lung Cancer Image ClassificationBased on Bag Feature Selection. 2008: IEEE.
    [48] Xu, X. and B. Li. Automatic classification and detection of clinically relevant images for diabeticretinopathy.SPIE.2008.
    [49] Dietterich, T.G., R.H. Lathrop, and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallelrectangles. Artificial Intelligence, 1997. 89(1-2): p. 31-71.
    [50]黄波.基于支持向量机的多示例学习研究与应用[D].中国地质大学，2009
    [51] Dempster, A.P., N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EMalgorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1977: p. 1-38.
    [52]李展，彭进业，温超.基于谱聚类和多示例学习的图像检索方法[J].华南理工大学学报（自然科学版）2011，39（7）：156-162.
    [53] Zhou Z-H, Zhang M-L. Neural networks for multi-instance learning. Technical Report, AI Lab, CS Dept.,Nanjing Univ., Aug. 2002.
    [54] Wang, J. and J.D. Zucker, Solving multiple-instance problem: A lazy learning approach. In: Proceedings ofthe17th International Conference on Machine Learning, San Francisco, CA, 2000, 1119-1125.
    [55] Andrews, S., T. Hofmann, and I. Tsochantaridis. Support Vector Machines for Multiple- Instance Learning.Proceedings of Advances in Neural Information Processing Systems, Cambridge, MA: MIT Press. 2003,561-568.
    [56] Chen, Y. and J.Z. Wang, Image categorization by learning and reasoning with regions. The Journal ofMachine Learning Research, 2004. 5: p. 913-939.
    [57] Chen, Y., J. Bi and J.Z. Wang. MILES: Multiple-instance learning via embedded instance selection. PatternAnalysis and Machine Intelligence, IEEE Transactions on, 2006. 28(12): p. 1931-1947.
    [58] Maron, O. and A.L. Ratan. Multiple-instance learning for natural scene classification. In Pro-Ceedings of the 15th International Conference on Machine, Madsion, WI, 1998: 341-349.
    [59] Zhou, Z.H., K. Jiang, and M. Li, Multi-instance learning based web mining. Applied Intelligence, 2005. 22(2):p. 135-147.
    [60] Rahmani R, Goldman S A, H Zhang. Localized Content-Based Image Retrieval[J]. IEEE Transactions onPattern Analysis and Machine Intelligence, 2008, 30(11):1902-1912.
    [61] Deng, Y., B.S. Manjunath, and H. Shin. Color image segmentation. 1999: IEEE.
    [62] Dunn, J.C., A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters.1973.
    [63] Itti L., C. Koch,E. Niebur. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis[J]. IEEETransactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1255.
    [64] Zheng B., A. Lu, L. A. Hardesty. et al. A method to improve visual similarity of breast masses for aninteractive computer-aided diagnosis environment[J]. Medical Physics, 2006, 33: 111-117.
    [65] Available: http://marathon.csee.usf.edu/Mammography/Database.html.
    [66] MacQueen J.Some methods for classification and analysis of multivariate observations[C]. proceedings ofthe 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of CaliforniaPress, 1967: 281-297.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700