面向医疗质量的病案首页数据关联规则挖掘

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向医疗质量的病案首页数据关联规则挖掘

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Medical Care Quality Oriented Association Rule Mining on Medical Record Homepage
作者：张云洋
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：病案首页 ; 医疗质量 ; 数据挖掘 ; 关联规则 ; Mypriori
英文关键词：medical record homepage ; medical care quality ; data mining ; association rule ; Mypriori
学位年度：2009
导师：赵政
学科代码：081203
学位授予单位：天津大学
论文提交日期：2009-05-01

摘要

医疗质量是医院的生命,是医院管理的永恒主题。我国境内的各级医院自上世纪90年代以来逐步使用了格式相对统一的病案首页来概括住院病人基本情况和诊疗过程。后来用计算机管理病案首页,产生了大量的病案首页数据。从这些数据中提取的医疗质量指标是目前评价医疗质量的主要依据,但仅凭这些统计得出的医疗指标不能反映深层次的医疗质量问题。
     考虑到病案首页的大数据量特性和医疗指标相关性分析的需要,本文提出用关联规则数据挖掘方法研究医疗质量。本文主要工作包括:
     (1)算法改进。针对病案首页数据的单层多维特点,本文提出基于经典关联规则算法Apriori的病案首页数据关联规则挖掘算法Mypriori。Mypriori算法改进了Apriori的频集发现和规则推导过程,在建立候选集时加入了项值过滤机制,在频集迭代过程中加入了项集顺序向右扩展机制和事务压缩策略,在规则产生过程中加入了基于约束的先验知识剪枝和同时使用置信度与提升度剪枝的策略,减小了冗余,提高了效率。
     (2)数据预处理。针对病案首页数据涉及个人隐私的特点,本文首先对数据进行了隐私过滤处理,保障医院和病人的权益。然后是挖掘前的数据准备,包括数据清理、字段抽取、属性转换和数据离散化。其中离散化操作是重点,将各种字段属性统一调整为分类属性,包括结合领域知识将连续数值属性分段处理、将多层多值属性上卷为少量值的分类。
     (3)病案首页数据挖掘。按照医疗数据挖掘的基本流程,本文利用Mypriori程序,以“与死亡及手术有关的病种分析”、“出院情况为死亡的数据分析”、“出院情况为疑似死亡的90数据分析”和“疑似非计划再返手术室分析”为挖掘目标,完成了天津市某医院2004-2007四年的病案首页数据关联规则挖掘,发现了数据中的高兴趣度关联规则,这些规则揭示了医院的一些医疗质量问题。
     数据挖掘实验结果表明,改进的关联规则算法Mypriori能够有效捕捉隐含在病案首页数据中的显著频繁模式;相比Weka的Apriori关联规则挖掘,Mypriori算法在病案首页数据关联规则挖掘中表现出较强的健壮性和高效性。
Medical care quality is the life of hospital and the eternal theme of hospital management. The medical quality indicators extracted from medical record homepage are the main measure of medical quality. But these indicators merely can't reveal deep medical care quality problems.
     Considering medical record homepage’s abundant data and the necessity of analyzing indicators' connection, this article proposes using association rule mining technique to study medical quality. This article’s main work contains three parts.
     Algorithm reforming. Specific to the characteristic that medical record is single-level and multi-dimension data, this article designs a medical record homepage association rule mining algorithm named Mypriori, which is based on the classic algorithm Apriori. Mypriori adds item filtering mechanism and transaction compressing strategy and itemsets expanding rightward sequentially mechanism in iterations, and appends constraint pruning based on priori knowledge in rule producing, and finally uses confidence and lift together to prune rules. All these methods cut down redundancy greatly, and promote algorithm’s efficiency.
     Data preprocessing. This article implements data privacy filtering to protect hospital and patient’s rights firstly. The following step is data pre-processing, which contains cleaning data, extracting fields, transferring and discretizing attributes. Among them the focus is discretization, which changes attributes to nominals uniformly. In detail, it cuts successive numeric attributes to segments, and scrolls multi-level and multi-value attributes to few nominals.
     Medical record homepage mining. According to medical data mining’s basic procedures, aiming at three targets,“analysis of data about diseases related to death or operation",“analysis of death data”,“analysis of suspected death data”and“analysis of suspected non-plan back to operating room data”, using Mypriori program, this article completes four years' actual data’s mining, and finds out highly interesting association rules, which reflect hospital’s medical care quality problems.
     Data mining experiments show that, improved algorithm Mypriori can capture remarkable frequent patterns existing in medical record homepage. Comparing to Weka's Apriori, Mypriori shows good robustness and efficiency.

引文

[1]王龙益.医院医疗质量管理要点及实践.西南军医, 2006,8(6): 82~84.
    [2]曹阳,张罗漫.利用病案首页数据进行病种医疗质量分析.第二军医大学学报, 2006,27(7): 703~706.
    [3]魏以璧,王丽.通过病案首页透视医疗质量指标填写的缺陷.中国病案, 2003,4(4): 30~34.
    [4]高筠,董军,胡湖,等.医疗指标数据的质量管理.解放军医院管理杂志, 2001,8(3): 184~185.
    [5]刘晋才,周英杰,黄旭东,等.医院综合医疗质量指标管理.解放军医院管理杂志, 2001,8(3): 182~183.
    [6]卢劲.病案首页与信息质量.现代医药卫生, 2007,23(3): 439~440.
    [7]孙兰香.病案首页与病案质量.中国病案, 2007,8(1): 29~30.
    [8] Ragupathy Veluswamy. Golden nuggets:clinical quality data mining in acute care. Physician Executive, 2008.
    [9] Sean N. Ghazavi, Thunshun W. Liao. Medical data mining by fuzzy modeling with selected features. Artificial Intelligence in Medicine, 2008,2008(43): 195~206.
    [10]刘君强,孙晓莹,潘云鹤.关联规则挖掘技术研究的新进展.计算机科学, 2004,31(1): 110~113.
    [11]汪洪涛,数据挖掘中关联规则算法研究,硕士学位论文,重庆大学,2003.
    [12] Jiawei Han, Micheline Kamber. Data Mining:Concepts and Techniques: Diane Cerra, 2000, 132~162.
    [13] L. Qiu, Y. J. Li, X. T. Wu. Preserving privacy in association rule mining with bloom filters. Journal of Intelligent Information Systems, 2007,29: 253~278.
    [14] J. Zhan, S. Matwin, L. W. Chang. Privacy-preserving collaborative association rule mining. Journal of Network and Computer Applications, 2007,30(3): 1216~1227.
    [15]牛丽敏. Apriori算法分析与改进综述.桂林电子科技大学学报, 2007,27(1): 27~30.
    [16] H.Mannila, H.Toivonen, A.Verkamo. Efficient algorithm for discovering association rules. AAAI Workshop on Knowledge Discovery in Databases, 1994: 181~192.
    [17] Po Shun Ngan, Man Leung Wong, Wai Lam. Medical data mining using evolutionary computation. Artificial Intelligence in Medicine, 1999,16: 73~96.
    [18]马谢民.国际医疗质量指标体系及其特点.中国医院管理, 2007,27(11): 22~24.
    [19] Rakesh Agrawal, Ramakrishnan Srikant, "Fast Algorithms for mining Association Rules," in Proceedings of the 20th VLDB Conference, Santiago,Chile, 1994, pp. 478~499.
    [20] Hannu Toivonen, "Sampling Large Databases for Association Rules," in the 22nd VLDB Conference, Bombay,India, 1996, p. 12.
    [21] A. Savasere, E. Omiecinski, S. Navathe, "An Efficient Algorithm for Mining Association Rules in Large Databases," in 21st Very Large Data Bases Conf, 1995.
    [22] Jiawei Han, Micheline Kamber, Jenny Chiang, "Mining Multi-Demensional Association Rules Using Data Cubes," Burnaby 1997.
    [23] Jiawei Han, Yongjian Fu, "Discovery of Multiple~Level Association Rules from Large Databases," in the 21st VLDB Conference, Zurich,Swizerland, 1995, p. 12.
    [24] Jonathan C. Prather, David F. Lobach. Medical Data Mining: Knowledge Discovery in a Clinical Data Warehouse. 1996.
    [25]蒋良教,蔡之华.医疗数据挖掘及其应用.微型机与应用, 2003,(10): 45~47.
    [26] Norjihan Abdul Ghani, Zailani Mohd Sidek, "Hippocratic Database: A Privacy - Aware Database," in ACADEMY OF SCIENCE, ENGINEER AND TECHNOLOGY, 2008.
    [27] Rakesh Agrawal, Tyrone Grandison, Christopher Johnson. Enabling the 21st Century HEALTH CARE INFORMATION TECHNOLOGY REVOLUTION. COMMUNICATIONS OF THE ACM, 2007,50(2): 35~42.
    [28] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant,等. "Hippocratic Databases," in the 28th VLDB Conference, Hong Kong, China, 2002.
    [29] D. P. Enot, W. Lin, M. Beckmann,等. Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data. Nature Protocols, 2008,3(3): 446~470.
    [30] J. Chi, X. M. Wu, C. M. Zhang. Medical CT Image Preprocessing Based on Edge Detection and Spline Fitting. 2008 Ieee International Symposium on It in Medicine and Education, Vols 1 and 2, Proceedings, 2008: 709~714.
    [31] Miguel Delgado, Daniel Sanchez, Maria J. Martin-Bautista,等. Mining association rules with improved semantics in medical databases. Artificial Intelligence in Medicine, 2001,21: 241~245.
    [32] Krzysztof J. Cios, G. William Moore. Uniqueness of medical data mining. Artificial Intelligence in Medicine, 2002,26: 1~24.
    [33] Fernanado Alonso, Juan P. Caraca-Valente, Angel L. Gonzalez,等. Combining expert knowledge and data mining in a medical diagnosis domain. Expert Systems with Applications, 2002,23: 367~375.
    [34] Shusaku Tsumoto. Mining diagnostic rules from clinical databases using rough sets and medical diagnostics model. Information Sciences, 2004,162: 65~80.
    [35] Isabelle Bichindaritz, Sarada Akkineni. Concept mining for indexing medical literature. Engineering Applications of Artificial Intelligence, 2006,19: 411~417.
    [36] Fei Hu, Meng Jiang, Laura Celentano,等. Robust medical ad hoc sensor networks (MASN) with wavelet-based ECG data mining. Ad Hoc Networks, 2008,(6): 986~1012.
    [37]崔雷.医学数据挖掘,北京:高等教育出版社, 2006, 2~10.
    [38] Carlos Ordonez, Norberto Ezquerra, Cesar A. Santana. Constraining and summarizing association rules in medical data. Knowledge and Information Systems, 2006,9(3): 259~283.
    [39] Rakesh Agrawal, Tomasz Imielinski, Arun Swami, "Mining association rules between sets of items in large databases," in ACM SIGMOD, 1993.
    [40] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining数据挖掘导论,郑州:人民邮电出版社, 2006, 201~250.
    [41] http://en.wikipedia.org/wiki/Association_rule, 2008.
    [42] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman,等. Dynamic itemset counting and implication rules for market basket data ACM SIGMOD Record 1997,26(2).
    [43] J. Han, J. Pei, Y. Yin, "Mining Frequent Patterns without Candidate Generation.," in ACM SIGMOD International Conference on Management of Data, Dallas, USA, 2000, pp. 1~12.
    [44] Jong Soo Park, Ming-Syan Chen, Philip S. Yu, "Efficient Parallel Data Mining for Association Rules," in 4th International Conference on Information and Knowledge Management, Baltimore,Maryland, 1995.
    [45] Jong Soo Park, Ming~Syan Chen, Philip S. Yu, "An Effective Hash-Based Algorithm for Minging Association Rules," in ACM SIGMOD International Conference on Management of Data, 1995.
    [46]丁学钧,杨炎,杨克俭,等.基于属性的聚类算法在医生医疗质量评价系统中的应用研究.计算机应用研究, 2005: 217~219.
    [47] Marcela X. Ribeiro, Agma J. M. Traina, Caetano Traina Jr.,等., "How To Improve Medical Image Diagnosis through Association Rules: THe IDEA Method," in 21st IEEE International Symposium on Computer-Based Medical Systems, 2008.
    [48] R. Agrawal, R. Srikant, "Fast algorithms for mining association rules in large database," in VLDB'94, 1994.
    [49] H. F. Zhang, Y. C. Zhao, L. B. Cao,等. Combined association rule mining.Advances in Knowledge Discovery and Data Mining, Proceedings, 2008,5012: 1069~1074.
    [50] Haiwei Pan, Qilong Han, Guisheng Yin,等. "Association rule mining with domain knowledge constraint," in 3rd International Conference on Intelligent System and Knowledge Engineering, 2008, pp. 273~278.
    [51]张茹,杨志义.基于约束关联规则挖掘方法的研究与应用.科学技术与工程, 2007,7(4): 625~628.
    [52] G. Ryan. First Catch Your Weka: A Story of New Zealand Cooking. New Zealand Journal of History, 2009,43(1): 106~108.
    [53] O. Ivanciuc. Weka Machine Learning for Predicting the Phospholipidosis Inducing Potential. Current Topics in Medicinal Chemistry, 2008,8(18): 1691~1709.
    [54] Li Yang. Pruning and Visualizing Generalized Association Rules in Parallel Coordinates. IEEE Transactions on Knowledge and Data Engineering, 2005,17(1): 60~70.
    [55] S. Jaroszewicz, T. Scheffer, D. A. Simovici. Scalable pattern mining with Bayesian networks as background knowledge. Data Mining and Knowledge Discovery, 2009,18(1): 56~100.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700