数据挖掘模型的创建及其在中医药文献中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
五千年中华民族的文化底蕴是中医药发生、发展的基础。中医药领域的无数临床实践与理论研究积累了大量的科学知识,这些知识包含在中医药古籍文献以及当前的研究文献中。面对如此海量的中医药数据,如何有效地利用这些宝贵资源就成了发展中医药必须面对的一个问题。
     中医药学有其自身的思维模式,具有系统性、整体性、复杂性、不确定性等特点,不适宜运用传统的还原论的方法研究。数据挖掘可以从海量的数据中寻找潜在的规律,完成普通人不能完成的任务。目前,数据挖掘相关技术和方法已经较为成熟,且存在着一套行之有效的方法。因此,应用数据挖掘技术进行有效模式、知识的获取研究,必将加速推进中医药国际化、现代化、规范化和知识化进程,对中医药学的长期稳定发展具有重要意义。
     数据挖掘(DM)是近20年来随着人工智能和数据库技术发展起来的,是一门涉及人工智能与数据库、统计学、机器学习等不同学科和领域的交叉学科。本文中数据挖掘采用广义观点,即等同于KDD,为从存放在数据库、数据仓库或其他信息库中的大量数据中挖掘有趣知识的过程。
     面对中医药数据描述多样化且不完备等现象,在标准化处理的同时,还必须对现有的数据挖掘技术进行改进和发展。本文以KDD方法为基础,创建了一种人机互动的数据挖掘模型。人工作业仅为编排及指定,最小化人为建档的工作量,并留下原始文本数据的换行断词噪声,作为操作标的,分析其产生结果。本程序可直接由文本数据作为处理标的。值得注意的是,基本辨认语料库必须正确,方有正确的结论。而数据资料的标准化则是可做可不做,重点在于我们对结论精度范围的要求。
     将此模型应用于选定的中医药文献资料进行挖掘研究,结果表明:(1)可以按照中医学理、法、方、药顺序做出标示及索引,能够揭示六名医家常用的相同或相似药物。(2)通过对《傅青主女科》方药规律的数据挖掘,发现当归、人参、川芎、酒、白芍、茯苓等药物及其配伍药对、药团最为常用,生化汤使用频率最高,提示补血调血及补气健脾的重要性。其中,对酒的普遍运用非常例外,这在之前的文献研究中很少述及。(3)通过对487首治噎膈病方剂的较为深入地研究,发现理气药所占频次最高,陈皮、木香、甘草、肉桂、人参等药物及其配伍药对或药团最为常用,而对于温里药及诃子的重视与现代临床用药有较大差异。还有,在剂型方面多选用散剂和丸剂,寓缓消渐散之意;在服法方面多选用不拘时候、内服、噙服,意在延长了药物与病灶局部的接触时间以提高药效。
The occurrence and development of traditional Chinese medicine is based on the Five-thousand year's Chinese cultural deposits.The numerous clinical practices and theoretical studies in the field of traditional Chinese medicine accumulate a large amount of scientific knowledge,which contained in ancient Chinese medicine literatures and current research literatures.Facing with such massive traditional Chinese medicine data,how to use of these valuable resources effectively has become a problem to develop traditional Chinese medicine.
     Traditional Chinese medicine has its own mode of thinking,such as systematicness,entirety,complexity,uncertainty and so on,which is not appropriate to be researched in the method of traditional reductionism.Data mining can seek for the potential law from the massive data and complete the tasks that ordinary people can't do.At present,the correlative technologies and methods of data mining have been mature,which also exist a kind of good-effective methods.Therefore,applying data mining technology to research on acquisition of the effective mode and knowledge,will accelerates the internationalization,modernization,standardization and knowledgeization processes of traditional Chinese medicine,which has great significance on the long-term stable development of traditional Chinese medicine.
     Data Mining(DM),developing with artificial intelligence and technique of database nearly 20 years,is a cross discipline of different subjects and interdisciplinary areas,involving artificial intelligence and databases,statistics, machine learning,etc.In this paper,Data mining is used by the broad point of view (KDD),
     Namely the process of mining interesting knowledge from massive data stored in the database,data warehouse or other information base.
     Facing with the diversification and imperfection of traditional Chinese medicine data description,it is necessary to improve and develop existing data mining techniques.In this paper,we use KDD method as a basis to create a data mining mode of man-machine interaction,to minimize the workload of people performance and directly use textual data as handling purpose.This method is used in the traditional Chinese medicine literatures,which could extract,unite and discover knowledge from a large number of literatures and could make the data mining mode quickly handle a large number of traditional Chinese medicine literatures and excavate knowledge in specific areas.
     The results about the data mining model in the application and research of traditional Chinese medicine literatures show that:(1) it can make the mark and index in accordance with the principle,law,formula and drugs of traditional Chinese medicine,reveal the same or similar drugs used by the six famous doctors.(2) Mining the data from drug laws of "Fu Qing-zhu gynecology",it found that Radix Angelicae Sinensis,Radix Ginseng,Rhizoma Chuanxiong,liquor,Radix Paeoniae Alba,poria,etc and their compatibility drugs are most commonly used,which suggest the importance of nourishing and adjusting blood,invigorating vital energy and spleen.Among,the widespread use of alcohol is an extraordinary exception, which is rarely mentioned in the previous literatures.(3) Through deeply researching 487 prescriptions for cardiac spasm,it shows that Qi regulating agents share the highest frequency,seasoned orange peel,Radix Aucklandiae,Radix Glycyrrhizae, Cortex Cinnamomi Cassiae,Radix Ginseng,etc and their compatibility drugs are most commonly used.However,there is a major difference of clinical medication of Fructus Chebulae and drugs for dispelling internal cold.Additional,prescriptions for cardiac spasm are more chosen in the form of powder and pills,the method of drug administration are irrespective of time,oral and hold in the mouth,which could prolong contact time of the drug and partial lesions so as to enhance efficacy.
引文
[1]邢玉瑞.中医思维研究与建构中医思维方法学.中医学学刊,2003,21(11):1832
    [2]阎英杰,田琳,朱建贵.等中医临床遣方用药思维模式探析.中国中医药信息杂志,2005,12(7):3
    [3]畅达.对中医临床思维方法的思考.山西中医,2005,21(1):43
    [4]游梅妥.谈提高中医临床思维.福建中医药,2006,37(6):60
    [5]Fayyad U,Piatetsky-Shapiro G,Smyth P.From Data Mining to Knowledge Discovery in Databases.AI Magazine,1996,17(3):37-54.
    [6]Coulter DM,Bate A,M eyboom RH,et a.l Antipsychotic drugs andheartmuscle disorder in international pharm acovigilance.Datamining study.BMJ,2001,322(7296):1207.
    [7]Ohrn A,Row land T.Rough sets:a knowledge discovery technique formultifactorialmedical outcome.Am J PhysMed Rehabi,2000,79(1):100.
    [8]李昆,宋姚屏,陈云惠,等.数据挖掘技术在药对配伍规律研究中的应用.辽宁中医杂志,2006,33(7):773-774
    [9]Han J,Kamber M.Data Mining:Concepts and Techniques.Morgan Kaufmann Publishers,2000.5-14.
    [10]吴朝晖,封毅.数据库中知识发现在中医药领域的若干探索Ⅰ.中国中医药信息杂志,2005,12(10):93-95
    [11]Fayyad U M,Piatetsky-Shapiro G,Uthurusamy R.Summary from the KDD-03Panel-Data Mining:The Next 10 Years.SIGKDD Explorations,2003,5(2):191-196.
    [12]孙燕,臧佳新,任廷革.基于数据挖掘技术的医案整理方法探讨.中国中医药信息杂志,2006,13(11):106-107
    [13]陈文伟.数据仓库与数据挖掘,北京:清华大学出版社,2006,8.
    [14]于长春,贺佳,范思昌,等.数据挖掘技术在医院领域中的应用.第二军医大学学报,2003,24(11):1250-1252.
    [15]陈明,张书河.关联规则在中医疾病证候诊断中的应用.中华医学丛刊,2004,4(5):14-16.
    [16]张凤娥.关于建立证型基因表达谱数据库的设想.山西中医学院学报,2002,3(3):1.
    [17]郭红霞,师义民.中医脉象的BP神经网络分类方法研究.计算机工程与应用,2005,32(5):187-189.
    [18]岳沛平,李训铭.基于小波分析和BP神经网络识别的中医脉象信号辨识系统.计算机与现代化,2005,24(12):1-4.
    [19]刘关松,徐建国,高敦岳.基于神经网络集成的舌苔分类方法.计算机工程,2003,29(14):100-102.
    [20]王爱民,赵忠旭,沈兰荪.中医舌象自动分析中舌色、苔色分类方法的研究.北京生物医学工程,2000,19(3):136-142.
    [21]秦中广,毛宗源,邓兆智.粗糙集在中医类风湿证候诊断中的应用.中国生物医学工程学报,2001,20(4):354-363.
    [22]李晓春,李有田,西村千秋,等.计算机在中医辨证中的应用.白求恩医科大学报,1997,27(3);328.
    [23]王学伟,瞿海斌,王阶.一种基于数据挖掘的中医定量诊断方法.北京中医药大学学报,2005,28(1):4-7.
    [24]徐蕾,贺佳,孟虹,等.基于信息熵的决策树在慢性胃炎中医辨证中的应用.第二军医大学学报,2004,25(9):1009-1012.
    [25]蒋永光,胡波,刘娟.方剂配伍的数据挖掘可行性探索.四川中医,2004,22(8):25-27.
    [26]吴朝晖,封毅.KDD在中医药领域的若干探索.中国中医药信息杂志,2005,12(11):93-95.
    [27]张承江,宋立群,闫朝升.中医肾病治疗信息中关联规则的挖掘算法.黑龙江大学自然科学学报,2005,22(6):842-845.
    [28]朱凌云,吴宝明.医学数据挖掘的技术、方法及应用.生物医学工程学杂志,2003,20(3):559-562.
    [29]Han J,Kamber M.Data Mining:Concepts and Techniques.Beijing:High Education Press,2001.
    [30]Agrawal R,ImielinSKi T,Swami A.Ming association rules between sets of items in large databases.In:Proc.of the ACM SIGMOD Intl Conf.on Management of Data.Vol 2,Washington DC:SIGMOD,1993,207-216.
    [31]Agrawal,R Srikant.Fast algorithms for mining association rules[J].In:Proc.of the 20th Intl Conf.VeryLarge Data Bases(VLDB'94).1994,487-499.
    [32]Yang M,Sun ZH.An incremental updating algorithm based on prefix ceneral list for asociation rules.Chinese Journal of Computers,2003,26(10):1318-1325.
    [33]Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation.In:Proc.of the 2000 ACM-SIGMOD Intl Conf.on Management of Data.Dallas:ACMPress,2000,1-12.
    [34]Lu SF,Lu ZD.Fast mining maximum frequent itemsets.Journal of Software,2001,12(2):293-297.
    [35]Song YQ,Zhu YQ,Sum ZH,et al.An algorithm an its updating algorithm based on FP-Tree for miningmaximum frequent itemsets.Journal of Software,2003,14(9):1586-1592.
    [36]贾亚莉,左翔.数据挖掘在心血管病住院患者病情分析的应用研究.中国农村卫生事业管理,2005,25(6):67-69.
    [37]余辉,吕扬生.数据挖掘技术在生物医学领域的应用.国外医学·生物医学工程分册,2003,26(2):54-59.
    [38]张承江,宋立群,闫朝升.中医肾病治疗信息中关联规则的挖掘算法.黑龙江大学自然科学学报,2005,22(6):842-845.
    [39]何前锋,崔蒙,吴朝晖,等.方剂中配伍知识的发现.中国中医药信息杂志,2004,11(7):655-658.
    [40]宋小莉,牛欣,司银楚,等.人工神经网络在半夏泻心汤配伍建模中的应用.中国临床药理学与治疗学,2005,10(1):104-107.
    [41]刘娟,蒋永光,胡波,等.白术类方的药证分析.成都中医药大学学报,2004,27(4):55
    [42]陈波,蒋永光,胡波,等.东垣脾胃方配伍规律之关联分析评述.中医药学刊, 2004,22(4):611-612.
    [43]蒋永光.《伤寒论》症状统计分析.辽宁中医杂志,1994,21(11):490.
    [44]姚美村,艾路,袁月梅,等.消渴病复方配伍规律的关联规则分析.北京中医药大学学报,2002,25(6):48-50.
    [45]四川大学化工学院.数字中药揭示临床“玄机”.中国中医药信息杂志,2005,12(9):108.
    [46]周鲁.中药复方治疗类风湿性关节炎的用药规律研究.四川中医,2004,22(12):23-24.
    [47]周鲁.中药复方治疗病毒性心肌炎用药分析.辽宁中医杂志,2005,32(8):827-828
    [48]周鲁.老年性痴呆的复方用药规律研究.辽宁中医杂志,2005,32(3):243-244.
    [49]陈和,陈通文.古今文献治阳痿病324方治法统计分析。陕西中医,2002,23(5):456.
    [50]范欣生.哮喘方组方规律探析.南京中医药大学学报,1995,11(2):33.
    [51]吴荣,刘睨,王阶,等.基于关联规则的名老中医冠心病用药规律研究.中国中药杂志,2007,32(17):1786-1788.
    [52]乔模,王笈.从《金匮》药物统计统计试论仲景论病特点.中医药研究,1994,(2):6.
    [53]杨林,徐慧,任延革,等.数据库技术与Web结合实现网上中医方剂的信息挖掘.中国中医药信息杂志,1999,9(6):71-72.
    [54]周忠眉.数据挖掘在方剂配伍规律研究应用的探讨.漳州师范学院学报(自然科学版),2003,16(4):31
    [55]周琳琳.中医药信息学发展现状分析.中国中医药信息杂志,2002,9(10):69-71
    [56]朱文锋.中医辅助诊疗系统的研究.中国中医基础医学杂志,2003,9(10):8-11
    [57]王庆宪.中医思维学.重庆:重庆出版社,1990,9.
    [58]王映辉,姜在汤,闰英杰,等.基于信息和数据挖掘技术的名老中医临床诊疗经验研究思路世界科学技术.中医药现代化,2005,7(1):98]
    [59]高洪深.决策支持系统——理论.方法.案例.北京:清华大学出版社,2000,10.
    [60]冯雪松,董鸿晔.中药指纹图谱中的数据挖掘技术.药学进展,2002,6(4):198-201.
    [61]梁逸曾.浅议中药色谱指纹图谱的意义、作用及可操作性.中药新药与临床药理,2001,12(3):196-200.
    [62]梁逸曾,龚范,俞汝勤,等.化学计量学用于中医药研究.化学进展,1999,11(2):208-211.
    [63]徐永群,孙素琴,周群,等.红外光谱与人工神经网络相结合识别栽培、野生黄芩和粘毛黄芩.光谱学与光谱分析,2002,22(6):945-948
    [64]刘沐华,张学工,周群,等.模式识别和红外光谱法相结合鉴别中药材产地.光谱学与光谱分析,2005,25(6):878-881
    [65]张亮,蓝要武,韩英,等.人工神经网络用于中药材雷公藤和昆明山海棠的分类识别研究.药学学报,1995,30(2):127-132.
    [66]乔延江,吴刚,王玺,等.中药蟾蜍质量的人工神经网络化学模式识别研究.分析化学,1995,23(6):630-634
    [67]Kusiak A,Kern JA,Kemstine KH,et al.Autonomous decision making:a data mining appmach.IEEETrans InfIechnol Biomed,2000,4(4):274.
    [68]陆爱军,刘冰,刘海波,等.中药化学数据库关联规则的挖掘.计算机与应用化学,2005,22(2):108-112
    [69]乔延江.中药(复方)KDD研究开发的意义.北京中医药大学学报,1998,21(3):15-17
    [70]Frayyad UM,Piatetsky-Shapiro G,Smyth P,et al.Knowledge Discovery and Data mining:Towards a Unifying Frame-work ProcKDD-96,Menlo park,CA:AAAIPress,1996,82-88.
    [71]胡文丰,张正国.生物医学数据挖掘.国外医学·生物医学工程分册,2003,26(1):11-15.
    [72]周雪忠,吴朝晖,刘保延.生物医学文献知识发现研究探讨及展望.复杂系统与复杂性科学,2004,1(3):45-55.
    [73]杜建强,吴友平.中医药领域信息技术的应用现状与发展.江西中医学院学报,2005,(17)6:64-66.
    [74]王映辉,姜在,闫项杰,等.基于信息和数据挖掘技术的名老中医临床诊疗经验研究思路.世界科学技术—中医药现代化数字人体,2005,7(1):98-105
    [75]朱杰.《傅青主女科》学术思想简评.江苏中医,1998,19(6):8-9
    [76]王永炎.中医内科学.上海:上海科学技术出版社,1997,189.
    [77]司富春,陈玉龙.古方治疗噎膈用药分析.山东中医杂志,2004,23(7):385-386
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.