一种基于模糊选项关系的关键属性提取方法

英文篇名：A Fuzzy-Option Based Attribute Discriminant Method
作者：熊熙 ; 乔少杰 ; 韩楠 ; 元昌安 ; 张海清 ; 李斌勇
英文作者：XIONG Xi;QIAO Shao-Jie;HAN Nan;YUAN Chang-An;ZHANG Hai-Qing;LI Bin-Yong;School of Cybersecurity,Chengdu University of Information Technology;School of Management,Chengdu University of Information Technology;School of Computer and Information Engineering,Guangxi Teachers Education University;School of Software Engineering,Chengdu University of Information Technology;
关键词：选项约简 ; 模糊集 ; 医学数据挖掘 ; 临床决策 ; 属性提取
英文关键词：option reduction;;fuzzy sets;;medical data mining;;clinical decision-making;;attribute discrimination
中文刊名：JSJX
英文刊名：Chinese Journal of Computers
机构：成都信息工程大学网络空间安全学院;成都信息工程大学管理学院;广西师范学院计算机与信息工程学院;成都信息工程大学软件工程学院;
出版日期：2018-07-19 11:18
出版单位：计算机学报
年：2019
期：v.42;No.433
基金：国家自然科学基金(61772091,61802035);; 教育部人文社会科学研究青年基金(17YJCZH202);; 四川省科技计划项目(2018GZ0253,2018JY0448);; 成都信息工程大学科研基金(KYTZ201637,KYTZ201715,KYTZ201750);成都信息工程大学中青年学术带头人科研基金(J201701);; 成都市软科学研究项目(2017-RK00-00125-ZF,2017-RK00-00053-ZF);; 四川高校科研创新团队建设计划(18TD0027);; 广西自然科学基金项目(2018GXNSFDA138005);; 广东省重点实验室项目(2017B030314073)资助~~
语种：中文;
页：JSJX201901014
页数：13
CN：01
ISSN：11-1826/TP
分类号：192-204

摘要

模糊分析方法已广泛应用于医学实践包括对心理疾病的辅助诊断.属性约简方法在过滤冗余信息并提取关键信息时起到了重要作用,使整个临床决策过程更加准确和高效.这些方法抽取的有价值信息可以从新的视角揭示深层次医学知识.很多未经培训的参与者很难识别心理量表中选项间模糊的界线,即很难区分拥有相同意义但程度不同的选项.临床心理学自身的模糊性和心理测量数据的模糊性都将带来噪声.如果将心理测量数据中的属性看作信息系统的条件属性,利用降维算法可提取关键属性,从而简化对疑似患者的临床筛查过程.实际使用时,可对提取的关键属性或者拥有高权重的属性进行重点关注,从而迅速定位拥有异常关键属性的患者,对其优先处理.由此该文提出一种称为FOAD(Fuzzy-Option based Attribute Discriminant method)的基于模糊选项关系的关键属性提取方法,包括三个主要步骤:数据获取、模糊选项的选择与约简以及关键属性的排序与提取.每个参与者样本包含若干身体症状属性,为每个属性都选择一个程度选项.选择模糊选项时须同时考虑选择该选项的样本数量和选项的程度含义.而模糊选项约简算法作为整个方法的核心,可以将模糊选项合并到其他选项,以降低心理测量数据中选项的模糊度.实验中采用两个真实临床数据集验证FOAD算法的性能.首先使用各种属性提取算法对测试数据集进行处理,获取关键属性,然后将输出的关键属性作为条件属性,以诊断结论作为分类标签,利用逻辑回归方法对样本数据进行分类.实验结果表明:FOAD算法在不增加时间复杂度的前提下能将分类准确率普遍提高3.3%～14.1%.虽然选项约简操作造成部分信息的损失,但是合并模糊选项使选项分布更加清晰.FOAD作用下的LDA(Linear Discrimination Analysis)对各种参数敏感,尤其是对保留属性的个数.LDA的预测准确率从保留最少属性时提高6.7%,上升到保留最多属性时提高14.1%.PCA(Principal Component Analysis)算法选择的投影方向会使数据方差最大,保留的信息量最多,但分类效果差.因此FOAD算法很难应用于提高PCA的预测准确率,甚至在个别情况下,出现了FOAD引起PCA分类准确率降低的情况.此外,实验发现基于FOAD的LDA算法比其他属性模糊提取算法具有更高预测准确率.心理诊断数据具有明显的模糊性,一般的统计分析方法往往不能得到需要的结果.而利用最新的模糊集和粗糙集等特殊的数据预处理方法可以消除这种数据噪声,提高临床诊断效果.
Fuzzy analysis method has been widely used in medical domains including auxiliary diagnosis of mental diseases.Attribute reduction methods play an important role in filteringredundant information and extracting essential information,and facilitating the whole decisionmaking process.Valuable information extracted by these methods can reveal underlying medical knowledge through a novel perspective of clinical medicine.It is difficult for many untrained participants to identify the fuzzy boundaries between the options in psychometric scales,i.e.,it is difficult to distinguish options with the same meaning and different degrees.The noise data are generated due to the intrinsic fuzziness of clinical psychology and the psychometric data.If the attributes of psychological data are viewed as the condition attributes of an information system,the key attribute can be obtained by attribute discriminant methods,which will simplify the clinical screening process for suspected patients.This study focuses on the extracted key attributes or the attributes with high weight values,in order to quickly discover the patients with abnormal key attributes and give them prior treatment.A Fuzzy-Option based Attribute Discriminant method is proposed,called FOAD,which contains three main phases:data collection,fuzzy option selection and reduction as well as sort and extraction of key attributes.In regard to psychometric data,each sample contains several physical symptoms,which can be viewed as attributes,then it selects an option for each attribute.It is necessary to take the number of samples and the meanings of options into consideration simultaneously when selecting fuzzy options which will be removed.As the key part of the whole approach,the fuzzy option reduction algorithm can merge fuzzy options into other reserved options in order to reduce the fuzziness of psychometric data.Two real clinical datasets are used to verify the performance of FOAD algorithm.The key attributes are obtained from datasets by multiple categories of attribute discriminant algorithms.Then,it classifies samples by logistic regression based on the key attributes and diagnosis results,which are viewed as conditional attributes and classification labels,respectively.The experimental results on the real datasets demonstrate that the prediction accuracy can be improved by 3.3%-14.1% without increasing the computational complexity.Although the operation of option reduction loses some information in datasets,the option distribution becomes clearer by the merging operation.Linear Discrimination Analysis(LDA)under FOAD is sensitive to various parameters,especially to the number of reserved attributes.The prediction accuracy of LDA is increased from 6.7% when reserving the least attributes to14.1% when reserving the most attributes.Principal Component Analysis(PCA)algorithm chooses the projection direction with the maximal variance of data and retains the maximal information.Due to the poor classification performance,PCA can hardly be improved through FOAD.The prediction accuracy of PCA degrades even under some specific conditions.Moreover,LDA based on FOAD demonstrates better prediction accuracy than other fuzzy attribute discriminant methods.It is concluded that it is difficult to process the fuzzy clinical psychometric data by conventional statistical analysis methods.The special preprocessing methods,such as the stateof-the-art fuzzy set and rough set techniques,can eliminate the noise of data and improve the clinical diagnosis effect.

引文

[1]Sumathi M R,Poorna B.Prediction of mental health problems among children using machine learning techniques.International Journal of Advanced Computer Science&Applications,2016,7(1):552-557
    [2]Chen H L,Huang C C,Yu X G,et al.An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach.Expert Systems with Applications,2013,40(1):263-271
    [3]Son C S,Kim Y N,Kim H S,et al.Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches.Journal of Biomedical Informatics,2012,45(5):999-1008
    [4]Kessler R C,Aguilar-Gaxiola S,Alonso J,et al.The global burden of mental disorders:an update from the WHO World Mental Health(WMH)surveys.Epidemiologia e Psichiatria Sociale,2009,18(1):23-33
    [5]Zhang J,Zhu S,Du C,Zhang Y.Posttraumatic stress disorder and somatic symptoms among child and adolescent survivors following the Lushan earthquake in China:Asix-month longitudinal study.Journal of Psychosomatic Research,2015,79(2):100-106
    [6]Masri R Y,Jani H M.Employing artificial intelligence techniques in Mental Health Diagnostic Expert System//Proceedings of the International Conference on Computer&Information Science.Kuala Lumpur,Malaysia,2012:495-499
    [7]Rahman R M,Afroz F.Comparison of various classification techniques using different data mining tools for diabetes diagnosis.Journal of Software Engineering&Applications,2013,6(3):85-97
    [8]Seixas F L,Zadrozny B,Laks J,et al.A Bayesian network decision model for supporting the diagnosis of dementia,Alzheimer’s disease and mild cognitive impairment.Computers in Biology&Medicine,2014,51C(7):140-158
    [9]Khemphila A,Boonjing V.Parkinsons disease classification using neural network and feature selection.World Academy of Science,Engineering and Technology,2012,64:15-18
    [10]Dabek F,Caban J J.A neural network based model for predicting psychological conditions//Proceedings of the International Conference on Brain Informatics and Health.London,UK,2015:252-261
    [11]L8pez J,Maldonado S.Group-penalized feature selection and robust twin SVM classification via second-order cone programming.Neurocomputing,2017,235:112-121
    [12]Pawlak Z,Skowron A.Rudiments of rough sets.Information Sciences,2007,177(1):3-27
    [13]Wang C Z,Qi Y,Shao M,et al.A fitting model for feature selection with fuzzy rough sets.IEEE Transactions on Fuzzy Systems,2016,25(4):741-753
    [14]Wang F,Liang J,Dang C.Attribute reduction for dynamic data sets.Applied Soft Computing,2013,13(1):676-689
    [15]Fan J,Jiang Y,Liu Y.Quick attribute reduction with generalized indiscernibility models.Information Sciences,2017,s397:15-36
    [16]Zhao Z,Wang L,Liu H,Ye J.On similarity preserving feature selection.IEEE Transactions on Knowledge&Data Engineering,2013,25(3):619-632
    [17]Liu J,Lin Y,Lin M,et al.Feature selection based on quality of information.Neurocomputing,2017,225:11-22
    [18]Wu X H,Zhou J J.Fuzzy discriminant analysis with kernel methods.Pattern Recognition,2006,39(11):2236-2239
    [19]Xu J,Gu Z,Xie K.Fuzzy local mean discriminant analysis for dimensionality reduction.Neural Processing Letters,2015,44(3):1-18
    [20]Zhao M,Chow T W S,Zhang Z,Random walk-based fuzzy linear discriminant analysis for dimensionality reduction.Soft Computing,2012,16(8):1393-1409
    [21]Zhang J,Zhang Y,Du C,et al.Prevalence and risk factors of posttraumatic stress disorder among teachers 3months after the Lushan earthquake:A cross-sectional study.Medicine,2016,95(29):e4298
    [22]Lee S,Ma Y L,Tsang A.Psychometric properties of the Chinese 15-item patient health questionnaire in the general population of Hong Kong.Journal of Psychosomatic Research,2011,71(2):69-73
    [23]Qu Y,Jiang H,Zhang N,et al.Prevalence of mental disorders in 6-16-year-old students in Sichuan province,China.International Journal of Environmental Research&Public Health,2015,12(5):5090-5107
    [24]Candes E,Li X,Ma Y,Wright J.Robust principal component analysis?Journal of the ACM,2009,58(3):11
    [25]Sharma A,Paliwal K K.A deterministic approach to regularized linear discriminant analysis.Neurocomputing,2015,151(1):207-214

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700