Improved Classification for Compositional Data Using the <em class="a-plus-plus">α</em>-transformation

Improved Classification for Compositional Data Using the α-transformation

详细信息查看全文

作者：Michail Tsagris ; Simon Preston ; Andrew T. A. Wood
关键词：Compositional data ; Classification ; α ; transformation ; α ; metric ; Jensen ; Shannon divergence
刊名：Journal of Classification
出版年：2016
出版时间：July 2016
年：2016
卷：33
期：2
页码：243-261
全文大小：366 KB
刊物类别：Mathematics and Statistics
刊物主题：Statistics
Statistical Theory and Methods
Pattern Recognition
Bioinformatics
Signal,Image and Speech Processing
Psychometrics
Marketing
出版者：Springer New York
ISSN：1432-1343
卷排序：33

文摘

In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700