基于主成分分析的成分数据缺失值插补法

Imputation of Missing Values for Compositional Data Based on Principal Component Analysis

  • 摘要: 本文针对成分数据的特殊几何结构, 提出了两种新方法对成分数据缺失值进行插补. 一种是用单形空间的均值进行插补, 主要是用Aitchison距离找到含缺失值样本的个近邻样本, 再结合单形空间中的加法运算与数乘运算, 用单形空间上的均值对成分数据的缺失值进行插补; 另一种是用主成分回归方法进行插补, 先将用第一种方法进行初始插补的成分数据经过等距对数比变换变成普通数据, 再用主成分回归进行第二次插补. 实例分析和实验模拟结果表明: 与近邻插补法、迭代的最小二乘插补法相比较, 本文提出的主成分插补法更优.

     

    Abstract: In this paper, considering of the special geometry of compositional data, two new methods for estimating missing values in compositional data are introduced. The first method uses the mean in the simplex space which mainly finds the -nearest neighbor procedure based on the Aitchison distance, combining with two basic operations on the simplex, perturbation and powering. As a second proposal the principal component regression imputation method is introduced which initially starts from the result of the proposed the mean in the simplex. The method uses ilr transformation to transform the compositional data set, and then uses principal component regression in a transformed space. The proposed methods are tested on real data and simulated data sets, the results show that the proposed methods work well.

     

/

返回文章
返回