汪红霞, 房丽云, 卜士杰, 许佩蓉. 一种基于鞅差散度的纵向数据降维方法[J]. 应用概率统计, 2023, 39(1): 132-158. DOI: 10.3969/j.issn.1001-4268.2023.01.009
引用本文: 汪红霞, 房丽云, 卜士杰, 许佩蓉. 一种基于鞅差散度的纵向数据降维方法[J]. 应用概率统计, 2023, 39(1): 132-158. DOI: 10.3969/j.issn.1001-4268.2023.01.009
WANG Hongxia, FANG Liyun, BU Shijie, XU Peirong. Dimension Reduction for Longitudinal Data Based on Martingale Difference Divergence[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(1): 132-158. DOI: 10.3969/j.issn.1001-4268.2023.01.009
Citation: WANG Hongxia, FANG Liyun, BU Shijie, XU Peirong. Dimension Reduction for Longitudinal Data Based on Martingale Difference Divergence[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(1): 132-158. DOI: 10.3969/j.issn.1001-4268.2023.01.009

一种基于鞅差散度的纵向数据降维方法

Dimension Reduction for Longitudinal Data Based on Martingale Difference Divergence

  • 摘要: 变量间的相关性和同一个体多次观测之间的相关性是纵向数据集两大固有特点, 这两种相关性包含纵向数据的许多重要信息.本文借鉴矩阵值数据的降维思想, 利用这两种相关性对纵向数据进行降维,提出一种基于鞅差散度的充分维数折叠降维方法. 理论上,该降维准则在总体形式下能找到中心均值维数折叠子空间,实现时间和变量两个维度的同时降维, 基于其样本形式得到的中心均值维数折叠子空间的估计具有n相合性. 算法上,通过引入Kronecker乘积假定, 将降维过程转化为带约束的低维优化问题,从而可以用成熟的非线性优化算法快速求解. 进一步地,本文提出一种相合的BIC准则自适应地确定结构维数. 相较于文献中的降维方法,数值模拟表明所提方法不仅能快速实现,而且在中心均值维数折叠子空间的估计和结构维数的确定上有更高的准确度.最后, 本文通过原发性胆汁性肝硬化临床数据的实证分析验证了所提方法的有效性.

     

    Abstract: Within-subject correlation and correlation among variables are two inherent characteristics of longitudinal datasets, which contain lots of important data information. In order to use these two kinds of correlation for dimension reduction, in this paper, we propose a sufficient dimension folding method based on martingale difference divergence in the spirit of dimension folding of matrix-valued data. It can be shown that the method can find the central mean dimension folding subspace in the population level, and can reduce the dimensions of both predictors and observation times simultaneously. Further, the estimated basis directions ensures the root-n consistency. To implement the proposed method, the Kronecker product assumption is introduced, so that the process can be transformed to a constrained low-dimensional optimization problem, which can be quickly solved by exisiting nonlinear optimization algorithms. Furthermore, a consistent BIC criterion is proposed to determine the structural dimension. Simulation studies show that the proposed method is efficient and can have higher accuracy on subspace estimation and structural dimension determination. Finally, an application on primary biliary cirrhosis data is used to illustrate the effectiveness of the proposed method.

     

/

返回文章
返回