超高维部分线性模型的PGFR变量筛选
Profile Greedy Forward Regression Variable Screening for Ultra-High Dimensional Partially Linear Model
-
摘要: 本文考虑超高维部分线性模型,其中线性部分的维数p大于样本量n, 且维数p随着样本量n呈指数阶增长.首先, 利用半参数回归的profile方法, 把超高维部分线性模型转化成超高维线性模型.其次, 为了对高维线性分量进行有效的变量筛选, 考虑到协变量之间的相关性,结合贪婪算法和向前回归变量筛选方法, 针对部分线性模型,提出了profile贪婪向前回归(PGFR)变量筛选方法. 在一定正则条件下,证明了所提PGFR方法具有筛选相合性.为了确定所选模型是否能够依概率趋于1包含真实模型, 进一步提出了BIC准则.最后, 通过模拟研究和实例分析验证了PGFR方法在有限样本下的完成情况.Abstract: In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure.