邹航, 姜云卢. 高维线性回归模型稳健变量选择方法综述[J]. 应用概率统计, 2024, 40(1): 157-181. DOI: 10.3969/j.issn.1001-4268.2024.01.010
引用本文: 邹航, 姜云卢. 高维线性回归模型稳健变量选择方法综述[J]. 应用概率统计, 2024, 40(1): 157-181. DOI: 10.3969/j.issn.1001-4268.2024.01.010
ZOU Hang, JIANG Yunlu. Overview of Robust Variable Selection Methods for High-Dimensional Linear Regression Model[J]. Chinese Journal of Applied Probability and Statistics, 2024, 40(1): 157-181. DOI: 10.3969/j.issn.1001-4268.2024.01.010
Citation: ZOU Hang, JIANG Yunlu. Overview of Robust Variable Selection Methods for High-Dimensional Linear Regression Model[J]. Chinese Journal of Applied Probability and Statistics, 2024, 40(1): 157-181. DOI: 10.3969/j.issn.1001-4268.2024.01.010

高维线性回归模型稳健变量选择方法综述

Overview of Robust Variable Selection Methods for High-Dimensional Linear Regression Model

  • 摘要: 随着大数据时代的到来,在经济学、金融学和生物医学等众多研究领域中频繁收集到高维数据.高维数据的特征之一是变量维数p随着样本量~n~的增加而变大且通常会超过样本量, 同时, 异常值也容易出现在高维数据中. 因此,如何克服异常值给高维统计推断带来的影响, 从而得到更精确的模型,是目前统计学研究的热点问题之一.本文是对高维线性模型下的稳健变量选择方法进行综述. 具体地, 首先介绍评估稳健性的三个指标: 影响函数、崩溃点和最大偏差.其次着重介绍了稳健变量选择方法, 包括响应变量含有异常值,响应变量和协变量都含有异常值, 高崩溃点且高效的变量选择方法.紧接着介绍相关算法, 通过模拟和实例比较不同变量选择方法. 最后,简要探讨了高维稳健有效变量选择方法存在的问题及未来的可能发展方向.

     

    Abstract: With the advance of the era of big data, high-dimensional data are frequently collected in many research fields such as economics, finance, and biomedicine. One of the characteristics of high-dimensional data is that the variable dimension p increases with the increase of the sample size n and usually exceeds the sample size. At the same time, outliers are also prone to appear in high-dimensional data. Therefore, how to overcome the influence of outliers on high-dimensional statistical inference, so as to obtain a more accurate model, is one of the hot issues in current statistical research. This paper is an overview of robust variable selection methods under high-dimensional linear models. Specifically, first of all, we introduce three indicators to evaluate robustness: influence function, breakdown point and maximum deviation. Secondly, it focuses on the selection methods of robust variables, including response variables with outliers, response variables and covariates with outliers, high breakdown point and efficient variable selection methods. Then, the related algorithms are introduced, and different variable selection methods are compared through simulation and examples. Finally, the problems of high-dimensional robust effective variable selection methods and the possible development direction in the future are briefly discussed.

     

/

返回文章
返回