牛勇, 李华鹏, 刘阳惠, 熊世峰, 於州, 张日权. 超高维数据特征筛选方法综述[J]. 应用概率统计, 2021, 37(1): 69-110. DOI: 10.3969/j.issn.1001-4268.2021.01.007
引用本文: 牛勇, 李华鹏, 刘阳惠, 熊世峰, 於州, 张日权. 超高维数据特征筛选方法综述[J]. 应用概率统计, 2021, 37(1): 69-110. DOI: 10.3969/j.issn.1001-4268.2021.01.007
NIU Yong, LI Huapeng, LIU Yanghui, XIONG Shifeng, YU Zhou, ZHANG Riquan. Overview of Feature Screening Methods for Ultra-high Dimensional Data[J]. Chinese Journal of Applied Probability and Statistics, 2021, 37(1): 69-110. DOI: 10.3969/j.issn.1001-4268.2021.01.007
Citation: NIU Yong, LI Huapeng, LIU Yanghui, XIONG Shifeng, YU Zhou, ZHANG Riquan. Overview of Feature Screening Methods for Ultra-high Dimensional Data[J]. Chinese Journal of Applied Probability and Statistics, 2021, 37(1): 69-110. DOI: 10.3969/j.issn.1001-4268.2021.01.007

超高维数据特征筛选方法综述

Overview of Feature Screening Methods for Ultra-high Dimensional Data

  • 摘要: 随着数据收集和存储能力的大幅提高,超高维数据\ucite9, 即数据维数伴随着样本呈指数增长,频繁出现在许多科学邻域. 此时, 惩罚类变量选择方法普遍遭遇三个方面的挑战:计算的复杂性, 统计的准确性以及算法的稳定性.Fan和Lv\ucite9首先提出超高维特征筛选的方法,并在近十多年取得大量研究成果, 成为当今统计最热点的研究邻域.本文主要从带模型假设, 包含参数、非参数半参数模型假定的筛选方法、无模型假设的筛选以及特殊数据的筛选方法四个角度来介绍超高维筛选相关工作,并简要探讨目前超高维筛选方法存在的问题以及未来可能的研究方向.

     

    Abstract: With the improvement of data collection and storage capacity, ultra-high dimensional data\ucite9, that is, dimensionality with the exponential growth of samples appears in many scientific neighborhoods. At this time, penalized variable selection methods generally encounter three challenges: computational expediency, statistical accuracy, and algorithmic stability, which are limited in handling ultra-high dimensional problems. Fan and Lv\ucite9 proposed the method of ultra-high dimensional feature screening, and achieved a lot of research results in the past ten years, which has become the most popular field of research in statistics. This paper mainly introduces the related work of ultra-high dimensional screening method from four aspects: the screening methods with model hypothesis, including parametric, non-parametric and semi-parametric model hypothesis, model-free hypothesis, and screening methods for special data. Finally, we briefly discuss the existing problems of ultra-high dimensional screening methods and some future directions.

     

/

返回文章
返回