回归变量选择中的数据诊断

Data diagnostics in subset selection of regression

  • 摘要: 在多元线性回归中,变量选择紧密依赖模型,与影响数据密切相关。本文从模型扰动的角度,研究了变量选择与数据的关系,用微分几何中的概念,提出了用曲线的变化率、加速率及其曲率三种量测,去评价数据对变量选择的影响,从而诊断影响数据。文中给出的数值例子表明,所提影响量测,对于诊断数据对变量选择的影响是有效的。

     

    Abstract: In multivariate linear regression, subset selection relies on models and relates closely to mfluence data. In this paper, the relation between subset selection and data based on Cp-criterion is studied from the model perturbation. Using the concepts of differential geometry, three measures——velocity, acceleration and curvature are proposed to assess the influence of data on subset selection and to detect the influence data. A numerical example is given showing that the influence measures are effective.

     

/

返回文章
返回