Regression variable subset selection is one of the most important
aspects in linear model theory. If the selected subset is consistent when the sample size
tends to infinity, and the prediction mean square error is small, then the selection method
is preferred. The BIC criterion can give consistent subset, but as the number of variables
get large, it involves too much computation. The adaptive lasso has better computational
efficiency, while keeping consistency. In this paper we propose a new approach for multiple
linear regression variable selection, which is much simpler than the other variable
selection methods, while it gives consistent subset. The new method only compute two passes
of ordinary least squares regressions, the first pass computes a complete set regression,
selects a variable subset based on the regression coefficient estimates, then the second
pass regresses on the selected variables.
Consider the following regression model:
where the indexes of the non-zero elements of are denoted by , and
suppose the new method gives a regression variable subset indexed by , and is the regression coefficient estimate using our new method, in which
the coefficients of the dropped out variables are defined to be zero. We proved that under
suitable conditions
where denotes the vector composed of the
elements of indexed by , is the error
variance, are matrix and constant relying on the limit of .
Simulation result and application examples show that the new approach have good small
to medium sample performance, which is comparable to the other methods such as BIC, adaptive
lasso.
ͨѶ����:
�¼Ҷ�
����:
�¼Ҷ�, ���. ��Ԫ�ع���ѡ���Ա�����һ�ּ���[J]. Ӧ�ø���ͳ��, 2015, 31(1): 71-88.
Chen Jiading, Li Dongfeng. A Simple Approach in Regression Variable Selection. CHINESE JOURNAL OF APPLIED PROBABILITY AND STATIST, 2015, 31(1): 71-88.