大数据背景下网络调查样本的超总体局部多项式回归模型推断研究
Research on Superpopulation Local Polynomial Regression Model Inference of Web Survey Samples Under the Background of Big Data
-
摘要: 大数据与互联网技术的高速发展使得网络调查越来越流行, 然而大部分网络调查样本本质上属于非概率样本, 难以采用传统的抽样推断理论对其进行统计推断. 考虑非参数模型假定相对较少的特点, 提出网络调查样本的超总体局部多项式回归模型推断方法. 首先对网络调查样本构建非参数超总体局部多项式回归模型对总体的目标变量进行预测, 然后基于网络调查样本数据采用倾向得分方法估计预测误差, 从而得到局部多项式回归的总体均值估计. 模拟和实证研究结果表明: 相对于倾向得分逆加权估计与基于参数超总体线性回归模型的总体均值估计, 基于非参数超总体局部多项式回归模型的总体均值估计的偏差、标准差、均方误差更小, 估计效果较好.Abstract: With the rapid development of big data and Internet technology, web surveys are be- coming increasingly popular. However, most web survey samples are essentially non-probability samples. Thus, it is difficult to make inference from web survey samples using traditional sampling inference theory. Taking into account the few assumptions of nonparametric models, the superpopulation local polynomial regression model inference approach of web survey samples is proposed. The nonparametric superpopula- tion local polynomial regression model is firstly established to predict the target variable of the population based on web survey samples. Then the propensity score method is adopted to estimate the prediction error based on web survey samples. The population mean estimator of local polynomial regression is lastly obtained. Simulation and empirical analysis show that compared with the inverse weighted esti- mator of propensity scores and the population mean estimator based on the parametric superpopulation linear regression model, the population mean estimator based on the nonparametric superpopulation local polynomial regression model has smaller bias, standard deviation and mean square error. The proposed method has good performance.