QUAN Zhuojun, ZHENG Ming, YU Wen. SSI for Case-control Binary Data under Possibly Mis-specified Logistic Models[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(5): 730-746. DOI: 10.3969/j.issn.1001-4268.2023.05.008
Citation: QUAN Zhuojun, ZHENG Ming, YU Wen. SSI for Case-control Binary Data under Possibly Mis-specified Logistic Models[J]. Chinese Journal of Applied Probability and Statistics, 2023, 39(5): 730-746. DOI: 10.3969/j.issn.1001-4268.2023.05.008

SSI for Case-control Binary Data under Possibly Mis-specified Logistic Models

  • Semi-supervised data contains a labeled data set with both responses and covariates and an unlabeled data set with covariates only. The inference based on semi-supervised data is gaining more and more interests in statistics. When the response in the labeled data is binary, case-control sampling is commonly used to alleviate the imbalanced data structure. When the response and the covariates satisfy the logistic model, the slope parameter of the model can be consistently estimated even for the case-control sampling. However, when the logistic model is incorrectly specified for the data, the case-control samples can not estimate the population risk minimizer consistently. With the help of the unlabeled data, we derive a consistent estimator for the case population proportion. Then, an inverse probability weighted loss function is developed to obtain a consistent estimator for the population risk minimizer. The proposed estimators are shown to be asymptotically normal and the limiting variance-covariance matrix can be consistently estimated. Simulation results show that the proposed method gives out reasonable finite sample performances. A real data example is also analyzed for illustration.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return