Abstract:
Biased sampling is a pervasive issue that transcends various disciplines, impacting fields such as econometrics, epidemiology, medicine, survey research, and more recently, machine learning and artificial intelligence (AI). This ubiquitous challenge arises when the selection of data points for analysis or research introduces systematic biases, potentially compromising the accuracy and reliability of research outcomes. In this paper, our objective is to provide a comprehensive overview of the foundational concepts related to biased sampling problems and the methods of inference. Furthermore, we aim to establish a connection between biased sampling issues and the more recent discussions in machine learning regarding distribution shift problems. Additionally, we will delve into the latest advancements in biased sampling, particularly within the context of transfer learning and conformal inference for predictive confidence intervals. Our ultimate goal is to present this material in a manner that is accessible to graduate students, enabling them to identify applications of biased sampling problems within their own research endeavors. It is with deep respect and gratitude that we dedicate this paper to the memory of the late Professor Shisong Mao, whose guidance and wisdom have been invaluable throughout the years.