赵晓, 李沐曦, 张耀武, 朱利平. 基于Rosenblatt变换的朴素贝叶斯改进[J]. 应用概率统计, 2024, 40(6): 975-987. DOI: 10.12460/j.issn.1001-4268.aps.2024.2022138
引用本文: 赵晓, 李沐曦, 张耀武, 朱利平. 基于Rosenblatt变换的朴素贝叶斯改进[J]. 应用概率统计, 2024, 40(6): 975-987. DOI: 10.12460/j.issn.1001-4268.aps.2024.2022138
ZHAO Xiao, LI Muxi, ZHANG Yaowu, ZHU Liping, . Improving Naïve Bayes through Rosenblatt Transformations[J]. Chinese Journal of Applied Probability and Statistics, 2024, 40(6): 975-987.
Citation: ZHAO Xiao, LI Muxi, ZHANG Yaowu, ZHU Liping, . Improving Naïve Bayes through Rosenblatt Transformations[J]. Chinese Journal of Applied Probability and Statistics, 2024, 40(6): 975-987.

基于Rosenblatt变换的朴素贝叶斯改进

Improving Naïve Bayes through Rosenblatt Transformations

  • 摘要: 朴素贝叶斯分类器作为一种简单实用的分类算法被应用到了很多领域,但朴素贝叶斯的条件独立性假设给其带来预测效率提升的同时也牺牲了一些预测的精确性,因此,很多关于朴素贝叶斯的改进算法应运而生.本文利用Rosenblatt变换,得到相互独立的协变量,并将其与类别变量构建朴素贝叶斯分类模型,以较大程度提升朴素贝叶斯的分类效果.同时,为了提升稳定性,本文还通过PC算法挖掘协变量中存在的条件独立性关系,进一步简化Rosenblatt变换.我们把通过这样的方式构建的朴素贝叶斯分类模型称之为Rosenblatt-Naïve Bayes模型.这种模型既考虑了协变量之间存在的关联性,又使得最终的分类模型维持了朴素贝叶斯精简的结构,同时相对于朴素贝叶斯在预测的精度和模型的稳健性上都有较为显著的提升,本文也从大量的数值模拟以及实证分析对比中得到了验证.

     

    Abstract: The Naïve Bayes classifier has been a popular classifier in many fields. However, it improves the prediction effciency at the expense of prediction accuracy because of the conditional independence assumption. In this paper, we first adopt the Rosenblatt transformation to obtain some mutually independent covariates and then apply the Naïve Bayes classifier. The mutual independence among features significantly increase the performance of the corresponding classifier. Meanwhile, in order to stabilize and simplify the Rosenblatt transformation, we propose using the PC algorithm to identify some conditional independence structures. We call the resulting classification model the Rosenblatt-Naïve Bayes model. This model takes the dependence among features automatically, and can maintain a simple model structure at the same time. We show the superiority of the Rosenblatt-Naïve Bayes model in terms of both prediction accuracy and model robustness through extensive simulation studies and three real data analyses.

     

/

返回文章
返回