基于Copula熵的变量选择

马健

doi:10.3969/j.issn.1001-4268.2021.04.006

基于Copula熵的变量选择

马健

Variable Selection with Copula Entropy

MA Jian

摘要

摘要: 在要求可解释性的机器学习和统计应用中,变量选择对分类和回归任务十分重要. 本文提出了一种基于Copula熵的变量选择方法,利用Copula熵值的阶次选择变量. 本方法既是模型无关的又是参数无关的.在UCI心脏病数据的基础上进行了本方法与传统变量选择方法(包括距离相关、希尔伯特--施密特独立性准则、逐步选择、正则化广义线性模型和自适应LASSO)的对比实验. 实验结果表明, 基于Copula熵的方法能够更有效地选择`正确'的变量,在不牺牲准确性性能的同时得到比传统方法更具可解释性的模型.

Abstract: Variable selection is of significant importance for classification and regression tasks in machine learning and statistical applications where both predictability and explainability are needed. In this paper, a Copula Entropy (CE) based method for variable selection which use CE based ranks to select variables is proposed. The method is both model-free and tuning-free. Comparison experiments between the proposed method and traditional variable selection methods, such as distance correlation, Hilbert-Schmidt independence criterion, stepwise selection, regularized generalized linear models and adaptive LASSO, were conducted on the UCI heart disease data. Experimental results show that CE based method can select the `right' variables out more effectively and derive better interpretable results than traditional methods do without sacrificing accuracy performance. It is believed that CE based variable selection can help to build more explainable models.

HTML全文

参考文献(0)

施引文献

资源附件(0)