Lasso变量选择的分布式算法

曾维佳; 张日权

doi:10.3969/j.issn.1001-4268.2022.01.007

Lasso变量选择的分布式算法

A Distributed Algorithm for Lasso Variable Selection

摘要

摘要: Lasso是机器学习中比较常用的一种变量选择方法,适用于具有稀疏性的回归问题. 当样本量巨大或者海量的数据存储在不同的机器上时,分布式计算是减少计算时间提高效率的重要方式之一.本文在给出Lasso模型等价优化模型的基础上,将ADMM算法应用到此优化变量可分离的模型中,构造了一种适用于Lasso变量选择的分布式算法,证明了该算法的收敛性; 同时, 我们通过数值实验,将本文构造的分布式算法与循环坐标下降法和ADMM算法进行了比较分析,结果显示在处理样本集大的稀疏性回归问题时,本文提出的算法的计算时间和误差都小于其他两种算法.

Abstract: Lasso is a variable selection method commonly used in machine learning, which is suitable for regression problems with sparsity. Distributed computing is an important way to reduce computing time and improve efficiency when large sample sizes or massive amounts of data are stored on different agents. Based on the equivalent optimization model of Lasso model and the idea of alternating stepwise iteration, this paper constructs a distributed algorithm suitable for Lasso variable selection. And the convergence of the algorithm is also proved. Finally, the distributed algorithm constructed in this paper is compared with cyclic-coordinate descent and ADMM algorithm through numerical experiments. For the sparse regression problem with large sample set, the algorithm proposed in this paper has better advantages in computing time and precision.

HTML全文

参考文献(0)

施引文献

资源附件(0)