基于Lipschitz条件改进的分布式CM稳健估计方法
Improved Robust CM Estimation Method for Distributed Data under Lipschitz Condition
-
摘要: 本文在中心化的分布式系统框架下, 针对同质数据集和异质数据集的“拜占庭失败”问题分别提出了改进的稳健估计方法. 首先使用Lipschitz条件对经典的按维度取中位数(CM)方法进行改进, 提出适用于同质数据集下的Lipschitz-CM方法, 并证明了其收敛速度形式; 数值实验和实际数据分析结果均表明, Lipschitz-CM方法相比于已有的几何中位数方法和按维度取中位数方法具有更好的稳健性和有效性. 针对异质数据集的情况, 本文使用分桶操作对Lipschitz-CM方法加以改进, 提出分桶Lipschitz-CM 方法, 并证明了分桶Lipschitz-CM方法与Lipschitz-CM 方法具有相同的收敛速度, 但并不会带来额外的计算复杂度; 数值实验和实际数据分析结果表明, 分桶Lipschitz-CM方法在异质数据集上具有更好的表现效果.Abstract: In this paper, we propose improved robust estimated methods for the "Byzantine failure" problem in homogeneous and heterogeneous datasets under the framework of centralized distributed system. Firstly, we use the Lipschitz condition to improve the classic coordinate-wise median method, and propose the Lipschitz-CM method, which is suitable for homogeneous datasets and prove its convergence rate form. The results of numerical experiments of simulated dataset and real dataset also show that the Lipschitz-CM method has better robustness and effectiveness compared with the existing geometric median method and coordinate-wise median method. For the case of heterogeneous datasets, we improve the Lipschitz-CM method by bucketing and propose the Bucketing Lipschitz-CM method, we prove that the convergence rate of the Bucketing Lipschitz-CM method has the same form as the Lipschitz-CM method, which means that bucketing does not bring additional computational complexity. Through the numerical experiments and real dataset, it is verified that the Bucketing Lipschitz-CM method has a better performance on the heterogeneous datasets.