伟德BETVlCTOR1946学术系列活动
统计学系列 Seminar第82期
主 题:Communication-Efficient Distributed Linear Discriminant Analysis for Binary Classification
主讲人:赵俊龙教授(北京师范大学)
主持人:姜云卢
会议时间:2021年 4月 2日(周五)上午10:00-11:00
会议工具:腾讯会议(ID: 442 734 602;密码:123456)
摘要
Large-scale data are common when the sample size n is large, and these data are often stored on k different local machines. Distributed statistical learning is an efficient way to deal with such data. In this study, we consider the binary classification problem for massive data based on a linear discriminant analysis (LDA) in a distributed learning framework. The classical centralized LDA requires the transmission of some p-by-p summary matrices to the hub, where p is the dimension of the variates under consideration. This can be a burden when p is large or the communication costs between the nodes are expensive. We consider two distributed LDA estimators, two-round and one-shot estimators, which are communication-efficient without transmitting p-by-p matrices. We study the asymptotic relative efficiency of distributed LDA estimators compared to a centralized LDA using random matrix theory under different settings of k. It is shown that when k is in a suitable range, such as k = o(n/p), these two distributed estimators achieve the same efficiency as that of the centralized estimator under mild conditions. Moreover, the two-round estimator can relax the restriction on k, allowing kp/n ->c 2 [0, 1) under some conditions. Simulations confirm the theoretical results.
★主讲人简介★
赵俊龙,现任北京师范大学统计学院教授、博士生导师、应用统计系主任。主要研究方向包括高维数据分析、稳健统计,统计机器学习。先后访问新加坡国立大学、香港浸会大学和北卡罗来纳大学教堂山分校。在 Journal of the Royal Statistical Society: Series B (JRSSB), The Annals of Statistics (AOS), Journal of American Statistical Association (JASA), Biometrika 等统计学各类期刊发表SCI论文40余篇。主持国家自然科学基金3项。