Seminar

Optimal Algorithms for Multiplayer Multi-Armed Bandits

LecturerMr. Po-An Wang (KTH Royal Institute of Technology)
Host: Chi-Jen Lu
Time2019-11-22 (Fri.) 14:00 ~ 17:00
LocationAuditorium106 at IIS new Building

Abstract

The talk addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB, and yet offers better performance guarantees. We then consider the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbors in the graph.

We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD-UCB. Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

BIO

Po-An Wang received his mathematic master's degree at National Tsing-Hua University. Since 2016, he worked in the Institute of Information Science, Academia Sinica as a research assistant. Currently, he is a Ph.D. student at KTH Royal Institute of Technology. His research interests involve reinforcement learning, tensor decomposition, and stochastic learning theory.

Events

Seminar

Abstract

BIO