學術演講

Optimal Algorithms for Multiplayer Multi-Armed Bandits

講者王柏安先生 (KTH Royal Institute of Technology)
邀請人：呂及人
時間2019-11-22 (Fri.) 14:00 ~ 17:00
地點資訊所新館106演講廳

摘要

The talk addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB, and yet offers better performance guarantees. We then consider the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbors in the graph.

We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD-UCB. Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

BIO

Po-An Wang received his mathematic master's degree at National Tsing-Hua University. Since 2016, he worked in the Institute of Information Science, Academia Sinica as a research assistant. Currently, he is a Ph.D. student at KTH Royal Institute of Technology. His research interests involve reinforcement learning, tensor decomposition, and stochastic learning theory.

中央研究院資訊科學研究所

活動訊息

學術演講

Optimal Algorithms for Multiplayer Multi-Armed Bandits

摘要

BIO