您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

學術演講

:::

Optimal Algorithms for Multiplayer Multi-Armed Bandits

  • 講者王柏安 先生 (KTH Royal Institute of Technology)
    邀請人:呂及人
  • 時間2019-11-22 (Fri.) 14:00 ~ 17:00
  • 地點資訊所新館106演講廳
摘要

The talk addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB, and yet offers better performance guarantees. We then consider the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbors in the graph.

We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD-UCB. Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

 

BIO

Po-An Wang received his mathematic master's degree at National Tsing-Hua University. Since 2016, he worked in the Institute of Information Science, Academia Sinica as a research assistant. Currently, he is a Ph.D. student at KTH Royal Institute of Technology. His research interests involve reinforcement learning, tensor decomposition, and stochastic learning theory.