Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21]

@

Journal of Information Science and Engineering, Vol. 30 No. 3, pp. 727-747 (May 2014)


Strategic Choices in Optimization


CHENG-WEI CHOU1, PING-CHIANG CHOU, JEAN-JOSEPH CHRISTOPHE4,5, ADRIEN COUETOUX4,5, PIERRE DE FREMINVILLE5, NICOLAS GALICHET4, CHANG-SHING LEE2, JIALIN LIU4, DAVID L. SAINT-PIERRE3, MICHELE SEBAG4, OLIVIER TEYTAUD4, MEI-HUI WANG2, LI-WEN WU2 AND SHI-JIM YEN1
1Department of Computer Science and Information Engineering National Dong Hwa University Hualien, 974 Taiwan 2Department of Computer Science and Information Engineering National University of Tainan Tainan, 700 Taiwan 3Montefiore Institute Universite de Liege, Belgium 4Tao, Inria-CnrsS-Lri, Univ. Paris-Sud 5Artelys, 12 Bd 4 Septembre, Paris

Many decision problems have two levels: one for strategic decisions, and another for tactical management. This paper focuses on the strategic level, more specifically the sequential exploration of the possible options and the final selection (recommendation) of the best option. Several sequential exploration and recommendation criteria are considered and empirically compared on real world problems (board games, card games and energy management problems) in the uniform (1-player) and adversarial (2-player) settings. W.r.t. the sequential exploration part, the classical upper confidence bound algorithm, the exponential exploration-exploitation algorithm, the successive reject algorithm (designed specifically for simple regret), and the Bernstein races, are considered. W.r.t. the recommendation part, the selection is based on the empirically best arm, most played arm, lower confidence bounds, based on the reward distribution or variants thereof designed for risk control. This paper presents a systematic study, comparing the coupling of the sequential exploration and recommendation variants on the considered problems in terms of their simple regret. A secondary contribution is that, to the best of our knowledge, this is the first win ever of a computer-kill-all Go player against professional human players [16].

Keywords: multi-armed bandit problems, metagaming, strategic choices, investment optimization, upper confidence bounds, simple regret, games

Full Text () Retrieve PDF document (201405_12.pdf)

Received February 28, 2013; accepted June 15, 2013.
Communicated by Hung-Yu Kao, Tzung-Pei Hong, Takahira Yamaguchi, Yau-Hwang Kuo, and Vincent Shin- Mu Tseng.