| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] |
¡@
Zhen Luo, Qi-Xin Cao and Yan-Zheng Zhao
Research Institute of Robotics
Shanghai Jiaotong University
Shanghai 200240, P.R. China
The proposed self-adaptive predictive pursuing policy consists of an action decision-making procedure and a procedure of adjusting the estimation of evader¡¦s action preference. Since correct estimation of opponent¡¦s intention would do good to win adversarial games, it introduces the conception of action preference to model opponent¡¦s decision-making. Because evader often has different action preference in different situation, to model evader¡¦s decision-making, pursuer has to divide the situation space into many categories and provide a set of estimation of evader¡¦s action preference for each kind of situation. Pursuer adjusts the estimation of evader¡¦s action preference in certain situation by observing evader¡¦s action. Action decision-making procedure consists of situation sorting, possible future states computation, payoff evaluation and action selection. Action decision-making is based on the decision tree constructed by expected payoffs. Expected payoffs are integrated from single payoffs. Single payoffs are evaluated by gains of features reflecting adversarial situation. A simulation of middle size soccer robots has been carried out and illustrated that the proposed policy is effective.
Received November 24, 2006; revised March 1, 2007; accepted June 1, 2007.
Communicated by Takeshi Tokuyama.