Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19]

@

Journal of Information Science and Engineering, Vol. 30 No. 6, pp. 1807-1823 (November 2014)


A Fast Learning Agent Based on the Dyna Architecture*


YUAN-PAO HSU1 AND WEI-CHENG JIANG2
11Department of Computer Science and Information Engineering
National Formosa University
Yunlin, 632 Taiwan
2Department of Electric Engineering
National Chung Cheng University
Minhsiung, 621 Taiwan

In this paper, we present a rapid learning algorithm called Dyna-QPC. The proposed algorithm requires considerably less training time than Q-learning and Table-based Dyna-Q algorithm, making it applicable to real-world control tasks. The Dyna-QPC algorithm is a combination of existing learning techniques: CMAC, Q-learning, and prioritized sweeping. In a practical experiment, the Dyna-QPC algorithm is implemented with the goal of minimizing the learning time required for a robot to navigate a discrete statespace containing obstacles. The robot learning agent uses Q-learning for policy learning and a CMAC-Model as an approximator of the system environment. The prioritized sweeping technique is used to manage a queue of previously influential state-action pairs used in a planning function. The planning function is implemented as a background task updating the learning policy based on previous experience stored by the approximation model. As background tasks run during CPU idle time, there is no additional loading on the system processor. The Dyna-QPC agent switches seamlessly between real and virtual modes with the objective of achieving rapid policy learning. A simulated and an experimental scenario have been designed and implemented. The simulated scenario is used to test the speed and efficiency of the three learning algorithms, while the experimental scenario evaluates the new Dyna-QPC agent. Results from both simulated and experimental scenarios demonstrate the superior performance of the proposed learning agent.

Keywords: reinforcement learning, Q-learning, CMAC, prioritized sweeping, dyna agent

Full Text () Retrieve PDF document (201411_08.pdf)

Received January 9, 2013; revised March 14 & April 28, 2013; accepted May 22, 2013.
Communicated by Zhi-Hua Zhou.
* The partial content of the article has been presented at the 2008 Conference on Information Technology and Applications in Outlying Islands in the proceedings and 2008 SICE Annual Conference in the proceedings.