Institute of Information Science, Academia Sinica



Press Ctrl+P to print from browser

Rethinking Policy Improvement in Reinforcement Learning


Rethinking Policy Improvement in Reinforcement Learning

  • LecturerProf. Ping-Chun Hsieh (Department of Computer Science, National Yang Ming Chiao Tung University)
    Host: TIGP (SNHCC)
  • Time2022-03-21 (Mon.) 14:00 – 16:00
  • LocationAuditorium106 at IIS new Building

Policy improvement is one central component of any reinforcement learning (RL) algorithm, and the most widely-used approach is to leverage the policy gradient (PG) theorem to iteratively improve the learned policy.  Despite the success of PG, it could suffer from inefficient training in various settings. In this talk, I will go beyond PG and introduce two new policy improvement frameworks:
(i) First, I will introduce the action-constrained RL problem and discuss the critical “zero-gradient issue” resulting from PG. Then, I will present Frank-Wolfe policy optimization, which is a decoupling framework that completely resolves the challenging zero-gradient issue.
(ii) Next, I will present Hinge policy optimization (HPO), which rethinks policy updates as solving a large-margin classification problem with hinge loss. The HPO framework opens up a whole new family of RL algorithms, including PPO with a clipped surrogate objective (PPO-clip) as a special case. Moreover, we formally prove that HPO attains a globally optimal policy. To our knowledge, this is the first global convergence guarantee for the PPO-clip algorithm.
Finally, experimental results will also be presented to corroborate the effectiveness of the two frameworks.


Ping-Chun Hsieh is currently an assistant professor in the Department of Computer Science at National Yang Ming Chiao Tung University (NYCU). He received his B.S. and M.S. in Electrical Engineering from National Taiwan University in 2011 and 2013, respectively, and his Ph.D. degree in Electrical and Computer Engineering from Texas A&M University (TAMU) in 2018. His research interests include reinforcement learning, multi-armed bandits, and wireless networks. His research received the Best Paper Awards from ACM MobiHoc 2020 and ACM MobiHoc 2017. He is a recipient of Junior Faculty Award from NYCUin 2020, Young Scholar Fellowship from the Ministry of Science and Technology in 2019, the Outstanding PhD Student Award from the ECE Department at TAMU in 2016, and the Government Scholarship to Study Abroad from the Ministry of Education, Taiwan.