Abstract: Reinforcement learning (RL) is a type of machine learning technique for solving sequential decision problems which has achieved great success in many areas. I will start from a brief overview of RL and related methods and then focus on the convergence of two policy optimization methods for RL. More precisely, the finite iteration convergence of the basic projected policy gradient method and the linear convergence of the Hadamard policy gradient will be reported.
Bio: WEI Ke is now a professor at School of Data Science, Fudan University. He obtained his PhD in Mathematical Institute, University of Oxford. Before joining Fudan, He has been postdoctoral research scholars in HKUST and UC Davis for three years. His current research interests include high dimensional structured signal processing, reinforcement learning, and numerical optimization.

