Speaker
Sajad Khodadadian
Abstract
Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator given noisy samples of this operator. SA appears in many areas such as optimization and Reinforcement Learning (RL). When implemented in practice, the noise that appears in the update of RL algorithms is naturally Markovian. Furthermore, in some settings, such as gradient temporal difference (GTD), SA is employed in a two-time-scale manner. The mix of Markovian noise along with the two-time-scale structure results in an algorithm which is complex to analyze theoretically. In this talk, we characterize a tight convergence bound for the iterations of linear two-time-scale SA with Markovian noise. Our results show the convergence behavior of this algorithm given various choices of step sizes.
Applying our result to the well-known temporal difference with gradient correction (TDC) algorithm, we show the first O(1/ϵ) sample complexity for the convergence of this algorithm, outperforming all the previous work. Similarly, our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak-Ruppert averaging, GTD, and GTD2.
Bio
Sajad Khodadadian is an Assistant Professor in the Grado Department of Industrial and Systems Engineering in Virginia Tech. Dr. Khodadadian’s research is on design and analysis of Reinforcement Learning algorithms. He received his PhD in Operations Research from School of Industrial and Systems Engineering in Georgia Institute of Technology.