Multifidelity Reinforcement Learning With Gaussian Processes GRID-©ISTOCKPHOTO.COM/ OLEKSANDR KHOMA, ROBOT IMAGES-©ISTOCKPHOTO.COM/ VECTORTATU Model-Based and Model-Free Algorithms By Varun Suryan, Nahush Gondhalekar, and Pratap Tokekar W e study the problem of reinforcement learning (RL) using as few real-world samples as possible. A naive application of RL can be inefficient in large and continuousstate spaces. We present two versions of multifidelity RL (MFRL), model based and model free, that leverage Gaussian processes (GPs) to learn the optimal policy in a real-world environment. In the MFRL framework, an agent uses multiple simulators of the real environment to Digital Object Identifier 10.1109/MRA.2020.2977971 Date of current version: 16 April 2020 1070-9932/20©2020IEEE perform actions. With increasing fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By incorporating GPs in the MFRL framework, we empirically observe an up to 40% reduction in the number of samples for model-based RL and 60% reduction for the model-free version. We examine the performance of our algorithms through simulations and realworld experiments for navigation with a ground robot. Recent Developments Recently, there has been a significant development in RL for robotics. A major limitation of using RL for planning with robots is the need to obtain a large number of JUNE 2020 * IEEE ROBOTICS & AUTOMATION MAGAZINE * 117