Towards Tractable Optimism in Model-Based Reinforcement Learning

Pacchiano A.; Ball P.; Parker-Holder J.; Choromanski K.; Roberts S.

Towards Tractable Optimism in Model-Based Reinforcement Learning

Pacchiano A., Ball P., Parker-Holder J., Choromanski K., Roberts S.

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the required optimism through approaches which are intractable when scaling to deep RL. We re-interpret these scalable optimistic model-based algorithms as solving a tractable noise augmented MDP. This formulation achieves a competitive regret bound: Õ(∣S∣H√∣A∣T) when augmenting using Gaussian noise, where T is the total number of environment steps. We also explore how this trade-off changes in the deep RL setting, where we show empirically that estimation error is significantly more troublesome. However, we also show that if this error is reduced, optimistic model-based RL algorithms can match state-of-the-art performance in continuous control problems.

Type

Conference paper

Publication Date

01/01/2021

Volume

161

Pages

1413 - 1423

Cookies on this website

Towards Tractable Optimism in Model-Based Reinforcement Learning

Pacchiano A., Ball P., Parker-Holder J., Choromanski K., Roberts S.

Type

Publication Date

Volume

Pages