approximately optimal approximate reinforcement learning

Approximately Optimal Approximate Reinforcement Learning. A new reinforcement learning algorithm called Short Horizon Policy Improvement (SHPI) is developed that approximates policy-induced drift in user behavior across sessions.

Approximately Optimal Approximate Reinforcement Learning — Approximately Optimal Approximate Reinforcement Learning from images.deepai.org

Citation: S. Kakade and J. Langford, Approximately Optimal Approximate Reinforcement Learning. Proceedings of the Nineteenth International Conference on Machine Learning: ,.