POLICY ADVICE, NON-CONVEX AND DISTRIBUTED OPTIMIZATION IN REINFORCEMENT LEARNING
MetadataShow full item record
Transfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally denes a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher's advice. Regret bounds are provided and negative transfer is formally dened and studied. On the other hand, policy search is a class of reinforcement learning algorithms for nding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies . These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the rst safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the nonconvex stochastic setting. Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.