Show simple item record

dc.contributor.advisorTaylor, Matthew E.
dc.creatorZhan, Yusen
dc.date.accessioned2017-06-19T16:21:56Z
dc.date.available2017-06-19T16:21:56Z
dc.date.issued2016
dc.identifier.urihttp://hdl.handle.net/2376/12014
dc.descriptionThesis (Ph.D.), Computer Science, Washington State Universityen_US
dc.description.abstractTransfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally denes a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher's advice. Regret bounds are provided and negative transfer is formally dened and studied. On the other hand, policy search is a class of reinforcement learning algorithms for nding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies [8]. These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the rst safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the nonconvex stochastic setting. Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.en_US
dc.description.sponsorshipWashington State University, Computer Scienceen_US
dc.language.isoEnglish
dc.rightsIn copyright
dc.rightsPublicly accessible
dc.rightsopenAccess
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0
dc.rights.urihttp://www.ndltd.org/standards/metadata
dc.rights.urihttp://purl.org/eprint/accessRights/OpenAccess
dc.subjectComputer scienceen_US
dc.subjectArtificial intelligenceen_US
dc.subjectMaching Leanringen_US
dc.subjectNon-convex Optimizationen_US
dc.subjectReinforcement Learningen_US
dc.subjectTransfer Learningen_US
dc.titlePOLICY ADVICE, NON-CONVEX AND DISTRIBUTED OPTIMIZATION IN REINFORCEMENT LEARNING
dc.typeText
dc.typeElectronic Thesis or Dissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record