Courses:

Decision Making in Large Scale Systems >> Content Detail



Study Materials



Readings

Amazon logo Help support MIT OpenCourseWare by shopping at Amazon.com! MIT OpenCourseWare offers direct links to Amazon.com to purchase the books cited in this course. Click on the Amazon logo to the left of any citation and purchase the book from Amazon.com, and MIT OpenCourseWare will receive up to 10% of all purchases you make. Your support will enable MIT to continue offering open access to MIT courses.

Amazon logo Bertsekas = Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.

Amazon logo Bertsekas and Tsitsiklis = Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.


LEC #TOPICSREADINGS
1Markov Decision Processes

Finite-Horizon Problems: Backwards Induction

Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation
Bertsekas Vol. 1, Chapter 1.
2Value Iteration

Existence and Uniqueness of Bellman's Equation Solution

Gauss-Seidel Value Iteration
Bertsekas Vol. 2, Chapter 1.

Bertsekas and Tsitsiklis, Chapter 2.
3Optimality of Policies derived from the Cost-to-Go Function

Policy Iteration

Asynchronous Policy Iteration
Bertsekas Vol. 2, Chapter 1.

Bertsekas and Tsitsiklis, Chapter 2.
4Average-Cost Problems

Relationship with Discounted-Cost Problems

Bellman's Equation

Blackwell Optimality
Bertsekas Vol. 2, Chapter 4.
5Average-Cost Problems

Computational Methods
Bertsekas Vol. 2, Chapter 4.
6Application of Value Iteration to Optimization of Multiclass Queueing Networks

Introduction to Simulation-based Methods Real-Time Value Iteration
Chen, R. R., and S. P. Meyn. "Value Iteration and Optimization of Multiclass Queueing Networks."Queueing Systems 32 (1999): 65-97.

Bertsekas and Tsitsiklis,
Chapter 5.
7Q-Learning

Stochastic Approximations
Bertsekas and Tsitsiklis, Chapters 4 and 5.
8Stochastic Approximations: Lyapunov Function Analysis

The ODE Method

Convergence of Q-Learning
Bertsekas and Tsitsiklis, Chapters 4 and 5.
9Exploration versus Exploitation: The Complexity of Reinforcement LearningKearns, M. , and S. Singh. "Near-Optional Reinforcement Learning in Polynomial Time." Machine Learning 49, no. 2 (Nov 2002): 209-232.
10Introduction to Value Function Approximation

Curse of Dimensionality

Approximation Architectures
Bertsekas and Tsitsiklis, Chapter 6.
11Model Selection and ComplexityAmazon logo Hastie, Tibshirani, and Friedmann. Chapter 7 in The Elements of Statistical Learning. New York: Springer, 2003. ISBN: 9780387952840.
12Introduction to Value Function Approximation Algorithms

Performance Bounds
Bertsekas and Tsitsiklis, Chapter 6.
13Temporal-Difference Learning with Value Function ApproximationBertsekas and Tsitsiklis, Chapter 6.
14Temporal-Difference Learning with Value Function Approximation (cont.)Bertsekas and Tsitsiklis, Chapter 6.

de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning."
15Temporal-Difference Learning with Value Function Approximation (cont.)

Optimal Stopping Problems

General Control Problems
Bertsekas and Tsitsiklis, Chapter 6.

de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning."

Bertsekas, Borkar, and Nedic. "Improved temporal Difference Methods with Linear Function Approximation."
16Approximate Linear Programmingde Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming."
17Approximate Linear Programming (cont.)de Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming."
18Efficient Solutions for Approximate Linear Programmingde Farias D. P., and B. Van Roy. "On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming."

Calafiori, and Campi. "Uncertain Convex Programs: Randomized Solutions and Confidence Levels."
19Efficient Solutions for Approximate Linear Programming: Factored MDPsGuestrin, et al. "Efficient Solution Algorithms for Factored MDPs."

Schuurmans, and Patrascu. "Direct Value Approximation for Factored MDPs."
20Policy Search MethodsMarbach, and Tsitsiklis. "Simulation-Based Optimization of Markov Reward Processes." (PDF)
21Policy Search Methods (cont.)Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation."
22Policy Search Methods for POMDPs

Application: Call Admission Control

Actor-Critic Methods
Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation."

Baxter, and Bartlett. "Experiments with Infinite-Horizon Policy-Gradient Estimation."


Konda, and Tsitsiklis. "Actor-Critic Algorithms." (PDF)
23Guest Lecture: Prof. Nick Roy

Approximate POMDP Compression
Roy, and Gordon. "Exponential Family PCA for Belief Compression in POMDPs."
24Policy Search Methods: PEGASUS

Application: Helicopter Control
Ng, and Jordan. "PEGASUS: A policy search method for large MDPs and POMDPs."

Ng, et al. "Autonomous Helicopter Flight via Reinforcement Learning."

Complementary Reading

Even-Dar, and Mansour. "Learning Rates for Q-Learning.'' Journal of Machine Learning Research 5 (2003): 1-25.

Barron. "Universal approximation bounds for superpositions of a sigmoidal function." IEEE Transactions on Information Theory 39 (1993): 930-944.

Tesauro. "Temporal-Difference Learning and TD-Gammon'' Communications of the ACM 38, no. 3 (1995).


 








© 2017 Coursepedia.com, by Higher Ed Media LLC. All Rights Reserved.