2024년도 1학기 강화 학습 (AIGS627-01) 강의계획서

1. 수업정보

학수번호 AIGS627 분반 01 학점 3.00
이수구분 전공선택 강좌유형 강의실 강좌 선수과목
포스테키안 핵심역량
강의시간 월, 수 / 14:00 ~ 15:15 / 제2공학관 강의실 [102호] 성적취득 구분 G

2. 강의교수 정보

옥정슬 이름 옥정슬 학과(전공) 인공지능대학원
이메일 주소 jungseul@postech.ac.kr Homepage https://sites.google.com/view/jungseulok
연구실 ML-LAB 전화 054-279-2242
Office Hours Appointment by email

3. 강의목표

This course aims at studying basic theory and practical algorithms of reinforcement learning (RL) so that being capable of understanding research papers on RL, applying RL techniques to problems in other fields, and, hopefully, formulating/solving research problems in RL.

4. 강의선수/수강필수사항

Mandatory: machine learning, calculus, probability & statistics
Recommend: optimization, programming

5. 성적평가

The grade is based on assignments (35%), midterm exam (20%), paper review (20%), proposal (20%), participation (5%), while there is policy of 5-strike out for attendance, i.e., 5 absences lead to F, no matter what (email me in advance of inevitable absence).

6. 강의교재

도서명 저자명 출판사 출판년도 ISBN
Reinforcement Learning: An Introduction R. S. Sutton and A. G. Barto MIT Press 2018

7. 참고문헌 및 자료

There will be no official textbook. However, most of contents are based on the following books:
- “Bandit algorithm” by T. Lattimore and C. Szepesvari
- “Reinforcement Learning: An Introduction”, by R. S. Sutton and A. G. Barto, MIT Press, 2018, (link to draft)

8. 강의진도계획

[Tentative Syllabus]
https://docs.google.com/spreadsheets/d/18p6JMZ76PCBQzcQnlnpOfe2_MuHv31ICtBGkBjMRynU/edit?usp=sharing

1. Introduction to RL
2. Multi-Armed Bandit
3. Regret Analysis in MAB
4. Sample Complexity in MAB
5. Markov Decision Process (MDP)
6. Dynamic Programming
7. Regret Minimization in MDP
8. Sampling Schemes
9. Temporal Difference learning
10. n-step Bootstrapping
11. Function Approximation
12. Deep Q-Network
13. Eligibility trace
14/15. Policy gradient method (Buddha's birthday / Childern's day - video lecture or scheduling make-up class)
16. Midterm Exam (9:30am~12:30pm )
17. Optimization techniques for RL
18. Scaling RL
19. Exploration via Intrinsic Motivation
20. Partially Observable RL
21. Bayesian and meta RL
22. Multitask and Hierarchical RL
23. Multi-agent RL
24. Adversarial Search: Alphago
25. Imitation learning and inverse RL
26. Application of RL
27-30. Final project presentation and guest lecture

9. 수업운영

Online.

10. 학습법 소개 및 기타사항

.

11. 장애학생에 대한 학습지원 사항

- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등

- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등

- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청