강화 학습 (2025-092, 50186472_50186527): 강의 계획서

1. 수업정보

학수번호	CSED627	분반	01	학점	3.00
이수구분	전공선택	강좌유형	강의실 강좌	선수과목
포스테키안 핵심역량	대인관계역량 글로벌시민역량 지식탐구역량 디지털리터러시역량 자기관리역량 창의융합역량
강의시간	화, 목 / 14:00 ~ 15:15 / 청암학술정보관 세미나실 [502호]			성적취득 구분	G

2. 강의교수 정보

이름	유환조	학과(전공)	컴퓨터공학과
이메일 주소	hwanjoyu@postech.ac.kr	Homepage	http://di.postech.ac.kr/hwanjoyu
연구실		전화	054-279-2388
Office Hours	TuTh 3:15-4:30pm or appointment by email

3. 강의목표

This course aims at studying basic theory and practical algorithms of reinforcement learning (RL).
By the end of the course, students are expected to
define and explain key features of RL,
know how to use RL for a given application problem,
implement common RL algorithms,
understand theoretical and empirical approaches for evaluating the quality of a RL algorithm, and
hopefully, formulate and solve research problems in RL.

4. 강의선수/수강필수사항

Mandatory: AI, machine learning, calculus, probability & statistics
Recommend: optimization, programming

5. 성적평가

The grade is based on
quizzes and class participation (30%),
assignments (30%),
paper presentation (10%), and
project (30%).
If you miss five classes, you will receive F, no matter what.

6. 강의교재

도서명	저자명	출판사	출판년도	ISBN

7. 참고문헌 및 자료

There is no official textbook.
Some references:
- “Reinforcement Learning: An Introduction”, by R. S. Sutton and A. G. Barto, MIT Press, 2020. (http://incompleteideas.net/book/RLbook2020.pdf)
- “Bandit algorithm” by T. Lattimore and C. Szepesvari

8. 강의진도계획

1. Introduction
2. MDP
3. Model-Free Evaluation & Control
4. Policy Gradient, PPO
5. Imitation Learning
6. RLHF and DPO
7. Offline RL
8. Multi-Armed Bandit
9. Bayesian Bandit
10. Data Efficient RL
11. Monte-Carlo Tree Search
12. Case studies: AlphaGo, DDPG, GRPO, etc
13. Research paper presentations

9. 수업운영

Offline.

10. 학습법 소개 및 기타사항

.

11. 장애학생에 대한 학습지원 사항

- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등

- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등

- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청

2025년도 2학기 강화 학습 (CSED627-01) 강의계획서