2025년도 2학기 특론: 인공지능 시스템 (CSED703O-01) 강의계획서

1. 수업정보

학수번호 CSED703O 분반 01 학점 3.00
이수구분 전공선택 강좌유형 강의실 강좌 선수과목
포스테키안 핵심역량
강의시간 월, 수 / 14:00 ~ 15:15 / 제2공학관 강의실 [107호] 성적취득 구분 G

2. 강의교수 정보

전명재 이름 전명재 학과(전공) 인공지능대학원
이메일 주소 jmj119@postech.ac.kr Homepage https://sites.google.com/site/myeongjae/
연구실 전화 054-279-2264
Office Hours by appointment (Mon, Wed)

3. 강의목표

This class explores key concepts in system support for machine learning, deep learning, and large language model (LLM) workloads. The objectives of this class are threefold:
(1) to understand the key properties of these workloads.
(2) to study the fundamental mechanisms and policies implemented in contemporary training and inference frameworks.
(3) to examine how these frameworks have evolved to push the boundaries of performance, scalability, and programmability.

4. 강의선수/수강필수사항

Required Prerequisite: Operating Systems, Computer Architecture

5. 성적평가

The grading breakdown is as follows: Paper reviews (10%), Presentation (20%), Midterm (20%), Final (20%), Term project (30%).
Please note that these weights are subject to change.

6. 강의교재

도서명 저자명 출판사 출판년도 ISBN

7. 참고문헌 및 자료

8. 강의진도계획

[Week 1] Introduction
- 9/1: Course Introduction
- 9/3: Basics for Scheduling and Memory Management

[Week 2] Conference Travel (SIGCOMM)
- 9/8: No class
- 9/10: No class

[Week 3] Data Preprocessing
- 9/15: Basics for Data Preprocessing, MinIO (VLDB'21), Revamper (ATC'21)
- 9/17: FastFlow (VLDB'23), FusionFlow (VLDB'24)

[Week 4] Single-GPU Training
- 9/22: Basics for Single-GPU Training
- 9/24: Zico (ATC'21), Nimble (NeurIPS’20)

[Week 5] Multi-GPU & Multi-node Training
- 9/29: Basics for Distributed Training, ZeRO (SC’20), Parallax (EuroSys’19), GPipe (NeurIPS'19)
- 10/1: PipeDream (SOSP'19), Megatron-LM (SC'21)

[Week 6] National Holiday (Chuseok)
- 10/6: No class
- 10/8: No class

[Week 7] Multi-GPU & Multi-node Training
- 10/13: ByteScheduler (SOSP'19), BytePS (OSDI’20)
- 10/15: Alpa (OSDI’22), Metis (ATC’24)

[Week 8] Midterm Exam

[Week 9] Failure Recovery & Reliability
- 10/27: GEMINI (SOSP’23), Universal Checkpointing (ATC’25)
- 10/29: DeepXplore (SOSP’17), TRAINCHECK (OSDI'25)

[Week 10] Memory Oversubscription
- 11/3: Basics for Memory Oversubscription, vDNN (MICRO’16), Checkmate (MLSys’20), Capuchin (ASPLOS'20)
- 11/5: Zero-Offload (ATC'21), HUVM (ATC'22)

[Week 11] Scheduler & Cluster Manager
- 11/10: Basics for Scheduler & Cluster Manager, Philly (ATC’19), MLaaS in the Wild (NSDI'22)
- 11/12: Gavel (OSDI’20), Pollux (OSDI'21)

[Week 12] (LLM) Serving Systems
- 11/17: AlpaServe (OSDI'23), DeepPlan (EuroSys'23)
- 11/19: Basics for LLM Serving, FlashAttention (NeurIPS’22)

[Week 13] (LLM) Serving Systems
- 11/24: Orca (OSDI'22), PagedAttention (SOSP'23)
- 11/26: DistServe (OSDI'24), Sarathi-Serve (OSDI'24)

[Week 14] (LLM) Serving Systems
- 12/1: InfiniGen (OSDI'24), FlexGen (ICML'23) or ORBITFLOW (VLDB’26)
- 12/3: No class (University Foundation Day)

[Week 15] Final Exam

[Week 16] Project Presentation

9. 수업운영

This course will be based on reading papers and engaging in research-oriented discussions. Each student is expected to: (1) give a 30-minute presentation on one of the papers (selected through a paper bidding process) during the semester, (2) solve system design questions in the midterm and final exams, and (3) submit reviews for a small subset of the papers covered during the semester. Students will also be required to form groups and work with me to identify a small research topic, implement and evaluate their idea, and write a short research paper.

10. 학습법 소개 및 기타사항

11. 장애학생에 대한 학습지원 사항

- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등

- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등

- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청