3. 강의목표
This class will explore key concepts in system support for machine learning, deep learning, and large language model workloads. The primary objectives of this class are:
- Understanding the key properties of these workloads.
- Learning about the state-of-the-art system mechanisms and policies implemented in contemporary training and inference frameworks.
- Examining how past research on traditional big data processing technologies has evolved to improve the performance, scalability, and programmability of training and inference workloads.
4. 강의선수/수강필수사항
Required Prerequisite: Operating Systems, Computer Architecture
5. 성적평가
Attendance (10%), Class Participation (15%), Presentation (25%), Midterm (25%), Final (25%)
Be aware that these weights are subject to changes.
8. 강의진도계획
[Week 1] Introduction
- 9/2: Course Introduction
- 9/4: No class
[Week 2] Systems Basics
- 9/9: Scheduling
- 9/11: Memory Systems
[Week 3] Chuseok (Korean Thanksgiving Day)
- 9/16: No class
- 9/18: No class
[Week 4] Data Preprocessing
- 9/23: Basics, MinIO (VLDB'21), Revamper (ATC'21)
- 9/25: FastFlow (VLDB'23), FusionFlow (VLDB'24)
[Week 5] Single-GPU Training
- 9/30: Basics
- 10/2: Zico (ATC'21), Zero-Offload (ATC'21)
[Week 6] Multi-GPU & Multi-node Training
- 10/7: Basics
- 10/9: No class (Hangul Day)
[Week 7] Multi-GPU & Multi-node Training
- 10/14: GPipe (NeurIPS'19), PipeDream (SOSP'19)
- 10/16: Megatron-LM (SC'21), ByteScheduler (SOSP'19)
[Week 8] Midterm Exam
[Week 9] Automazation / Energy Efficiency
- 10/28: Alpa (OSDI’22), GEMINI (SOSP’23)
- 10/30: Zeus (NSDI'23), EnvPipe (ATC'23)
[Week 10] Memory Oversubscription
- 11/4: Basics
- 11/6: Capuchin (ASPLOS'20), HUVM (ATC'22)
[Week 11] Scheduler & Cluster Manager
- 11/11: Basics
- 11/13: Gandiva (OSDI'18), Pollux (OSDI'21)
[Week 12] Scheduler & Cluster Manager / (LLM) Serving Systems
- 11/18: MLaaS in the Wild (NSDI'22), Oobleck (SOSP'23)
- 11/20: MLPerf (ISCA'20), AlpaServe (OSDI'23)
[Week 13] (LLM) Serving Systems
- 11/25: DeepPlan (EuroSys'23), LLM Serving Basics
- 11/27: Orca (OSDI'22), PagedAttention (SOSP'23)
[Week 14] (LLM) Serving Systems
- 12/2: FlexGen (ICML'23), Sarathi-Serve (OSDI'24)
- 12/4: DistServe (OSDI'24), InfiniGen (OSDI'24)
[Week 15] Edge AI
- 12/9: Basics, CarM (DAC'22), Miro (MobiCom'23)
- 12/11: Sage (MobiSys'22), Ekya (NSDI'22)
[Week 16] Final Exam
9. 수업운영
This course will be based on paper reading and research-oriented discussion. We will roughly spend 55 minutes on paper presentations (two or three papers including a 5-min interim break) and 20 minutes on follow-up discussions in every session. Each student is expected to solve system design questions for each exam.
11. 장애학생에 대한 학습지원 사항
- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등
- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등
- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청