2022년도 2학기 특강III:수리데이터사이언스 (MATH439-01) 강의계획서

1. 수업정보

학수번호 MATH439 분반 01 학점 3.00
이수구분 전공선택 강좌유형 강의실 강좌 선수과목
포스테키안 핵심역량
강의시간 월, 수 / 11:00 ~ 12:15 / 수리과학관 [100호] 성적취득 구분 G

2. 강의교수 정보

정재훈 이름 정재훈 학과(전공) 수학과
이메일 주소 jung153@postech.ac.kr Homepage
연구실 전화 054-279-2302
Office Hours Mondays 4:00PM - 5:00PM

3. 강의목표

Data science is one of the most growing scientific disciplines today. This course will introduce various mathematics related to modern data science. The subjects that will be covered for the semester include Introduction to data science and big data, Preprocessing of data, Interpolating data, Classification of data, Generating scheme of data, Text, sound, and image data, Stochastic and chaotic characteristics of data, data-driven computing, Topological and geometrical understanding of data, Manifold representation of data, Artificial intelligence and data, Data analysis with Python and visualization, etc. Students will also implement the learned materials into Python programming.

4. 강의선수/수강필수사항

It is recommended, but not required, to take linear algebra, differential equations.

5. 성적평가

Class attendance: 10%
Homework: 30%
Midterm exam: 30%
Final research project: 30%

6. 강의교재

도서명 저자명 출판사 출판년도 ISBN

7. 참고문헌 및 자료

Introduction to Mathematical Data Science - by example (Instructor's lecture note), 2021.

8. 강의진도계획

1. Basic Python programming
A. Basic grammar
B. SC library
C. Data analysis with Pandas
2. Introduction to data science
A. What is data science?
B. History of data science
C. What do we do with data science
3. Epidemics - modeling from data
A. Covid-19 data
B. Modeling with differential equations
C. Particle simulation (dynamical system)
D. Cellular automata (discrete system)
4. Random versus chaotic data
A. Paramecium Aurelia data
B. Population data
C. Logistic map
D. Chaotic behavior of data - Butterfly effect
E. Random versus chaotic data
5. Population prediction with data
A. 한국통계청 홈페이지
B. Data interpolation/extrapolation - interpolation theory
C. Regularization (overfitting, L1 regularization)
D. Least squares approximation (regression)
6. Prediction with regression
A. Prediction with data
B. Linear regression & polynomial regression
C. Overfitting & regularization
D. Bias & variance
7. Classification with data
A. Data Source: Titanic data (from Kaggle)
B. Regression (Logistic regression, optimization)
C. Linear regression, Logistic regression, SVM
D. Deep learning - multiple layers
8. Image classification via convolutional neural network
A. Data source: cat vs. dog (from Kaggle)
B. Intro - Edge detection (image data)Jump function, differential map
C. Introduction of CNN
9. Text data
A. Python nltk
B. Text analysis (word frequency, Zipf law)
C. Corpus analysis
10. Unsupervised learning
A. Supervised versus unsupervised learning
B. Clustering
C. Dimensional reduction (PCA, TSNE)
11. Sampling of data
A. Sampling from simulation data
B. Monte Carlo versus polynomial chaos
12. Topological data analysis
A. Data analysis using homology and mapper (MNIST hand-writing data)
B. persistent homology
C. Mapper

9. 수업운영

The class is composed of lecture and tutorial sessions. Each class will provide both the theoretical and practical (computational) aspects of data science with hands-on tutorials. Weekly homework will be assigned. Students are required to attend the class regularly and submit their homework on time.

10. 학습법 소개 및 기타사항

11. 장애학생에 대한 학습지원 사항

- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등

- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등

- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청