2023년도 1학기 데이터 과학을 위한 프로그래밍 (IMEN574-01) 강의계획서

1. 수업정보

학수번호 IMEN574 분반 01 학점 3.00
이수구분 전공선택 강좌유형 강의실 강좌 선수과목
포스테키안 핵심역량
강의시간 화, 목 / 14:00 ~ 15:15 / 제4공학관 세미나실 [302/304호] 성적취득 구분 G

2. 강의교수 정보

이혜선 이름 이혜선 학과(전공) 산업경영공학과
이메일 주소 hyelee@postech.ac.kr Homepage http://www.postech.ac.kr/~hyelee
연구실 054-279-8222 전화 279-8222
Office Hours

3. 강의목표

[본 교과목은 "산업인공지능(산업AI)프로그램 기초공통분야" 과목임]
Data science requires integration of statistical knowledge, computing skill, data-related technique, and communication skills. This course is to develop the appropriate skills to handle complex data science projects and “think with data” for better decisions. Data analysts must be able to have a right question, obtain data to address it, model and explore performing reproducible results. Programming technique for R and Python is developed throughout this course.

What we learn in this course
● data extraction techniques (data wrangling, SQL query, data features)
● modern data visualization approaches (multiple graphics, spatial data)
● boosting, regularization and clustering
● refined computational approaches (automation machine learning)

4. 강의선수/수강필수사항

Prerequisite : Statistics course (IMEN272, Math231, Math230) or equivalent
Recommend (but not required) : Basic level of programming with R (or Python), Data mining related courses (IMEN 382, IMEN 472, IMEN 473).

5. 성적평가

Attendance & short test* 15% (short test given at the end of the semester)
Homework (quiz) 30%
midterm project 25%
final project 30%
Total 100

6. 강의교재

도서명 저자명 출판사 출판년도 ISBN
Modern Data Science with R Benjamin D. Baumer, Daniel T. Kaplan, and Nicholas J. Horton 2017
Machine Learning Engineering Andriy Burkov 2020
Python Machine Learning PACKT Sebastian Raschkav 2015

7. 참고문헌 및 자료

[Reference]
● Mastering machine learning with R, 3rd Cory Lesmeister, 2019, Packt Publishing.
● Rohan Chopra et al,, Mohamed Noordeen Alaudeen, Master Data Science with Python, 2019
● Ani Adhikari, John DeNero, Computational and Inferential Thinking: The Foundations of Data Science (free web)

8. 강의진도계획

1. What is Data Science / Data Science, Open source, Github
2. Computing tools for R / R&RStudio basics, Intro in R, Intro about visualization
3. Data Wrangling & Data preparation /Essencial techniques for data management (dylpr in R)
4. Statistical modeling / sampling, bootstrap, outlier model with confounding factor
5. Data handling with PYthon / Python basics, Data loading, handling(pandas)
6. Database & SQL / SQL concept and practice
7. Data visualization / matplotlib in Python. Building google map
8. Data preprocessing / Data preprocessing with scikit learn
9. Supervised learning : Classifiers / Tuning parameters, Ensemble methods
10. Evaluating methods / Cross-validation, Confusion matrix, ROC curve
11. Unsupervised learning : Clustering / Parallelizing hierachical clustering
12. Data features / Feature extraction method, Linear/nonlinear/deep learning
13. Unbalancing data/Synthetic Minority Over-sampling
14. Regularization, Automation machine learning / LASSO and ElasticNet, Auto ML

* week5-week8 : lecture by Prof. Song MinSeok
* Project presentation will be scheduled in class.

9. 수업운영

1. 대면수업 : 다양한 소스의 공유데이터를 다루고 데이터사이언티스로서의 분석력을 향상시킬수 있도록 강의진행합니다.
2. 챗GPT 활용 : 데이터분석 프로그래밍에 활용(오류 해결 및 분석기법 예제 코드 등)하고, 학습과 지식탐색을 위한 도구로 사용할 수 있습니다.
3. 과제 및 퀴즈는 PLMS에 과제마감일까지 word 혹은 ppt파일 형태로 올리고, 늦은제출에는 감점(10%-20%)이 적용됩니다.
3. 강의조교 : 한유정(huj1259@postech.ac.kr)

10. 학습법 소개 및 기타사항

★ More information about the course:
* R and Python code corresponding for each subjects in main text book are given.
* For your own project or homework, you may use R or Python whatever you are comfortable.
* Recommend to try R for Python users, and to try Python for R users.
* Choose your specific data, and work these real data (public data) on all steps in the course

11. 장애학생에 대한 학습지원 사항

- 수강 관련: 문자 통역(청각), 교과목 보조(발달), 노트필기(전 유형) 등

- 시험 관련: 시험시간 연장(필요시 전 유형), 시험지 확대 복사(시각) 등

- 기타 추가 요청사항 발생 시 장애학생지원센터(279-2434)로 요청