Materials
Week 1
Readings:
- LDS1 The Data Science Lifecycle
- LDS5 Case Study: Why is my Bus Always Late?
- PDSH2.1 Understanding data types in python
- PDSH2.2 The basics of numpy arrays
- PDSH2.4 Aggregations: min, max, and everything in between
Monday: Course introduction [slides]
Lab sections: Orientation to Jupyter notebooks [html]
Wednesday: Data science lifecycle [slides]
Week 2
Readings:
- Wickham (2014). Tidy data. Journal of statistical software 59(10). [link to paper]
- PDSH3.1 Introducing pandas objects
- PDSH3.2 Data indexing and selection
- PDSH3.7 Merge and join
- PDSH3.8 Aggregation and grouping
Assignments:
- HW1, BRFSS case study, due Monday, April 24 [html]
Monday: Tidy data [slides]
Lab sections: Pandas [html]
Wednesday: Dataframe transformations [slides]
Week 3
Readings:
- LDS2.2 Population, frame, sample
- Van Buuren, Flexible Imputation of Missing Data, section 2.2 Concepts in incomplete data
- PDSH3.4 Handling missing data
Assignments:
- Mini project 1, due Monday, May 1 [html]
Monday: Sampling, bias, and missingness [slides]
Lab sections: Exploring sampling bias through simulation [html]
Wednesday: Voter fraud case study [slides] [activity html]
Week 4
Readings:
- Wilke, Fundamentals of Data Visualization Ch. 2-5
- LDS11.1 Choosing scale to reveal structure
- (Recommended) Cook, D., Lee, E. K., & Majumder, M. (2016). Data visualization and statistical graphics in big data analysis. Annual Review of Statistics and Its Application, 3, 133-159. [link to paper]
- (Recommended) Gelman, A., & Unwin, A. (2013). Infovis and statistical graphics: different goals, different looks. Journal of Computational and Graphical Statistics, 22(1), 2-28. [link to paper]
- (Recommended) Iliinsky, N. (2010). On beauty. Beautiful visualization: Looking at data through the eyes of experts, 1-13. [link to chapter]
Assignments:
- HW2, SEDA case study, due Monday, May 8 [html]
Monday: Statistical graphics [slides]
Lab sections: Data visualization [html]
Wednesday: Principles of figure design [slides]
Week 5
Readings:
- LDS 11.2 Smoothing and aggregating data
- Scott, D.W. (2012). Multivariate Density Estimation and Visualization. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. [link to chapter]
Monday: Exploratory analysis and density estimation [slides]
Lab sections: Smoothing [html]
Wednesday: Multivariate KDE, mixture models, and scatterplot smoothing [slides] [activity html]
Week 6
Readings:
- LDS 10.2-10.5 Exploratory data analysis
- PDSH5.9 Principal component andalysis
Assignments:
- HW3, Diatom paleoclimatology case study, due Monday, May 22 [html]
Monday: Covariance, correlation, and spectral decomposition [slides]
NO lab sections this week
Wednesday: Principal components [slides]
Week 7
Readings:
- LDS 15.1-15.3 Simple linear models
Assignments:
- Mini project 2, due Tuesday, May 30 [html]
Monday: Modeling concepts; least squares [slides]
Lab sections: Principal components [html]
Wednesday: The simple linear regression model [slides]
Week 8
Readings:
- LDS 17.5 Basics of prediction intervals
- LDS 15.4 Multiple regression
Assignments:
- HW4, Discrimination in disability benefit allocation, due Wednesday, June 7 [html]
Monday: Prediction [slides]
Lab sections Fitting regression models [html]
Wednesday: Multiple regression [slides]
Week 9
Readings:
- LDS 19.1 – 19.3 and 19.5 Classification
Assignments:
- Course project due Friday, June 16 [html]
No class or lab sections Monday
Wednesday: Classification [slides]
Week 10
No readings or new assignments
No class Monday
Lab sections: Logistic regression (submission is optional) [html]
Wendesday: Clustering [slides]