# Materials

## Week 1

*Readings*:

- LDS1 The Data Science Lifecycle
- LDS5 Case Study: Why is my Bus Always Late?
- PDSH2.1 Understanding data types in python
- PDSH2.2 The basics of numpy arrays
- PDSH2.4 Aggregations: min, max, and everything in between

**Monday**: Course introduction [slides]

**Lab sections**: Orientation to Jupyter notebooks [html]

**Wednesday**: Data science lifecycle [slides]

## Week 2

*Readings*:

- Wickham (2014). Tidy data.
*Journal of statistical software*59(10). [link to paper] - PDSH3.1 Introducing pandas objects
- PDSH3.2 Data indexing and selection
- PDSH3.7 Merge and join
- PDSH3.8 Aggregation and grouping

*Assignments*:

- HW1, BRFSS case study, due Monday, April 24 [html]

**Monday**: Tidy data [slides]

**Lab sections**: Pandas [html]

**Wednesday**: Dataframe transformations [slides]

## Week 3

*Readings*:

- LDS2.2 Population, frame, sample
- Van Buuren, Flexible Imputation of Missing Data, section 2.2 Concepts in incomplete data
- PDSH3.4 Handling missing data

*Assignments*:

- Mini project 1, due Monday, May 1 [html]

**Monday**: Sampling, bias, and missingness [slides]

**Lab sections**: Exploring sampling bias through simulation [html]

**Wednesday**: Voter fraud case study [slides] [activity html]

## Week 4

*Readings*:

- Wilke, Fundamentals of Data Visualization Ch. 2-5
- LDS11.1 Choosing scale to reveal structure
- (Recommended) Cook, D., Lee, E. K., & Majumder, M. (2016). Data visualization and statistical graphics in big data analysis. Annual Review of Statistics and Its Application, 3, 133-159. [link to paper]
- (Recommended) Gelman, A., & Unwin, A. (2013). Infovis and statistical graphics: different goals, different looks. Journal of Computational and Graphical Statistics, 22(1), 2-28. [link to paper]
- (Recommended) Iliinsky, N. (2010). On beauty. Beautiful visualization: Looking at data through the eyes of experts, 1-13. [link to chapter]

*Assignments*:

- HW2, SEDA case study, due Monday, May 8 [html]

**Monday**: Statistical graphics [slides]

**Lab sections**: Data visualization [html]

**Wednesday**: Principles of figure design [slides]

## Week 5

*Readings*:

- LDS 11.2 Smoothing and aggregating data
- Scott, D.W. (2012). Multivariate Density Estimation and Visualization. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. [link to chapter]

**Monday**: Exploratory analysis and density estimation [slides]

**Lab sections**: Smoothing [html]

**Wednesday**: Multivariate KDE, mixture models, and scatterplot smoothing [slides] [activity html]

## Week 6

*Readings*:

- LDS 10.2-10.5 Exploratory data analysis
- PDSH5.9 Principal component andalysis

*Assignments*:

- HW3, Diatom paleoclimatology case study, due Monday, May 22 [html]

**Monday**: Covariance, correlation, and spectral decomposition [slides]

**NO lab sections this week**

**Wednesday**: Principal components [slides]

## Week 7

*Readings*:

- LDS 15.1-15.3 Simple linear models

*Assignments*:

- Mini project 2, due Tuesday, May 30 [html]

**Monday**: Modeling concepts; least squares [slides]

**Lab sections**: Principal components [html]

**Wednesday**: The simple linear regression model [slides]

## Week 8

*Readings*:

- LDS 17.5 Basics of prediction intervals
- LDS 15.4 Multiple regression

*Assignments*:

- HW4, Discrimination in disability benefit allocation, due Wednesday, June 7 [html]

**Monday**: Prediction [slides]

**Lab sections** Fitting regression models [html]

**Wednesday**: Multiple regression [slides]

## Week 9

*Readings*:

- LDS 19.1 – 19.3 and 19.5 Classification

*Assignments*:

- Course project due Friday, June 16 [html]

**No class or lab sections Monday**

**Wednesday**: Classification [slides]

## Week 10

*No readings or new assignments*

**No class Monday**

**Lab sections**: Logistic regression (submission is optional) [html]

**Wendesday**: Clustering [slides]