```
# Initialize Otter
import otter
= otter.Notebook("lab4-smoothing.ipynb") grader
```

# Lab 4: Smoothing

```
import numpy as np
import pandas as pd
import altair as alt
= None # default='warn'
pd.options.mode.chained_assignment # disable row limit for plotting
alt.data_transformers.disable_max_rows()# uncomment to ensure graphics display with pdf export
# alt.renderers.enable('mimetype')
```

So far, you’ve encountered a number of visualization techniques for displaying tidy data. In those visualizations, all graphic elements represent the values of a dataset – they are visual displays of actual data.

In general, smoothing means evening out. Visualizations of actual data are often irregular – points are distributed widely in scatterplots, line plots are jagged, bars are discontinuous. When we look at such visuals, we tend to attempt to look past these irregularities in order to discern patterns – for example, the overall shape of a histogram or the general trend in a scatterplot. Showing what a graphic might look like with irregularities evened out often aids the eye in detecting pattern. This is what **smoothing** is: * evening out irregularities in graphical displays of actual data*.

For our purposes, usually smoothing will consist in drawing a line or a curve on top of an existing statistical graphic. From a technical point of view, this amounts to adding derived geometric objects to a graphic that have fewer irregularities than the displays of actual data.

In this lab, you’ll learn some basic smoothing techniques – kernel density estimation, LOESS, and linear smoothing via regression – and how to implement them in Altair.

In Altair, smoothing is implemented via what Altair describes as *transforms* – operations that modify a dataset. Try not to get too attached to this terminology – ‘transform’ and ‘transformation’ are used to mean a variety of things in other contexts. You’ll begin with a brief introduction to Altair transforms before turning to smoothing techniques.

The **sections** of the lab are divided as follows:

- Introduction to Altair transforms
- Histogram smoothing: kernel density estimamtion
- Scatterplot smoothing: LOESS and linear smoothing
- A neat graphic

And our main **goals** are:

- Get familiar with Altair transforms for dataframe operations: filter, bin, aggregate, calculate.
- ‘Handmande’ histograms: step-by-step construction
- Implement kernel density estimation via
`.transform_density(...)`

- Implement LOESS via
`.transform_loess(...)`

- Implement linear smoothing via
`transform_regression(...)`

You’ll use the same data as last week to stick to a familiar example:

```
# import tidied lab 3 data
= pd.read_csv('data/lab3-data.csv')
data data.head()
```

# Transforms in Altair

In Altair, operations that modify a dataset are referred to as *transforms*. Mostly, these are operations that could be performed manually with ease – the utility of transforms is that they *wrap common operations within plotting commands*, although they also make plotting codes more verbose.

Transforms encompass a broad range of types of operations, from relatively simple ones like filtering to more complex ones like smoothing. Here you’ll see a few intuitive transforms in Altair that integrate simple dataframe manipulations into the plotting process.

You’ll focus on the construction of histograms as a sort of case study. This will be a useful primer for histogram smoothing in the next section.

## Filter transform

Last week you saw a way to make histograms. As a quick refresher, to make a histogram of life expectancies across the globe in 2010, one can filter the data and then plot using the following commands:

```
# filter
= data[data.Year == 2010]
data2010
# plot
alt.Chart(data2010).mark_bar().encode(= alt.X('Life Expectancy',
x bin = alt.Bin(step = 2),
= 'Life Expectancy at Birth'),
title = 'count()'
y )
```

However, the filtering step can be handled *within the plotting commands* using `.transform_filter()`

.

This uses a helper command to specify the filtering condition – in the above example, the filtering condition is that `Year`

is equal to `2010`

. A filtering condition is referred to in Altair as a ‘field predicate’. In the above example: * filtering field: `Year`

* field predicate: equals `2010`

There are different helpers for different types of field predicates – you can find a complete list in the documentation.

Here is how to use `.transform_filter()`

to make the same histogram shown above, but skipping the step of storing a subset of the data under a separate name:

```
# filter and plot
alt.Chart(data).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
).mark_bar().encode(= alt.X('Life Expectancy',
x bin = alt.Bin(step = 2),
= 'Life Expectancy at Birth'),
title = 'count()'
y )
```

### Question 1: Filter transform

Construct a histogram of life expectancies across the globe in 2019 using a filter transform as shown above to filter the appropriate rows of the dataset. Use a bin size of three (not two) years.

```
# filter and plot
alt.Chart(data).transform_filter(
...
).mark_bar().encode(= ...
x = ...
y )
```

## Bin transform

The codes above provide a sleek way to construct the histogram that handles binning via arguments to `alt.X(...)`

. However, binning actually involves an operation: creating a new variable that is a discretization of an existing variable into contiguous intervals of a specified width.

To illustrate, have a look at how the histogram could be constructed ‘manually’ by the following operations. 1. Bin life expectancies 2. Count values in each bin 3. Make a bar plot of counts against bin centers.

Here’s step 1:

```
# bin life expectancies into 20 contiguous intervals
'Bin'] = pd.cut(data2010["Life Expectancy"], bins = 20)
data2010[ data2010.head()
```

Here’s step 2:

```
# count values in each bin and store midpoints
= data2010.loc[:, ['Life Expectancy', 'Bin']].groupby('Bin').count()
histdata 'Bin midpoint'] = histdata.index.values.categories.mid.values
histdata[ histdata
```

And finally, step 3:

```
# plot histogram
= 10).encode(
alt.Chart(histdata).mark_bar(width = 'Bin midpoint',
x = alt.Y('Life Expectancy', title = 'Count')
y )
```

These operations can be articulated as a transform in Altair using `.bin_transform()`

:

```
# filter, bin, and plot
alt.Chart(
data
).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
).transform_bin('Life Expectancy at Birth', # name to give binned variable
= 'Life Expectancy', # variable to bin
field bin = alt.Bin(step = 2) # binning parameters
= 10).encode(
).mark_bar(size = 'Life Expectancy at Birth:Q',
x = 'count()'
y )
```

The plotting codes are a little more verbose, but they’re much more efficient than performing the manipulations separately in pandas.

### Question 2: Bin transform

Follow the example above and make a histogram of life expectancies across the globe in 2019 using an explicit bin transform to create bins spanning three years.

```
# filter, bin, and plot
alt.Chart(
data
).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2019)
equal
).transform_bin(
...= ...
field bin = ...
= 10).encode(
).mark_bar(size = ...
x = 'count()'
y )
```

## Aggregate transform

Now, the counting of observations in each bin (implemented via `y = count()`

) is *also* an under-the-hood operation in constructing the histogram. You already saw how this was done ‘manually’ in the example above before introducing the bin transform.

Grouped counting is a form of *aggregation* in the sense discussed in lecture: it produces output that has fewer values than the input by combining multiple values (in this case rows) into one value (in this case a count of the number of rows).

This operation can also be made explicit using `.transform_aggregate()`

. This makes use of Altair’s *aggregation shorthands* for common aggregation functions; see the documentation on Altair encodings for a full list of shorthands.

Here is how `.transform_aggregate()`

would be used to perform the counting:

```
# filter, bin, count, and plot
alt.Chart(
data
).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
).transform_bin('Life Expectancy at Birth',
= 'Life Expectancy',
field bin = alt.Bin(step = 2)
).transform_aggregate(= 'count()', # altair shorthand operation -- see docs for full list
Count = ['Life Expectancy at Birth'] # grouping variable(s)
groupby = 10).encode(
).mark_bar(size = 'Life Expectancy at Birth:Q',
x = 'Count:Q'
y )
```

## Calculate transform

By default, Altair’s histograms are displayed on the *count scale* rather than the *density scale*.

The **count scale** means that the y-axis shows * counts of observations in each bin*.

By contrast, on the **density scale**, the y-axis would show * proportions of total bar area* (so that the area of plotted bars sums to 1).

It might seem like a silly distinction – after all, the two scales differ simply by a proportionality constant (the sample size times the bin width) – but as you will see shortly, the density scale is more useful for statistical thinking about the distribution of values and for direct comparisons of distributions approximated from samples of different sizes.

The scale conversion can be done using `.transform_calculate()`

, which computes derived variables using arithmetic operations. In this case, one only needs to divide the count by the total number of observations.

```
# filter, bin, count, convert scale, and plot
alt.Chart(
data
).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
).transform_bin('Life Expectancy at Birth',
= 'Life Expectancy',
field bin = alt.Bin(step = 2)
).transform_aggregate(= 'count()',
Count = ['Life Expectancy at Birth']
groupby
).transform_calculate(= 'datum.Count/(2*157)' # divide counts by sample size x binwidth
Density = 10).encode(
).mark_bar(size = 'Life Expectancy at Birth:Q',
x = 'Density:Q'
y )
```

### Question 3: Density scale histogram

Follow the example above and convert your histogram from Question 2 (with the year 2019, the step size of 3, and the usage of `.transform_bin(...)`

) to the density scale.

- First, calculate the sample size and store the value as
`sample_size`

. Store the desired step size as`bin_width`

.

- First, calculate the sample size and store the value as
- Convert your histogram from Question 2 (with the year 2019, the step size of 3, and the usage of
`.transform_bin(...)`

) to the density scale. First calculate the count explicitly using`.transform_aggregate(...)`

and then convert to a proportion using`.transform_calculate(...)`

. Multiply`sample_size`

with`bin_width`

to obtain the scaling constant and hardcode it into your implementation.

- Convert your histogram from Question 2 (with the year 2019, the step size of 3, and the usage of

```
# find scaling factor
= ...
sample_size = ...
bin_width print('scaling factor = ', sample_size*bin_width)
# construct histogram
alt.Chart(data).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2019)
equal
).transform_bin(
...= ...
field bin = ...
).transform_aggregate(= ...
Count = ...
groupby
).transform_calculate(# use sample_size*bin_width to rescale - you will need to hardcode this value
= ...
Density = 20).encode(
).mark_bar(size = 'Life Expectancy at Birth:Q', # SOLUTIOON
x = ...
y )
```

`"q3") grader.check(`

# Density estimation

Now that you have a sense of how transforms work, we can explore transforms that perform more sophisticated operations. We’re going to focus on a technique known as *kernel density estimation*.

Histograms show the distribution of values in the sample. Let’s call the density-scale histogram the *empirical density*. A **kernel density estimate** is simply * a smoothing of the empirical density*. (It’s called an ‘estimate’ because it’s often construed as an approximation of the distribution of population values that the sample came from.)

Often the point of visualizing the distribution of a variable is to discern the shape, spread, center, and tails of the distribution to answer certain questions: * what’s a typical value? * are there multiple typical values (multi-modal)? * are there outliers? * is the distribution skewed?

Density estimates are often easier to work with in exploratory analysis because it is visually easier to distinguish the shape of a smooth curve than the shape of a bunch of bars (unless you’re really far away).

Kernel density estimates are easy to plot using `.transform_density()`

. The cell below generates a density estimate of life expectancies across the globe in 2010. Notice the commented lines explaining the syntax.

```
# plot kernel density estimate of life expectancies in 2010
alt.Chart(
data
).transform_filter(= 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
).transform_density(= 'Life Expectancy', # variable to smooth
density = ['Life Expectancy at Birth', 'Estimated Density'], # names of outputs
as_ = 3, # how smooth?
bandwidth = [30, 85], # domain on which the smooth is defined
extent = 1000 # for plotting: number of points to generate for plotting line
steps = 'black').encode(
).mark_line(color = 'Life Expectancy at Birth:Q',
x = 'Estimated Density:Q'
y )
```

This estimate can be layered onto the empirical density to get a better sense of the relationship between the two. The cell below accomplishes this. Notice that the plot elements are constructed as separate *layers*.

```
# base plot
= alt.Chart(data).transform_filter(
base = 'Year',
alt.FieldEqualPredicate(field = 2010)
equal
)
# empirical density
= base.transform_bin(
hist = 'Life Expectancy at Birth',
as_ = 'Life Expectancy',
field bin = alt.Bin(step = 2)
).transform_aggregate(= 'count()',
Count = ['Life Expectancy at Birth']
groupby
).transform_calculate(= 'datum.Count/(2*157)'
Density = 10, opacity = 0.8).encode(
).mark_bar(size = 'Life Expectancy at Birth:Q',
x = 'Density:Q'
y
)
# kernel density estimate
= base.transform_density(
smooth = 'Life Expectancy',
density = ['Life Expectancy at Birth', 'Estimated density'],
as_ = 3,
bandwidth = [30, 85],
extent = 1000
steps = 'black').encode(
).mark_line(color = 'Life Expectancy at Birth:Q',
x = 'Estimated density:Q'
y
)
# layer
+ smooth hist
```

What if you want a different amount of smoothing? That’s what the `extent`

parameter is for. The smoothing is *local*, in the following sense: at any given point, the kernel density estimate averages bar heights in a neighborhood of nearby bars in proportion to how far the bars are from the point in question.

The `extent`

parameter specifies the size of the smoothing neighborhood in standard deviations. For instance, above `extent = 3`

, which means that the empirical density is smoothed 3 standard deviations in either direction to produce the kernel density estimate. This is also known as the *bandwidth*. * If the bandwidth is increased, averaging is more global, so the density estimate will get smoother. * If the bandwidth is decreased, averaging is more local, so the density estimate will get wiggly.

There are some methods out there for automating bandwidth choice, but often it is done by hand. Arguably this is preferable, as it allows the analyst to see a few possibilities and decide what best captures the shape of the distribution.

### Question 4: Selecting a bandwidth

Modify the ploting codes by *decreasing* the bandwidth parameter. Try several values, and then choose one that you feel captures the shape of the distribution well without getting too wiggly.

```
+ base.transform_density(
hist = 'Life Expectancy',
density = ['Life Expectancy at Birth', 'Estimated density'],
as_ = ...
bandwidth = [30, 85],
extent = 1000
steps = 'black').encode(
).mark_line(color = 'Life Expectancy at Birth:Q',
x = 'Estimated density:Q'
y )
```

## Comparing distributions

The visual advantage of a kernel density estimate for discerning shape is even more apparent when comparing distributions.

A major task in exploratory analysis is understanding how the distribution of a variable of interest changes depending on other variables – for example, you have already seen in the last lab that life expectancy seems to change over time. We can explore this phenomenon from a different angle by comparing distributions in different years.

Multiple density estimates can be displayed on the same plot by passing a grouping variable (or set of variables) to `.transform_density(...)`

. For example, the cell below computes density estimates of life expectancies for each of two years.

```
alt.Chart(data).transform_filter(= 'Year',
alt.FieldOneOfPredicate(field = [2010, 2019])
oneOf
).transform_density(= 'Life Expectancy',
density = ['Year'],
groupby = ['Life Expectancy at Birth', 'Estimated Density'],
as_ = 1.8,
bandwidth = [25, 90],
extent = 1000
steps
).mark_line().encode(= 'Life Expectancy at Birth:Q',
x = 'Estimated Density:Q',
y = 'Year:N'
color )
```

Often the area beneath each density estimate is filled in. This can be done by simply appending a `.mark_area()`

call at the end of the plot.

```
= alt.Chart(data).transform_filter(
p = 'Year',
alt.FieldOneOfPredicate(field = [2010, 2019])
oneOf
).transform_density(= 'Life Expectancy',
density = ['Year'],
groupby = ['Life Expectancy at Birth', 'Estimated Density'],
as_ = 1.8,
bandwidth = [25, 90],
extent = 1000
steps
).mark_line().encode(= 'Life Expectancy at Birth:Q',
x = 'Estimated Density:Q',
y = 'Year:N'
color
)
+ p.mark_area(opacity = 0.1) p
```

Notice that this makes it much easier to compare the distributions between years – you can see a pronounced rightward shift of the smooth for 2019 compared with 2010.

We could make the same comparison based on the histograms, but the shift is a lot harder to make out. Overlaid histograms should be avoided.

```
alt.Chart(data).transform_filter(= 'Year',
alt.FieldOneOfPredicate(field = [2010, 2019])
oneOf = 0.5).encode(
).mark_bar(opacity = alt.X('Life Expectancy', bin = alt.Bin(maxbins = 30), title = 'Life Expectancy at Birth'),
x = alt.Y('count()', stack = None),
y = 'Year:N'
color )
```

### Question 5: Multiple density estimates

Follow the example above to construct a plot showing separate density estimates of life expectancy for each region in the 2010. You can choose whether you prefer to fill in the area beneath the smooth curves, or not. Be sure to play with the bandwidth parameter and choose a value that seems sensible to you.

```
# construct density estimates
= alt.Chart(data).transform_filter(
p
...
).transform_density(= ...
density = ...
groupby = ...
as_ = ...
bandwidth = ...
extent = ...
steps
).mark_line(
).encode(= ...
x = ...
y = ...
color
)
# add shaded area underneath curves
...
```

### Question 6: Interpretation

Do the distributions of life expectancies seem to differ by region? If so, what is one difference that you notice? Answer in 1-2 sentences.

*Type your answer here, replacing this text.*

### Question 7: Outlier

Notice that little peak way off to the left in the distribution of life expectancies in the Americas. That’s an outlier.

- Which country is it? Check by filtering
`data`

appropriately and using`.sort_values(...)`

to find the lowest life expectancy in the Americas. Save the outlying observation as a one-row dataframe called`lowest_Americas`

and print the row.

- Which country is it? Check by filtering
- What was the life expectancy for that country in other years? Filter the data to examine the life expectancy in the country you identified as the outlier in all y. Save the resulting data frame as
`outlier_country`

.

- What was the life expectancy for that country in other years? Filter the data to examine the life expectancy in the country you identified as the outlier in all y. Save the resulting data frame as
- What Happened in 2010? Can you explain why the life expectancy was so low in that country for that particular year?(
*Hint*: if you don’t remember, Google the country name and year in question.)

- What Happened in 2010? Can you explain why the life expectancy was so low in that country for that particular year?(

*Type your answer here, replacing this text.*

```
# examine outlier
= data[
lowest_Americas
...
].sort_values(= ...
by 1)
).head( lowest_Americas
```

```
# show all obsrvations for country of interest
= ...
outlier_country outlier_country
```

`"q7") grader.check(`

# Scatterplot smoothing

In this brief section you’ll see two techniques for smoothing scatterplots: LOESS, which produces a curve; and regression, which produces a linear smooth.

The next parts will modify the dataframe `data`

by adding a column. Create a copy `data_mod1`

of the original dataframe `data`

to modify so as to not lose track of previous work:

`= data.copy() data_mod1 `

## LOESS

**Lo**cally w**e**ighted **s**catterplot **s**moothing (LOESS) is a flexible smoothing technique for visualizing trends in scatterplots. The technical details are a little involved but quite similar conceptually to kernel density estimation; we’ll just look at the implementation for now.

To illustrate, consider the scatterplots you made in lab 3 showing the relationship between life expectancy and GDP per capita. The plot for 2010 looked like this:

```
# log transform gdp explicitly
'log(GDP per capita)'] = np.log(data_mod1['GDP per capita'])
data_mod1[
# scatterplot
= alt.Chart(data_mod1).transform_filter(
scatter = 'Year', equal = 2000)
alt.FieldEqualPredicate(field = 0.5).encode(
).mark_circle(opacity = alt.X('log(GDP per capita)', scale = alt.Scale(zero = False)),
x = alt.Y('Life Expectancy', title = 'Life Expectancy at Birth', scale = alt.Scale(zero = False)),
y = alt.Size('Population', scale = alt.Scale(type = 'sqrt'))
size
)
# show
scatter
```

To add a LOESS curve, simply append `.transform_loess()`

to the base plot:

```
# compute smooth
= scatter.transform_loess(
smooth = 'log(GDP per capita)', # x variable
on = 'Life Expectancy', # y variable
loess = 0.25 # how smooth?
bandwidth = 'black')
).mark_line(color
# add as a layer to the scatterplot
+ smooth scatter
```

Just as with kernel density estimates, LOESS curves have a bandwidth parameter that controls how smooth or wiggly the curve is. In Altair, the LOESS bandwidth is a unitless parameter between 0 and 1.

### Question 8: LOESS bandwidth selection

Tinker with the bandwidth parameter to see its effect in the cell below. Then choose a value that produces a smoothing you find appropriate for indicating the general trend shown in the scatter.

```
# compute smooth
= scatter.transform_loess(
smooth = 'log(GDP per capita)',
on = 'All',
loess = ...
bandwidth = 'black')
).mark_line(color
# add as a layer to the scatterplot
+ smooth scatter
```

LOESS curves can also be computed groupwise. For instance, to display separate curves for each region, one need only pass a `groupby = ...`

argument to `.transform_loess()`

:

```
# scatterplot
= alt.Chart(data_mod1).transform_filter(
scatter = 'Year', equal = 2000)
alt.FieldEqualPredicate(field = 0.5).encode(
).mark_circle(opacity = alt.X('log(GDP per capita)', scale = alt.Scale(zero = False)),
x = alt.Y('Life Expectancy', title = 'Life Expectancy at Birth', scale = alt.Scale(zero = False)),
y = alt.Size('Population', scale = alt.Scale(type = 'sqrt')),
size = 'region'
color
)
# compute smooth
= scatter.transform_loess(
smooth = ['region'], # add groupby
groupby = 'log(GDP per capita)',
on = 'Life Expectancy',
loess = 0.8
bandwidth = 'black')
).mark_line(color
# add as a layer to the scatterplot
+ smooth scatter
```

The curves are a little jagged because there aren’t very many countries in each region.

`== 2000].groupby('region').count().iloc[:, [0]] data_mod1[data_mod1.Year `

## Regression

You will be learning more about linear regression later in the course, but we can introduce regression lines now as a visualization technique. As with LOESS, you don’t need to concern yourself with the mathematical details (yet). From this perspective, regression is a form of *linear* smoothing – a regression smooth is a straight line. By contrast, LOESS smooths have *curvature* – they are not straight lines.

In the example above, the LOESS curves don’t have much curvature. So it may be a cleaner choice visually to show linear smooths. This can be done using `.transform_regression(...)`

with a similar argument structure.

```
# scatterplot
= alt.Chart(data_mod1).transform_filter(
scatter = 'Year', equal = 2000)
alt.FieldEqualPredicate(field = 0.5).encode(
).mark_circle(opacity = alt.X('log(GDP per capita)', scale = alt.Scale(zero = False)),
x = alt.Y('Life Expectancy', title = 'Life Expectancy at Birth', scale = alt.Scale(zero = False)),
y = alt.Size('Population', scale = alt.Scale(type = 'sqrt')),
size = 'region'
color
)
# compute smooth
= scatter.transform_regression(
smooth = ['region'],
groupby = 'log(GDP per capita)',
on = 'Life Expectancy'
regression = 'black')
).mark_line(color
# add as a layer to the scatterplot
+ smooth scatter
```

### Question 9: Simple regression line

Based on the example immediately above, construct a scatterplot of life expectancy against log GDP per capita in 2010 with points sized according to population (and no distinction between regions). Layer a single linear smooth on the scatterplot using `.transform_regression(...)`

.

(*Hint*: remove the color aesthetic and grouping from the previous plot.)

```
# construct scatterplot
= alt.Chart(data_mod1).transform_filter(
scatter
...= 0.5).encode(
).mark_circle(opacity = ...
x = ...
y = ...
size
)
# construct smooth
= scatter.transform_regression(
smooth = ...
on = ...
regression = 'black')
).mark_line(color
# layer
...
```

# A neat trick

Let’s combine the scatterplot with a smooth from part 2 with the density estimates in part 1. This is an example of combining multiple plots into one visual.

Why combine? Well, sometimes it’s useful to visualize the distribution of the variable of interest *together with* its relationship to another variable. Imagine, for example, that you’re interested in seeing both: * the relationship between life expectancy and GDP per capita by region; and * the distributions of life expeectancies by region.

We can flip the density estimates on their side and append them as a facet to the right-hand side of the scatterplot as follows:

```
# scatterplot with linear smooth
= alt.Chart(data_mod1).transform_filter(
scatter = 'Year', equal = 2000)
alt.FieldEqualPredicate(field = 0.5).encode(
).mark_circle(opacity = alt.X('log(GDP per capita)', scale = alt.Scale(zero = False)),
x = alt.Y('Life Expectancy', title = 'Life Expectancy at Birth', scale = alt.Scale(zero = False)),
y = alt.Size('Population', scale = alt.Scale(type = 'sqrt')),
size = 'region'
color
)
= scatter.transform_regression(
smooth = ['region'],
groupby = 'log(GDP per capita)',
on = 'Life Expectancy'
regression = 'black')
).mark_line(color
# density estimates
= alt.Chart(data_mod1).transform_filter(
p = 'Year',
alt.FieldEqualPredicate(field = 2000)
equal
).transform_density(= 'Life Expectancy',
density = ['region'], # change here
groupby = ['Life Expectancy at Birth', 'Estimated density'],
as_ = 2,
bandwidth = [40, 85],
extent = 1000
steps = False).encode(
).mark_line(order = alt.Y('Life Expectancy at Birth:Q',
y = alt.Scale(domain = (40, 85)),
scale = '',
title = None),
axis = alt.X('Estimated density:Q',
x = '',
title = None),
axis = alt.Color('region:N')
color = 60)
).properties(width
# facet structure
+ smooth) | (p + p.mark_area(order = False, opacity = 0.1)) (scatter
```

# Submission

- Save the notebook.
- Restart the kernel and run all cells. (
**CAUTION**: if your notebook is not saved, you will lose your work.) - Carefully look through your notebook and verify that all computations execute correctly and all graphics are displayed clearly. You should see
**no errors**; if there are any errors, make sure to correct them before you submit the notebook. - Download the notebook as an
`.ipynb`

file. This is your backup copy. - Export the notebook as PDF and upload to Gradescope.