In this tutorial, we will take a real-world dataset and plot the scatter chart for the dataset. Along with this, we will be plotting regression lines for the dataset.
A scatter plot
is a type of plot which displays the relation between two
variables in a dataset. Adding a regression line to a scatter plot is a great way to understand the relationship between the two numeric variables.
Altair
is a Python library that makes uses Vega and Vega-Lite grammars
that gives more time to focus on the analysis and study of data rather than visualization of data.
We will start off by loading the `Pandas` and `NumPy` libraries. We will also import `Altair` and `vega_datasets` to get the dataset in the later sections.
Also Read: Python Altair tutorial: Creating Interactive Visualizations
Implementing Regression Line on Scattery Plot using Python Altair
We will start by importing the Altair
and vega_datasets
libraries to get the plots and dataset we will be working on in the later sections.
import altair as alt
from vega_datasets import data
In this tutorial, we will be making use of Seattle’s weather dataset
which is built-in and can be loaded using the code below.
seattle_weather_data = data.seattle_weather()
print(seattle_weather_data.head())

We will start by plotting a simple scatter chart using the mark_point
function using the code below. We will be plotting the resulting line for three different types of relationships that are:
Minimum Temp and Maximum Temp
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='temp_min'
)

Wind and Minimum Temperature
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_min',
y='wind'
)

Wind and Maximum Temperature
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='wind'
)

Plotting Regression Line using Altair
The next step and the final step is to plot the regression line on the plots we have just seen right now. We can make a regression line using transform_regression
function and we can add it as another layer to the scatter plot.
Minimum Temp. and Maximum Temp.
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='temp_min'
) + alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='temp_min'
).transform_regression('temp_max', 'temp_min').mark_line(color='red')

Wind and Minimum Temperature
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_min',
y='wind'
) + alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_min',
y='wind'
).transform_regression('temp_min', 'wind').mark_line(color='red')

Wind and Maximum Temperature
alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='wind'
) + alt.Chart(seattle_weather_data).mark_point().encode(
x='temp_max',
y='wind'
).transform_regression('temp_max', 'wind').mark_line(color='red')

Conclusion
I hope you are now clear with how to plot regression lines on basic scatter plots in the Python programming language. Thank you for reading!