Python Altair is a unique data visualization library that allows you to create interactive models for visualizing data.
To become a good data scientist, being able to build easily understandable but complex plots is important.
A perfect way to tell the underlying story of your data is to make visualisations.
It illustrates the relationships within the data and exposes information that can not be communicated with only numbers and digits apparent to the human eye.
But you know what’s even better for data processing than visualizations? Visualizations that are interactive!
As a beginner, sadly, it can seem like a daunting mission.
To support you with the mission, Python and R both have a wide range of tools and tricks.
We will introduce you to Altair in this tutorial.
With Altair, with only a few lines of code and in a very short time, you’ll be able to construct meaningful, beautiful, and efficient visualizations. So let’s start now!
Table of Contents
- 1 What is Python Altair?
- 2 Working with the Python Altair Library
- 3 Ending Note
What is Python Altair?
Altair is a library of Python intended for statistical visualization. In nature, it is declarative (we shall come to this definition later on).
It is based on Vega and Vega-Lite, both of which are visualization grammar that enables you to explain a visualization’s visual appearance and interactive actions in a JSON format.
As a data scientist, Altair will allow you to concentrate your time on your data and make more effort to understand, analyze, and visualize it rather than on the required code.
Working with the Python Altair Library
Let’s move to work with the Altair library now. We’ll work on the vega dataset here. I’ve shared the link in the datasets section.
1. Installing the Altair module
To install the Python Altair library, we can use pip package manager:
pip install altair pip install vega_datasets
I’m using Google Colab, where it’s already present, so we can directly import:
import pandas as pd import altair as alt from vega_datasets import data as vega_data
2. Preparing the dataset
Today we’ll be using the flights_2k dataset from the vega-datasets library. I chose this because it is small, and doesn’t take much time to load, unlike the flights_3m library.
3. Fetching data with Pandas
We can fetch data from the library using the Python Pandas library and add the “url” tag as mentioned on the first line below:
flights_data = pd.read_json(vega_data.flights_2k.url) flights_data.head(10)
This gives us our data:
4. Plotting a dataset using Python Altair
Data is designed around the Pandas Dataframe in Altair, which means you can manipulate information in Altair the same way you can interact with Pandas DataFrame.
And while Altair internally stores data in a Pandas DataFrame format, there are several ways to enter information.
We use the alt.Chart function to plot :
alt.Chart(flights_data).mark_point().encode( alt.X('delay'), alt.Y('distance') )
5. Making plots interactive with Altair
Now we’ll take it to the next level. Let’s add the ability to interact with the plot, including:
- zooming into the plot
- clicking on data points
- viewing information while hovering
Add the tooltip option and then call the interactive function:
alt.Chart(flights_data).mark_point().encode( alt.X('delay'), alt.Y('distance'), tooltip = [ alt.Tooltip('delay'), alt.Tooltip('distance'), ] ).interactive()
This will give us:
As you can see, we can zoom in as we want into the dataset to get inferences.
Complete implementation of an interactive plot in Python
And that’s all. I’ve made a bunch of more interactive plots on my colab notebook using these codes, so try them out:
import pandas as pd import altair as alt from vega_datasets import data as vega_data flights_data = pd.read_json(vega_data.flights_2k.url) flights_data.head(10) alt.Chart(flights_data).mark_point().encode( alt.X('delay'), alt.Y('distance'), tooltip = [ alt.Tooltip('delay'), alt.Tooltip('distance'), ] ).interactive() alt.Chart(flights_data).mark_point(filled=True).encode( alt.X('origin'), alt.Y('delay'), alt.Size('distance') ) median_delay = flights_data.groupby('origin').median() alt.Chart(flights_data).mark_point(filled=True).encode( alt.X('origin'), alt.Y('destination'), alt.Size('distance') ).configure_mark( opacity=0.2, color='red' )
If you liked reading this article and want to read more, continue to follow the site! We have a lot of interesting articles upcoming in the near future. To stay updated on all the articles, don’t forget to join us along on Twitter and sign up for the newsletter for some interesting reads!