Analyzing the deaths in 14 Tallest Mountains using Python

Filed Under: Python
Mountain Deaths Python

Hey learner! In this tutorial, we will take a dataset and learn how to analyze the dataset and gain maximum information from it. We will be using the Mountain Deaths dataset which is available on Kaggle easily.

Let’s not wait and get started already!

Also read: Friends (TV Series) Dataset Analysis using Python

What Does the Dataset Contain?

The dataset we will be using in this tutorial can be found here. The dataset description according to the Kaggle page says the following :

The International Climbing and Mountaineering Federation, commonly known by its French name Union Internationale des Associations d’Alpinisme (UIAA) recognizes 14 mountains that are more than 8,000 meters (26,247 ft) in height above sea level, and are considered to be sufficiently independent of neighboring peaks. These mountains are popularly called eight-thousanders. Even though all eight-thousanders have been summited, more than 1000 people have died trying to make it to the summits of these mountains.

The dataset contains the following columns for all the 14 mountains:

  1. Date: Date on which the mountaineer died
  2. Name: Name of the deceased
  3. Nationality: The country which the mountaineer belonged to
  4. Cause of death: Reason for the death

Analyzing the Mountain Deaths Using Python

Firstly, we import all of the libraries that we will need for our analysis in the later sections.

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

The next thing that we are going to do is combine all the 14 CSV files into a single CSV file to make the analysis even more interesting for us.

The code for the same is below. We will make sure sure that all the CSV files are in the same directory as the code file and then run the code. All the data stored in a single variable, DATA.

arr = os.listdir('.')
all_csv = [i for i in arr if i.endswith('csv')]

DATA =  pd.read_csv(all_csv[0])
DATA['Peak Name'] = [all_csv[0].split('.')[0] for i in range(DATA.shape[0])]
for i in all_csv[1:]:
  temp_DATA = pd.read_csv(i)
  temp_DATA['Peak Name'] = [i.split('.')[0] for j in range(temp_DATA.shape[0])]
  DATA = DATA.append(temp_DATA)

The data will look something like the image below.

Mountain Deaths Dataset
Mountain Deaths Dataset

Some Preliminary Analysis

The first thing we will look at is the describe function that summarizes the count, mean, standard deviation, min, and max for all the numeric features in the dataset.

DATA.describe()
Mountain Deaths Dataset Description
Mountain Deaths Dataset Description

The count function provides the number of data rows in a specific column.

DATA.count()
Mountain Deaths Dataset Count Rows
Mountain Deaths Dataset Count Rows

We can also the data type of each and every column in the dataset using this syntax:

DATA.dtypes
Mountain Deaths Dataset DataTypes
Mountain Deaths Dataset DataTypes

Next, we can use the unique function to find out the unique values of a particular column. Let’s see what are the unique values of the ‘Nationality’ column in our dataset.

print(DATA['Nationality'].unique())
Mountain Deaths Dataset Unique Nationalities
Mountain Deaths Dataset Unique Nationalities

Some Basic Visualizations for Mountain Deaths

First, let’s have a look at the mountain that has the largest number of deaths over the time period using the code below.

sns.catplot(x='Peak Name',kind='count',data=DATA,height=10,aspect=20/10)
plt.xticks(rotation=90)
plt.show()
Mountain Deaths Mountain Death Count
Mountain Deaths Mountain Death Count

From the plot, we can clearly see that everest has had the maximum number of deaths!

Next, we can see which is the main cause of the deaths over the period using the code below.

sns.catplot(x='Cause of death',kind='count',data=DATA,height=10,aspect=30/10)
plt.xticks(rotation=90)
plt.show()
Mountain Deaths Death Cause Count
Mountain Deaths Death Cause Count

We can see that most of the climbers died due to Avalanche making it the deadliest of all the other reasons. Also, avalanches are unfortunately out of control for climbers and it’s a risk they take up when going for the climb.

Lastly, we can analyze the climbers of which nationality have died the most over the years using the code below.

sns.catplot(x='Nationality',kind='count',data=DATA,height=10,aspect=20/10)
plt.xticks(rotation=90)
plt.show()
Mountain Deaths Nationality Death Count
Mountain Deaths Nationality Death Count

Of climbers from all the nationalities, those from Nepal have the highest death rate here. You may have to dig further to understand if the reason for death and country has any correlations and identify if the cause of death here is fixable or not.

Conclusion

Now you can easily analyze any dataset that you have no matter how challenging the dataset is. There are a lot more visualizations possible as well!

Keep reading to learn more!

Thank you for reading!

close
Generic selectors
Exact matches only
Search in title
Search in content