Value Sorting Using Pandas: 4 Methods To Know

Filed Under: Pandas
Value Sorting Using Pandas

Sorting in layman terms – arranging the data in a meaningful order so that, it will be easy to analyze and visualize the data. Pandas being the go-to tool for data processing, we use value_sorting() most of the time to sort the data. You can use the sort_index() function as well. But, here, our focus will be on the value sorting using pandas. Without much intro, let’s discuss some of the key value_sorting operations using pandas in python. 

Also read: Pivot table using Pandas in Python


Pandas in Python

  • Pandas in python is an open-source library for data analysis.
  • It provides many functions to process the data.
  • You can install pandas using this code – pip install pandas.
  • You can inspect, merge, slice, sort, drop values using many functions.

Some of the key library operations include –

  1. Datafarme
  2. Reading and Writing data
  3. Missing data
  4. Duplicates
  5. Slicing
  6. Reshaping
  7. Indexing
  8. Time-series and more…

I think it’s enough information about routine pandas operations in python. Next, we will be diving into the most useful and important value sorting operations using pandas.


Import the data – Value sorting using pandas

For this whole illustration, we will be using the mtcars dataset. You can download it from here. For your convenience, a glance at the dataset is provided below. Have a look!

#Import pandas and data

import pandas as pd

#data

df = pd.read_csv('mtcars.csv')
Mtcars

Let’s explore the data to examine its shape and variables.

#Shape of the data

df.shape
(32, 12)
#Data attributes 

df.columns 
Index(['model', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am',
       'gear', 'carb'],
      dtype='object')

You can even check the duplicate columns using the value_counts() function. If the column is not duplicated, then the count of it should be one.

Tip: For the simple and quick visualizations, use the plot function. Let’s see how we can quickly analyze a data attribute.

Here we will quickly analyze and visualize the 'cyl' attribute of the data. (It’s just an add-on tip. You can skip this section).

#Analyse 

df['cyl'].value_counts()
8    14
4    11
6     7
Name: cyl, dtype: int64
#Visualize

df['cyl'].value_counts().plot(kind ='barh')
Bar Chart
  • Almost half of the cars in the dataset got 8 cylinders. I hope you find this useful!

1. Sorting single column

First, we will see how we can sort a single column in the dataset. In the mtcars data, we are going to sort the mpg attribute. It is nothing but – miles per gallon or simple mileage of the car. Let’s see how we can do this using the sort_values function offered by pandas. 

#Sort single column

df.sort_values('mpg', inplace  = True)

#view data

df.head(5)
Sort Values Single Column

Here, you can that all the values in the mpg attribute are sorted by ascending order. Make sure that you are passing inplace argument to make the changes in the original file itself. Otherwise, it will create a new dataframe.

Above all, you can pass the ascending = False argument to sort the values in the descending fashion.


2. Reset Index

Did you observe the index values in the previous output?

If not, have a look!

They seem messy and shuffled everywhere. So, it’s necessary to reset the index after sorting the values. It will uphold the data sanity and readability as well.

#resetting index

df.sort_values('mpg', inplace  = True, ignore_index = True)
#view data

df.head(5)
value sorting using pandas

Here, you can see that our index is resettled and now it looks good!

For resetting the index, you have to pass the ignore_index = True argument to the function.


3. Sort multiple columns

Now, let’s see how we can sort multiple columns at once. Because we cannot afford to change each column at a time. If the data has many attributes, it will take too much time and code to sort it.

#Sort multiple columns

df.sort_values(["mpg","disp"], ignore_index = True, ascending = [True, False])
#view data

df.head(3)
Sort Values Multiple Cols

Well, you can see that output above. Just like this, you can sort multiple columns at once. You can create a Boolean list for sorting fashion as shown in the code.


4. Never forger Missing values

Yes, the sort_values function offers na.position to detect the missing values in the data. But, we don’t have any missing values in our data as of now.

So, we need to create a temp NA value in the data. Then, we will sort the values.

#create Na values

import numpy as np
df.iloc[1:2,1:4] = np.nan
df.head(2)
value sorting using pandas

We have successfully induced the NA values in the [1,1 position of the dataframe. It simply means – we have induced the NA values through indexing.

Now, we sort the values of the mpg attribute.

#sorting

df.sort_values('mpg', inplace  = True)
value sorting using pandas
  • You can see that our mpg values are sorted and NA values got dispersed in-between. Now, this is our data which includes NA values in it.
  • How we can find the position of NA values now? Because they have lost the data while sorting. Here comes na.position. 
#NA position

df.sort_values(["mpg"], na_position="first").head()
value sorting using pandas

we have set the NA position to First in our code and here come our NA values appearing at the top. It is a very handy argument as we can find the position of the missing values in the data.


Wrapping Up – Value sorting using pandas

As I said earlier, pandas are the best tool in python for data analysis operations. Using the sort_values () function, you can perform many operations which help you in the analysis as well as to assess the data quality and distribution. I hope you find this value sorting using the pandas tutorial helpful and it will be great if it can save some time for you!

That’s all for now. Happy Python!

More read: Pandas documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content