Tutorial

3 Easy Ways to Create a Subset of Python Dataframe

Published on August 3, 2022
Default avatar

By Safa Mulani

3 Easy Ways to Create a Subset of Python Dataframe

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Hello, readers! In this article, we will be focusing on Different Ways to Create a Subset of a Python Dataframe in detail.

So, let us get started!


First, what is a Python Dataframe?

Python Pandas module provides us with two data structures, namely, Series and Dataframe to store the values.

A Dataframe is a data structure that holds the data in the form of a matrix i.e. it contains the data in the value-form of rows and columns. Thus, in association with it, we can create and access the subset of it in the below formats:

  • Access data according to the rows as subset
  • Fetch data according to the columns as subset
  • Access specific data from some rows as well as columns as subset

Having understood about Dataframe and subsets, let us now understand the different techniques to create a subset out of a Dataframe.


Creating a Dataframe to work with!

To create subsets of a dataframe, we need to create a dataframe. Let’s get that out of our way first:

import pandas as pd 
data = {"Roll-num": [10,20,30,40,50,60,70], "Age":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
block = pd.DataFrame(data)
print("Original Data frame:\n")
print(block)

Output:

Original Data frame:

   Roll-num  Age    NAME
0        10   12    John
1        20   14  Camili
2        30   13  Rheana
3        40   12  Joseph
4        50   14  Amanti
5        60   13   Alexa
6        70   15    Siri

Here, we have created a data frame using pandas.DataFrame() method. We will be using the above created dataset throughout this article

Let us begin!


1. Create a subset of a Python dataframe using the loc() function

Python loc() function enables us to form a subset of a data frame according to a specific row or column or a combination of both.

The loc() function works on the basis of labels i.e. we need to provide it with the label of the row/column to choose and create the customized subset.

Syntax:

pandas.dataframe.loc[]

Example 1: Extract data of specific rows of a dataframe

block.loc[[0,1,3]]

Output:

As seen below, we have created a subset which includes all the data of row 0, 1, and 3.

Roll-num	Age	NAME
0	10	12	John
1	20	14	Camili
3	40	12	Joseph

Example 2: Create a subset of rows using slicing

block.loc[0:3]

Here, we have extracted the data of all the rows from index 0 to index 3 using slicing operator with loc() function.

Output:

Roll-num	Age	NAME
0	10	12	John
1	20	14	Camili
2	30	13	Rheana
3	40	12	Joseph

Example 3: Create a subset of particular columns using labels

block.loc[0:2,['Age','NAME']]

Output:

Age	NAME
0	12	John
1	14	Camili
2	13	Rheana

Here, we have created a subset which includes data from rows 0 to 2, but includes that of only some specific columns i.e. ‘Age’ and ‘NAME’.


2. Using Python iloc() function to create a subset of a dataframe

Python iloc() function enables us to create subset choosing specific values from rows and columns based on indexes.

That is, unlike loc() function which works on labels, iloc() function works on index values. We can choose and create a subset of a Python dataframe from the data providing the index numbers of the rows and columns.

Syntax:

pandas.dataframe.iloc[]

Example:

block.iloc[[0,1,3,6],[0,2]]

Here, we have created a subset which includes the data of the rows 0,1,3 and 6 as well as column number 0 and 2 i.e. ‘Roll-num’ and ‘NAME’.

Output:

Roll-num	NAME
0	10	John
1	20	Camili
3	40	Joseph
6	70	Siri

3. Indexing operator to create a subset of a dataframe

In a simple manner, we can make use of an indexing operator i.e. square brackets to create a subset of the data.

Syntax:

dataframe[['col1','col2','colN']]

Example:

block[['Age','NAME']]

Here, we have selected all the data values of the columns ‘Age’ and ‘NAME’, respectively.

Output:

Age	NAME
0	12	John
1	14	Camili
2	13	Rheana
3	12	Joseph
4	14	Amanti
5	13	Alexa
6	15	Siri

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to Python, stay tuned, and till then, Happy Learning!! :)

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us


About the authors
Default avatar
Safa Mulani

author

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
JournalDev
DigitalOcean Employee
DigitalOcean Employee badge
February 3, 2022

great lesson!

- Patrick Malaza

    Try DigitalOcean for free

    Click below to sign up and get $200 of credit to try our products over 60 days!

    Sign up

    Join the Tech Talk
    Success! Thank you! Please check your email for further details.

    Please complete your information!

    Get our biweekly newsletter

    Sign up for Infrastructure as a Newsletter.

    Hollie's Hub for Good

    Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

    Become a contributor

    Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

    Welcome to the developer cloud

    DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

    Learn more
    DigitalOcean Cloud Control Panel