Data Mapping Using Numpy and Pandas in Python

Filed Under: Python Advanced
Data Mapping In Python

Data manipulation or transformation is the key aspect of any analysis. I am saying this because chances of getting insights that make sense are highly impossible. You should transform raw data into meaningful data. You may need to create new variables, bring the data into one form or even rearrange the data to make sense out of it.

This helps in identifying the anomalies and extract more insights than you think. Therefore, in this article, we will be discussing some of the python pandas and numpy functions which help us in Data mapping and replacement in python.


1. Create a Data Set

For the data mapping purpose, let’s create a simple dataset using the pandas dataframe function. This will be a simple student grade dataset.

we will be creating a simple dataset having 2 columns, one for student name and another for student grade.

#Create a dataset

import pandas as pd

student =  {'Name':['Mike','Julia','Trevor','Brooks','Murphy'],'Grade':[3.5,4,2.1,4.6,3.1]}

df = pd.DataFrame(student)

df

	Name	Grade
0	Mike	3.5
1	Julia	4.0
2	Trevor	2.1
3	Brooks  4.6
4	Murphy	3.1

Well, we got simple students data. Let’s see how we can map and replace the values as a part of the data transformation process.


2. Replacing Values in the data

So, we have data that include 5 values and multiple attributes. Now, we got a message from the class teacher that Murphy actually secured 5 grades and he is the topper in the class. We need to replace the old grade with a new grade as per the teacher’s words.

So, here we go…

#Replacing data

df['Grade'] = df['Grade'].replace([3.1],5)

#Updated data

	Name	Grade
0	Mike	3.5
1	Julia	4.0
2	Trevor	2.1
3	Brooks	4.6
4	Murphy	5.0

That’s great! We have successfully replaced the old grade(Value) with a new grade(Value). It is just an example and I have provided a real-world application of this process.

In the real world use case, you have to submit a data quality report to the client and you need to ask for correct values if you found the current data is not good. After you get the revised data from client, you should replace that against old record.


More Examples / Instances

  • Well, now we look for some other requirements as well. Let’s see how we can replace multiple old values with a set of new values.
#Replace multiple values with new set of values

df['New_grades']= df['Grade'].replace([3.5,4.0,2.1,4.6,5.0],['Average','Good','Needs Improvement','Good','Excellent'])

df
	Name	Grade
0	Mike	Average
1	Julia	Good
2	Trevor	Needs Improvement
3	Brooks	Good
4	Murphy	Excellent

That’s cool!

We have amazingly replaced multiple values a set of new values. As you can see, we have replaced all 5 values at once.

  • Replacing multiple values with a single new value.
#Replacing multiple values with a single new value 

df['Grade']= df['Grade'].replace(['Average','Good','Needs Improvement','Good','Excelelnt'],'Good')

df
    Name	Grade
0	Mike	Good
1	Julia	Good
2	Trevor	Good
3	Brooks	Good
4	Murphy	Good

That’s it. As simple as that. This is how you can replace multiple value with new set of values and a single new value.


3. Data Mapping Using Pandas Cut function

Well, we have discussed replacing values with multiple scenarios. Now, we will see how we can do this using the Pandas cut function in python.

In the above examples, we have manually replaced the values. But here, we will be creating bins and assign the values based on the grades.

#Pandas cut function 

my_bins = [0,2,4,5]
my_comments = ['Poor','Satisfied','Good']
df['New_Grades'] = pd.cut(df['Grade'],my_bins,labels=my_comments)
    Name	Grade	New_Grades
0	Mike	3.5	    Satisfied
1	Julia	4.0	    Satisfied
2	Trevor	2.1	    Satisfied
3	Brooks	4.6	    Good
4	Murphy	5.0	    Good

Excellent! We have mapped new grades into the data.

  • You need to define the bins.
  • Add the comments for the bins range.
  • Map the new variable into the data

4. Data Mapping using Numpy.digitize Function

This function will do the same mapping as pandas cut did. But, the difference is we have to create a dictionary and map it to the data.

Here, defining bins and bin range names will be same as above.

#Data mapping using numpy

import numpy as np

my_bins = [0,2,4.5,5]
my_comments = ['Poor','Satisfied','Good']
my_dict = dict(enumerate(my_comments,1))

df['Numpy.digitize'] = np.vectorize(my_dict.get)(np.digitize(df['Grade'], my_bins))

df
	Name	Grade	New_Grades	Numpy.digitize
0	Mike	3.5	    Satisfied	Satisfied
1	Julia	4.0	    Satisfied	Satisfied
2	Trevor	2.1	    Satisfied	Satisfied
3	Brooks	4.6	         Good	     Good
4	Murphy	5.0	         Good	     Good

You can see that, numpy.digitize method also produces the same result as of pandas cut function.


5. Numpy.select()

If you use this method for data mapping, you have to set the list conditions. based on your conditions, it will return an array of your choice.

#Numpy.select method

import numpy as np

select = [df['Grade'].between(0,2), 
          df['Grade'].between(2,4), 
          df['Grade'].between(4.1,5)]
values = ['Poor', 'Satisfied', 'Good']
df['Numpy_select'] = np.select(Numpy_select, values, 0)
Name	Grade	New_Grades	Numpy.digitize	Numpy_select
0	Mike	3.5	Satisfied	Satisfied	     Satisfied
1	Julia	4.0	Satisfied	Satisfied	     Satisfied
2	Trevor	2.1	Satisfied	Satisfied	     Satisfied
3	Brooks	4.6	     Good	     Good	          Good
4	Murphy	5.0	     Good	     Good	          Good

The code itself is self explanatory and you will get the idea easily.


6. User-defined Function

Finally, we are going to create a custom function which will do the same job like pandas cut, numpy.digitize and numpy.select functions.

#User defined function

def user_defined(values):
    if values >=0 and values <=2:
        return 'Poor'
    elif values >2 and values <= 4:
        return 'Satisfied'
    else: 
        return 'Good'


#Using the custom function 
df['user_defined'] = df['Grade'].apply(lambda x: user_defined(x))
	Name	Grade	New_Grades	Numpy.digitize	Numpy_select	user_defined
0	Mike	3.5	    Satisfied	   Satisfied	  Satisfied	     Satisfied
1	Julia	4.0	    Satisfied	   Satisfied	  Satisfied	     Satisfied
2	Trevor	2.1	    Satisfied	   Satisfied	  Satisfied	     Satisfied
3	Brooks	4.6	         Good	        Good	       Good	          Good
4	Murphy	5.0	         Good	        Good	       Good	          Good

Impressive!

We got the same output using different methods. You are free to use any of these shown methods when you working on data transformation and data mapping or data replacement as well.


Ending Note – Data Mapping

Data mapping and transformation is the vital part of the analysis. It will turn your raw data into an insights engine where you can get as many patterns and meaningful insights as you want. I hope you find this tutorial useful and enjoyed playing with the above methods.

That’s all for now! Happy Python 馃檪

More read: Numpy.digitize

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content