Without any doubt, Pandas is a widely used robust python module for data manipulation and analysis. It offers hundreds of functions which makes our analysis lifecycle not only easy but efficient.
At often times, we do update existing features or create new features from existing data for desired results. Today, let’s understand how we can apply functions to columns or features.Â
Apply Functions to Columns in Python
We will be discussing 2 methods to apply functions to columns.
Also read: Conditional Filtering Using Pandas In Python
Load the data
Before we move forward, we need to import data to work with. We will be using the housing dataset for this tutorial. You can download this dataset on the Kaggle website.
#loading dataset
import pandas as pd
data = pd.read_csv('housing.csv')
data.head(5)

We are good to go!
1. Pandas Apply function
The apply function in pandas will apply the specific function to every value of a particular column.
In our data, we have a column names price, which represents the price of the house based on many factors.
Now, we try to apply a function on those price values to convert them into million format for easy consumption.
#Pandas apply
def measure_update(num):
return num/1000000
data['price_in_millions'] = data['price'].apply(measure_update)
data.head(5)

I have added pictures of data before and after applying our custom function. Basically, this function will convert the price to millions. After is 13300000 = 13.3 Million.
You can create any custom function based on your needs. This will help in many ways and saves your time on data analysis.
2. Complex Functions
Simple functions cannot serve the purpose all the time. To reduce your code and get optimal results, I suggest using complex functions or functions with multiple conditions.
Let’s walk through an example.
#multiple conditions
def price_range(price_in_millions):
if price_in_millions >= 10.0:
return "High"
elif price_in_millions < 10 and price_in_millions > 5:
return "Affordable"
else:
return 'Cheap'
data['price_range'] = data['price_in_millions'].apply(price_range)
data[['price','price_range']].sample(10)

What the above does is it will take in values in the Price column as input and group them based on conditional statements set by us.
After applying the function, it’s good to cross-check the results as shown above. You can easily select the required columns using pandas.
3. Ratios
Yes, getting the ratio of some columns can be a part of creating a new feature which may help in our analysis. So, let’s see how we can create a ratio column based on our data using pandas.
#ratio
def demo_ratio(bedrooms, bathrooms):
return bedrooms / bathrooms
data['ratio'] = data[['bedrooms', 'bathrooms']].apply(lambda data: demo_ratio(data['bedrooms'], data['bathrooms']), axis=1)
data[['bedrooms','bathrooms','ratio']]

That’s cool. Now we have the bedroom per bathroom ratio. So based on our results, we have 1 bathroom for every 2 bedrooms.
4. Numpy Magic
Yes, you read it right. Numpy’s magic will never get old. You have created a ratio attribute in the above section.
Now, let’s see how we can get the same output using Numpy vectorization. When it comes to numbers, Numpy is unstoppable.
#vectorization
data['do_ratio'] = np.vectorize(demo_ratio)(data['bedrooms'], data['bathrooms'])
data[['bedrooms','bathrooms','ratio','numpy_ratio']]

That’s nasty from Numpy 😛
We got the same output (Ratio) using the Numpy vectorization method. Now, you will believe in NumPy’s magic.
Apply Functions To Columns – Conclusion
It’s very easy to apply functions to columns using both pandas and numpy as shown here. These methods will be very handy whenever you will work on data manipulation and analysis. I hope you get to learn something new. That’s all for now. Happy Python!!!
More read: Numpy vectorization