If you use Excel for Data manipulation, you might be familiar with the Split text to columns option. You can find this option in the Data tab in Excel. But if you use Python for data analysis, then this function is too good for you. We already know how flexible python is in terms of thousands of functions. Today, In this tutorial, let’s discuss how you can split text into columns using pandas in python.
How it is helpful?
This is the most obvious question I face many times :P. Yeah, it’s good that we came to know about Splitting text into columns, but is it really helpful? if yes, how?.
Consider a situation where you have a column where you have People’s names. But you want to split them as First and Last names, or even middle names. If you remember how automated mails include your first name, you got the whole point of splitting texts.
Take a look at this picture –
Apply the same knowledge for date and time also. If you can get the day, month, and year into separate columns, it will be easy for you to analyze them.
Though this function is very simple, it can save you time in the manipulation of data and analysis.
The best thing is, we can do all this in Python!!!
1. Import Pandas/Setup Jupyter Notebook
First things first. We have to import Pandas library.
In our first example, let’s see how we can split a text into columns using str.split function in pandas.
#Pandas import pandas as pd #data df2 = pd.read_csv('authors.csv') df2
Well, we have a list of Author names. Now, we have to split them into first and second names. Let’s see how it works.
#split df2[['First_name','Second_name']] = df2.Authors.str.split("-",expand=True) df2
That’s Awesome! You were too quick 😛
2. Split Date into Columns
I hope you are good with splitting text into columns as shown in the above section. Now, let’s see how we can split the dates into columns.
For this, we need data with the date as one of the attributes. I am using coffee sales data for this purpose. You can have a look at the data here –
Our date attributes include Year, month, and day. Let’s split that as new columns.
#split date df[['year','month','day']] = df.order_date.str.split("-",expand=True) df
This is pretty much cool!
You can see that now we have 3 more columns (Year, Month, and Day) as we expected.
3. Split Text To Columns – Afterword’s
When you are working with text data or data-time attributes, splitting text into columns using pandas str.split will be very handy. It will help us to create new features with DateTime. I hope you get to know something from this. That’s all for now. Happy Python!
More read: Working on strings using pandas.