Most of the time you work with CSV (Comma Separated Values) file formats. It is also a widely used file format for data storage. So, what is special about this? Well, CSV files will consume more space and take more time to load as well. Therefore, we have to find some alternative to overcome this issue. Here, I am introducing Feather file format to you which offers lightning speed and manages the space very efficiently. Finally, companies will end up saving some bucks on storage services.
What is the Feather File Format In Python?
- Feather is first created in the Arrow project as a POC for fast data frame storage in Python and R.
- But, now it is not limited to Python and R. You can use it will all major languages.
- It is also known as a portable file format for sorting data frames.
- There are 2 versions available, Version1 and Version2. If any of the libraries are not comfortable with one of them, you can pass the version = ” ” argument to set the specific version.
Feather File Format Using Python Pandas
You can use this file format as a part of Pandas library. You have to import the pandas to use this file format while saving or reading the data.
Here, I will be loading the mtcars (csv format) dataset using the pandas read_csv function. After that, I will save the data from CSV to feather file format.
Let’s see how it works!
#Read the data(csv) and save it to feather file format df = pd.read_csv('mtcars.csv') df
This is the data saved in CSV format. Let’s save this in Feather file format now.
#Save data as feather file format df.to_feather('d_data.feather')
You have to use the
to_feather function to save the data in feather file format. It will get saved in your local working directory.
Read the Feather File
Well, we knew how to save a CSV file to feather file format. But, how to read it in python?
do you have any idea?
If not, worry not! It is again takes just a single line of code to read it as shown below.
#Read feather file df1 = pd.read_feather('d_data.feather')
That’s it. As simple as it is.
Yes, we have also a dedicated library for it in python. You have to install and import it before using the dedicated function to read and write feather files in python.
#Install and load feather pip install feather-format import feather #Write the data into feather file feather.write_dataframe(df, 'd2_data.feather')
Well, the data will be saved into your local directory in the feather file format. There is a very small difference in using feather file format with pandas and as a dedicated function. But the operation remains the same. You can go with either one for sure.
Read Feather File using the Feature library
Just like pandas, it will be the same process. Just call the
feather.read_dataframe function to read the feather file.
#Read feather file using feather library df3 = feather.read_dataframe('d2_data.feather') df3
It is the same data without a single change. Without any doubt, you can make use of feather file format which is nearly 150 times faster than csv files. It will also save much time and cost less.
Feather file format vs. CSV file format
So, we have seen some of the examples and short tutorials on how to read and write files using pandas and feather library as well. Both methods are good and it’s on you to choose one of them.
Here are some of the plots which clearly show, what to choose and what to use among these two formats. Because data speak better!
From the above plots, it is very clear and concise that native feather is the best file format to use to save your time, storage, and money. It reduces the size of the file to half. How something can be better than this 😛
Finally, we have come to the end of the article. I have shown you how it can be your one-stop solution for time and storage savings. As we observed, there will be no difference in the data amidst using different storage file formats. So make a better call next time you work with data.
And, that’s all for now!
Happy Python 🙂
More read: Feather file format