How to Parse CSV Files in Python

Filed Under: Python

CSV files are used a lot in storing tabular data into a file. We can easily export data from database tables or excel files to CSV files. It’s also easy to read by humans as well as in the program. In this tutorial, we will learn how to parse CSV files in Python.

What is Parsing?

Parsing a file means reading the data from a file. The file may contain textual data so-called text files, or they may be a spreadsheet.

What is a CSV file?

CSV stands for Comma Separated Files, i.e. data is separated using comma from each other. CSV files are created by the program that handles a large number of data. Data from CSV files can be easily exported in the form of spreadsheet and database as well as imported to be used by other programs.

Let’s see how to parse a CSV file. Parsing CSV files in Python is quite easy. Python has an inbuilt CSV library which provides the functionality of both readings and writing the data from and to CSV files. There are a variety of formats available for CSV files in the library which makes data processing user-friendly.

Parsing a CSV file in Python

Reading CSV files using the inbuilt Python CSV module.


import csv

with open('university_records.csv', 'r') as csv_file:
    reader = csv.reader(csv_file)

    for row in reader:
        print(row)

Output:

Python Parse CSV File

Python Parse CSV File

Writing a CSV file in Python

For writing a file, we have to open it in write mode or append mode. Here, we will append the data to the existing CSV file.


import csv

row = ['David', 'MCE', '3', '7.8']

row1 = ['Lisa', 'PIE', '3', '9.1']

row2 = ['Raymond', 'ECE', '2', '8.5']

with open('university_records.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)

    writer.writerow(row)

    writer.writerow(row1)

    writer.writerow(row2)
Python Append To CSV File

Python Append To CSV File

Parse CSV Files using Pandas library

There is one more way to work with CSV files, which is the most popular and more professional, and that is using the pandas library.

Pandas is a Python data analysis library. It offers different structures, tools, and operations for working and manipulating given data which is mostly two dimensional or one-dimensional tables.

Uses and Features of pandas Library

  • Data sets pivoting and reshaping.
  • Data manipulation with indexing using DataFrame objects.
  • Data filtration.
  • Merge and join operation on data sets.
  • Slicing, indexing, and subset of massive datasets.
  • Missing data handling and data alignment.
  • Row/Column insertion and deletion.
  • One-Dimensional different file formats.
  • Reading and writing tools for data in various file formats.

To work with the CSV file, you need to install pandas. Installing pandas is quite simple, follow the instructions below to install it using PIP.


$ pip install pandas
Python Install Pandas

Python Install Pandas

Python Install Pandas Cmd

Python Install Pandas Cmd

Once the installation is complete, you are good to go.

Reading a CSV file using Pandas Module

You need to know the path where your data file is in your filesystem and what is your current working directory before you can use pandas to import your CSV file data.

I suggest keeping your code and the data file in the same directory or folder so that you will not need to specify the path which will save you time and space.


import pandas

result = pandas.read_csv('ign.csv')

print(result)

Output

Read Csv File Pandas Output

Read CSV File using pandas module

Writing a CSV file using Pandas Module

Writing CSV files using pandas is as simple as reading. The only new term used is DataFrame.

Pandas DataFrame is a two-dimensional, heterogeneous tabular data structure (data is arranged in a tabular fashion in rows and columns.

Pandas DataFrame consists of three main components – data, columns, and rows –  with a labeled x-axis and y-axis (rows and columns).


from pandas import DataFrame

C = {'Programming language': ['Python', 'Java', 'C++'],

     'Designed by': ['Guido van Rossum', 'James Gosling', 'Bjarne Stroustrup'],

     'Appeared': ['1991', '1995', '1985'],

     'Extension': ['.py', '.java', '.cpp'],

     }

df = DataFrame(C, columns=['Programming language', 'Designed by', 'Appeared', 'Extension'])

export_csv = df.to_csv(r'program_lang.csv', index=None, header=True)

Output

Python Pandas Write CSV File

Python Pandas Write CSV File

Conclusion

We learned to parse a CSV file using built-in CSV module and pandas module. There are many different ways to parse the files, but programmers do not widely use them.

Libraries like PlyPlus, PLY, and ANTLR are some of the libraries used for parsing text data. Now you know how to use inbuilt CSV library and powerful pandas module for reading and writing data in CSV format. The codes shown above are very basic and straightforward. It is understandable by anyone familiar with python, so I don’t think there is any need for explanation.

However, the manipulation of complex data with empty and ambiguous data entry is not easy. It requires practice and knowledge of various tools in pandas. CSV is the best way of saving and sharing data. Pandas is an excellent alternative to CSV modules. You may find it difficult in the beginning, but it isn’t so hard to learn. With a little bit of practice, you will master it.

Comments

  1. Ankit Rana says:

    Nice tutorial
    In first example of reading csv, we try to close file and are using with statement too.
    With will close your resource so you don’t need to.

    1. Pankaj says:

      Yes, nice catch. I have updated the code snippets.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages