Java CSV Parser

Filed Under: Java

Welcome to the Java CSV Parser tutorial. CSV files are one of the most widely used format to pass data from one system to another. Since CSV files are supported in Microsoft Excel, it can be easily used by non-techies also.

Java CSV Parser

java csv, java csv parser, OpenCSV, Apache Commons CSV, SuperCSV, java CSV writer

Unfortunately, we don’t have any in-built Java CSV Parser.

If the CSV file is really simple and don’t have any special characters, then we can use Java Scanner class to parse CSV files but most of the times it’s not the case. Rather than writing complicated logic for parsing, it’s better to use open-source tools we have for parsing and writing CSV files.

There are three open-source APIs for working with CSV.

  1. OpenCSV
  2. Apache Commons CSV
  3. Super CSV

We will look into all these java CSV parsers one by one.

Suppose we have a CSV file as:

employees.csv


ID,Name,Role,Salary
1,Pankaj Kumar,CEO,"5,000USD"
2,Lisa,Manager,500USD
3,David,,1000USD

and we want to parse it to list of Employee object.


package com.journaldev.parser.csv;

public class Employee {

	private String id;
	private String name;
	private String role;
	private String salary;
	
	public String getId() {
		return id;
	}
	public void setId(String id) {
		this.id = id;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public String getRole() {
		return role;
	}
	public void setRole(String role) {
		this.role = role;
	}
	public String getSalary() {
		return salary;
	}
	public void setSalary(String salary) {
		this.salary = salary;
	}
	
	@Override
	public String toString(){
		return "ID="+id+",Name="+name+",Role="+role+",Salary="+salary+"\n";
	}
}

1. OpenCSV

We will see how we can use OpenCSV java parser to read CSV file to java object and then write CSV from java object. Download OpenCSV libraries from SourceForge Website and include it in the classpath.

If you are using Maven then include it with below dependency.


<dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>3.8</version>
</dependency>

For parsing CSV file we can use CSVReader to parse each row to the list of objects. CSVParser also provides an option to read all the data at once and then parse it.

OpenCSV provides CsvToBean class that we can use with HeaderColumnNameMappingStrategy object to automatically map the CSV to list of objects.

For writing CSV data, we need to create List of String array and then use CSVWriter class to write it to the file or any other writer object.


package com.journaldev.parser.csv;

import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import au.com.bytecode.opencsv.CSVReader;
import au.com.bytecode.opencsv.CSVWriter;
import au.com.bytecode.opencsv.bean.CsvToBean;
import au.com.bytecode.opencsv.bean.HeaderColumnNameTranslateMappingStrategy;

public class OpenCSVParserExample {

	public static void main(String[] args) throws IOException {

		List<Employee> emps = parseCSVFileLineByLine();
		System.out.println("**********");
		parseCSVFileAsList();
		System.out.println("**********");
		parseCSVToBeanList();
		System.out.println("**********");
		writeCSVData(emps);
	}

	private static void parseCSVToBeanList() throws IOException {
		
		HeaderColumnNameTranslateMappingStrategy<Employee> beanStrategy = new HeaderColumnNameTranslateMappingStrategy<Employee>();
		beanStrategy.setType(Employee.class);
		
		Map<String, String> columnMapping = new HashMap<String, String>();
		columnMapping.put("ID", "id");
		columnMapping.put("Name", "name");
		columnMapping.put("Role", "role");
		//columnMapping.put("Salary", "salary");
		
		beanStrategy.setColumnMapping(columnMapping);
		
		CsvToBean<Employee> csvToBean = new CsvToBean<Employee>();
		CSVReader reader = new CSVReader(new FileReader("employees.csv"));
		List<Employee> emps = csvToBean.parse(beanStrategy, reader);
		System.out.println(emps);
	}

	private static void writeCSVData(List<Employee> emps) throws IOException {
		StringWriter writer = new StringWriter();
		CSVWriter csvWriter = new CSVWriter(writer,'#');
		List<String[]> data  = toStringArray(emps);
		csvWriter.writeAll(data);
		csvWriter.close();
		System.out.println(writer);
	}

	private static List<String[]> toStringArray(List<Employee> emps) {
		List<String[]> records = new ArrayList<String[]>();
		//add header record
		records.add(new String[]{"ID","Name","Role","Salary"});
		Iterator<Employee> it = emps.iterator();
		while(it.hasNext()){
			Employee emp = it.next();
			records.add(new String[]{emp.getId(),emp.getName(),emp.getRole(),emp.getSalary()});
		}
		return records;
	}

	private static List<Employee> parseCSVFileLineByLine() throws IOException {
		//create CSVReader object
		CSVReader reader = new CSVReader(new FileReader("employees.csv"), ',');
		
		List<Employee> emps = new ArrayList<Employee>();
		//read line by line
		String[] record = null;
		//skip header row
		reader.readNext();
		
		while((record = reader.readNext()) != null){
			Employee emp = new Employee();
			emp.setId(record[0]);
			emp.setName(record[1]);
			emp.setRole(record[2]);
			emp.setSalary(record[3]);
			emps.add(emp);
		}
		
		reader.close();
		
		System.out.println(emps);
		return emps;
	}
	
	private static void parseCSVFileAsList() throws IOException {
		//create CSVReader object
		CSVReader reader = new CSVReader(new FileReader("employees.csv"), ',');

		List<Employee> emps = new ArrayList<Employee>();
		//read all lines at once
		List<String[]> records = reader.readAll();
		
		Iterator<String[]> iterator = records.iterator();
		//skip header row
		iterator.next();
		
		while(iterator.hasNext()){
			String[] record = iterator.next();
			Employee emp = new Employee();
			emp.setId(record[0]);
			emp.setName(record[1]);
			emp.setRole(record[2]);
			emp.setSalary(record[3]);
			emps.add(emp);
		}
		
		reader.close();
		
		System.out.println(emps);
	}

}

When we run above OpenCSV example program, we get the following output.


[ID=1,Name=Pankaj Kumar,Role=CEO,Salary=5,000USD
, ID=2,Name=Lisa,Role=Manager,Salary=500USD
, ID=3,Name=David,Role=,Salary=1000USD
]
**********
[ID=1,Name=Pankaj Kumar,Role=CEO,Salary=5,000USD
, ID=2,Name=Lisa,Role=Manager,Salary=500USD
, ID=3,Name=David,Role=,Salary=1000USD
]
**********
[ID=1,Name=Pankaj Kumar,Role=CEO,Salary=null
, ID=2,Name=Lisa,Role=Manager,Salary=null
, ID=3,Name=David,Role=,Salary=null
]
**********
"ID"#"Name"#"Role"#"Salary"
"1"#"Pankaj Kumar"#"CEO"#"5,000USD"
"2"#"Lisa"#"Manager"#"500USD"
"3"#"David"#""#"1000USD"

As you can see that we can set the delimiters character also while parsing or writing CSV data in OpenCSV java parser.

2. Apache Commmons CSV

You can download the Apache Commons CSV binaries or include the dependencies using maven as shown below.


<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.3</version>
</dependency>

Apache Commons CSV parser is simple to use and CSVParser class is used to parse the CSV data and CSVPrinter is used to write the data.

Example code to parse above CSV file to the list of Employee objects is given below.


package com.journaldev.parser.csv;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

public class ApacheCommonsCSVParserExample {

	public static void main(String[] args) throws FileNotFoundException, IOException {
		
		//Create the CSVFormat object
		CSVFormat format = CSVFormat.RFC4180.withHeader().withDelimiter(',');
		
		//initialize the CSVParser object
		CSVParser parser = new CSVParser(new FileReader("employees.csv"), format);
		
		List<Employee> emps = new ArrayList<Employee>();
		for(CSVRecord record : parser){
			Employee emp = new Employee();
			emp.setId(record.get("ID"));
			emp.setName(record.get("Name"));
			emp.setRole(record.get("Role"));
			emp.setSalary(record.get("Salary"));
			emps.add(emp);
		}
		//close the parser
		parser.close();
		
		System.out.println(emps);
		
		//CSV Write Example using CSVPrinter
		CSVPrinter printer = new CSVPrinter(System.out, format.withDelimiter('#'));
		System.out.println("********");
		printer.printRecord("ID","Name","Role","Salary");
		for(Employee emp : emps){
			List<String> empData = new ArrayList<String>();
			empData.add(emp.getId());
			empData.add(emp.getName());
			empData.add(emp.getRole());
			empData.add(emp.getSalary());
			printer.printRecord(empData);
		}
		//close the printer
		printer.close();
	}

}

When we run the above program, we get the following output.


[ID=1,Name=Pankaj Kumar,Role=CEO,Salary=5,000USD
, ID=2,Name=Lisa,Role=Manager,Salary=500USD
, ID=3,Name=David,Role=,Salary=1000USD
]
********
ID#Name#Role#Salary
1#Pankaj Kumar#CEO#5,000USD
2#Lisa#Manager#500USD
3#David##1000USD

3. Super CSV

While searching for good CSV parsers, I saw so many developers recommending Super CSV in Stack Overflow. So I thought to give it a try. Download Super CSV libraries from SourceForge Website and include the jar file in the project build path.

If you are using Maven, just add below dependency.


<dependency>
    <groupId>net.sf.supercsv</groupId>
    <artifactId>super-csv</artifactId>
    <version>2.4.0</version>
</dependency>

For parsing CSV file to list of objects, we need to create instance of CsvBeanReader. We can set cell specific rules using CellProcessor array. We can use it to read directly from CSV file to java bean and vice versa.

If we have to write CSV data, process is similar and we have to use CsvBeanWriter class.


package com.journaldev.parser.csv;

import java.io.FileReader;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.List;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.UniqueHashCode;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;

public class SuperCSVParserExample {

	public static void main(String[] args) throws IOException {

		List<Employee> emps = readCSVToBean();
		System.out.println(emps);
		System.out.println("******");
		writeCSVData(emps);
	}

	private static void writeCSVData(List<Employee> emps) throws IOException {
		ICsvBeanWriter beanWriter = null;
		StringWriter writer = new StringWriter();
		try{
			beanWriter = new CsvBeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);
			final String[] header = new String[]{"id","name","role","salary"};
			final CellProcessor[] processors = getProcessors();
            
			// write the header
            beanWriter.writeHeader(header);
            
            //write the bean's data
            for(Employee emp: emps){
            	beanWriter.write(emp, header, processors);
            }
		}finally{
			if( beanWriter != null ) {
                beanWriter.close();
			}
		}
		System.out.println("CSV Data\n"+writer.toString());
	}

	private static List<Employee> readCSVToBean() throws IOException {
		ICsvBeanReader beanReader = null;
		List<Employee> emps = new ArrayList<Employee>();
		try {
			beanReader = new CsvBeanReader(new FileReader("employees.csv"),
					CsvPreference.STANDARD_PREFERENCE);

			// the name mapping provide the basis for bean setters 
			final String[] nameMapping = new String[]{"id","name","role","salary"};
			//just read the header, so that it don't get mapped to Employee object
			final String[] header = beanReader.getHeader(true);
			final CellProcessor[] processors = getProcessors();

			Employee emp;
			
			while ((emp = beanReader.read(Employee.class, nameMapping,
					processors)) != null) {
				emps.add(emp);
			}

		} finally {
			if (beanReader != null) {
				beanReader.close();
			}
		}
		return emps;
	}

	private static CellProcessor[] getProcessors() {
		
		final CellProcessor[] processors = new CellProcessor[] { 
                new UniqueHashCode(), // ID (must be unique)
                new NotNull(), // Name
                new Optional(), // Role
                new NotNull() // Salary
        };
		return processors;
	}

}

When we run above Super CSV example program, we get below output.


[ID=1,Name=Pankaj Kumar,Role=CEO,Salary=5,000USD
, ID=2,Name=Lisa,Role=Manager,Salary=500USD
, ID=3,Name=David,Role=null,Salary=1000USD
]
******
CSV Data
id,name,role,salary
1,Pankaj Kumar,CEO,"5,000USD"
2,Lisa,Manager,500USD
3,David,,1000USD

As you can see that the Role field is set as Optional because for the third row, it’s empty. Now if we change that to NotNull, we get following exception.


Exception in thread "main" org.supercsv.exception.SuperCsvConstraintViolationException: null value encountered
processor=org.supercsv.cellprocessor.constraint.NotNull
context={lineNo=4, rowNo=4, columnNo=3, rowSource=[3, David, null, 1000USD]}
	at org.supercsv.cellprocessor.constraint.NotNull.execute(NotNull.java:71)
	at org.supercsv.util.Util.executeCellProcessors(Util.java:93)
	at org.supercsv.io.AbstractCsvReader.executeProcessors(AbstractCsvReader.java:203)
	at org.supercsv.io.CsvBeanReader.read(CsvBeanReader.java:206)
	at com.journaldev.parser.csv.SuperCSVParserExample.readCSVToBean(SuperCSVParserExample.java:66)
	at com.journaldev.parser.csv.SuperCSVParserExample.main(SuperCSVParserExample.java:23)

So SuperCSV provides us option to have conditional logic for the fields that are not available with other CSV parsers. It’s easy to use and the learning curve is also very small.

That’s all for the Java CSV parser example tutorial. Whether to use OpenCSV, Apache Commons CSV or Super CSV depends on your requirement and they all seem to be easy to use.

Comments

  1. Ranjeet says:

    comma are available in Cells, while i am using simple CSVParser its working. but data is getting effected when comma are available in CSV cells. how to remove them .

    please assist

  2. Kasun says:

    Hi,

    Is there a way parse a CSV record with line breaks within quotes in Apache CSV ?

    ex:
    Below 3 lines should be one CSV record.

    1, Test, “testing
    testing
    testing”

    Thanks for the great post.

  3. Simran says:

    How will I convert CSV to JSON..??
    Help would be really appreciated..

  4. szh says:

    hi….i need to dump data into csv file from database table…..i have created entity class from database….can u tell me how to write into csv file and export all data into the csv file

  5. Prakaash says:

    May i know, how to omit empty line between each row in CSV file.

    1. Arun says:

      By using csv_ml (http://siara.cc/csv_ml), you could skip empty lines. You could have comments inside your CSV as well. Download demo application from http://siara.cc/csv_ml/csv_ml_swing_demo-1.0.0.jar

      Please look under API section for usage. Complete documentation and source available (https://github.com/siara-cc/csv_ml)

  6. Prakaash says:

    If csv file has content like below, then how your code will work

    ID,Name,Role,Salary
    1,Pankaj Kumar,CEO,”5,000USD”

    2,Lisa,Manager,500USD

    3,David,,1000USD

    between each row empty line is there.

  7. Vineet Mishra says:

    Hi,

    Great explanation.

    I have a scenario where i want to modify existing CSV file. say 3rd rows values I want to replace with some other values.

    how can I achieve this using any of the above 3 methods.

    thanks.

  8. Vinay says:

    Hi ,
    Thanks for the tutorial.

    While writing to CSV file you are using the line.

    CSVPrinter printer = new CSVPrinter(System.out, format.withDelimiter(‘#’));

    Why use # as a delimiter ? What is its significance?

    My problem is if a cell contains # in it then , in the csv file the whole line is being treated as a cell. If I open it in an editor then the commas will be in correct places but there will be a pair of “” around that line.

    Please help me in resolving this.
    Thanks,
    Vinay.

    1. Pankaj says:

      That is just a delimiter, if you have comma separated file the use comma as delimiter.

      1. Vinay says:

        Thanks for replying.
        If I use comma as a delimiter , then all the data is stored in one cell.(along with the commas meant for separation.)

        Can you tell me why and suggest a solution please?
        Thanks
        Vinay

  9. Chandra Bhanu Rastogi says:

    Pankaj,
    This exception is strange for me
    “Exception in thread “main” org.supercsv.exception.SuperCsvReflectionException: unable to find method setId(java.lang.String)”

    Why SuperCsv trying to find setId(String)? Id is of type int.

    1. Chandra Bhanu Rastogi says:

      It was my mistake. Now implemented successfully for my project. Thanks Pankaj for informative post.

  10. Ferenc Toth says:

    Hi, thanks for the tutorial – seems quite OK for beginners. I would still have question based on this material:
    There seems to be a problem in the method parseCSVToBeanList(), since when I attempt to execute it, it says there is a NullPointerException during the creation of the MappingStrategy. I include the error code below:

    Exception in thread “main” java.lang.RuntimeException: Error parsing CSV!
    at com.opencsv.bean.CsvToBean.parse(CsvToBean.java:95)
    at com.opencsv.bean.CsvToBean.parse(CsvToBean.java:75)
    at com.journaldev.parser.csv.OpenCSVParserExample.parseCSVToBeanList(OpenCSVParserExample.java:64)
    at com.journaldev.parser.csv.OpenCSVParserExample.main(OpenCSVParserExample.java:25)
    Caused by: java.lang.NullPointerException
    at com.opencsv.bean.HeaderColumnNameMappingStrategy.createBean(HeaderColumnNameMappingStrategy.java:170)
    at com.opencsv.bean.CsvToBean.processLine(CsvToBean.java:117)
    at com.opencsv.bean.CsvToBean.processLine(CsvToBean.java:101)
    at com.opencsv.bean.CsvToBean.parse(CsvToBean.java:91)
    … 3 more

    What could be there the problem related to the initialization? Thanks in advance,
    Feri

    1. Ferenc Toth says:

      The problem I saw was in the OpenCSV section – by the way.

  11. Jerry Bakster says:

    Thanks for the article, and I’m sharing one more open-source library for reading/writing/mapping CSV data. Since I used this library in my project, I found out it powerful and flexiable especially parsing big CSV data (such as 1GB+ file or complex processing logic).

    The library provides simplified API for Java developers, and also it provided full features in parsing CSV file.

    Here is a code snippt for using this library:

    public static void main(String[] args) throws FileNotFoundException {
    /**
    * —————————————
    * Read CSV rows into 2-dimensional array
    * —————————————
    */
    // 1st, config the CSV reader, such as line separator, column separator and so on
    CsvParserSettings settings = new CsvParserSettings();
    settings.getFormat().setLineSeparator(“\n”);

    // 2nd, creates a CSV parser with the configs
    CsvParser parser = new CsvParser(settings);

    // 3rd, parses all rows from the CSV file into a 2-dimensional array
    List resolvedData = parser.parseAll(new FileReader(“/examples/example.csv”));

    /**
    * ———————————————
    * Read CSV rows into list of beans you defined
    * ———————————————
    */
    // 1st, config the CSV reader with row processor attaching the bean definition
    BeanListProcessor rowProcessor = new BeanListProcessor(ColumnBean.class);
    settings.setRowProcessor(rowProcessor);
    settings.setHeaderExtractionEnabled(true);

    // 2nd, parse all rows from the CSF file into the list of beans you defined
    parser.parse(new FileReader(“/examples/example.csv”));
    List resolvedBeans = rowProcessor.getBeans();
    }

    Also find more details at official Github repository.

  12. Shwetha says:

    How to read the value from CSV based on the header value using OpenCSV?

  13. Pranav Mathur says:

    Hi,

    We’re using the apache commons csv library to parse and validate data in csv files, but, we’re facing an issue now. There are duplicate headers present in the csv file, due to which it fails in parsing itself.
    Any ideas on how to resolve this issue will be very helpful.

    Thanks in advance.

  14. Sharique says:

    Sir,

    If my bean class has reference of other Class then how do I bind csv to my bean class, consider following example :

    public class Employee{
    private int empId;
    private int empName;
    private Address address;

    }

    public class Address {
    private String city;
    private String state;
    }

    And my Employee.csv file is like this :

    EmpId,EmpName,City,State
    101,xyz,abc,def

    Please clarify

    1. mike says:

      i have same situation.

  15. Poohdedoo says:

    Hi
    I tried your sample on opencsv however it gives compilation errors when trying this
    HeaderColumnNameTranslateMappingStrategy beanStrategy = new HeaderColumnNameTranslateMappingStrategy();

    CsvToBean csvToBean = new CsvToBean();

    Error
    – The type HeaderColumnNameTranslateMappingStrategy is not generic; it cannot be parameterized with arguments

  16. Manju says:

    Hi,

    I’ve been given with a task to accept a csv file from UI (say a jsp form using file tag), parse the data of that file into beans, and output a csv file with a ‘Success’ message with respect to every file item. I’m new to java and I dont know how exactly to handle this. Can you please suggest any ideas that can help me doing this?

  17. RJ says:

    Nice Article Pankaj.
    You saved a lot of time for many repeating the same.
    Thanks !!

    In our case the header columns may vary among files. Taking your example the header line could be

    ID,Name,Role,Salary
    or
    Role,SomeUnwantedColumn,ID,Name

    The parser need to always read only ‘ ID,Role,Name’ values only.

    Which of these parser do you think would be a good candidate in terms of specifying which columns need to be parsed. Again its based on the ‘name’ of the columns and not the positional value.
    Although one of the options is to get list of HashMaps using SuperCSV, but I am interested in getting a list of POJO.

    Any suggestions on execution speed? (Need to process around 100,000 rec under 3 seconds on intel i5 windows 7)

  18. James says:

    Hi there, I’m a Super CSV developer. Just a correction – Super CSV does not force you to use a header with the same names as your field names.

    In your code examples above you could have had 2 arrays.

    One for your header:
    String[] header = new String[]{"ID","Name","Role","Salary"}

    And one for the name mapping passed to each read/write call:
    String[] nameMapping = new String[]{"id","name","role","salary"}

    Your header can be completely different – in fact you don’t even need a header!

    1. Pankaj says:

      I don’t see where we can provide header and nameMapping arrays in the read() method call.

      Can you provide a quick snippet that I can use like emp = beanReader.read(Employee.class, header, processors) to get the object directly from bean reader.

      1. James says:

        Here’s your writing example updated showing that the header is completely separate to the nameMapping used to map between CSV and the bean (often you can use the same array if the header and field names match, but if they don’t you have to supply two different arrays).


        beanWriter = new CsvBeanWriter(writer, CsvPreference.STANDARD_PREFERENCE);

        // write the header
        final String[] header = new String[]{"ID","Name","Role","Salary"}
        beanWriter.writeHeader(header);

        // maps columns to field names
        final String[] nameMapping = new String[]{"id","name","role","salary"};

        final CellProcessor[] processors = getProcessors();

        // write the beans data
        for(Employee emp : emps){
        beanWriter.write(emp, nameMapping, processors);
        }

        The same change would apply to the reading example.

        1. James says:

          Or I could have just linked you to this StackOverflow answer 🙂

          http://stackoverflow.com/a/21952522/1068649

          1. Pankaj says:

            Thanks for the clarification and pointing out the error in post, I have corrected it.

  19. subbareddy says:

    nice explanation it will useful

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages