Monday, 20 July 2015

2 Ways to Parse CSV Files in Java - BufferedReader vs Apache

In last tutorial, you have learned how to parse Excel file in Java and in this Java tutorial, you will learn how to parse CSV file in Java. You can directly parse CSV file in Java without using any third party library, because ultimately its a text file and you can use BufferedReader to read it, but you can also take advantage of good open source library like Apache commons CSV to parse comma separated values. These library makes developer's life easy and provides rich functionality to parse various CSV formats. In real programming world, CSV or comma separated files are used for variety of purpose, including for transporting data from one system to another e.g. FX rates, importing and exporting records from database etc. In CSV files entries are separated by comma, and it may or may not contain header. There are many ways to parse or read CSV files in Java, and if you need to do it on your project, its better not to reinvent the wheel and choose commons csv, but for learning purpose, it's good to know how to do it without using third party library.

In this tutorial, I am going to show you 2 ways to read CSV files in Java. First way is by using java.io.BufferedReader and split() method from java.lang.String class, and second way is by using Apache Commons CSV library's CSVParser class. Commons CSV is new member in rich Apache commons family and has built in support to read most common CSV formats e.g. RFC 4180, Microsoft Excel, MySQL and TDF. You can also create custom format by using fluent style API of Apache commons CSV.  CSVParser is fully functional parser, which can parse different kind of CSV files e.g. XLS CSV file, CSV file without header or CSV file with header. All you need to do is to choose different format for CSVParser, which we will learn in this tutorial. By the way, if you want to learn more about reading/writing files in Java, I suggest to read one of the good Java book like Core Java by Cay S. Horstmann or Java: A Beginner's Guide by Herbert Schildt.



Maven dependency and JAR file required for CSV Parsing

In order to use this library you need to add commons-csv-1.1.jar file into your classpath. If you are using Maven you can also add following dependency in your project file.
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.1</version>
</dependency>
Remember, its better to use Maven for managing dependency because it will also download any other JAR file on which this library is dependent, known as transitive dependencies. If you add JAR files manually, you make sure to download any dependent JAR.


CSV Parser to read CSV files in Java

Apache Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format. For example to parse an Excel CSV file, you need to write following code

Reader in = ...;
Iterable parser = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : parser) {
    ...
}

and to read a normal CSV file with header you need to write :

Reader in = ...;
Iterable parser = CSVFormat.DEFAULT.parse(in);
for (CSVRecord record : parser) {
    ...
}

Currently Apache commons CSV supports following formats :
  • DEFAULT to read standard comma separated format, as for RFC4180 but allowing empty lines.
  • EXCEL to read Excel file format (both XLS and XLSX) (using a comma as the value delimiter).
  • MYSQL to parse default MySQL format used by the SELECT INTO OUTFILE and LOAD DATA INFILE operations.
  • RFC4180 to read comma separated format as defined by RFC 4180.
  • TDF to parse tab-delimited format, with quote; leading and trailing spaces ignored
It's more functional, and should be used in real world project. On the other hand BufferedReader approach is pretty straight forward. You open a CSV file and start reading it line by line, since each line contains a coma separated String, you need to split them using comma (",") and you will get an array of String containing each column. Just do whatever you wants to do with them, if you are creating object, as shown in first example, then create them, otherwise you can simply print them like in second example. You can even use new Java 7 and Java 8 feature to read file more efficiently.

How to read CSV File in Java using BufferedReader


Java Program to Parse or Read CSV File in Java

Here is full code example of how to read CSV file in Java. This program contains two examples, first one read CSV file without using third party library and the second one parse file using Apache commons CSV, a new library for parsing CSV files. Make sure you include commons-csv-1.1.jar file in your CLASSPATH to run this program in your PC.  Here is our sample CSV file, which also contains header and has contains countries detail e.g. name of country, its capital and currency.

Our CSV file - countries.txt
NAME,CAPITAL,CURRENCY
India,New Delhi,INR
USA,Washington,USD
England,London,GBP
Japan,Tokyo,JPY

There are two methods in this program readCSV() and parseCSV(), former uses BufferedReader to read CSV file. We also have a class Country to represent each line of file, which basically contains country specific data. In first method we read the file line by line and then split each line on comma to get a String array containing individual fields. We use this array to create Country object and add them into the List, which is returned by our method. Code of this method is very straight forward and self explanatory, we have ignored the first line because we know its header.

Second method is interesting as it demonstrate how to use apache commons csv library to read csv file. As I said, commons csv supports several csv format directly and we will use CSVFormat.DEFAULT, which also supports header. Here you create an instance of CSVParser by passing it a FileInputStream, which points to your csv file and CSVFormat. This contains several CSVRecord from which you can retrieve individual fields.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

/**
 * Java Program to parse and read CSV file using traditional BufferedReader
 * approach and by using more functional CSV parser from Apache Commons CSV
 * library. Apache Commons CSV support different CSV format including default
 * one, with or without header, reading EXCEL or XLS CSV file etc.
 *
 * @author
 */
public class CSVReader {
    
    private static class Country {
        private String name;
        private String capital;
        private String currency;

        public Country(String name, String capital, String currency) {
            this.name = name;
            this.capital = capital;
            this.currency = currency;
        }

        public String name() {
            return name;
        };

        public String capital() {
            return capital;
        }

        public String currency() {
            return currency;
        }

        @Override
        public String toString() {
            return "Country [name=" + name + ", capital=" + capital
                    + ", currency=" + currency + "]";
        }
    }

    public static void main(String args[]) throws FileNotFoundException, IOException {
        System.out.println("Reading from CSV file using BufferedReader and String Split");
        List nations = readCSV();
        print(nations);
        System.out.println("Parsing CSV file using CSVParser of Apache commons CSV");
        parseCSV();

    }

    /*
     * Java program to read CVS file using BufferedReader and String split()
     * method
     */
    public static List readCSV() throws FileNotFoundException, IOException {
        List countries = new ArrayList<>();
        BufferedReader br = new BufferedReader(new FileReader("countries.csv"));

        String line = br.readLine(); // Reading header, Ignoring

        while ((line = br.readLine()) != null && !line.isEmpty()) {
            String[] fields = line.split(",");
            String name = fields[0];
            String capital = fields[1];
            String currency = fields[2];
            Country nation = new Country(name, capital, currency);
            countries.add(nation);
        }
        br.close();
        return countries;
    }

    /*
     * Method to read CSV file using CSVParser from Apache Commons CSV
     */
    public static void parseCSV() throws FileNotFoundException, IOException {
        CSVParser parser = new CSVParser(new FileReader("countries.csv"), CSVFormat.DEFAULT.withHeader());

        for (CSVRecord record : parser) {
            System.out.printf("%s\t%s\t%s\n", record.get("NAME"),
                    record.get("CAPITAL"), record.get("CURRENCY"));
        }
        parser.close();
    }

    public static void print(List countries) {
        System.out.println("========================");
        for (Country country : countries) {
            System.out.println(country);
        }
        System.out.println("========================");
    }

}

Output:
Reading from CSV file using BufferedReader and String Split
========================
Country [name=India, capital=New Delhi, currency=INR]
Country [name=USA, capital=Washington, currency=USD]
Country [name=England, capital=London, currency=GBP]
Country [name=Japan, capital=Tokyo, currency=JPY]
========================

Parsing CSV file using CSVParser of Apache commons CSV
India   New Delhi       INR
USA     Washington      USD
England London          GBP
Japan   Tokyo           JPY

You can see output of our program matches with content of our CSV file. So both of our approach is working properly.

That's all folks, Enjoy parsing CSV file with Apache commons CSV parser. Another great utility open source library from Apache. You can also report any issue found while using them. A nice way to support open source projects. I suggest to use this library if you have to process CSV files in your project because its tried and tested and rich in functionality. You can use this library to load CSV data into MySQL database or simply create objects from them.

If you like this tutorial and interested to learn more about parsing and handling files in Java, you can also check out following Java IO tutorials from this blog :
  • How to parse XML file in Java using SAX parser? (guide)
  • How to convert JSON to Object in Java? (example)
  • How to read XML file in Java using JDOM parser? (tutorial)
  • How to parse big JSON file using Jackson Streaming API? (example)
  • How to read a file in one line in Java 8? (example
  • How to copy File in Java? (example)
  • How to generate MD5 checksum for file in Java? (solution)
  • How to read/write RandomAccessFile in Java? (example)


Recommended books for Java programmers 
  • Head First Java 2nd Edition by Kathy Sierra (book)
  • Core Java Volume 1 and 2 9th Edition by Cay S. Horstmann
  • Java : A beginner's guide by Herbert Schildt

No comments:

Post a Comment