Saturday, 7 June 2014

2 Examples to read Zip Files in Java, ZipFile vs ZipInputStream

ZIP format is one of the most popular compression mechanism in computer world. A Zip file may contains multiples files or folder in compressed format.  Java API provides extensive support to read Zip files, all classes related to zip file processing are located in java.util.zip package. One of the  most common task related to zip archive is to read a Zip file and display what entries it contains, and then extract them in a folder. In this tutorial we will learn how to do this task in Java. There are two ways you can iterate over all items in a given zip archive, you can use either java.util.zip.ZipFile or java.util.zip.ZipInputStream. Since a Zip file contains several items, each of them has header field containing size of items in number of bytes. Which means you can iterate all entries without actually decompressing the zip file. ZipFile class accepts a java.io.File or String file name, it opens a ZIP file for reading and UTF-8 charset is used to decode the entry names and comments. Main benefit of using ZipFile over ZipInputStream is that it uses random access to iterate over different entries, while ZipInputStream is sequential, because it works  with stream, due to which it's not able to move positions freely. It has to read and decompress all zip data in order to reach EOF for each entry and read header of next entry. That's why its better to use ZipFile class over ZipInputStream for iterating over all entries from archive.  We will learn more about how to use read Zip file in Java, by following an example. By the way, code should work with zip file created by any zip utility e.g. WinZip, WinRAR or any other tool, .ZIP format permits multiple compression algorithms.. I have tested with Winzip in Windows 8, but it should work with zip file created by any tool.



Reading Zip archive in Java

How to read Zip Archive in Java
In this example, I have used ZipFile class to iterate over each file from Zip archive. getEntry() method of ZipFile returns an entry, which has all meta data including name, size and modified date and time. You can ask ZipFile for InputStream corresponding to this file entry for extracting real data. Which means, you only incur cost of decompression, when you really need to. By using java.util.zip.ZipFile, you can check each of entry and only extract certain entries, depending upon your logic. ZipFile is good for both sequential and random access of individual file entries. On the other hand, if you are using ZipInptStream then like any other InputStream, you will need to process all entries sequentially, as shown in second example. Key point to remember, especially if you are processing large zip archives is that, Java 6 only support zip file up to 2GB. Thankfully Java 7 supports zip64 mode, which can be used to process large zip file with size more than 2GB.

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Date;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipInputStream;

/**
 * Java program to iterate and read file entries from Zip archive.
 * This program demonstrate two ways to retrieve files from Zip using ZipFile and by using ZipInputStream class.
 * @author Javin
 */

public class ZipFileReader {

    // This Zip file contains 11 PNG images
    private static final String FILE_NAME = "C:\\temp\\pics.zip";
    private static final String OUTPUT_DIR = "C:\\temp\\Images\\";
    private static final int BUFFER_SIZE = 8192;

    public static void main(String args[]) throws IOException {

        // Prefer ZipFile over ZipInputStream
        readUsingZipFile();
    //  readUsingZipInputStream();

    }

    /*
     * Example of reading Zip archive using ZipFile class
     */

    private static void readUsingZipFile() throws IOException {
        final ZipFile file = new ZipFile(FILE_NAME);
        System.out.println("Iterating over zip file : " + FILE_NAME);

        try {
            final Enumeration<? extends ZipEntry> entries = file.entries();
            while (entries.hasMoreElements()) {
                final ZipEntry entry = entries.nextElement();
                System.out.printf("File: %s Size %d  Modified on %TD %n", entry.getName(), entry.getSize(), new Date(entry.getTime()));
                extractEntry(entry, file.getInputStream(entry));
            }
            System.out.printf("Zip file %s extracted successfully in %s", FILE_NAME, OUTPUT_DIR);
        } finally {
            file.close();
        }

    }

    /*
     * Example of reading Zip file using ZipInputStream in Java.
     */

    private static void readUsingZipInputStream() throws IOException {
        BufferedInputStream bis = new BufferedInputStream(new FileInputStream(FILE_NAME));
        final ZipInputStream is = new ZipInputStream(bis);

        try {
            ZipEntry entry;
            while ((entry = is.getNextEntry()) != null) {
                System.out.printf("File: %s Size %d  Modified on %TD %n", entry.getName(), entry.getSize(), new Date(entry.getTime()));
                extractEntry(entry, is);
            }
        } finally {
            is.close();
        }

    }

    /*
     * Utility method to read  data from InputStream
     */

    private static void extractEntry(final ZipEntry entry, InputStream is) throws IOException {
        String exractedFile = OUTPUT_DIR + entry.getName();
        FileOutputStream fos = null;

        try {
            fos = new FileOutputStream(exractedFile);
            final byte[] buf = new byte[BUFFER_SIZE];
            int read = 0;
            int length;

            while ((length = is.read(buf, 0, buf.length)) >= 0) {
                fos.write(buf, 0, length);
            }

        } catch (IOException ioex) {
            fos.close();
        }

    }

}

Output:
Iterating over zip file : C:\temp\pics.zip
File: Image  (11).png Size 21294  Modified on 10/24/13
File: Image  (1).png Size 22296  Modified on 11/19/13
File: Image  (2).png Size 10458  Modified on 10/24/13
File: Image  (3).png Size 18425  Modified on 11/19/13
File: Image  (4).png Size 31888  Modified on 11/19/13
File: Image  (5).png Size 27454  Modified on 11/19/13
File: Image  (6).png Size 67608  Modified on 11/19/13
File: Image  (7).png Size 8659  Modified on 11/19/13
File: Image  (8).png Size 40015  Modified on 11/19/13
File: Image  (9).png Size 17062  Modified on 10/24/13
File: Image  (10).png Size 42467  Modified on 10/24/13
Zip file C:\temp\pics.zip extracted successfully in C:\temp\Images\

In order to run this file, make your you must have, zip file with name pics.zip in C:\temp, and output directory C:\temp\Images available, otherwise it will throw java.lang.NullPointerException. After successful run of this program, you can see contents of zip file extracted inside output directory. By the way, as an exercise, you can enhance this program to get name of zip file from user and create output directory of same name.

That's all about How to read Zip file in Java. We have seen two different approaches to iterate over each file entries in Zip file and retrieve them. You should prefer using ZipFile over ZipInputStream for iterating over each file from archive. It's also good to know that java.uti.zip package also support GZIP file formats, which means you can also read .gz files generated by gzip command in UNIX from your Java program.

No comments:

Post a Comment