In Java, we have ZipOutputStream to create a zip file, and GZIPOutputStream to compress a file using Gzip, but there is no official API to create a tar.gz file.

In Java, we can use Apache Commons Compress (Still active in development) to create a .tar.gz file.

pom.xml

  <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-compress</artifactId>
      <version>1.20</version>
  </dependency>

Notes

  1. The tar is for collecting files into one archive file, aka tarball, and generally has the suffix .tar
  2. The Gzip is for compress files to save space and generally has the suffix .gz
  3. The tar.gz or .tgz means group all files into one archive file, and compress it using Gzip.

The below code snippets will create a tar.gz file.

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

//...
try (OutputStream fOut = Files.newOutputStream(Paths.get("output.tar.gz"));
     BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
     GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
     TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

       TarArchiveEntry tarEntry = new TarArchiveEntry(file,fileName);

       tOut.putArchiveEntry(tarEntry);

       // copy file to TarArchiveOutputStream
       Files.copy(path, tOut);

       tOut.closeArchiveEntry();

       tOut.finish();

     }

The below code snippets will decompress a .tar.gz file.

import org.apache.commons.compress.archivers.ArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;

//...
try (InputStream fi = Files.newInputStream(Paths.get("input.tar.gz"));
     BufferedInputStream bi = new BufferedInputStream(fi);
     GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
     TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {

    ArchiveEntry entry;
    while ((entry = ti.getNextEntry()) != null) {

        // create a new path, remember check zip slip attack
        Path newPath = filename(entry, targetDir);

        //checking

        // copy TarArchiveInputStream to newPath
        Files.copy(ti, newPath);

    }
}

1. Add two files to tar.gz

This example shows how to add two files into a tar.gz file.

TarGzipExample1.java

package com.favtuts.io.howto.compress;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.BufferedOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;

public class TarGzipExample {

    public static void main(String[] args) {

        try {

            Path path1 = Paths.get("/home/tvt/workspace/favtuts/sitemap.xml");
            Path path2 = Paths.get("/home/tvt/workspace/favtuts/file.txt");
            Path output = Paths.get("/home/tvt/workspace/favtuts/output.tar.gz");

            List<Path> paths = Arrays.asList(path1, path2);
            createTarGzipFiles(paths, output);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    // tar.gz few files
    public static void createTarGzipFiles(List<Path> paths, Path output)
        throws IOException {

        try (OutputStream fOut = Files.newOutputStream(output);
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            for (Path path : paths) {

                if (!Files.isRegularFile(path)) {
                    throw new IOException("Support only file!");
                }

                TarArchiveEntry tarEntry = new TarArchiveEntry(
                                                  path.toFile(),
                                                  path.getFileName().toString());

                tOut.putArchiveEntry(tarEntry);

                // copy file to TarArchiveOutputStream
                Files.copy(path, tOut);

                tOut.closeArchiveEntry();

            }

            tOut.finish();

        }

    }

}

Output – It adds sitemap.xml and file.txt into one archive file output.tar and compress it using Gzip, and the result is a output.tar.gz

$ tar -tvf /home/tvt/workspace/favtuts/output.tar.gz
-rw-r--r-- 0/0          260295 2022-05-20 16:15 sitemap.xml
-rw-r--r-- 0/0              24 2022-05-06 17:14 file.txt

2. Add a directory to tar.gz

This example adds a directory, including its sub-files and sub-directories into one archive file and Gzip compress it into a .tar.gz

The idea is to use Files.walkFileTree to walk a file tree and add the file one by one into the TarArchiveOutputStream.

TarGzipExample2.java

package com.favtuts.io.howto.compress;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.*;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Arrays;
import java.util.List;

public class TarGzipExample {

    public static void main(String[] args) {

        try {

            // tar.gz a folder
            Path source = Paths.get("/home/tvt/workspace/favtuts/test");
            createTarGzipFolder(source);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    // generate .tar.gz file at the current working directory
    // tar.gz a folder
    public static void createTarGzipFolder(Path source) throws IOException {

        if (!Files.isDirectory(source)) {
            throw new IOException("Please provide a directory.");
        }

        // get folder name as zip file name
        String tarFileName = source.getFileName().toString() + ".tar.gz";

        try (OutputStream fOut = Files.newOutputStream(Paths.get(tarFileName));
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            Files.walkFileTree(source, new SimpleFileVisitor<>() {

                @Override
                public FileVisitResult visitFile(Path file,
                                            BasicFileAttributes attributes) {

                    // only copy files, no symbolic links
                    if (attributes.isSymbolicLink()) {
                        return FileVisitResult.CONTINUE;
                    }

                    // get filename
                    Path targetFile = source.relativize(file);

                    try {
                        TarArchiveEntry tarEntry = new TarArchiveEntry(
                                file.toFile(), targetFile.toString());

                        tOut.putArchiveEntry(tarEntry);

                        Files.copy(file, tOut);

                        tOut.closeArchiveEntry();

                        System.out.printf("file : %s%n", file);

                    } catch (IOException e) {
                        System.err.printf("Unable to tar.gz : %s%n%s%n", file, e);
                    }

                    return FileVisitResult.CONTINUE;
                }

                @Override
                public FileVisitResult visitFileFailed(Path file, IOException exc) {
                    System.err.printf("Unable to tar.gz : %s%n%s%n", file, exc);
                    return FileVisitResult.CONTINUE;
                }

            });

            tOut.finish();
        }

    }

}

3. Add String to tar.gz

This example adds String into a ByteArrayInputStream and put it into the TarArchiveOutputStream directly. It means to create a file without saving it into the local disk and put the file into the tar.gz directly.

    public static void createTarGzipFilesOnDemand() throws IOException {

        String data1 = "Test data 1";
        String fileName1 = "111.txt";

        String data2 = "Test data 2 3 4";
        String fileName2 = "folder/222.txt";

        String outputTarGzip = "/home/tvt/workspace/favtuts/output.tar.gz";

        try (OutputStream fOut = Files.newOutputStream(Paths.get(outputTarGzip));
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            createTarArchiveEntry(fileName1, data1, tOut);
            createTarArchiveEntry(fileName2, data2, tOut);

            tOut.finish();
        }

    }

    private static void createTarArchiveEntry(String fileName,
                                              String data,
                                              TarArchiveOutputStream tOut)
                                              throws IOException {

        byte[] dataInBytes = data.getBytes();

        // create a byte[] input stream
        ByteArrayInputStream baOut1 = new ByteArrayInputStream(dataInBytes);

        TarArchiveEntry tarEntry = new TarArchiveEntry(fileName);

        // need defined the file size, else error
        tarEntry.setSize(dataInBytes.length);
        // tarEntry.setSize(baOut1.available()); alternative

        tOut.putArchiveEntry(tarEntry);

        // copy ByteArrayInputStream to TarArchiveOutputStream
        byte[] buffer = new byte[1024];
        int len;
        while ((len = baOut1.read(buffer)) > 0) {
            tOut.write(buffer, 0, len);
        }

        tOut.closeArchiveEntry();

    }

4. Decompress file – tar.gz

This example shows how to decompress and extract a tar.gz file, and it also checks the zip slip vulnerability.

TarGzipExample4.java

package com.favtuts.io.howto.compress;

import org.apache.commons.compress.archivers.ArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.*;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Arrays;
import java.util.List;

public class TarGzipExample {

    public static void main(String[] args) {

        try {

            // decompress .tar.gz
            Path source = Paths.get("/home/tvt/workspace/favtuts/output.tar.gz");
            Path target = Paths.get("/home/tvt/workspace/favtuts/tartest");
            decompressTarGzipFile(source, target);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    public static void decompressTarGzipFile(Path source, Path target)
        throws IOException {

        if (Files.notExists(source)) {
            throw new IOException("File doesn't exists!");
        }

        try (InputStream fi = Files.newInputStream(source);
             BufferedInputStream bi = new BufferedInputStream(fi);
             GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
             TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {

            ArchiveEntry entry;
            while ((entry = ti.getNextEntry()) != null) {

                // create a new path, zip slip validate
                Path newPath = zipSlipProtect(entry, target);

                if (entry.isDirectory()) {
                    Files.createDirectories(newPath);
                } else {

                    // check parent folder again
                    Path parent = newPath.getParent();
                    if (parent != null) {
                        if (Files.notExists(parent)) {
                            Files.createDirectories(parent);
                        }
                    }

                    // copy TarArchiveInputStream to Path newPath
                    Files.copy(ti, newPath, StandardCopyOption.REPLACE_EXISTING);

                }
            }
        }
    }

    private static Path zipSlipProtect(ArchiveEntry entry, Path targetDir)
        throws IOException {

        Path targetDirResolved = targetDir.resolve(entry.getName());

        // make sure normalized file still has targetDir as its prefix,
        // else throws exception
        Path normalizePath = targetDirResolved.normalize();

        if (!normalizePath.startsWith(targetDir)) {
            throw new IOException("Bad entry: " + entry.getName());
        }

        return normalizePath;
    }
}

Further Reading

Please check the official Apache Commons Compress examples.

Download Source Code

$ git clone https://github.com/favtuts/java-core-tutorials-examples

$ cd java-io/howto/compress

References

Leave a Reply

Your email address will not be published. Required fields are marked *