In Java, we have ZipOutputStream
to create a zip file, and GZIPOutputStream
to compress a file using Gzip, but there is no official API to create a tar.gz
file.
In Java, we can use Apache Commons Compress (Still active in development) to create a .tar.gz
file.
pom.xml
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.20</version> </dependency>
Notes
- The tar is for collecting files into one archive file, aka tarball, and generally has the suffix
.tar
- The Gzip is for compress files to save space and generally has the suffix
.gz
- The
tar.gz
or.tgz
means group all files into one archive file, and compress it using Gzip.
The below code snippets will create a tar.gz
file.
import org.apache.commons.compress.archivers.tar.TarArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; //... try (OutputStream fOut = Files.newOutputStream(Paths.get("output.tar.gz")); BufferedOutputStream buffOut = new BufferedOutputStream(fOut); GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut); TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) { TarArchiveEntry tarEntry = new TarArchiveEntry(file,fileName); tOut.putArchiveEntry(tarEntry); // copy file to TarArchiveOutputStream Files.copy(path, tOut); tOut.closeArchiveEntry(); tOut.finish(); }
The below code snippets will decompress a .tar.gz
file.
import org.apache.commons.compress.archivers.ArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; //... try (InputStream fi = Files.newInputStream(Paths.get("input.tar.gz")); BufferedInputStream bi = new BufferedInputStream(fi); GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi); TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) { ArchiveEntry entry; while ((entry = ti.getNextEntry()) != null) { // create a new path, remember check zip slip attack Path newPath = filename(entry, targetDir); //checking // copy TarArchiveInputStream to newPath Files.copy(ti, newPath); } }
1. Add two files to tar.gz
This example shows how to add two files into a tar.gz
file.
TarGzipExample1.java
package com.favtuts.io.howto.compress; import org.apache.commons.compress.archivers.tar.TarArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import java.io.BufferedOutputStream; import java.io.IOException; import java.io.OutputStream; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; public class TarGzipExample { public static void main(String[] args) { try { Path path1 = Paths.get("/home/tvt/workspace/favtuts/sitemap.xml"); Path path2 = Paths.get("/home/tvt/workspace/favtuts/file.txt"); Path output = Paths.get("/home/tvt/workspace/favtuts/output.tar.gz"); List<Path> paths = Arrays.asList(path1, path2); createTarGzipFiles(paths, output); } catch (IOException e) { e.printStackTrace(); } System.out.println("Done"); } // tar.gz few files public static void createTarGzipFiles(List<Path> paths, Path output) throws IOException { try (OutputStream fOut = Files.newOutputStream(output); BufferedOutputStream buffOut = new BufferedOutputStream(fOut); GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut); TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) { for (Path path : paths) { if (!Files.isRegularFile(path)) { throw new IOException("Support only file!"); } TarArchiveEntry tarEntry = new TarArchiveEntry( path.toFile(), path.getFileName().toString()); tOut.putArchiveEntry(tarEntry); // copy file to TarArchiveOutputStream Files.copy(path, tOut); tOut.closeArchiveEntry(); } tOut.finish(); } } }
Output – It adds sitemap.xml
and file.txt
into one archive file output.tar
and compress it using Gzip, and the result is a output.tar.gz
$ tar -tvf /home/tvt/workspace/favtuts/output.tar.gz
-rw-r--r-- 0/0 260295 2022-05-20 16:15 sitemap.xml
-rw-r--r-- 0/0 24 2022-05-06 17:14 file.txt
2. Add a directory to tar.gz
This example adds a directory, including its sub-files and sub-directories into one archive file and Gzip compress it into a .tar.gz
The idea is to use Files.walkFileTree
to walk a file tree and add the file one by one into the TarArchiveOutputStream
.
TarGzipExample2.java
package com.favtuts.io.howto.compress; import org.apache.commons.compress.archivers.tar.TarArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import java.io.*; import java.nio.file.*; import java.nio.file.attribute.BasicFileAttributes; import java.util.Arrays; import java.util.List; public class TarGzipExample { public static void main(String[] args) { try { // tar.gz a folder Path source = Paths.get("/home/tvt/workspace/favtuts/test"); createTarGzipFolder(source); } catch (IOException e) { e.printStackTrace(); } System.out.println("Done"); } // generate .tar.gz file at the current working directory // tar.gz a folder public static void createTarGzipFolder(Path source) throws IOException { if (!Files.isDirectory(source)) { throw new IOException("Please provide a directory."); } // get folder name as zip file name String tarFileName = source.getFileName().toString() + ".tar.gz"; try (OutputStream fOut = Files.newOutputStream(Paths.get(tarFileName)); BufferedOutputStream buffOut = new BufferedOutputStream(fOut); GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut); TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) { Files.walkFileTree(source, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attributes) { // only copy files, no symbolic links if (attributes.isSymbolicLink()) { return FileVisitResult.CONTINUE; } // get filename Path targetFile = source.relativize(file); try { TarArchiveEntry tarEntry = new TarArchiveEntry( file.toFile(), targetFile.toString()); tOut.putArchiveEntry(tarEntry); Files.copy(file, tOut); tOut.closeArchiveEntry(); System.out.printf("file : %s%n", file); } catch (IOException e) { System.err.printf("Unable to tar.gz : %s%n%s%n", file, e); } return FileVisitResult.CONTINUE; } @Override public FileVisitResult visitFileFailed(Path file, IOException exc) { System.err.printf("Unable to tar.gz : %s%n%s%n", file, exc); return FileVisitResult.CONTINUE; } }); tOut.finish(); } } }
3. Add String to tar.gz
This example adds String
into a ByteArrayInputStream
and put it into the TarArchiveOutputStream
directly. It means to create a file without saving it into the local disk and put the file into the tar.gz
directly.
public static void createTarGzipFilesOnDemand() throws IOException { String data1 = "Test data 1"; String fileName1 = "111.txt"; String data2 = "Test data 2 3 4"; String fileName2 = "folder/222.txt"; String outputTarGzip = "/home/tvt/workspace/favtuts/output.tar.gz"; try (OutputStream fOut = Files.newOutputStream(Paths.get(outputTarGzip)); BufferedOutputStream buffOut = new BufferedOutputStream(fOut); GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut); TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) { createTarArchiveEntry(fileName1, data1, tOut); createTarArchiveEntry(fileName2, data2, tOut); tOut.finish(); } } private static void createTarArchiveEntry(String fileName, String data, TarArchiveOutputStream tOut) throws IOException { byte[] dataInBytes = data.getBytes(); // create a byte[] input stream ByteArrayInputStream baOut1 = new ByteArrayInputStream(dataInBytes); TarArchiveEntry tarEntry = new TarArchiveEntry(fileName); // need defined the file size, else error tarEntry.setSize(dataInBytes.length); // tarEntry.setSize(baOut1.available()); alternative tOut.putArchiveEntry(tarEntry); // copy ByteArrayInputStream to TarArchiveOutputStream byte[] buffer = new byte[1024]; int len; while ((len = baOut1.read(buffer)) > 0) { tOut.write(buffer, 0, len); } tOut.closeArchiveEntry(); }
4. Decompress file – tar.gz
This example shows how to decompress and extract a tar.gz
file, and it also checks the zip slip vulnerability.
TarGzipExample4.java
package com.favtuts.io.howto.compress; import org.apache.commons.compress.archivers.ArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveEntry; import org.apache.commons.compress.archivers.tar.TarArchiveInputStream; import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import java.io.*; import java.nio.file.*; import java.nio.file.attribute.BasicFileAttributes; import java.util.Arrays; import java.util.List; public class TarGzipExample { public static void main(String[] args) { try { // decompress .tar.gz Path source = Paths.get("/home/tvt/workspace/favtuts/output.tar.gz"); Path target = Paths.get("/home/tvt/workspace/favtuts/tartest"); decompressTarGzipFile(source, target); } catch (IOException e) { e.printStackTrace(); } System.out.println("Done"); } public static void decompressTarGzipFile(Path source, Path target) throws IOException { if (Files.notExists(source)) { throw new IOException("File doesn't exists!"); } try (InputStream fi = Files.newInputStream(source); BufferedInputStream bi = new BufferedInputStream(fi); GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi); TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) { ArchiveEntry entry; while ((entry = ti.getNextEntry()) != null) { // create a new path, zip slip validate Path newPath = zipSlipProtect(entry, target); if (entry.isDirectory()) { Files.createDirectories(newPath); } else { // check parent folder again Path parent = newPath.getParent(); if (parent != null) { if (Files.notExists(parent)) { Files.createDirectories(parent); } } // copy TarArchiveInputStream to Path newPath Files.copy(ti, newPath, StandardCopyOption.REPLACE_EXISTING); } } } } private static Path zipSlipProtect(ArchiveEntry entry, Path targetDir) throws IOException { Path targetDirResolved = targetDir.resolve(entry.getName()); // make sure normalized file still has targetDir as its prefix, // else throws exception Path normalizePath = targetDirResolved.normalize(); if (!normalizePath.startsWith(targetDir)) { throw new IOException("Bad entry: " + entry.getName()); } return normalizePath; } }
Further Reading
Please check the official Apache Commons Compress examples.
Download Source Code
$ git clone https://github.com/favtuts/java-core-tutorials-examples
$ cd java-io/howto/compress