In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.

import java.nio.charset.StandardCharsets;

  //...
  try (FileInputStream fis = new FileInputStream(file);
       InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
       BufferedReader reader = new BufferedReader(isr)
  ) {

      String str;
      while ((str = reader.readLine()) != null) {
          System.out.println(str);
      }

  } catch (IOException e) {
      e.printStackTrace();
  }

In Java 7+, many file read APIs start to accept charset as an argument, making reading a UTF-8 very easy.

  // Java 7
  BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

  // Java 8
  List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);

  // Java 8
  Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);

  // Java 11
  String s = Files.readString(path, StandardCharsets.UTF_8);

1. UTF-8 File

A UTF-8 encoded file c:\\temp\\test.txt, with Chinese characters.

2. Read UTF-8 file

This example shows a few ways to read a UTF-8 file.

UnicodeRead.java

package com.favtuts.io.howto;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Stream;

public class UnicodeRead {

    public static void main(String[] args) {
        
        String fileName = "/home/tvt/workspace/favtuts/unicode.txt";
        //readUnicodeClassic(fileName);
        //readUnicodeFiles(fileName);
        //readUnicodeJava11(fileName);
        readUnicodeBufferedReader(fileName);

    }


    // Java 7 - Files.newBufferedReader(path, StandardCharsets.UTF_8)
    // Java 8 - Files.newBufferedReader(path) // default UTF-8
    public static void readUnicodeBufferedReader(String fileName) {

        Path path = Paths.get(fileName);

        // Java 8, default UTF-8
        try (BufferedReader reader = Files.newBufferedReader(path)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    // Java 11, adds charset to FileReader
    public static void readUnicodeJava11(String fileName) {

        Path path = Paths.get(fileName);

        try (FileReader fr = new FileReader(fileName, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(fr)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }


    public static void readUnicodeFiles(String fileName) {

        Path path = Paths.get(fileName);

        try {
            
            // Java 11
            String s = Files.readString(path, StandardCharsets.UTF_8);
            System.out.println(s);

            // Java 8
            List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);
            list.forEach(System.out::println);

            // Java 8
            Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);
            lines.forEach(System.out::println);
            lines.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    public static void readUnicodeClassic(String fileName) {

        File file = new File(fileName);

        try (FileInputStream fis = new FileInputStream(file);
            InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
            BufferedReader reader = new BufferedReader(isr)
        ) {
            
            String str;
            while((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Output

line 1
line 2
line 3
你好,世界

Further Reading
How to write to a UTF-8 file in Java

Download Source Code

$ git clone https://github.com/favtuts/java-core-tutorials-examples

$ cd java-io/howto

References

Leave a Reply

Your email address will not be published. Required fields are marked *