This article shows few Java examples to get the total number of lines in a file. The steps are similar:

  1. Open the file.
  2. Read line by line, and increases count + 1 each line.
  3. Close the file.
  4. Read the count.

Test Java Methods:

  1. Files.lines
  2. BufferedReader
  3. LineNumberReader
  4. BufferedInputStream

At the end of the article, we also show the performance of the different ways of counting the total number of lines in a large file that contains 5 million lines and 1053 characters per line.

1. Files.lines (Java 8)

This Files.lines is the most straightforward implementation.

  public static long countLineJava8(String fileName) {

      Path path = Paths.get(fileName);

      long lines = 0;
      try {

          // much slower, this task better with sequence access
          //lines = Files.lines(path).parallel().count();

          lines = Files.lines(path).count();

      } catch (IOException e) {
          e.printStackTrace();
      }

      return lines;

  }

2. BufferedReader

This example uses BufferedReader to read line by line and increases the count.

  public static long countLineBufferedReader(String fileName) {

      long lines = 0;
      try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
          while (reader.readLine() != null) lines++;
      } catch (IOException e) {
          e.printStackTrace();
      }
      return lines;

  }

3. LineNumberReader

This LineNumberReader is similar to the above BufferedReader.

  public static long countLineNumberReader(String fileName) {

      File file = new File(fileName);

      long lines = 0;

      try (LineNumberReader lnr = new LineNumberReader(new FileReader(file))) {

          while (lnr.readLine() != null) ;

          lines = lnr.getLineNumber();

      } catch (IOException e) {
          e.printStackTrace();
      }

      return lines;

  }

4. BufferedInputStream

This BufferedInputStream example is copied from this StackOverflow Answer.

  public static long countLineFast(String fileName) {

      long lines = 0;

      try (InputStream is = new BufferedInputStream(new FileInputStream(fileName))) {
          byte[] c = new byte[1024];
          int count = 0;
          int readChars = 0;
          boolean endsWithoutNewLine = false;
          while ((readChars = is.read(c)) != -1) {
              for (int i = 0; i < readChars; ++i) {
                  if (c[i] == '\n')
                      ++count;
              }
              endsWithoutNewLine = (c[readChars - 1] != '\n');
          }
          if (endsWithoutNewLine) {
              ++count;
          }
          lines = count;
      } catch (IOException e) {
          e.printStackTrace();
      }

      return lines;
  }

5. Benchmark

5.1 Create a large file containing 5 million lines, 1053 characters per line, and file size of 5G.

  public static void writeLargeFile() {

      String fileName = "/home/favtuts/large-file.txt";

      // 1053 chars per line
      String content = "Hello 123456 ";
      content = content + content + content;
      content = content + content + content;
      content = content + content + content;
      content = content + content + content;

      System.out.println(content.length()); // 1053

      try (BufferedWriter bw = new BufferedWriter(new FileWriter(fileName))) {

          for (int i = 0; i < 5_000_000; i++) {
              bw.write(content);
              bw.write(System.lineSeparator());
          }

      } catch (IOException e) {
          e.printStackTrace();
      }

  }

5.2 Rerun 5-10 times the same method and get the average benchmark, here’s the result:

  1. Files.lines – 6-8 seconds.
  2. BufferedReader – 6-8 seconds.
  3. LineNumberReader – 6-8 seconds.
  4. BufferedInputStream – 4-5 seconds.

The BufferedInputStream (StackOverflow Answer), is the fastest way to count the number of lines in a large file (5G file size and 5 million lines). Still, the difference is not that significant, and the implementation is error-prone and a bit complicated. If we test with a smaller file like 1G file size and 1 million lines, and we hardly notice the difference.

At last, the Java NIO Files.lines is simple to use, and the performance isn’t that much different, the best choice to count the number of lines in a file.

6. wc -l

On Linux, the command wc -l is the fastest way to count the number of lines in a file.

$ time wc -l large-file.txt
5000000 large-file.txt

real	0m2.344s
user	0m0.113s
sys	0m1.306s

$ time wc -l large-file.txt
5000000 large-file.txt

real	0m0.630s
user	0m0.092s
sys	0m0.537s

Any inputs and ideas on the algorithm behind the wc -l command?

Download Source Code

$ git clone https://github.com/favtuts/java-core-tutorials-examples

$ cd java-io/howto

References

Leave a Reply

Your email address will not be published. Required fields are marked *