Count bytes while writing to a file using CountingOutputStream from commons io package

Published by Alexander Braun on 03 May 2019 - tagged with Java

Sometimes it is useful to know how many bytes have already been written to an output stream in Java. In this post we will use CountingOutputStream to achieve this goal easily.

Motivation

Sometimes it is required to identify how many bytes have already been written to a file while processing data. A typical example is when working on data transfer projects where the receiver of the data has certain limits regarding the maximum file size that can be processed. It might be necessary to create a new output file when we reach the maximum file size. In these cases we can utilize CountingOutputStream from the apache.commons.io package.

It is usually not sufficient to just manually count the number of characters of a String we write to an output stream. There are some considerations to be taken into account, e.g.:

  • New line characters, e.g. \n or \r\n
  • The Charset we use for writing, e.g. UTF-8 or UTF-16

The CountingOutputStream class provides a ready to go solution to handle all these situations.

You can find the complete code for this project at github.

Implementing the example

We will go through two examples, writing files using different each time a different Charset.

Initialzing CountingOutputStream

We first have to initialize the CountingOutputStream and the corresponding OutputStreamWriter. As you see in the code below, we can do this by first creating a FileOutputStream and then the CountingOutputStream. I usually prefer to be able to set the Charset to be used for writing the file. Thus, we are wrapping the CountingOutputStream with OutputStreamWriter which allows us to set the Charset.

Additionally we store the reference to CountingOutputStream to be able to get the number of bytes written.

public class CoutingOutputStreamDemo {

    private CountingOutputStream countingOutputStream;
    private OutputStreamWriter outputStreamWriter;

    /**
     * Constructor
     * @param file the path to the file to write
     * @param charset the {@link Charset) to use for writing
     * @throws FileNotFoundException
     */
    public CoutingOutputStreamDemo(String file, Charset charset) throws FileNotFoundException {
        // We store a reference to the CountingOutputStream to be able to count the bytes later
        this.countingOutputStream = new CountingOutputStream(new FileOutputStream(file));
        this.outputStreamWriter = new OutputStreamWriter(this.countingOutputStream, charset);
    }

    // More code follows

}

Write a UTF-8 file

To create and write to a file, we have to add some helper methods.

  • writeLine() concatenates the content to be written with the lineSeparator and writes it to the OutputStreamWriter
  • getBytesWritten() flushes the OutputStreamWriter and returns the number of bytes written so far using CountingOutputStream.getByteCount()
  • close() simply flushes and closes the OutputStreamWriter
    /**
     * Write a line to the output stream
     * @param content the content to write
     * @throws IOException
     */
    public void writeLine(String content) throws IOException {
        outputStreamWriter.write(content + System.lineSeparator());
    }

    /**
     * Determine the number of bytes written so far
     * @return the number of bytes written so far
     * @throws IOException
     */
    public Long getBytesWritten() throws IOException {
        // We first have to flush the buffer to ensure all bytes were actually written
        this.outputStreamWriter.flush();
        return countingOutputStream.getByteCount();
    }
    
    /**
     * Closes the output stream and returns the number of bytes written
     * @return the number of bytes written
     * @throws IOException
     */
    public void close() throws IOException {
        this.outputStreamWriter.flush();
        this.outputStreamWriter.close();
    }

Now let's write a few lines to a file using UTF-8 Charset. I have written the following unit test for this purpose. The class is available here.

    @Test
    public void testWriterUtf8() throws IOException {
        CoutingOutputStreamDemo cos = new CoutingOutputStreamDemo(OUTPUT_FILE_UTF_8, CHARSET_UTF_8);
        cos.writeLine("A simple test");
        cos.writeLine("And another line");
        long bytestWritten = cos.getBytesWritten();
        cos.close();
        long actualFileSize = new File(OUTPUT_FILE_UTF_8).length();
        System.out.println(String.format("Actual file size UTF-8: %s bytes", actualFileSize));
        assertEquals(actualFileSize, bytestWritten);
    }

This is a straight-forward test, but let's quickly go through it:

  • We get an instance of CountingOutputStreamDemo that internally initializes CountingOutputStream and OutputStreamWriter
  • Write two lines of text
  • Get the number of bytes written from CountingOutputStream
  • Close the OutputStreamWriter
  • Then we get the actual file size using File.length()
  • Finally, we check if the actual file size matches what the CountingOutputStream has counted

When writing a UTF-8 file the number of bytes written is 31.

Write UTF-16 file

As a second test case we write a UTF-16 file using the same content.

    @Test
    public void testWriterUtf16() throws IOException {
        CoutingOutputStreamDemo cos = new CoutingOutputStreamDemo(OUTPUT_FILE_UTF_16, CHARSET_UTF_16);
        cos.writeLine("A simple test");
        cos.writeLine("And another line");
        long bytestWritten = cos.getBytesWritten();
        cos.close();
        long actualFileSize = new File(OUTPUT_FILE_UTF_16).length();
        System.out.println(String.format("Actual file size UTF-16: %s bytes", actualFileSize));
        assertEquals(actualFileSize, bytestWritten);
    }

In this case CountingOutputStream counts 64 bytes written. This value is slightly more than twice the size of our UTF-8 example. It matches our expectation as UTF-16 takes twice the number of bytes compared to UTF-8. Thus, we can assume that CountingOutputStream does in fact count the bytes correctly.

That's all for today, I hope you found this post useful.