Count bytes while writing to a file using CountingOutputStream from commons io package
Sometimes it is useful to know how many bytes have already been written to an output stream in Java. In this post we will use CountingOutputStream to achieve this goal easily.
Motivation
Sometimes it is required to identify how many bytes have already been written to a file while processing data. A typical example is when working on data transfer projects where the receiver of the data has certain limits regarding the maximum file size that can be processed. It might be necessary to create a new output file when we reach the maximum file size. In these cases we can utilize CountingOutputStream
from the apache.commons.io
package.
It is usually not sufficient to just manually count the number of characters of a String
we write to an output stream. There are some considerations to be taken into account, e.g.:
- New line characters, e.g.
\n
or\r\n
- The
Charset
we use for writing, e.g.UTF-8
orUTF-16
The CountingOutputStream
class provides a ready to go solution to handle all these situations.
You can find the complete code for this project at github.
Implementing the example
We will go through two examples, writing files using different each time a different Charset
.
Initialzing CountingOutputStream
We first have to initialize the CountingOutputStream
and the corresponding OutputStreamWriter
. As you see in the code below, we can do this by first creating a FileOutputStream
and then the CountingOutputStream
. I usually prefer to be able to set the Charset
to be used for writing the file. Thus, we are wrapping the CountingOutputStream
with OutputStreamWriter
which allows us to set the Charset
.
Additionally we store the reference to CountingOutputStream
to be able to get the number of bytes written.
public class CoutingOutputStreamDemo {
private CountingOutputStream countingOutputStream;
private OutputStreamWriter outputStreamWriter;
/**
* Constructor
* @param file the path to the file to write
* @param charset the {@link Charset) to use for writing
* @throws FileNotFoundException
*/
public CoutingOutputStreamDemo(String file, Charset charset) throws FileNotFoundException {
// We store a reference to the CountingOutputStream to be able to count the bytes later
this.countingOutputStream = new CountingOutputStream(new FileOutputStream(file));
this.outputStreamWriter = new OutputStreamWriter(this.countingOutputStream, charset);
}
// More code follows
}
Write a UTF-8 file
To create and write to a file, we have to add some helper methods.
writeLine()
concatenates the content to be written with thelineSeparator
and writes it to theOutputStreamWriter
getBytesWritten()
flushes theOutputStreamWriter
and returns the number of bytes written so far usingCountingOutputStream.getByteCount()
close()
simply flushes and closes theOutputStreamWriter
/**
* Write a line to the output stream
* @param content the content to write
* @throws IOException
*/
public void writeLine(String content) throws IOException {
outputStreamWriter.write(content + System.lineSeparator());
}
/**
* Determine the number of bytes written so far
* @return the number of bytes written so far
* @throws IOException
*/
public Long getBytesWritten() throws IOException {
// We first have to flush the buffer to ensure all bytes were actually written
this.outputStreamWriter.flush();
return countingOutputStream.getByteCount();
}
/**
* Closes the output stream and returns the number of bytes written
* @return the number of bytes written
* @throws IOException
*/
public void close() throws IOException {
this.outputStreamWriter.flush();
this.outputStreamWriter.close();
}
Now let's write a few lines to a file using UTF-8
Charset
. I have written the following unit test for this purpose. The class is available here.
@Test
public void testWriterUtf8() throws IOException {
CoutingOutputStreamDemo cos = new CoutingOutputStreamDemo(OUTPUT_FILE_UTF_8, CHARSET_UTF_8);
cos.writeLine("A simple test");
cos.writeLine("And another line");
long bytestWritten = cos.getBytesWritten();
cos.close();
long actualFileSize = new File(OUTPUT_FILE_UTF_8).length();
System.out.println(String.format("Actual file size UTF-8: %s bytes", actualFileSize));
assertEquals(actualFileSize, bytestWritten);
}
This is a straight-forward test, but let's quickly go through it:
- We get an instance of
CountingOutputStreamDemo
that internally initializesCountingOutputStream
andOutputStreamWriter
- Write two lines of text
- Get the number of bytes written from
CountingOutputStream
- Close the
OutputStreamWriter
- Then we get the actual file size using
File.length()
- Finally, we check if the actual file size matches what the
CountingOutputStream
has counted
When writing a UTF-8
file the number of bytes written is 31.
Write UTF-16 file
As a second test case we write a UTF-16
file using the same content.
@Test
public void testWriterUtf16() throws IOException {
CoutingOutputStreamDemo cos = new CoutingOutputStreamDemo(OUTPUT_FILE_UTF_16, CHARSET_UTF_16);
cos.writeLine("A simple test");
cos.writeLine("And another line");
long bytestWritten = cos.getBytesWritten();
cos.close();
long actualFileSize = new File(OUTPUT_FILE_UTF_16).length();
System.out.println(String.format("Actual file size UTF-16: %s bytes", actualFileSize));
assertEquals(actualFileSize, bytestWritten);
}
In this case CountingOutputStream
counts 64 bytes written. This value is slightly more than twice the size of our UTF-8
example. It matches our expectation as UTF-16
takes twice the number of bytes compared to UTF-8
. Thus, we can assume that CountingOutputStream
does in fact count the bytes correctly.
That's all for today, I hope you found this post useful.
Tags
AOP Apache Kafka Bootstrap Go Java Linux MongoDB Nginx Security Spring Spring Boot Spring Security SSL ThymeleafSearch
Archive
- 1 December 2023
- 1 November 2023
- 1 May 2019
- 2 April 2019
- 1 May 2018
- 1 April 2018
- 1 March 2018
- 2 February 2018
- 1 January 2018
- 5 December 2017
- 7 November 2017
- 2 October 2017