Output Stream to Input Stream

24 April 2015, Rhodri Pugh

I was working on a JVM service we have which does some processing on various document types (PDF, PNG, etc…), returning the transformed result to the client. The way the PDF portion of this works is that the library we use writes its output to an OutputStream, and currently we’re using a ByeArrayOutputStream. I then needed to find an effecient way of getting the data in that object back to the client.

Possible Solutions

The service uses Ring as the HTTP library, and allows returning an input stream to read from as the body of the response. Googling how to convert an OutputStream to an InputStream led to the suggestion of just moving the byte array over, something like…

1
(ByteArrayInputStream. (.toByteArray output))

This is nice and simple, but because of mutability concerns requires making a copy of the data contained in the output stream first. When you’re dealing with many megabytes of document data that’s obviously not going to be ideal. Here’s the source of toByteArray from JSE7.

1
2
3
4
5
6
7
8
9
10
11
/**
* Creates a newly allocated byte array. Its size is the current
* size of this output stream and the valid contents of the buffer
* have been copied into it.
*
* @return the current contents of this output stream, as a byte array.
* @see java.io.ByteArrayOutputStream#size()
*/

public synchronized byte toByteArray()[] {
return Arrays.copyOf(buf, count);
}

First Solution

This led me to adding a custom implementation of ByteArrayOutputStream that would avoid this copying and create a new stream using the extact same data. Here it is…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package com.owsy.myapp;

import java.io.ByteArrayInputStream;

public class ByteArrayOutputStream extends java.io.ByteArrayOutputStream
{
public ByteArrayOutputStream()
{
super();
}

public ByteArrayInputStream asInputStream()
{
return new ByteArrayInputStream(
this.buf,
0,
this.size()
);
}
}

Which could be used as follows…

1
2
3
(let [output-stream (ByteArrayOutputStream.)]
; do writing to output stream, then...
(.asInputStream output-stream))

This worked well, and avoided the costly duplication of the data when converting from one stream type to another. But it still involved realising all of the data in memory at least once, I thought we could do better…

Piping ‘Dem Streams

Looking around some more I found people recommending using the PipedInputStream and PipedOutputStream classes to allow writing the output as it’s created. This involves a little more coordination in creating another thread to do the work on, but luckily the ring.util.io namespace already has a helper function for doing this very thing!

Here’s an example of the implementation.

1
2
3
4
5
6
7
8
9
10
(ns my.app
(:require [ring.util.io :refer [piped-input-stream]]))

(defn pdf-response [request]
{:status 200
:headers {"Content-Type" "application/pdf"}
:body (piped-input-stream
(fn [output-stream]
; write data to output stream
))})

This then allows the output stream to be written to lazily and the data goes direct to the client, without needing to be completely realised up front.

Obviously depending on the work we need to do on the document, we could end up having to hold some stuff in memory, but the facility is at least there for flushing as often as possible.