Incremental Processing

How can we mitigate the intense byte-stripping operation from blocking other important concurrent logic?

Node's core abstraction for incremental asynchronous processing has already appeared in the previous recipe, but its merits deserve repetition.

Streams to the rescue!

We're going to convert our recipe once more, this time to using a streaming abstraction (we can find out a lot more about streams in Chapter 4, Using Streams).

First we'll need the third-party strip-bytes-stream package:

$ npm init -y # create package.json if we don't have it 
$ npm install --save strip-bytes-stream

Now let's alter our code like so:

setInterval(() => process.stdout.write('.'), 10).unref() 

const fs = require('fs')
const path = require('path')
const cwd = process.cwd()

const sbs = require('strip-bytes-stream')

fs.createReadStream(path.join(cwd, 'file.dat'))
.pipe(sbs((n) => n))
.on('end', function () { log(this.total) })
.pipe(fs.createWriteStream(path.join(cwd, 'clean.dat')))

function log(total) {
fs.appendFile(
path.join(cwd, 'log.txt'),
(new Date) + ' ' + total + ' bytes removed\n'
)
}

This time we should see around 15 dots, which over roughly 200 ms of execution time is much fairer.

This is because the file is read into the process in chunks. Each chunk is stripped of null bytes and written to the file, the old chunk and stripped results are discarded, whilst the next chunk enters the process memory. This all happens over multiple ticks of the event loop (the JavaScript thread is a constant loop, each iteration of that loop is a tick), allowing room for the processing of the interval timer queue.

We'll be delving much deeper into streams in Chapter 4, Using Streams. For the time being, we can see that fs.createReadStream and fs.createWriteStream are more often than not, the most suitable way to read and write to files.