书名：Node Cookbook（Third Edition）
作者名：David Mark Clements Matthias Buus Matteo Collina Peter Elger
本章字数：241字
更新时间：2024-10-29 20:27:03

Processing Big Data

Let's dive right into it by looking at a classic Node problem: counting all Node modules available on npm. The npm registry exposes an HTTP endpoint where we can get the entire contents of the npm registry content as JSON.

Using the command line tool, curl, which is included (or at least installable) on most operating systems, we can try it out.

$ curl https://skimdb.npmjs.com/registry/_changes?include_docs=true

This will print a new line delimited JSON stream of all modules.

The JSON stream returned by the registry contains a JSON object for each module stored on npm followed by a new line character.

A simple Node program that counts all modules could look like this:

var request = require('request') 
var npmDb = 'https://skimdb.npmjs.com'
var registryUrl = `${npmDb}/registry/_changes?include_docs=true`
request(registryUrl, function (err, data) { 
  if (err) throw err 
  var numberOfLines = data.split('\n').length + 1 
  console.log('Total modules on npm: ' + numberOfLines) 
})

If we try and run the preceding program, we'll notice a couple of things.

First of all, this program takes quite a long time to run. Second, depending on the machine we are using, there is a very good chance the program will crash with an out of memory error.

Why is this happening?

The npm registry stores a very large amount of JSON data, and it takes quite a bit of memory to buffer it all.

In this recipe, we'll investigate how we can use streams to improve our program.

本周热推：

大型网站性能优化实战：从前端、网络、CDN到后端、大促的全链路性能优化详解 FFmpeg入门详解：音视频原理及应用 HuggingFace自然语言处理详解：基于BERT中文模型的任务实战基于MATLAB的控制系统仿真及应用 Access 数据库应用教程