CSVParse for Node.js

IssuesGitHub

File system interaction

Table of Contents

This recipe illustrates how to read and write to an UTF-8 file with a byte order mark (BOM).

The native Node.js File System module named fs is used to read the content of a file. The parser doesn't provide any file access method, it is not its responsibility, and using the native fs module conjointly with the csv-parse is easy and natural.

You must first choose the right API. This package exposed multiple API all backed by the same parsing algorithm and supporting the same options. Whether you select one API over another one encompasses the scope of this page and is documented inside the API section.

The easiest way is using the sync API. You read the file and get its content. You then inject this content into the parser and get the result as an array of records. Records may be printed to the console and written to a file one JSON per line for each record. The final code looks like:

import assert from 'assert';
import {promises as fs} from 'fs'; // 'fs/promises' not available in node 12
import os from 'os';
import { parse } from '../lib/sync.js';

// Prepare the dataset
await fs.writeFile(`${os.tmpdir()}/input.csv`, [
  '\ufeff', // BOM
  'a,1\n', // First record
  'b,2\n' // Second record
].join(''), 'utf8');
// Read the content
const content = await fs.readFile(`${os.tmpdir()}/input.csv`);
// Parse the CSV content
const records = parse(content);
// Validate the records
assert.deepStrictEqual(records, [
  [ 'a', '1' ],
  [ 'b', '2' ]
]);

Alternatively, you could use the Stream API by piping a file readable stream to the parser transform stream which is itself piped into a writable stream.

Alternative encoding

The parser shall comply without interfering with the file encoding. You can specify the file encoding when calling fs.readFile by passing the encoding property as a second argument. If the second argument is a string, then it specifies the encoding of the source file.

An alternative is to initialize the parser with the encoding option and writing bytes to it.

import assert from 'assert';
import {promises as fs} from 'fs';
import os from 'os';
import { parse } from '../lib/sync.js';

// Prepare the dataset
await fs.writeFile(`${os.tmpdir()}/input.csv`, Buffer.from([
  '\ufeff',
  `a€b€c`,
  '\n',
  `d€e€f`
].join(''), 'utf16le'));
// Read the content
const content = await fs.readFile(`${os.tmpdir()}/input.csv`);
// Parse the CSV content
const records = parse(content, {
  delimiter: '€',
  encoding: 'utf16le'
});
// Validate the records
assert.deepStrictEqual(records, [
  [ 'a', 'b', 'c' ],
  [ 'd', 'e', 'f' ]
]);

At the time of this writing, the list of Node.js supported encodings includes 'utf8', 'ucs2', 'utf16le', 'latin1', 'ascii', 'base64', 'hex'.