CSVParse for Node.js

IssuesGitHub

Option encoding

The encoding option declare the input and output encodings.

The default encoding value is utf8. The default 'utf8' encoding is also used when the value is true. The values null and false disable string serialization and returns buffers instead of strings.

Default behavior

The default encoding in Node.js is UTF-8. When using UTF-8, you do not need to specify anything.

When an alternative encoding is used, it can be discovered with the BOM (byte order mark) present at the begining of the input data or it can be defined with this option.

Working with options

When providing options, the values must internally reflect the data source encoding. If the value is a string, the parser will convert the value into a buffer representation using the selected encoding input value.

However, if the value is a buffer, you must make sure the buffer was created with the right encoding, here is an exemple encoding an option as buffer, the delimiter option in this case:

const parse = require('../lib/sync')
const assert = require('assert')

const data = Buffer.from(`a:b\n1:2`, 'utf16le')
const records = parse(data, {
  encoding: 'utf16le',
  delimiter: Buffer.from(':', 'utf16le')
})
assert.deepEqual(records, [
  ['a', 'b'],
  ['1', '2']
])

Bom automatic detection

The BOM is a special Unicode character sequence at the begining of a text stream to indicate the encoding. The list of available supported encoding in Node.js is available inside its source code. At the time of this writing, it includes 'utf8', 'ucs2', 'utf16le', 'latin1', 'ascii', 'base64', 'hex'.

Because the BOM is specific to unicode, only the UTF-8 and UTF-16LE encoding are natively detected by the parser. Here is an example detecting the encoding, UTF-16LE in this case:

const parse = require('csv-parse/lib/sync')
const assert = require('assert')

const data = Buffer.from(`\uFEFFa,b,c\n1,2,3`, 'utf16le')
const records = parse(data, {
  bom: true
})
assert.deepEqual(records, [
  [ 'a', 'b', 'c' ],
  [ '1', '2', '3' ]
])

Notice how the BOM is declared as \uFEFF. You can see how it is converted to the hexadecimal representation of FF EE with the command node -e 'console.info(Buffer.from("\ufeff", "utf16le"))'. You can refer to the Wikipedia byte order mark by encoding table for further investigations.

Buffer output

A value of null or false disables output encoding and returns the raw buffer.

const parse = require('csv-parse/lib/sync')
const assert = require('assert')

const data = Buffer.from(`a,b\n1,2`)
const records = parse(data, {
  encoding: null
})
assert.deepEqual(records, [
  [ Buffer.from('a'), Buffer.from('b') ],
  [ Buffer.from('1'), Buffer.from('2') ]
])