CSVParse for Node.js

IssuesGitHub

Option cast

The cast option works at the field-level to alter its value. It is possible to transform the field's value or change its type.

The cast value is expected to be a function that receives context-rich information. The function has full control over a field.

Usage

The cast user function is called with 2 arguments: the field value and the context object. The user function may return the value as-is or any other value including null and undefined.

The test/option.cast.coffee test provides insights on how to use it and its supported functionalities. It returns the first column untouched, converts the second column to an integer and return a new string from the third column's value.

import assert from "node:assert";
import dedent from "dedent";
import { parse } from "csv-parse/sync";

const data = dedent`
  1,2,3
  4,5,6
`;
const records = parse(data, {
  // The cast option exect a function which
  // is called with two arguments,
  // the parsed value and a context object
  cast: function (value, context) {
    // Index indicates the column position
    if (context.index === 0) {
      // Return the value untouched
      return value;
    } else if (context.index === 1) {
      // Convert the value to a string
      return parseInt(value);
    } else {
      // Return a different value
      return `Value is ${value}`;
    }
  },
  trim: true,
});
assert.deepStrictEqual(records, [
  ["1", 2, "Value is 3"],
  ["4", 5, "Value is 6"],
]);

Context

The context object exposes the following properties:

  • column (number|string)
    The column name if the columns options is defined, or the field position.
  • empty_lines (number)
    Internal counter of empty lines encountered until this field.
  • header (boolean)
    A boolean indicating if the provided value is a part of the header.
  • index (number)
    The field position within the current record starting at 0.
  • invalid_field_length (number)
    Number of records with a non uniform length when relax_column_count is true. It was named skipped_lines until version 3.
  • lines (number)
    The number of lines which have been processed including the current line.
  • quoting (boolean)
    A boolean indicating if the field was surrounded by quotes.
  • records (number)
    The number of records which have been fully parsed. It was named count until version 3.

The context example uses the context to transform the first field into a date and replace the second field with the injected context:

import assert from "node:assert";
import dedent from "dedent";
import { parse } from "csv-parse/sync";

const data = dedent`
  2000-01-01,date1
  2050-11-27,date2
`;
const records = parse(data, {
  // The cast option exect a function which
  // is called with two arguments,
  // the parsed value and a context object
  cast: function (value, context) {
    // You can return any value
    if (context.index === 0) {
      // Such as a string
      return `${value}T05:00:00.000Z`;
    } else {
      // Or the `context` object literal
      return context;
    }
  },
  trim: true,
});
assert.deepStrictEqual(records, [
  [
    "2000-01-01T05:00:00.000Z",
    {
      bytes: 16,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 1,
      records: 0,
      columns: false,
      error: undefined,
      header: false,
      index: 1,
      column: 1,
      quoting: false,
      raw: undefined,
    },
  ],
  [
    "2050-11-27T05:00:00.000Z",
    {
      bytes: 33,
      comment_lines: 0,
      empty_lines: 0,
      invalid_field_length: 0,
      lines: 2,
      records: 1,
      columns: false,
      error: undefined,
      header: false,
      index: 1,
      column: 1,
      quoting: false,
      raw: undefined,
    },
  ],
]);

Using the cast and columns functions conjointly

The cast function is called for each and every field, whether it is considered a header or not. The columns function is called once the first record is created (if treated as a header). For this reason, cast is executed before columns.

To distinguish a header field from a data field in the cast function, use the context.header property from the second argument to the cast function:

import assert from "node:assert";
import dedent from "dedent";
import { parse } from "csv-parse/sync";

assert.deepEqual(
  parse(
    dedent`
      a,b,c
      1,2,3
      4,5,6
    `,
    {
      cast: (value, context) => {
        if (context.header) return value;
        if (context.column === "B") return Number(value);
        return String(value);
      },
      columns: (header) => {
        return header.map((label) => label.toUpperCase());
      },
      trim: true,
    },
  ),
  [
    { A: "1", B: 2, C: "3" },
    { A: "4", B: 5, C: "6" },
  ],
);

Note, the above example can be rewritten to implement the columns transformation directly inside cast, by setting columns: true and by replacing if(context.header) return value; by if(context.header) return value.toUpperCase();:

import assert from "node:assert";
import dedent from "dedent";
import { parse } from "csv-parse/sync";

assert.deepEqual(
  parse(
    dedent`
      a,b,c
      1,2,3
      4,5,6
    `,
    {
      cast: (value, context) => {
        if (context.header) return value.toUpperCase();
        if (context.column === "B") return Number(value);
        return String(value);
      },
      columns: true,
      trim: true,
    },
  ),
  [
    { A: "1", B: 2, C: "3" },
    { A: "4", B: 5, C: "6" },
  ],
);

About

The Node.js CSV project is an open source product hosted on GitHub and developed by Adaltas.