Configuring the CSV data reader

Configure the comma-separated values (CSV) data reader in the business object configuration file to modify the way data is read from CSV source files. You might want to change the default settings of the CSV data reader to better work with the format of your existing source data.

About this task

The CSV data reader reads and processes data from an input CSV file one record at a time until the end of the file is reached. Each record in the CSV file must have the same data structure. The data read from the CSV file can be mapped to a WebSphere Commerce business object by using a business object configuration file. Using the configuration file, each column of data in the input CSV file is mapped directly to a property of a WebSphere Commerce business object.

A CSV file can contain multiple data records, with each record spanning multiple columns. Each column value for a record is also known as a token. The CSV file must include delimiter characters to separate tokens within each record and to separate records. The CSV data reader uses these delimiter characters to identify each record and token.
  • Tokens are separated by a tokenDelimiter character. By default, the tokenDelimiter character is a comma ( , ). Each token can be optionally enclosed by tokenValueDelimiter characters, with a tokenValueDelimiter at the beginning and the end of the token. The default tokenValueDelimiter is the double quotation mark character ( " ). If a token value contains a special character, such as the tokenDelimiter or the lineDelimiter character, the token must be enclosed by tokenValueDelimiter characters. As an example, the following string includes commas and is enclosed in the tokenValueDelimeter:
    "Men's fashions for business, casual, and formal occasions"  
  • Records are separated by a lineDelimiter character, which can also be called a record delimiter. By default, the lineDelimiter character is the new line character. This character indicates the end of a record for an object and the beginning of a new object record.

    Since the default character is a new line character, the CSV data reader reads each line in the file as a separate object record. If you include data for a column or record across multiple lines in your file, you can encounter errors or issues with the load process or with your data. If you want data for a column to span multiple lines, enclose the data within the configured tokenValueDelimiter characters. If you want data for an entire record to span multiple lines, you must configure a different lineDelimiter character to use instead of the new line character to identify the end of each record.

Procedure

  1. Open the wc-loader-<object>.xml configuration file in edit mode.
    A sample of this file is in the WC_installdir/samples/DataLoad/Catalog directory.
  2. Find the <_config:DataReader> element.
  3. Add the following optional parameters inside the <_config:DataReader> tag:
    lineDelimiter
    Specifies the line separator character or record separator character. The default value is the new line character. The lineDelimiter character cannot appear in the content of a token unless enclosed within the tokenValueDelimiter character.
    Note: If you want records in a CSV file to span multiple lines, you can configure a custom lineDelimiter character to identify the end of a record. By configuring a different delimiter character, CSV files can include newline characters within object records, instead of having the data reader handle each newline character as the end of a record. For instance, you can configure the lineDelimiter to be a semi-colon ( ; ) instead of the newline character. With this new lineDelimiter character configured, the following CSV file is considered to have a single object record instead of two records.
    
    Column1, Column2, Column3, Column4, Column5;
    Value1,Val
    ue2,Value3,Value4,Value5;
    The CSV data reader reads this object record as a single record with the value for Column2 spanning multiple lines.
    tokenDelimiter
    Specifies the token separator character. The default is the comma character (,).
    tokenValueDelimiter
    Specifies the string separator character. The tokenValueDelimiter is used to indicate the beginning and the end of a token. The default tokenValueDelimiter character is the double quotation mark ("). For instance, the following token, which contains commas, can be used for a catalog entry short description:
    "Men's fashions for business, casual, and formal occasions"
    Notes:
    • If you are editing your file with a plain text editor, use the tokenValueDelimiter when your token contains special characters, such as the tokenDelimiter character or the tokenValueDelimiter itself. To use the tokenValueDelimiter character within the token, you must use two tokenValueDelimiter characters. For instance, the following token, which contains commas and quotation marks, can be used for a catalog entry short description:
      "Men's fashions for ""business"", ""casual"", and ""formal"" occasions."
      The output can resemble the following string:
      Men's fashions for "business", "casual", and "formal" occasions.
      These usages of the tokenValueDelimeter apply only when you are using a plain text editor to edit your file.
    • If you want to include column values that span multiple lines within your input file, enclose the column value within tokenValueDelimiter characters. By enclosing the value within these characters, you can include the newline character in the column value without causing the data reader to handle the newline character as the end of the object record.
    charset
    Specifies the character set of the CSV file. The default character set is UTF-8.
    firstLineIsHeader
    Indicates that the first line in the CSV file is column header information. Use this header line for providing the column mappings in the <_config: Data> element in the wc-loader-<object>.xml configuration file. The default value is false.
    useHeaderAsColumnName
    Indicates that the first line in the CSV file is used as column information. The default value for useHeaderAsColumnName is false. There are four possible combinations of the firstLineIsHeader and useHeaderAsColumnName parameters:
    1. firstLineIsHeader = "false" and useHeaderAsColumnName = "false". In this case, the column mappings in the wc-loader-<object>.xml configuration file is mandatory.
    2. firstLineIsHeader = "false" and useHeaderAsColumnName = "true". In this case, the useHeaderAsColumnName flag is ignored and the column mapping is mandatory.
    3. firstLineIsHeader = "true" and useHeaderAsColumnName = "false". In this case, the column mapping configuration is optional. If the column mapping configuration is defined in the wc-loader-<object>.xml configuration file, use the column mapping configuration. If not, use the CSV header for the column names.
    4. firstLineIsHeader = "true" and useHeaderAsColumnName = "true". In this case, the column mapping configuration is ignored and always use the CSV header for the column names.
    Note: The DataReader element can contain nested elements. To add column mappings, you can use the following code as an example:
    <_config:DataReader firstLineIsHeader="false" useHeaderAsColumnName="false">
        <_config:Data>
            <_config:column number="1" name="FIRST" />
            <_config:column number="2" name="SECOND" />
        </_config:Data>
    </_config:DataReader>
  4. Save and close the file.

Example

The following code snippet demonstrates how to use the parameters. This code snippet uses all default values:
<_config:DataReader lineDelimiter="\n" tokenDelimiter="," tokenValueDelimiter='"' 
charset="UTF-8" firstLineIsHeader="false" useHeaderAsColumnName="false" />