Configuring the CSV data reader
Configure the comma-separated values (CSV) data reader in the business object configuration file to modify the way data is read from CSV source files. You might want to change the default settings of the CSV data reader to better work with the format of your existing source data.
About this task
The CSV data reader reads and processes data from an input CSV file one record at a time until the end of the file is reached. Each record in the CSV file must have the same data structure. The data read from the CSV file can be mapped to a WebSphere Commerce business object by using a business object configuration file. Using the configuration file, each column of data in the input CSV file is mapped directly to a property of a WebSphere Commerce business object.
- Tokens are separated by a tokenDelimiter character. By default, the tokenDelimiter character is
a comma (
,
). Each token can be optionally enclosed by tokenValueDelimiter characters, with a tokenValueDelimiter at the beginning and the end of the token. The default tokenValueDelimiter is the double quotation mark character ("
). If a token value contains a special character, such as the tokenDelimiter or the lineDelimiter character, the token must be enclosed by tokenValueDelimiter characters. As an example, the following string includes commas and is enclosed in the tokenValueDelimeter:"Men's fashions for business, casual, and formal occasions"
- Records are separated by a lineDelimiter character, which can also be called a record delimiter.
By default, the lineDelimiter character is the new line character. This character indicates the end
of a record for an object and the beginning of a new object record.
Since the default character is a new line character, the CSV data reader reads each line in the file as a separate object record. If you include data for a column or record across multiple lines in your file, you can encounter errors or issues with the load process or with your data. If you want data for a column to span multiple lines, enclose the data within the configured tokenValueDelimiter characters. If you want data for an entire record to span multiple lines, you must configure a different lineDelimiter character to use instead of the new line character to identify the end of each record.
Procedure
-
Open the wc-loader-<object>.xml configuration file
in edit mode.
A sample of this file is in the WC_installdir/samples/DataLoad/Catalog directory.
-
Find the
<_config:DataReader>
element. -
Add the following optional parameters inside the
<_config:DataReader>
tag:- lineDelimiter
- Specifies the line separator character or record separator character. The default value is the
new line character. The lineDelimiter character cannot appear in the content of a token unless
enclosed within the tokenValueDelimiter character.Note: If you want records in a CSV file to span multiple lines, you can configure a custom lineDelimiter character to identify the end of a record. By configuring a different delimiter character, CSV files can include newline characters within object records, instead of having the data reader handle each newline character as the end of a record. For instance, you can configure the lineDelimiter to be a semi-colon (
;
) instead of the newline character. With this new lineDelimiter character configured, the following CSV file is considered to have a single object record instead of two records.
The CSV data reader reads this object record as a single record with the value forColumn1, Column2, Column3, Column4, Column5; Value1,Val ue2,Value3,Value4,Value5;
Column2
spanning multiple lines. - tokenDelimiter
- Specifies the token separator character. The default is the comma character (,).
- tokenValueDelimiter
- Specifies the string separator character. The tokenValueDelimiter is used to indicate the
beginning and the end of a token. The default tokenValueDelimiter character is the double quotation
mark ("). For instance, the following token, which contains commas, can be used for a catalog entry
short
description:
"Men's fashions for business, casual, and formal occasions"
Notes:- If you are editing your file with a plain text editor, use the tokenValueDelimiter when your
token contains special characters, such as the tokenDelimiter character or the tokenValueDelimiter
itself. To use the tokenValueDelimiter character within the token, you must use two
tokenValueDelimiter characters. For instance, the following token, which contains commas and
quotation marks, can be used for a catalog entry short
description:
The output can resemble the following string:"Men's fashions for ""business"", ""casual"", and ""formal"" occasions."
These usages of theMen's fashions for "business", "casual", and "formal" occasions.
tokenValueDelimeter
apply only when you are using a plain text editor to edit your file. - If you want to include column values that span multiple lines within your input file, enclose the column value within tokenValueDelimiter characters. By enclosing the value within these characters, you can include the newline character in the column value without causing the data reader to handle the newline character as the end of the object record.
- If you are editing your file with a plain text editor, use the tokenValueDelimiter when your
token contains special characters, such as the tokenDelimiter character or the tokenValueDelimiter
itself. To use the tokenValueDelimiter character within the token, you must use two
tokenValueDelimiter characters. For instance, the following token, which contains commas and
quotation marks, can be used for a catalog entry short
description:
- charset
- Specifies the character set of the CSV file. The default character set is UTF-8.
- firstLineIsHeader
- Indicates that the first line in the CSV file is column header information. Use this header line
for providing the column mappings in the
<_config: Data>
element in the wc-loader-<object>.xml configuration file. The default value is false. - useHeaderAsColumnName
- Indicates that the first line in the CSV file is used as column information. The default value
for useHeaderAsColumnName is false. There are four possible combinations of the firstLineIsHeader
and useHeaderAsColumnName parameters:
- firstLineIsHeader = "false" and useHeaderAsColumnName = "false". In this case, the column mappings in the wc-loader-<object>.xml configuration file is mandatory.
- firstLineIsHeader = "false" and useHeaderAsColumnName = "true". In this case, the useHeaderAsColumnName flag is ignored and the column mapping is mandatory.
- firstLineIsHeader = "true" and useHeaderAsColumnName = "false". In this case, the column mapping configuration is optional. If the column mapping configuration is defined in the wc-loader-<object>.xml configuration file, use the column mapping configuration. If not, use the CSV header for the column names.
- firstLineIsHeader = "true" and useHeaderAsColumnName = "true". In this case, the column mapping configuration is ignored and always use the CSV header for the column names.
Note: TheDataReader
element can contain nested elements. To add column mappings, you can use the following code as an example:<_config:DataReader firstLineIsHeader="false" useHeaderAsColumnName="false"> <_config:Data> <_config:column number="1" name="FIRST" /> <_config:column number="2" name="SECOND" /> </_config:Data> </_config:DataReader>
- Save and close the file.
Example
<_config:DataReader lineDelimiter="\n" tokenDelimiter="," tokenValueDelimiter='"'
charset="UTF-8" firstLineIsHeader="false" useHeaderAsColumnName="false" />