The data load order configuration file controls the load order of the Data Load utility.
The data load file has a pointer to the environment settings file, business object configuration
file, and input file. You can also define the mode that the Data Load utility uses to load
data.
About this task
Sample load order configuration files are provided for loading different component objects.
These files are included within the component-specific directories in the following directory:
- utilities_root/samples/DataLoad
- WCDE_installdir/samples/DataLoad
Procedure
-
Create a copy of a sample wc-dataload-object.xml file
and rename the file.
For example, copy the wc-dataload-catalog-entry.xml file within the
Catalog component directory.
-
Open the new data load order configuration file for editing.
-
Specify the data load environment configuration file.
-
Find the
<_config:DataLoadEnvironment>
element.
-
Change the value for the
configFile
attribute of this element to be the
relative file path to the data load environment configuration file.
Note: This value is overridden if an environment configuration file is specified in the command line
when running the Data Load utility from the Utility server Docker container.
Although the environment configuration file that is specified in the command line takes
precedence, the element must still exist in the data load environment configuration file.
-
Configure the load order for the data load process.
-
Find the
<_config:LoadOrder>
element.
- Optional:
Specify the value for the
commitCount
attribute. The commit count identifies
the number of rows that are flushed in a single transaction before a commit is issued.
The following values can be set for this attribute.
- 0
- Nothing is committed until the load item finishes processing all of its input data. If an
exception occurs, the records that are successfully processed can be committed. If the exception is
an SQL exception, the utility rolls back all of the data that is processed after the last committed
record. If a different type of exception occurs, the utility rolls back only the data that is not
successfully processed. The load operation commits any successfully processed records.
- N
- Where N is a positive integer value. The commit count specifies how many
lines of records that are processed when it calls the database commit. The default value is 1.
Note: If errors occur during the data load, an SQL exception causes the database to roll back to the
last line committed. The data after the last line committed up until the error line does not commit
to the database. Any other error does not cause a rollback. For other errors, only the error line
data does not commit to the database. All other data is committed. For example, if the commitCount
is N = 10 and an error occurs in line 25:
- If the error is an SQL exception, the data between line 1 - 20 is committed to the database. The
data between line 21 - 25 is not added in the database.
- If the error is not an SQL exception, the data between line 1 - 24 is committed to the database.
The data in line 25 is not added in the database.
You can include a configuration property "rollbackForAnyException"
to
control what data the utility rolls back when an exception occurs. When the property is set to true,
the utility rolls back all data that is processed after the last committed record whenever an
exception occurs. This roll back occurs regardless of the exception type. You can set this property
within the configuration for each load item or within the parent
<_config:LoadOrder>
element to have the property setting apply to all load
items.
- Optional:
Specify the value for the
batchSize
attribute. Database transactions are
loaded in batches. These batches are kept in the Java memory until there are enough stored for a
flush. Then, the batch contents are stored to the database as a single packet of data. The batches
are stored in the database until the next commit occurs and the database loads the changes.
The following values can be set for this attribute.
- 0
- Uses JDBC batch update. All batches, for the entire load item, are processed for input
data.
- N
- Where N is a positive integer value, indicating how many lines of data are
processed before the JDBC batch is executed. The JDBC batch is enabled if, and only if, the batch
size is greater than 1. The batchSize value should be less than or equal to the commitCount value.
The default batch size is 1, which means that the JDBC batch is not enabled and the SQL statements
are executed one by one directly.
- Optional:
Specify the value for the
dataLoadMode
attribute. The data load mode can be
set to either Insert
, Update
, or Delete
.
- Insert
- All data is inserted into the database. The utility generates insert SQL statements. This mode
is recommended for initial loading of data. If there are any delete flags in your data file, the
flags are ignored.
In insert mode, you can specify a primary keyrange to use when the object does
not exist in the database and it requires a new generated key. Specify the value within the
<_config:BusinessObjectMediator>
element. For
example:
startKey="100001" endKey="200001"
The data
writers that are supported in the insert mode:
- JDBC data writer
- Native load data writer
Note: Running the Data Load utility in insert mode can improve the performance of loading
initial loads of large amounts of data. However, when you run the Data Load utility in insert mode
the utility does not check your database before the utility loads your input data. The utility does
not determine whether the data objects that you are creating exist in your database before the
utility attempts to load and create the data objects. This behavior can cause the load process to
fail when the utility attempts to create data objects that exist in your database.
For example,
if your input file is a CSV file that contains a line that creates a product and a product
description. If the file also contains a second line that loads the same product with a description
in a different language, the load process fails. Since the utility creates the product and
description in the first line, when the utility encounters the second line, the utility attempts to
create the product again. Since the product exists, the load fails. To load this product and
description data, you can use one of the following methods:
- Run the Data Load utility in replace mode.
- Run the Data Load utility in insert mode, but load your product and description information
separately. Use one input file to load your product information, and a different input file to load
all of your descriptions.
For more information about configuring the Data Load utility to run in insert
mode, see Scenario: Initial load.
- Replace
- Default: All data is replaced in the database. The utility generates insert, update, or delete
SQL statements based on the data. Replace mode replaces existing data that is contained in the
database with the input data. That is, if some column information is not in the input data, the
column value is updated to null or the default value if any. For example:
- If one record (line) in your data file represents a new object, it is inserted.
- If the object is in the database already, it is replaced.
- If there is a flag in the data to indicate that this object is to be deleted, the object is
deleted.
In replace mode, do not specify a primary keyrange value as it can result in key conflicts
within the database.
The data writers that are supported in the replace mode:
- JDBC data writer
- Native load data writer
Note: You can prevent accidentally replacing information in the
database with null data. Modify the original input data that is used in the initial load when you
are replacing a subset of the original data. Do not enter empty fields in your source file, unless
you want the fields to contain null data in the database.
For more information about
configuring the Data Load utility to run in replace mode, see Scenario: Delta load.
- Update
- Specified catalog entry and catalog entry description data is updated in the database. The
utility compares the catalog entry data in the input file with the corresponding data for the
catalog entries in the database. The utility then replaces or adds the new or changed data for only
the columns that are specified in the input file. All other columns remain unchanged.
The update
mode supports only the JDBC data writer.
Note: The Data Load utility can run in update mode for
loading only catalog entry or catalog entry description information.
For more information
about configuring the Data Load utility to run in update mode, see Scenario: Catalog entry update load.
- Delete
- All data that is identified in the input file that is in the database is deleted from the
database. The utility ignores the value in the delete column or element in an input file in this
mode. The utility generates the delete SQL statements that are needed to delete the objects in the
input file. When you run the utility in this mode, only the JDBC data writer is supported in this
mode.
Note: If your site uses
HCL Commerce search, the delta search index might not rebuild
correctly when you delete some catalog objects with the Data Load utility in delete mode. When you
delete a child object of a catalog entry or category with the utility in delete mode, both the child
and parent objects are removed from the delta search index rebuild. This removal can cause the
parent catalog entry or category to no longer be indexed or display correctly in the storefront.
Use the utility in replace mode to delete catalog objects when your site uses HCL Commerce search. To delete objects with the utility in replace mode, include the value
1
for the Delete
column of an object in your input file. If you do
decide to delete catalog objects with the utility in delete mode, run a full index rebuild after the
load operation completes.
- Optional:
Specify the value for the
maxError
attribute. The maximum error tolerance can
be set to ensure that the Data Load utility runs, regardless of any errors that occur.
Note: If you are using the XML data reader class to load XML input files, do not set the
maxError
attribute. If you do set this attribute, you must set the value for the
attribute to be 1
. If you set a different value, you can encounter unexpected
behavior.
If you are using a CSV data reader class, the following values can be set for this attribute.
- 0
- Continue to run the Data Load utility, regardless of how many errors occur.
- N
- Where N is a positive integer value. The max error count specifies the error
tolerance level during the data load process for a load item. The default value is 1.
-
Specify the load item configuration elements for each load item that is required. Specify the
load item information within the
<_config:LoadItem>
element. Within this
element, you must include the name
attribute to identify the name of the load item.
You must also include the businessObjectConfigurationFile
attribute to identify the
business object configuration file for the load item.
For example,
<_config:LoadItem name="CatalogGroup" businessObjectConfigFile=
"wc-loader-catalog-group.xml" >
-
Within the
<_config:LoadItem>
element, specify the data source location for
the input file or files that contains the load item information. Specify this information with the
location
attribute in the <_config:DataSourceLocation>
element.
For example, for a specific
file:
<_config:DataSourceLocation location="CatalogGroups\CatalogGroups.xml"/>
For
example, for a
directory:
<_config:DataSourceLocation location="CatalogGroups"/>
If
you specify a directory for a load item:
- All files in the directory must be the same type of load for the data load utility not to
fail.
- All files in the directory are loaded serially, sorted by file name in ascending order.
- Optional:
Include any other configuration properties that you require.
Include these properties within a <_config:property>
element within either
the <_config:LoadOrder>
or <_config:Loaditem>
elements. If
this property tag is specified within a <_config:LoadOrder>
element, the
property applies to all load items. If the property tag is specified within a
<_config:Loaditem>
element, then the property applies to only that specific load
item.
For example, if you want to build the search index, include the
buildSearchIndex
configuration property in the load order configuration
file.
-
Save and close the file.
The following sample shows a configuration file for catalog
data.
<?xml version="1.0" encoding="UTF-8" ?>
<_config:DataLoadConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config
../../../../xml/config/xsd/wc-dataload.xsd"
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">
<_config:DataLoadEnvironment configFile="wc-dataload-env.xml" />
<_config:LoadOrder commitCount="100" batchSize="1" dataLoadMode="Replace">
<_config:property name="firstTwoLinesAreHeader" value="true" />
<_config:LoadItem name="CatalogGroup" businessObjectConfigFile="wc-loader-catalog-group.xml">
<_config:property name="validateCatalogGroup" value="true" />
<_config:DataSourceLocation location="CatalogGroups.csv" />
</_config:LoadItem>
<_config:LoadItem name="CatalogEntry" businessObjectConfigFile="wc-loader-catalog-entry.xml">
<_config:DataSourceLocation location="CatalogEntries.csv" />
</_config:LoadItem>
</_config:LoadOrder>
</_config:DataLoadConfiguration>
- Optional:
If you are configuring the data load order for routine data load processes, you can configure a
file difference preprocessor to run.
You can
run a file difference preprocess for routine data loads to improve the Data Load utility performance
for loading these files. By using this preprocessor that you can compare two input files, such as a
previously loaded file and a new version of this file. The preprocessor generates a difference file
that contains only the records in the new file that are not within the old file or that are changed
from the records in the old file. The Data Load utility can then load this difference file. If your
routinely loaded files contain many previous loaded records, then running this file difference can
result in shorter load times. Running a file difference can reduce the loading time that is required
to load your routine updates to your HCL Commerce database, reduce server usage time, and
improve server performance.
You can configure the Data Load
utility file difference preprocessor to compare files by the values in each column, instead of
entire records, to identify the changed records. You can also configure the file difference
preprocessor to ignore specific columns when the process is comparing files.
For more information about this preprocessor, see Data Load file difference preprocessing.
-
Save and close the file.