Configuring the data load order

The data load order configuration file controls the load order of the Data Load utility. The data load file has a pointer to the environment settings file, business object configuration file, and input file. You can also define the mode that the Data Load utility uses to load data.

About this task

Sample load order configuration files are provided for loading different component objects. These files are included within the component-specific directories in the following directory:

WC_installdir/samples/DataLoad
WCDE_installdir/samples/DataLoad

Procedure

Create a copy of a sample wc-dataload-object.xml file and rename the file.
For example, copy the wc-dataload-catalog-entry.xml file within the Catalog component directory.
Open the new data load order configuration file for editing.
Specify the data load environment configuration file.
1. Find the <_config:DataLoadEnvironment> element.
2. Change the value for the configFile attribute of this element to be the relative file path to the data load environment configuration file.
  Note: This value is overridden if an environment configuration file is specified in the command line when running the Data Load utility. For example:
  - ./dataload.sh ../samples/DataLoad/Catalog/wc-dataload-catalog-entry.xml ../samples/DataLoad/Catalog/environmentfile.xml
  - dataload.cmd ..\samples\DataLoad\Catalog\wc-dataload-catalog-entry.xml ..\samples\DataLoad\Catalog\environmentfile.xml
  - dataload.bat ..\samples\DataLoad\Catalog\wc-dataload-catalog-entry.xml ..\samples\DataLoad\Catalog\environmentfile.xml
  Where environmentfile is the name of the environment file to process.
  Although the environment configuration file that is specified in the command line takes precedence, the element must still exist in the data load environment configuration file.
Configure the load order for the data load process.
1. Find the <_config:LoadOrder> element.
2. Optional: Specify the value for the commitCount attribute. The commit count identifies the number of rows that are flushed in a single transaction before a commit is issued.
  The following values can be set for this attribute.
  
  0
  
  Nothing is committed until the load item finishes processing all of its input data. If an exception occurs, the records that are successfully processed can be committed. If the exception is an SQL exception, the utility rolls back all of the data that is processed after the last committed record. If a different type of exception occurs, the utility rolls back only the data that is not successfully processed. The load operation commits any successfully processed records.
  
  N
  
  Where N is a positive integer value. The commit count specifies how many lines of records that are processed when it calls the database commit. The default value is 1.
  Note: If errors occur during the data load, an SQL exception causes the database to roll back to the last line committed. The data after the last line committed up until the error line does not commit to the database. Any other error does not cause a rollback. For other errors, only the error line data does not commit to the database. All other data is committed. For example, if the commitCount is N = 10 and an error occurs in line 25:
  
  If the error is an SQL exception, the data between line 1 - 20 is committed to the database. The data between line 21 - 25 is not added in the database.
  
  If the error is not an SQL exception, the data between line 1 - 24 is committed to the database. The data in line 25 is not added in the database.
  You can include a configuration property "rollbackForAnyException" to control what data the utility rolls back when an exception occurs. When the property is set to true, the utility rolls back all data that is processed after the last committed record whenever an exception occurs. This roll back occurs regardless of the exception type. You can set this property within the configuration for each load item or within the parent <_config:LoadOrder> element to have the property setting apply to all load items.
3. Optional: Specify the value for the batchSize attribute. Database transactions are loaded in batches. These batches are kept in the Java memory until there are enough stored for a flush. Then, the batch contents are stored to the database as a single packet of data. The batches are stored in the database until the next commit occurs and the database loads the changes.
  The following values can be set for this attribute.
  
  0
  
  Uses JDBC batch update. All batches, for the entire load item, are processed for input data.
  
  N
  
  Where N is a positive integer value, indicating how many lines of data are processed before the JDBC batch is executed. The JDBC batch is enabled if, and only if, the batch size is greater than 1. The batchSize value should be less than or equal to the commitCount value. The default batch size is 1, which means that the JDBC batch is not enabled and the SQL statements are executed one by one directly.
4. Optional: Specify the value for the dataLoadMode attribute. The data load mode can be set to either Insert, Update, or Delete.
  Insert
  All data is inserted into the database. The utility generates insert SQL statements. This mode is recommended for initial loading of data. If there are any delete flags in your data file, the flags are ignored.
  In insert mode, you can specify a primary keyrange to use when the object does not exist in the database and it requires a new generated key. Specify the value within the <_config:BusinessObjectMediator> element. For example:
  startKey="100001" endKey="200001"
  The data writers that are supported in the insert mode:
  
  JDBC data writer
  
  Native load data writer
  
  Note: Running the Data Load utility in insert mode can improve the performance of loading initial loads of large amounts of data. However, when you run the Data Load utility in insert mode the utility does not check your database before the utility loads your input data. The utility does not determine whether the data objects that you are creating exist in your database before the utility attempts to load and create the data objects. This behavior can cause the load process to fail when the utility attempts to create data objects that exist in your database.
  For example, if your input file is a CSV file that contains a line that creates a product and a product description. If the file also contains a second line that loads the same product with a description in a different language, the load process fails. Since the utility creates the product and description in the first line, when the utility encounters the second line, the utility attempts to create the product again. Since the product exists, the load fails. To load this product and description data, you can use one of the following methods:
  
  Run the Data Load utility in replace mode.
  
  Run the Data Load utility in insert mode, but load your product and description information separately. Use one input file to load your product information, and a different input file to load all of your descriptions.
  
  For more information about configuring the Data Load utility to run in insert mode, see Scenario: Initial load.
  Replace
  Default: All data is replaced in the database. The utility generates insert, update, or delete SQL statements based on the data. Replace mode replaces existing data that is contained in the database with the input data. That is, if some column information is not in the input data, the column value is updated to null or the default value if any. For example:
  
  If one record (line) in your data file represents a new object, it is inserted.
  
  If the object is in the database already, it is replaced.
  
  If there is a flag in the data to indicate that this object is to be deleted, the object is deleted.
  
  In replace mode, do not specify a primary keyrange value as it can result in key conflicts within the database.
  The data writers that are supported in the replace mode:
  
  JDBC data writer
  
  Native load data writer
  
  Note: You can prevent accidentally replacing information in the database with null data. Modify the original input data that is used in the initial load when you are replacing a subset of the original data. Do not enter empty fields in your source file, unless you want the fields to contain null data in the database.
  For more information about configuring the Data Load utility to run in replace mode, see Scenario: Delta load.
  Update
  
  Specified catalog entry and catalog entry description data is updated in the database. The utility compares the catalog entry data in the input file with the corresponding data for the catalog entries in the database. The utility then replaces or adds the new or changed data for only the columns that are specified in the input file. All other columns remain unchanged.
  The update mode supports only the JDBC data writer.
  Note: The Data Load utility can run in update mode for loading only catalog entry or catalog entry description information.
  For more information about configuring the Data Load utility to run in update mode, see Scenario: Catalog entry update load.
  
  Delete
  
  All data that is identified in the input file that is in the database is deleted from the database. The utility ignores the value in the delete column or element in an input file in this mode. The utility generates the delete SQL statements that are needed to delete the objects in the input file. When you run the utility in this mode, only the JDBC data writer is supported in this mode.
  Note: If your site uses WebSphere Commerce search, the delta search index might not rebuild correctly when you delete some catalog objects with the Data Load utility in delete mode. When you delete a child object of a catalog entry or category with the utility in delete mode, both the child and parent objects are removed from the delta search index rebuild. This removal can cause the parent catalog entry or category to no longer be indexed or display correctly in the storefront.
  Use the utility in replace mode to delete catalog objects when your site uses WebSphere Commerce search. To delete objects with the utility in replace mode, include the value 1 for the Delete column of an object in your input file. If you do decide to delete catalog objects with the utility in delete mode, run a full index rebuild after the load operation completes.
5. Optional: Specify the value for the maxError attribute. The maximum error tolerance can be set to ensure that the Data Load utility runs, regardless of any errors that occur.
  Note: If you are using the XML data reader class to load XML input files, do not set the maxError attribute. If you do set this attribute, you must set the value for the attribute to be 1. If you set a different value, you can encounter unexpected behavior.
  If you are using a CSV data reader class, the following values can be set for this attribute.
  
  0
  
  Continue to run the Data Load utility, regardless of how many errors occur.
  
  N
  
  Where N is a positive integer value. The max error count specifies the error tolerance level during the data load process for a load item. The default value is 1.
6. Specify the load item configuration elements for each load item that is required. Specify the load item information within the <_config:LoadItem> element. Within this element, you must include the name attribute to identify the name of the load item. You must also include the businessObjectConfigurationFile attribute to identify the business object configuration file for the load item.
  For example,
```
<_config:LoadItem name="CatalogGroup" businessObjectConfigFile=
"wc-loader-catalog-group.xml" >
```
7. Within the <_config:LoadItem> element, specify the data source location for the input file that contains the load item information. Specify this information with the location attribute in the <_config:DataSourceLocation> element.
  For example,
```
<_config:DataSourceLocation location="CatalogGroups.xml"/>
```
8. Optional: Include any other configuration properties that you require.
  Include these properties within a <_config:property> element within either the <_config:LoadOrder> or <_config:Loaditem> elements. If this property tag is specified within a <_config:LoadOrder> element, the property applies to all load items. If the property tag is specified within a <_config:Loaditem> element, then the property applies to only that specific load item.
  
  For example, if you want to build the search index, include the buildSearchIndex configuration property in the load order configuration file.
  
  For more information about the available configuration properties, see Data Load utility configuration properties.
9. Save and close the file.
  The following sample shows a configuration file for catalog data.
```
<?xml version="1.0" encoding="UTF-8" ?> 
<_config:DataLoadConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config 
../../../../xml/config/xsd/wc-dataload.xsd" 
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">
  <_config:DataLoadEnvironment configFile="wc-dataload-env.xml" />
  <_config:LoadOrder commitCount="100" batchSize="1" dataLoadMode="Replace">
    <_config:property name="firstTwoLinesAreHeader" value="true" />
    <_config:LoadItem name="CatalogGroup" businessObjectConfigFile="wc-loader-catalog-group.xml">
      <_config:property name="validateCatalogGroup" value="true" />
      <_config:DataSourceLocation location="CatalogGroups.csv" />
    </_config:LoadItem>
    <_config:LoadItem name="CatalogEntry" businessObjectConfigFile="wc-loader-catalog-entry.xml">
      <_config:DataSourceLocation location="CatalogEntries.csv" />
    </_config:LoadItem>
  </_config:LoadOrder>
</_config:DataLoadConfiguration>
```
Optional: If you are configuring the data load order for routine data load processes, you can configure a file difference preprocessor to run.

You can run a file difference preprocess for routine data loads to improve the Data Load utility performance for loading these files. By using this preprocessor that you can compare two input files, such as a previously loaded file and a new version of this file. The preprocessor generates a difference file that contains only the records in the new file that are not within the old file or that are changed from the records in the old file. The Data Load utility can then load this difference file. If your routinely loaded files contain many previous loaded records, then running this file difference can result in shorter load times. Running a file difference can reduce the loading time that is required to load your routine updates to your WebSphere Commerce database, reduce server usage time, and improve server performance.

You can configure the Data Load utility file difference preprocessor to compare files by the values in each column, instead of entire records, to identify the changed records. You can also configure the file difference preprocessor to ignore specific columns when the process is comparing files.

For more information about this preprocessor, see Data Load file difference preprocessing.
Save and close the file.