Introduced in Feature Pack 3

Overview of the data extract utility

The Data Extract utility is a command-line utility that you can use to extract data from the WebSphere Commerce database into an output file.

The Data Extract utility can be run in the staging and production environments. However, you are recommended to run the utility in an environment that has all of the information for that you need to extract for an object. For example, the staging environment might not have inventory or pricing information for a catalog entry. In this case, run the utility on the production environment.

The Data Extract utility uses the existing Data Load utility framework and follows a similar interaction process:
  1. The configured data reader for the utility reads the data that is to be extracted from the database and returns the data to the business object builder.
  2. The business object builder populates a business object that is based on the data that is passed from the data reader. The business object builder passes the object to the business object mediator.
  3. The business object mediator transforms the business object into a list of map objects that is then passed to the data writer.
  4. The data writer then generates the configured output file and writes the list of CSV or XML objects into the output file.

By default, the Data Extract utility supports extracting data only for use with IBM Product Recommendations. When you use this utility to generate Enterprise Product Report (EPR) data for use with IBM Product Recommendations, the extraction process uses a business logic-based extraction. The utility extracts catalog data from your database and generates ECDF and EPCMF files in the correct format to load into IBM Product Recommendations. To generate this data, the configured data reader class uses the catalog service to retrieve data in the catalog business object (noun) format. The business object builder class does not populate any data in this process. Instead, the builder class passes the noun objects from the data reader class to the business object mediator class. The mediator class is then used to extract the data from the business object to build a map object. The data writer then converts the map object into a CSV format and writes the data into the CSV output file.

This approach uses business logic to fetch the data, similar to the behavior of existing web service. This business logic approach is useful when data cannot be directly retrieved from the database. For example, when complicated business logic is needed to compute the data, such as for extracting pricing data that uses price rules. To extract this pricing data, logic is needed to apply the price rules before the catalog entry prices can be determined, extracted, and written to an output file. When complicated business logic is needed, you do not need to reimplement the logic that is used to load or create the data to support extracting the data.

This approach however has a few disadvantages:
  • The approach can cause the performance of the extraction process to be slow. The logic-based service for retrieving data is intended to retrieve a single business object or a list of business objects. If any of the business objects are large, however, the performance can be slow.
  • Customizing the extraction process requires significant effort to customize the utility to retrieve custom data or data that is not supported for extracting by default. If you need to extract custom data or data that is not supported for extracting with the utility, you must implement your own custom services to extract the data.
Feature Pack 8
The Data Extract utility is updated to use an SQL-based extract process that can improve the performance and flexibility of the utility. The SQL-based process also can reduce the implementation cost for customizing the utility to extract data that is not supported for extracting with the utility by default. The utility is also enhanced to support extracting the following types of data with the SQL-based extraction process:
  • Promotions
  • Feature Pack 8Commerce Composer objects, such as widgets, layouts, layout templates, and pages
    Note: You must apply the interim fixes for APAR JR53438.fep and APAR JR53438.fp.
  • Feature Pack 8Marketing objects, such as activities. e-Marketing Spots, content, campaigns, attachments, and customer segments
    Note: You must apply the interim fixes for APAR JR53438.fep and APAR JR53438.fp.
To configure the utility to use an SQL-based extract process instead of the business logic process, use the following classes:
UniqueIdReader
This data reader class adds support for the utility to use SQL statements to retrieve the unique ID value for a business object. The data reader class can then send a map object for the business object to the business object builder.
AssociatedObjectMediator
This business object mediator adds support for the utility to use SQL statements to retrieve the detailed business object information for the map object. The mediator can then send an updated map object that contains the detailed business object information to the configured data writer class.
CSVWriter
A data writer class that can convert the map objects that are sent by the business object mediator into a CSV formatted record. This writer class can then write the record into the configured output CSV file. Use either this data writer class or the XmlWriter data writer class.
XmlWriter
A data writer class that can convert the map objects that are sent by the business object mediator into an XML formatted element. This writer class can then write the element and any subelements into the configured output XML file. Use either this data writer class or the CSVWriter data writer class.
ValueHandler
This interface provides a customization point that you can use when the utility cannot retrieve data directly from the database. You can also use this class when you need to modify data before the data writer class writes the data into the output file.
For more information about configuring the Data Extract utility to use these classes and the SQL-based extraction process, see Configuring and running the Data Extract utility. When you are configuring the utility, you are recommended to copy and edit the provided sample configuration files to help you quickly configure and run the utility.

Configuration files for the Data Extract utility

The Data Extract utility uses three types of configuration files. Samples are provided, but you must update the samples with configuration information specific to your site and environment. These configuration files are based on the Data Load utility configuration files, but they include some extensions.
wc-dataextract.xml
This file is the load order configuration file that you must point to when you run the utility. This file specifies the paths to the environment configuration file and to the business object configuration file.
wc-dataextract-env.xml
The environment configuration file, which includes the environment variables for your WebSphere Commerce instance. These variables include the following information:
  • Business context variables, including the store identifier, catalog identifier, and the default language and currency for your store.
  • Database environment settings, including the database type, name, and schema.
wc-dataextract-business_object.xml
The business object configuration file, which configures how the utility identifies the data to extract for a specific business object. By default sample business object configuration files are provided for extracting data for the following types of objects:
  • Catalog entries

    The sample configuration files for extracting catalog entry data are configured to generate an EPCMF file for use with IBM Product Recommendations. When you use this sample configuration file, the utility uses the business logic-based data extract process. For more information about configuring the utility to extract this data, see Data extraction utility for dynamic recommendations in IBM Product Recommendations.

  • Categories

    The sample configuration files for extracting catalog entry data are configured to generate an ECDF file for use with IBM Product Recommendations. When you use this sample configuration file, the utility uses the business logic-based data extract process. For more information about configuring the utility to extract this data, see Data extraction utility for dynamic recommendations in IBM Product Recommendations.

  • Feature Pack 8Promotions

    The sample configuration files for extracting promotion data are configured to generate an XML file that can be used with the Data Load utility. When you use this sample configuration file, the utility uses the SQL-based data extract process.

  • Feature Pack 8Commerce Composer objects
    Sample configuration files are provided for extracting Commerce Composer widgets, layouts, templates, and pages. The files are configured to generate CSV files that can be used with the Data Load utility. When you use this sample configuration file, the utility uses the SQL-based data extract process.
    Note: You must apply the interim fixes for APAR JR53438.fep and APAR JR53438.fp to add the sample configuration files for these objects to your WebSphere Commerce instance.
  • Feature Pack 8Marketing objects
    Sample configuration files are provided for extracting marketing activities, campaigns, content, attachments, customer segments, and e-Marketing Spots. The files are configured to generate CSV files that can be used with the Data Load utility. When you use this sample configuration file, the utility uses the SQL-based data extract process.
    Note: You must apply the interim fixes for APAR JR53438.fep and APAR JR53438.fp to add the sample configuration files for these objects to your WebSphere Commerce instance.
These files contain the following information:
  • Business context information.
  • Data mappings that are required to transform WebSphere Commerce business objects to the data that can be written in the output file.
  • Definitions for the order that the utility writes the data to the columns in the file.
  • Pointers to interfaces and implementation classes that the utility uses to extract and transform the data.

Best Practices

When you use the Data Extract utility, there are some general configuration recommendations that you can use to ensure that you take advantage of the full capability of the utility. For more information, see Data Extract utility best practices.