Configuring the search preprocessor
In this lesson, you modify the preprocessor configuration file to support custom data.
You add support for your custom data by adding a custom configuration file that references the
required processing Java classes and the temporary customer ranking table information. Preprocessing
tasks are controlled by the wc-dataimport-preprocess XML files. The files
contain table definitions, database schema metadata, and references to the Java classes that are
used in the preprocessing steps.
About this task
- First, the data is loaded into a temporary table
- Then, internal WebSphere Commerce referential constraints are resolved and the resolved data is loaded into a secondary table. The secondary table data is used for indexing purposes.
Procedure
- Copy the sample Ratings.xml file into any directory within your development environment. This file is included in the searchindexratings.zip compressed file that you downloaded from the tutorial introduction. As an example, the following steps have the file included within the WebSphere Commerce bin directory. This XML file includes sample customer ratings, which you load into your database and use to develop product rankings with WebSphere Commerce search. If you want to include more ratings data, you can edit the file.
- In your file manager utility, go to the Search_home\pre-processConfig\MC_masterCatalogID\dbtype directory, where dbtype is the database type for your environment, such as DB2.
- In this folder, create an XML file and name the file wc-dataimport-preprocess-custom.xml.
-
Add the following code into your new file.
<?xml version="1.0" encoding="UTF-8"?> <_config:DIHPreProcessConfig xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../xsd/wc-dataimport-preprocess.xsd "> <!-- load ratings into temp table --> <_config:data-processing-config processor="com.mycompany.commerce.preprocess.StaticRatingsDataPreProcessor" batchSize="500"> <_config:table definition="CREATE TABLE TI_RATING_TEMP ( PART_NUMBER VARCHAR(256),RTYPE VARCHAR(256), RATING VARCHAR(256))" name="TI_RATING_TEMP"/> <_config:query sql=""/> <_config:mapping> <_config:key queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID"/> <_config:column-mapping> <_config:column-column-mapping> <_config:column-column queryColumn="" tableColumn="" /> </_config:column-column-mapping> </_config:column-mapping> </_config:mapping> <!-- this property is added new to locate the input file path instead of hard coding it to be in WC\bin -->
<_config:property name="inputFile" value="WCDE_installdir\bin\Ratings.xml"/>
</_config:data-processing-config> <_config:data-processing-config processor="com.mycompany.commerce.preprocess.StaticRatingsDataPopulator" batchSize="500"> <_config:table definition="CREATE TABLE TI_RATING ( CATENTRY_ID BIGINT NOT NULL, PART_NUMBER VARCHAR(256),RTYPE VARCHAR(256), RATING VARCHAR(256))" name="TI_RATING"/> <_config:query sql="insert into TI_RATING ( catentry_id,part_number, rating,rtype ) select catentry_id,part_number,rating,rtype from catentry,ti_rating_temp where catentry.partnumber=ti_rating_temp.part_number andcatentry.member_id=7000000000000000101"
/> <_config:mapping> <_config:key queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID"/> <_config:column-mapping> <_config:column-column-mapping> <_config:column-column queryColumn="" tableColumn="" /> </_config:column-column-mapping> </_config:column-mapping> </_config:mapping> </_config:data-processing-config> </_config:DIHPreProcessConfig>Notes:- Ensure that any member ID values and the
inputFile
property value are correct for your store and environment. TheinputFile
property must point to the XML file that includes the customer ranking data. - The first instance of the
<_config:data-processing-config>
element refers to the com.mycompany.commerce.preprocess.StaticRatingsDataPreProcessor Java class, which is used for loading data by the processor attribute. This element defines the table definition for the first temporary table, TI_RATING_TEMP, by using the<_config:table>
subelement. The remaining subelements remain unused and ensure that the XML is well-formed. - The second
<_config:data-processing-config>
refers to the com.mycompany.commerce.preprocess.StaticRatingsDataPopulator Java class, which is responsible for reading the data that is produced by the first stage of the preprocessing operation and resolves the internal identifiers. This element defines the table definition for the secondary temporary table, TI_RATING, which stores the resolved data. The<_config:query>
subelement defines the SQL that is used to resolve and load the data.
- Ensure that any member ID values and the
- Save and close the file
-
Create the preprocessor Java classes in your environment for preprocessing your custom data.
The following procedure creates sample StaticRatingsDataPopulator.java and
StaticRatingsDataPreProcessor.java files. These files include sample code for
only this tutorial. If you need to index different data, you must define your own custom Java
files.
Note: The RatingXMLReader.java within the com.mycompany.commerce.preprocess.rating package is a simple Java Class that takes an XML file name and parses the file. The format of the XML decides how the implementation of this class is performed. The format of the XML and how to parse it is left open. For instance, the following code snippet is a sample format for the Ratings.xml file.
<?xml version="1.0" encoding="utf-8"?> <customInfo> <product partNumber="AC-01"> <rating type="quality"> <averageRating>1.7</averageRating> <reviewCount>60</reviewCount> </rating> </product> <product partNumber="AC-0101"> <rating type="quality"> <averageRating>4.6</averageRating> <reviewCount>85</reviewCount> </rating> </product> </customInfo>
-
Package the classes within a JAR file in your WC_eardir
directory so that the proprocessor utilities can locate the classes at run time.