
Preprocessing and building the search index
Building search indexes in WebSphere Commerce includes two steps: preprocessing the data and building the search index.
The preprocess utility extracts and flattens WebSphere Commerce data and then outputs the data into a set of temporary tables inside the WebSphere Commerce database. The data in the temporary tables is then used by the index building utility to populate the data into search indexes by using the Data Import Handler (DIH). When there are multiple indexes, for example, each language uses its own separate index, the index is built multiple times.
Preprocessing data
Preprocessing data involves querying WebSphere Commerce tables and creating a set of temporary tables to hold the data. By default, preprocessing is used for WebSphere Commerce attributes. The default data preprocessors are based on the configuration information that is defined in wc-dataimport-preprocess.xml to process the data.
The table is first loaded in the wc-dataimport-preprocess-fullbuild.xml or wc-dataimport-preprocess-deltaupdate.xml files, since the process might be time consuming. This process helps keep the data consistent between the temporary tables. These two files are for the same temporary table, however, the SQL statement to get the data differs for full index builds and delta index builds. All SQL statements to load other temporary tables in other data import preprocessing related configuration files join with the temporary table defined in the wc-dataimport-preprocess-fullbuild.xml or wc-wc-dataimport-preprocess-deltaupdate.xml files.
For example, all the qualified catalog entry IDs for a master catalog are stored when the utility is performed against them. A benefit for this approach is that whether used for full index builds or delta index build, all the other data import preprocessing related configuration files remain the same.
Sample configuration files
WCDE_installdir\components\foundation\samples\dataimport\catalog
.
The following snippet specifies a batch size of
300,000:
<_config:data-processing-config
processor="com.ibm.commerce.foundation.dataimport.preprocess.CatalogHierarchyDataPreProcessor"
masterCatalogId="10101" batchSize="300000">
...
</_config:data-processing-config>
This
provides the ability to cache some of the information that can be reused to
determine all of the ancestor catalog groups for each catalog entry, and results
in fewer hits to the database to determine this information.Index-building and the Data Import Handler (DIH)
http://host:port/solr/MasterCatalog_CatalogEntry_en_US/dataimport?command=full-import
The
index building utility uses DIH to connect to the WebSphere Commerce database
through a JDBC connection. It crawls the temporary tables that are populated by the
preprocess utility, and then populates the Solr index. The
wc-data-config.xml configuration file definies the JDBC
configuration and crawling SQL statements.Preprocessing and index-building
