Index Load configuration files for indexing from database
Index Load configuration file | Schema definition file |
---|---|
Environment configuration file (wc-indexload-env.xml) | wc-dataload-env.xsd |
Profile configuration file (wc-indexload-profileName.xml) | wc-indexload.xsd |
Profile item configuration file (wc-indexload-businessobject.xml) | wc-indexload-item.xsd |
Environment configuration file (wc-indexload-env.xml)
The wc-indexload-env.xml file contains environment control information and global properties that are required by Index Load, including a common data writer and data source to be used to persist the data.
The wc-indexload-env.xml file does not typically require customization. You can use the default sample file as-is.
Profile configuration file (wc-indexload-profileName.xml)
The wc-indexload-profileName.xml file contains configurable performance attributes and load item configurations.
Profile names that you define in configuration files are then substituted in as a URL parameter when you call Index Load in a web browser.
The load item configurations are listed under the load order section of this file. They are processed in the same order as they are specified.
It can contain one or multiple LoadItem definitions, with every LoadItem configuration specifying the specific LoadItem configuration and coreName target. Multiple LoadItems are run in parallel, without sequence.
- batchSize
- The threshold when documents are soft committed in memory.
- commitCount
- The threshold when documents are hard committed to disk from memory.
- ThreadLaunchTimeDelay
- The amount of time in milliseconds to wait before starting another new thread to avoid overloading the system at startup.
- OptimizeAfterIndexing
- Indicates whether Index Load performs index optimization after commit.Note: Performing optimization after a full indexing improves runtime performance; however, it increases the overall indexing time.
- StatusRefreshInterval
- The maximum amount of time in seconds to wait before refreshing the current Index Load status and display it in the administrative log.
- DocumentSizeSamplingInterval
- The time interval in seconds to calculate the size of the indexed document. Use -1 to disable the service. The default value is 300.
- IndexHeightCacheHint
- A number that hints the system to determine the size of the applicable caches for index height that is used during indexing.
- IndexWidthCacheHint
- A number that hints the system to determine the size of the applicable caches for index width that is used during indexing.
Profile item configuration file (wc-indexload-businessobject.xml)
- ParallelThreads
- Reads data in parallel. It specifies the maximum loader thread number, which can be dispatched by the search work manager. The loader thread reads data in parallel, sharing the data writer.
- ParallelLowerRangeSQL
- SQL queries that get the first keys.
- ParallelUpperRangeSQL
- SQL queries that get the end keys.
- ParallelNextRangeSQL
- An SQL statement that determines the next available identifier when an empty range ID is detected from the parallel range. Typically, the nextStartKey value is the firstKey, and the nextEndKey is the firstKey+prefetchSize-1.
- ParallelLowerRange
- A hardcoded value that tracks the lower range keys. If defined, it is an absolute number for the lower range and overrides the value of ParallelLowerRangeSQL.
- ParallelUpperRange
- A hardcoded value that tracks the upper range keys. If defined, it is an absolute number for the upper range and overrides the value of ParallelUpperRangeSQL.
- ParallelPrefetchSize
- Determines how much data to read in one run, when the reader performs a query from the database. If defined, the run time breaks up the entire data range into fragments to avoid overloading the database sort heap with too large a query result set
- ParallelDeltaUpdate
- Determines whether the SQL result set is merged into an existing indexed document that contains a matching primary key. This delta update operation is equivalent to the Atomic Update feature provided by Solr.
- com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryReader
- A simple SQL loader that reads the original physical data from the data source in parallel as specified by the configuration files.
- com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader
- Requires the index entity to have the KeyFieldName property that is defined and only one primary
key field. The database column that maps to this primary key index field is used as the identifier
for the index entry.It is used in the following way:
- The KeyFieldName property is the index field name for the primary key.
- The query tag is the database SQL query to be used, and must be ordered by the primary key field.
- Multiple ColumnMapping tags can be used, with each one mapping to a database table column (name) with an index field name (value).
- The DynamicFields section allows a list of dynamic fields to be defined. Multiplexing is applied to this field with the column name as the resolved value from dynamicFieldName and the value in this column as the resolved value from dynamicFieldValue. In addition, dynamicFieldName and dynamicFieldValue can be used as a template where other field variable names can be declared. An optional parameter, indexingMode, with its default value as replace, is used to define the behavior for handling multiple values in this dynamic column. Other supported operations are append and sum, where append is for handling multi-value index fields, and sum is for adding up all the values.