Index Load
Index Load is an indexing service that uses the Data Load framework to load data in parallel into one or more search extension indexes.
It is used to populate contract prices when your site requires a separate Price extension index for performance reasons. For example, use Index Load with a Price extension index if your site contains more than 1000 contracts, or, if you use an external source to populate prices.
- It improves indexing performance by leveraging local binding (embedded mode) on the search server to avoid making remote HTTP calls using HTTPClient.
- The data feed is streamed directly into one or more index columns and no temporary tables are needed. This programming model allows precise data conversion and is easier to customize.
- Multithreaded parallel indexing technique makes sharding possible within and across multiple extension indexes.
- Metrics can be displayed using the Index Load status command while indexing to help refine tuning parameters and improve performance throughput.
Index Load uses profiles to control the indexing behavior and characteristics for a search extension index. Index Load profiles are defined in the Index Load configuration file.
A profile name can be passed in through a URL parameter, named profile, when calling Index Load. The value of the profile parameter is then substituted into the pattern to resolve the actual file name to be loaded from the predefined configuration directory. Both the pattern name and Index Load configuration directory is defined as a servlet initialization parameter in the web.xml of the Index Load servlet.
Tuning Index Load contains more detailed information on how data flows through the multi threaded indexing application and which tuning parameters can be used.
Where Index Load contains the following components:
- Index Load Servlet (SolrIndexLoadServlet)
- The Index Load interface. It accepts commands with input information such as profile, catalog, and store. The input information is used to look up the specified configuration files.
- Loader Interface
- Creates loader units to run based on the configured load item (loaditem). Only one loader exists, which can use several load items. Each load item includes a reader, and zero or several mediators.
- Loader Item
- The runnable unit for Index Load. There can be multiple loader
items in parallel, where every loader item is an independent load unit
composed of a single data loader to control it.
Within a loader, a data reader exists which can read data in multiple threads, and optional mediators. The mediators are in a chain, where the output of a mediator is the input of another mediator, with a single data writer. The target of multiple loader items can be the same or different core instances.
- Reader
- Reads original physical data from data sources in parallel and passes it to the mediator. The SolrIndexLoadQueryReader is used by default to read data from relational databases as specified by the configuration files.
- Mediator
- The BusinessObjectMediator defines a common interface to take the input from the reader and transform it to follow the convert pattern as specified in the configuration files. There can be zero or several mediators, where the output of a mediator is the input for the following mediator. When all the mediators finish transforming, the physical data writer persists the physical objects into Solr by calling the Solrj interface.
- Batch Service
- Adds Solr documents and commits them to the Solr server. There
is only one batch service which represents one unique Solr core, with
the ability to interact with multiple index writers. The batch service
contains an internal queue for buffering unfinished documents from
various writers. Once the input document is ready for indexing, it
is dispatched to the Solr runtime.
The batch service is used by default to populate the Price extension index when indexing contract prices using Index Load.
Limitations
- Index Load supports only extension indexes. Index Load does not support the Product, Category, or Unstructured indexes.