Merging indexes by shard using a single JVM
When additional storage devices are available and indexing takes too long to finish, consider distributing the indexing workload across multiple devices. You can do this by adding additional index shards to your existing Search indexing server.
Before you begin
To distribute an index, you divide it into price shards, which are index cores within the same Solr instance. They are only used for indexing and merging. Once all of the index shards are successfully created, they can be merged in one optimized index, to be used with your storefront for sorting, faceting and filtering.
In order to perform index sharding with the Search server, first set up your shard environment using the following guidelines.- Determine the number of shards that you will be using, based on the available physical storage devices on your server. It is recommended that you assign separate storage devices for read and write operations, so that read performance is not affected by write operations.
- Ensure that you have run the SetupSearchIndex command to create the Price index, using the indexSubType option. For detailed instructions on setting up the Price index, see step 2 of Indexing contract prices using Index Load.
- In your authoring environment, create additional shards on your existing Master Search indexing
server using the following command.
wherehttp://hostname:3737/solr/admin/cores?action=create&name=MC_catalogId_CatalogEntry_PriceN_generic&instanceDir=MC_catalogId/generic/CatalogEntry/Price&dataDir=shardN
- hostname
- The host name of the Master indexing server.
- catalogId
- The master catalog ID of the index.
- N
- The shard number.
- Allocate enough heap memory for each of your index shards. Refer to WebSphere Commerce Search performance tuning for recommendations on how to configure the solrconfig.xml configuration file.
- Edit the solrconfig.xml configuration file for the Price core. Change the locktype parameter to "single." In addition, in order to speed up the merge operation, consider increasing the ramBufferSize size in the same configuration file. The default value is 64 (which stands for 64 megabytes). The maximum number you can set to it is 2048. Once you have made these changes, restart the Master Search server to activate them.
Once you have set up your shard environment, you can then perform indexing to each shard using Index Load. The following diagram shows the two stages involved in sharding and rebuilding the index.
Procedure
-
In the first stage, you prepare the data from the existing indexes.
- Split your business data into equal catentry_id ranges for use with your index shards.
- Set up Index Load configuration files for each of your shards. For detailed instructions, see Index Load configuration files for indexing from database.
- Use Index Load to index your data into each index shard. For more information, see Index Load.
-
In the second stage, you merge the indexes using Index Load. You will need
merge configuration files that specify your source and target directories. To create these files,
follow the instructions in Index Load configuration files for merging indexes.
- Run Index Load Merge against all shard index data directories. Index Load Merge processes your data in two steps, an index merge step and an optimization step.