Using the Solr atomic update feature with Search
Atomic update, also known as partial update, enables you to make index updates on specified stored fields in an existing document. This approach is especially useful when a core has many fields and only a small number of them have been changed between index builds.
- set
- Set or replace a particular value, or remove the value if null is specified as the new value.
- add
- Adds an additional value to a list.
- remove
- Removes a value (or a list of values) from a list.
- removeregex
- Removes from a list that match the given Java regular expression.
- inc
- Increments a numeric value by a specific amount (use a negative value to decrement),
Example
catentry_id "10044"
inv_strlocqty_1 100
inv_strlocqty_2 200
inv_strlocqty_3 300
indexedTime "2018-11-28T14:51:58.042Z"
An
inventory update occurs, updating an available inventory in Store 1 to 400, and the available
inventory in Store 2 to 500.catentry_id | store_id | availquantity |
---|---|---|
10044 | 1 | 400 |
10044 | 2 | 500 |
catentry_id,inv_strlocqty_1,inv_strlocqty_2
10044,400, 500
After index
load, the document in Solr will look
like:
catentry_id "10044"
inv_strlocqty_1 400
inv_strlocqty_2 500
inv_strlocqty_3 300
indexedTime "2018-11-28T15:51:38.033Z"
Procedure to use atomic update with a CSV file
- Create the environment configuration file
workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-csv.xml,
where profileName is the URL parameter you use when you call IndexLoad in a web
browser. In following scenarios, price-delta is used as the
profileName for the CSV scenario, and inventory-delta as
profileName for the SQL scenario. The wc-indexload-profileName-csv.xml file contains environment control information and global properties that are required by Index Load. For example, it includes the specified data mapping between the CSV field and the corresponding Solr field. (You have the option of leaving a column empty of data if its name in this file matches a Solr field name.) This file also specifies the DataReader and mediator. To load from a CSV file, specify com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader as the reader, and com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator as BusinessObjectMediator. The wc-indexload-profileName-csv.xml file does not typically require customization. You can use the following sample file as-is.
<_config:DataLoader className="com.ibm.commerce.search.indexload.loader.SearchIndexLoadCSVLoader" > <_config:property name="FirstLineIsHeader" value="true" /> <_config:property name="Charset" value="UTF-8" /> <_config:property name="TokenDelimiter" value="," /> <_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader" /> <_config:BusinessObjectBuilder> <_config:DataMapping> </_config:DataMapping> <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator"/> <_config:BusinessObjectMediator className="com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator" /> </_config:BusinessObjectBuilder> </_config:DataLoader>
- Create the profile configuration file
wc-indexload-profileName.xml.
The wc-indexload-profileName.xml file contains configurable performance attributes, and one or multiple load item definitions. It also contains the CSV file location and the target core name. Profile names that you define in configuration files are then substituted in as a URL parameter when you call IndexLoad in a web browser. The load item configurations are listed under the load order section of this file. Every LoadItem definition specifies a particular load item configuration such as coreName or location. Multiple load items are run in parallel. Within every load item configuration section, the environment configuration file wc-indexload-profileName-csv.xml must be specified. The profile configuration file also contains DataWriter configuration; keep the original com.ibm.commerce.search.indexload.writer.SearchIndexLoadBatchService as the writer. The CSV file need only contain the changed field value. IndexLoad will use the Solr atomic update API to update the specified stored field.
Example: wc-indexload-price-delta.xml<_config:LoadItem name="ExternalPrice-1" fileName="wc-indexload-externalprice-csv.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price1_generic" /> <_config:property name="groupName" value="1" /> <_config:DataSourceLocation location="resources/search/index/indexload/contract-price-example1.csv" /> </_config:LoadItem>
- Run IndexLoad in POST mode with the profileName defined in
step 2. For example, if the profileName configuration file named as
wc-indexload-price-delta.xml, then run indexload with the
URL:
https://searchMaster:3738/search/admin/resources/indexload/profile/price-delta/start?catalogId=#MASTER_CATALOG_ID
- After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. For more information, see Packaging customized code for deployment.
Procedure to use atomic update via SQL
- Create the environment configuration file
workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-sql.xml.
This SQL version of the environment configuration file specifies the parallel indexing configuration. This configuration will be used to evenly split the dataset across multiple threads when run with the SolrIndexLoadQueryLoader and the configuration SQL code, which is used to capture the data from the specified datasource.
This configuration file also specifies the data reader. There are two DataReader entries:- com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryReader
- You can use this command to read unique records from database, and later save them into the index.
- com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader
- This command is used to transform multiple data entries from the database table into a single index row with numerous dynamic index fields.
Folllowing is a sample DataReader entry, which is used to get the updated inventory from a specific time. Since there are multiple records for any unique catentryId, the example uses com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader to accumulate multiple rows.<_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader"> <_config:DynamicFields> <_config:DynamicField dynamicFieldName="inv_strlocqty_%storeId%" dynamicFieldValue="%quantity%" indexingMode="replace" /> </_config:DynamicFields> <_config:property name="KeyFieldName" value="catentry_id" /> <_config:property name="ExcludeFieldNames" value="storeId,quantity" /> <_config:property name="minDelta" value="5"/> <_config:Query> <_config:SQL> SELECT invavl.catentry_id, invavl.STORE_ID,INVAVL.AVAILQUANTITY FROM INVAVL, CATGPENREL WHERE CATGPENREL.CATALOG_ID = 10001 AND INVAVL.CATENTRY_ID = CATGPENREL.CATENTRY_ID AND INVAVL.QUANTITYMEASURE = 'C62' AND INVAVL.LASTUPDATE BETWEEN '2018-11-25 16:45:24.000' AND current timestamp ORDER BY INVAVL.CATENTRY_ID WITH UR </_config:SQL> <_config:ColumnMapping columnName="CATENTRY_ID" indexFieldName="catentry_id" /> <_config:ColumnMapping columnName="STORE_ID" indexFieldName="storeId" /> <_config:ColumnMapping columnName="AVAILQUANTITY" indexFieldName="quantity" /> </_config:Query> </_config:DataReader>
- Create the profile configuration file
workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName.xml.As with the CSV file approach, specify the SQL configuration file within the
load item
section:<_config:LoadItem name="Inventory-Delta" fileName="wc-indexload-dom-delta-inventory-sql.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Inventory_generic" /> <_config:property name="groupName" value="I" /> </_config:LoadItem>
- Run IndexLoad with the defined profileName. For example,
if in step 2, the profile configuration name is
wc-indexload-inventory-delta.xml, then
run:
https://searchMaster:3738/search/admin/resources/indexload/profile/inventory-delta/start?catalogId=#MASTER_CATALOG_ID
- After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. For more information, see Packaging customized code for deployment.