Extending the Dataload indexer mediator for WebSphere Commerce search
The default Dataload indexer generic mediator enables indexing indexer-ready data directly into WebSphere Commerce search. A new mediator can be created to support new data types.
There are two requirements for
the data file:
- The data file schema structure must match the index schema structure. That is, the external name of XML elements must match the internal name of the index field.
- All the index required fields are provided in the data file. If the external names are different than the internal index names, a mapping can be done in a configuration file. If the data file does not contain all the required fields, the mediator must extended to be able to compute the required fields.
A new mediator must be created for each data type. The
new mediator extends the com.ibm.commerce.foundation.dataimport.dataload.mediator.AbstractSolrInputDocumentMediator class,
and implements the transform()
method. The logic
to resolve any missing data from the data file can be added into the
new mediator class.
The Dataload framework is enhanced to support
indexing data directly into the WebSphere Commerce search server.
A SolrJ Java client is used to index a flat data
structure from an external data source in XML or CSV format. The Dataload
processing is done in multiple components: the reader, the mediator,
and the writer:
Where:
- The reader parses the input file, and constructs a name-value-pair object. There are generic readers for CSV and XML data formats available by default.
- The business object builder builds a map object of the name-value-pairs read by the reader.
- A Solr document object is created in the mediator. The document
is constructed and populated according to the index schema structure.
A generic Solr mediator is provided by default that can be used to
index indexer-ready data.Note: If the input data is not indexer-ready data, a custom mediator must process the data to make it ready for indexing.
- The constructed Solr document is inserted into the Solr server
using the SolrJ Java client.
The reader, builder, mediator and other configurations are specified in the loader file.
Where, within the<_config:DataMapping>
section:internal_field_A
is the Solr index field name.external_field_A
is the field name in the CSV or XML file.
For example, the XML loader, in bold:<?xml version="1.0" encoding="UTF-8"?> <_config:DataloadBusinessObjectConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config"> <_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader"> <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" > <_config:XmlHandler className="com.ibm.commerce.foundation.dataload.xmlhandler.NVPXmlHandler" /> </_config:DataReader> <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder"> <_config:DataMapping> <_config:mapping xpath="internal_field_A" value="external_field_A" /> <_config:mapping xpath=" internal_field_B" value="external_field_B" /> <_config:mapping xpath="internal_compositeUniqueKey" value="" valueFrom="Fixed"> <_config:ValueHandler className="com.ibm.commerce.foundation.dataload.config.OrderedConcatenateValueHandler" > <_config:Parameter name="1" value="external_field_A" /> <_config:Parameter name="2" value="_" valueFrom="Fixed" /> <_config:Parameter name="3" value="external_field_B"/> </_config:ValueHandler> </_config:mapping > <_config:mapping xpath="" value="delete" deleteValue="true"/> </_config:DataMapping> <_config:BusinessObjectMediator className="com.mycompany.commerce.foundation.dataimport.dataload.mediator.myCompanySolrInputDocumentMediator"> <!-- idFieldName value should match the index uniqueKey value --> <_config:property name="idFieldName" value="internal_ compositeUniqueKey"/> </_config:BusinessObjectMediator> </_config:BusinessObjectBuilder> </_config:DataLoader> </_config:DataloadBusinessObjectConfiguration>
For example, the CSV loader, in bold:<?xml version="1.0" encoding="UTF-8"?> <_config:DataloadBusinessObjectConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config"> <_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader"> <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.CSVReader" firstLineIsHeader="true"/> <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder"> <_config:DataMapping> <_config:mapping xpath="internal_field_A" value="external_field_A" /> <_config:mapping xpath=" internal_field_B" value="external_field_B" /> <_config:mapping xpath="internal_compositeUniqueKey" value="" valueFrom="Fixed"> <_config:ValueHandler className="com.ibm.commerce.foundation.dataload.config.OrderedConcatenateValueHandler" > <_config:Parameter name="1" value="external_field_A" /> <_config:Parameter name="2" value="_" valueFrom="Fixed" /> <_config:Parameter name="3" value="external_field_B"/> </_config:ValueHandler> </_config:mapping > <_config:mapping xpath="" value="delete" deleteValue="true"/> </_config:DataMapping> <_config:BusinessObjectMediator className="com.mycompany.commerce.foundation.dataimport.dataload.mediator.myCompanySolrInputDocumentMediator"> <!-- idFieldName value should match the index uniqueKey value --> <_config:property name="idFieldName" value="internal_ compositeUniqueKey"/> </_config:BusinessObjectMediator> </_config:BusinessObjectBuilder> </_config:DataLoader> </_config:DataloadBusinessObjectConfiguration>
To extend the Dataload indexer mediator:
- Create a
myCompanySolrInputDocumentMediator
class that extends from thecom.ibm.commerce.foundation.dataimport.dataload.mediator.AbstractSolrInputDocumentMediator
class. - Implement the transform method which takes the
dataObject
object anddeleteFlag
boolean as input parameters. ThedataObject
object is a Java representation of a NVP mapping passed from the builder. ThedeleteFlag
boolean is a flag indicating whether adataObject
passed is to be deleted.
Example
The following snippet represents the pseudo code of this
class:
public class myCompanySolrInputDocumentMediator extends
AbstractSolrInputDocumentMediator{
protected void transform(Object dataObject, boolean deleteFlag)throws DataLoadException {
// read the data object and assign to local variables
if (data != null && !data.isEmpty()) {
readProductAttribute1 = (ArrayList)data.get(PRODUCT_ATTRIBUTE1_DATA_NAME);
readProductAttribute2 = (ArrayList)data.get(PRODUCT_ATTRIBUTE2_DATA_NAME);
}
//Compute Product Attribute 3 from read attributes 1 and 2
computedProductAttribute3 = ....
// Create the SolrInputDocument object
SolrInputDocument doc= new SolrInputDocument();
//Read from the loader configuration file the unique id name of the Solr Document
String idFieldName = getSolrIdFieldName();
// Add the product attributes field names and their values to the Solr doc object
if (idFieldName != null) {
doc.addField(PRODUCTATTRIBUTE1_SOLR_FIELD_NAME, readProductAttribute1);
doc.addField(PRODUCTATTRIBUTE1_SOLR_FIELD_NAME, readProductAttribute2);
doc.addField(PRODUCTATTRIBUTE3_SOLR_FIELD_NAME, computedProductAttribute3);
}
// Create the SolrDocumentDataObject object, and set the operatio mode
SolrDocumentDataObject solrDocDO = new SolrDocumentDataObject(idFieldName, doc);
if (deleteFlag) {
solrDocDO.setOpMode('D');
} else {
solrDocDO.setOpMode('U');
}
// Add the SolrDocumentDataObject to the list of SolrDocumentDataObject
addSolrInputDocDataObjects(solrDocDO);
}
}
What to do next
To work with an inventory index in WebSphere Commerce search, complete the following tutorial: Tutorial: Indexing external inventory data in WebSphere Commerce search.