Enabling WebSphere Commerce Search as a stand-alone unstructured content search engine

You can enable WebSphere Commerce Search to exclusively search unstructured content so that your unstructured content can be more efficiently indexed and retrieved. Enabling a stand-alone search engine for unstructured content helps offset potential intensive processing loads when you search for two different content types.

Before you begin

Ensure that you complete the following tasks:

Deploying WebSphere Commerce Search
Your database contains unstructured content.

Procedure

Design the schema for the new core.

Since the core is for unstructured content only, at least one dynamic text field is required for content mapping from the output of the Solr Cell.

For example, the following snippet is a sample schema for the new core:


<!-- Tokenized text for search -->
    <fieldType name="wc_text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>
…
<field name="unstructured_id" type="wc_text" indexed="true" stored="true" required="true" multiValued="false"/>
…
<dynamicField name="tika_*" type="wc_text" indexed="true" stored="true" multiValued="true"/>
…
<uniqueKey> unstructured _id</uniqueKey>

Where unstructured_id is the key field and helps identify the unstructured documents.

Post the unstructured content to the core by selecting one of the following methods:

Use curl, or a similar tool, to post the content to the core by using the Solr Cell interface.

For more information, see ExtractingRequestHandler.

Use SolrJ to post the content to the core.

For example:


protected void postContentDirectly(ContentStream stream, String name, String id, String catentryId) throws IOException, SolrServerException{

		ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
		up.addContentStream(stream);
		up.setParam("extractOnly", "false");
		up.setParam("literal.unstructured_id", id);
		up.setParam("uprefix", "tika_");
		up.setParam("fmap.content", "tika_content");
		up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
		NamedList<Object> result = query.getUnstructureSolrServer().request(up);
	
	}

Where ContentStream is the stream that contains the binary data of the unstructured content.

For more information, see ExtractingRequestHandler.

Perform a search by using the core.

After all the unstructured contented is posted to the Solr Cell and the contents are indexed, you can use a search URL to start searching.
1. Enter a search URL.
  For example:
```
"q=tika_content:KEYWORD&fl=unstructured_id,tika_content&hl=true&hl.fl=id&hl.fl=tika_content&hl.fragsize=100"
```
2. Typically, extra logic for the unstructured content and a wrapper logic is required for optimal results. The SolrJ API is used to invoke the equal logic of the preceding sample query. For more information, see Client Libraries / Bindings.