Example: Sample schema for Catalog attachments

The following sample snippet contains information for Catalog attachments in WebSphere Commerce.

<fields>
   <!--
   Attachments' basic attributes:
   -->
   <field name="attachmentrel_id" type="string"^* indexed="true" stored="true" required="true" multiValued="false"/> 
   <field name="attachment_id" type="long" indexed="true" stored="true" required="true" multiValued="false"/> 
   <field name="catentry_id" type="long" indexed="true" stored="true" required="true" multiValued="false"/> 
   <field name="name" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   <field name="path" type="wc_text" indexed="false" stored="true" required="false" multiValued="false"/> 
   <field name="mimetype" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   <field name="image" type="wc_text" indexed="false" stored="true" required="false" multiValued="false"/> 
   <field name="rulename" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   <field name="identifier" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   <field name="shortdesc" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   <field name="longdesc" type="wc_text" indexed="true" stored="true" required="false" multiValued="false"/> 
   
   <!--
   Tika's default dynamic field: map to all metadata of Tika generated fields
   -->
   <dynamicField name="tika_*" type="wc_text" indexed="true" stored="true" multiValued="true"/>
<!--
Spell check field
-->  
  <field name="spellCheck" type="wc_textSpell" indexed="true" stored="false" multiValued="true" />
   
 </fields>

Feature Pack 2 * type="long"

Where: The value of the following fields are obtained from the WebSphere Commerce database:

attachmentrel_id
The schema change from type="long" to type="string" enables greater flexibility of the index to store more types of content. The extra Web content is added in without any runtime functionality impact. Upgrading existing schemas requires a full index of previous unstructured content while deploying the new schema. To distinguish the HTML content with previous attachment content, the value of attachmentrel_id contains a prefix HTML_ and the value of both attachment_id and catentry_id are -1.
attachment_id
catentry_id
name
path
mimetype
image
rulename
identifier
shortdesc
longdesc

And the values of the following field is obtained from the output of the Tika framework:

tika_*

The data type of the dynamic field and the other fields use the same wc_text data type introduced in the structured content index schema. The unstructured content Solr core is deployed under the related entity core folder, and reuses the stopwords.txt, synonyms.txt and protwords.txt files of its parent entity core configuration. For example:


<fieldType name="wc_text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="../../conf/stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="../../conf/protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="../../conf/synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="../../conf/stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="../../conf/protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>

Where each language contains its own set of data and configurations for its content.