
Troubleshooting: Preventing external URLs from being requested
You can prevent external URLs from being requested for the unstructured search index for performance benefits. This is done by excluding the attachmentURL entity in requests.
Problem
Indexing performance suffers if your site requests external URLs for the unstructured search index.
Solution
To exclude external URLs:
- Update the wc-dataimport-preprocess-unstructured-content.xml file, noting
the following changes in bold:
<_config:data-processing-config processor="com.ibm.commerce.foundation.dataimport.preprocess.StaticAttributeDataPreProcessor" masterCatalogId="10001" batchSize="500"> <_config:table definition="CREATE TABLE TI_CEATCHTU_0_#lang_tag# (ATCHREL_ID VARCHAR(128) NOT NULL, ATCHAST_ID BIGINT NOT NULL, CATENTRY_ID BIGINT NOT NULL, IDENTIFIER VARCHAR(128) NOT NULL, NAME VARCHAR(128), SHORTDESCRIPTION VARCHAR(254), LONGDESCRIPTION VARCHAR(4000), ATCHASTPATH VARCHAR(520), DIRECTORY VARCHAR(254), DIRECTORYPATH VARCHAR(128), MIMETYPE VARCHAR(254), LANGUAGE_ID INTEGER, IMAGE VARCHAR(254), RULENAME VARCHAR(254), PRIMARY KEY (ATCHREL_ID))" name=" TI_CEATCHTU_0_#lang_tag#"/> <_config:query sql="SELECT rtrim(cast(ATCHREL.atchrel_id as char(23))) ||'_'|| rtrim(cast(ATCHAST.atchast_id as char(23))) as atchrel_id, CE.CATENTRY_ID, ATCHAST.atchast_id, ATCHTGT.identifier, ATCHRELDSC.name, ATCHRELDSC.shortdescription, ATCHRELDSC.longdescription, rtrim(ATCHAST.atchastpath) atchastpath, STORE.directory, ATCHAST.directorypath, ATCHAST.mimetype, ATCHASTLG.language_id, ATCHRLUS.Image, ATCHRLUS.identifier rulename FROM TI_CATENTRY_0 CE JOIN ATCHREL ON ATCHREL.BIGINTOBJECT_ID = CE.CATENTRY_ID JOIN ATCHOBJTYP ON (ATCHREL.ATCHOBJTYP_ID = ATCHOBJTYP.ATCHOBJTYP_ID AND ATCHOBJTYP.IDENTIFIER = 'CATENTRY') LEFT JOIN ATCHTGT on (ATCHREL.atchtgt_id = ATCHTGT.atchtgt_id ) JOIN ATCHAST on (ATCHAST.atchtgt_id = ATCHTGT.atchtgt_id AND ATCHAST.MIMETYPEENCODING = 'URL') LEFT JOIN ATCHASTLG on (ATCHASTLG.atchast_id = ATCHAST.atchast_id) LEFT JOIN ATCHRELDSC on (ATCHRELDSC.atchrel_id = ATCHREL.atchrel_id AND ATCHRELDSC.language_id=?language_id?) JOIN ATCHRLUS ON (ATCHREL.ATCHRLUS_ID = ATCHRLUS.ATCHRLUS_ID) LEFT JOIN STORE on (ATCHAST.storeent_id = STORE.store_id) WHERE (ATCHASTLG.atchastlg_id is null or ATCHASTLG.language_id=?language_id?) order by ATCHREL.sequence, ATCHREL.atchrel_id"/> <_config:mapping> <_config:key queryColumn="ATCHREL_ID" tableColumn="ATCHREL_ID"/> <_config:column-mapping> <_config:column-column-mapping> <_config:column-column queryColumn="ATCHAST_ID" tableColumn="ATCHAST_ID" /> <_config:column-column queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID" /> <_config:column-column queryColumn="IDENTIFIER" tableColumn="IDENTIFIER" /> <_config:column-column queryColumn="NAME" tableColumn="NAME" /> <_config:column-column queryColumn="SHORTDESCRIPTION" tableColumn="SHORTDESCRIPTION" /> <_config:column-column queryColumn="LONGDESCRIPTION" tableColumn="LONGDESCRIPTION" /> <_config:column-column queryColumn="ATCHASTPATH" tableColumn="ATCHASTPATH" /> <_config:column-column queryColumn="DIRECTORY" tableColumn="DIRECTORY" /> <_config:column-column queryColumn="DIRECTORYPATH" tableColumn="DIRECTORYPATH" /> <_config:column-column queryColumn="MIMETYPE" tableColumn="MIMETYPE" /> <_config:column-column queryColumn="LANGUAGE_ID" tableColumn="LANGUAGE_ID" /> <_config:column-column queryColumn="IMAGE" tableColumn="IMAGE" /> <_config:column-column queryColumn="RULENAME" tableColumn="RULENAME" /> </_config:column-column-mapping> </_config:column-mapping> </_config:mapping> </_config:data-processing-config>
- Update the TI_CEATCHT_0 table, noting the following change in bold:
JOIN ATCHAST on (ATCHAST.atchtgt_id = ATCHTGT.atchtgt_id AND ATCHAST.mimetype <> '' AND ATCHAST.MIMETYPEENCODING <> 'URL')
- Update the wc-data-config.xml file, noting the following changes in bold:
<entity name="AttachmentURL-test" dataSource="WC database" query="SELECT I_CATENTATCHT_1.CATENTRY_ID, I_CATENTATCHT_1.ATCHAST_ID, I_CATENTATCHT_1.ATCHREL_ID, I_CATENTATCHT_1.IDENTIFIER, I_CATENTATCHT_1.ATCHASTPATH, I_CATENTATCHT_1.MIMETYPE, I_CATENTATCHT_1.NAME, I_CATENTATCHT_1.IMAGE, I_CATENTATCHT_1.RULENAME, I_CATENTATCHT_1.SHORTDESCRIPTION, I_CATENTATCHT_1.LONGDESCRIPTION, I_CATENTATCHT_1.DIRECTORY FROM TI_CEATCHTU_0_1 I_CATENTATCHT_1" deltaImportQuery="SELECT I_CATENTATCHT_1.CATENTRY_ID, I_CATENTATCHT_1.ATCHAST_ID, I_CATENTATCHT_1.ATCHREL_ID, I_CATENTATCHT_1.IDENTIFIER, I_CATENTATCHT_1.ATCHASTPATH, I_CATENTATCHT_1.MIMETYPE, I_CATENTATCHT_1.NAME, I_CATENTATCHT_1.IMAGE, I_CATENTATCHT_1.RULENAME, I_CATENTATCHT_1.SHORTDESCRIPTION, I_CATENTATCHT_1.LONGDESCRIPTION, I_CATENTATCHT_1.DIRECTORY FROM TI_CEATCHTU_0_1 I_CATENTATCHT_1" deltaQuery="SELECT ATCHREL_ID FROM TI_CEATCHTU_0_1 FETCH FIRST 1 ROWS ONLY" transformer="script:isWriteToFile" > <field column="ATCHREL_ID" name="attachmentrel_id" /> <field column="ATCHAST_ID" name="attachment_id" /> <field column="CATENTRY_ID" name="catentry_id" /> <field column="IDENTIFIER" name="identifier" /> <field column="NAME" name="name" /> <field column="IMAGE" name="image" /> <field column="RULENAME" name="rulename" /> <field column="SHORTDESCRIPTION" name="shortdesc" /> <field column="LONGDESCRIPTION" name="longdesc" /> <field column="ATCHASTPATH" name="path" /> <field column="MIMETYPE" name="mimetype" /> </entity>
To filter based on MIME type, for example
images:
SELECT I_CATENTATCHT_1.CATENTRY_ID, I_CATENTATCHT_1.ATCHAST_ID,
I_CATENTATCHT_1.ATCHREL_ID, I_CATENTATCHT_1.IDENTIFIER,
I_CATENTATCHT_1.ATCHASTPATH, I_CATENTATCHT_1.MIMETYPE,
I_CATENTATCHT_1.NAME, I_CATENTATCHT_1.IMAGE, I_CATENTATCHT_1.RULENAME,
I_CATENTATCHT_1.SHORTDESCRIPTION, I_CATENTATCHT_1.LONGDESCRIPTION,
I_CATENTATCHT_1.DIRECTORY
FROM TI_CEATCHT_0_1 I_CATENTATCHT_1 WHERE MIMETYPE NOT LIKE
'image/%' with ur;
The new Solr document reflects the updated exclusions and increases the performance of the preprocess and build index scripts.