Limiting search terms and characters from the search query
Procedure
-
Removing unimportant words from the search query:
Stop words remove common parts of speech that are typically unimportant. Such as the, and, or for. They are defined in the stopwords.txt file. For example, if a shopper searches for the shirt in the storefront, the is skipped by Solr.
To activate the stop words feature:
- Copy the solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt file to a location that will be accessible within the Search server's container.
-
Add the value
stopwords=stopwords_file_path
to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the relative path to the file discoverable in the container. The command to insert the data isupdate srchconfextset config='stopwords_en=stopwords_file_path' where srchconfext_id=x;
where stopwords_en=stopwords_file_path is the path to the stopwords.txt file, and x should be replaced by your desired ID. Normally, this is the record for the "Structured" index subtype with a certain language. - Restart the HCL Commerce Search server.
To create a language-specific stop words list, add the language code to the stopwords parameter of the database entry. This version of the value uses the form
stopwords_lang=stopwords_lang_file_path
, where stopwords_lang_file_path is the path to the language-specific stop words file.For example, If you want to add your own French stop words, add
Stop words are considered at both indexing and querying time.stopwords_fr= stopwords_fr_file_path
to the SRCHCONFEXT table's CONFIG column.If you are using the AND search type, no search results are returned, since the is defined in the stopwords.txt file. For more information, see StopFilterFactory.
-
Preventing stemming:
If you want to protect certain words from being stemmed, you can add them into the protwords.txt file.
-
Disabling wildcard and other character searches:
Wildcard searching is enabled by default, but if necessary, you can disable it for runtime performance or security reasons:
- Performance might be impacted, as a wildcard search that uses a common term might return many documents from the search index.
- Security might be a consideration, as Solr does not analyze and apply filters to wildcard searches.
A prohibited words list stops the search request from further searching, and is configurable in the wc-component.xml file.
For example, when you search for * by default, the resulting page is routed to the Prohibited Characters store page.
The default configuration is:<_config:property name="StopPatterns" value="\*,~,\?,'',"",.*\\.*,.*/.*,.*\|.*" />
You can update the configuration to disable wildcard (*) searches or other characters by using the regular expression format.