Customizing the stopwords.txt file
In this lesson, you edit configuration files to influence the behavior of the Solr 7.3.1 search engine. The particular example is customization of the stopwords.txt file.
The stopwords.txt file is a configuration file that lists the words used by the Solr stop filter. In HCL Commerce Version 9, you can change the behavior of the stop filter by pointing the engine at your own stopwords.txt file.
In the following tutorial, you will customize the English stopwords.txt file, and verify that you haves successfully changed the behavior of the Solr search engine.
Before you begin
- Ensure that you are working on the correct version of the stopwords.txt
file. The default file is
solrhom
/v3/CatalogEntry/conf/stopwords.txt, but it may have been extended, as described in Limiting search terms and characters from the search query. Locate the extended file, or create a new one to work on.To ensure that your system is referring to the default stopwords.txt file or its extended counterpart:- Determine what the content of the name field
is in either the default
solrhome/v3-index/CatalogGroup/conf/schema.xml
or extended
solrhome/v3-index-ext/CatalogGroup/x-schema.xml
file. Look for a definition similar to the following:
<field name="name" type="wc_text_${lang:en}" indexed="true" stored="true" multiValued="false"/>
- In this example, the
en
language code has been assigned to name. This language code will be used as part of the reference to the stopwords.txt file, making its name stopwords_en. You can the path that this name is associated with by looking in the solrhome/v3/common/schema-field-types.xml file. Look for the target of the solr.StopFilterFactory filter. It will resemble the following:
In this case, the stopwords_en name has been associated with ../../common/stopwords.txt. If stopwords_en is not otherwise specified in SCHCONFIG, this will be the default file.<filter class="solr.StopFilterFactory" ignoreCase="true" words="${stopwords_en:../../common/stopwords.txt}"/>
- Determine what the content of the name field
is in either the default
solrhome/v3-index/CatalogGroup/conf/schema.xml
or extended
solrhome/v3-index-ext/CatalogGroup/x-schema.xml
file. Look for a definition similar to the following:
- Add the parameter stopwords= stopwords_file_path to
the CONFIG column of the SRCHCONFEXT database table, where
stopwords_file_path is the path to your customized
stopwords.txt file. In the container environment, you would use an SQL command
similar to the
following:
Where the highlightedupdate SRCHCONFEXT set CONFIG='stopwords=/opt/WebSphere/Liberty/usr/servers/default/resources/search/index/managed-solr/config/v3-index-ext/common/stopwords.txt, original_config' where indextype='CatalogEntry' and indexscope=masterCatalogId and indexsubtype='Structured';
original_config
is the original CONFIG value for the record, and masterCalatogId should be changed into your own master catalogId. - You can add stop words for specific languages. To make a stopwords.txt file
language-specific, add the line
stopwords_lang
= stopwords_lang_file_path to the CONFIG column of the SRCHCONFEXT table, where lang is the language code. For example, if you want to add your own French stop words, add the linestopwords_fr=stopwords_fr_file_path
to the SRCHCONFEXT table CONFIG column, where stopwords_fr_file_path is the path to the French stop words file.
Procedure
-
In the storefront, search for the string "can." You should see a result similar to the
following:
- Copy the solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt file to the directory workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common. Open the file in an editor.
-
The file contains words such as "will" and "was" that help filter out
unhelpful clauses in search queries. As an example that will be easy to test,
add the word “can” at the bottom of the file. If you are have copied the default
file, the result should look something like the following:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # a couple of test stopwords to test that the words are really being # configured from this file: stopworda stopwordb # Standard english stop words taken from Lucene's StopAnalyzer a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
-
Add the value
stopwords=stopwords_file_path
to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the relative path to the file discoverable in the container. The following command will insert the data.sql: update SRCHCONFEXT set CONFIG=stopwords=workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common\stopwords.txt, original_config where srchconfext_id=1;
- Restart the HCL Commerce Search server.