How Portal Search handles special characters when indexing | HCL Digital Experience
Portal Search indexes words that are composed of consecutive literals, that is letters, digits, and special characters. Learn how Portal Search handles special characters during indexing.
This includes the following characters:
- The hash or pound sign (
#). - The percent sign (
%). - The plus sign (
+). - The asterisk (
*).
During indexing special characters are handled as follows:
- Blank or white space; this includes the tab
- Blanks separate words and are not indexed. Example: The string
key boardis indexed as two separate wordskeyandboard. - Line break or new line
- Line breaks separate words and are not indexed unless they are
preceded by a dash (
-). Examples:- The string
is indexed as two separate wordskey boardkeyandboard. - The string
is indexed as one wordkey- boardkeyboard.
- The string
- Dot or sentence end period (
.) and comma (,) - Dots and commas separate words and are not indexed, unless they
are both preceded and followed by a letter or digit. Example:
The string
www.ibm.comis indexed aswww.ibm.comand not as three separate words. - Question mark (
?) and exclamation mark (!) - Question marks and exclamation marks separate words and are not indexed unless they are followed by a letter.
- Other punctuation:
( ) { } [ ] < > ; : / \ | " _ - - These characters separate words and are not indexed.
- Other characters
- All other characters are removed from the strings in which they appear but do not separate words.
Notes:
- All characters that split words are discarded during indexing and searching.
- The previous statements apply to indexing. However, in
a search query all characters that can be part of the search
syntax are treated in that capacity and not as part of the search
query. These are the plus (
+) and minus (-) signs, double quotation marks ("), and the asterisk wild card character (*). If users want to include such characters in their search query, they must enclose them in double quotation marks. For example"+hello"searches for the string+hello;"*Hello*"searches for the string*Hello*. - The less than ( < ) and greater than ( > ) symbols are special HTML characters that Search cannot handle.