How Portal Search handles special characters when indexing
Portal Search indexes words that are composed of consecutive literals, that is letters, digits and special characters. This section describes how Portal Search handles special characters during indexing.
This includes the following characters:
- The hash or pound sign (
**\#**). - The percent sign (
**%**). - The plus sign (
**+**). - The asterisk (
**\***).
During indexing special characters are handled as follows:
-
Blank or white space; this includes the tab
Blanks separate words and are not indexed. Example: The string
key boardis indexed as two separate wordskeyandboard. -
Line break or new line
Line breaks separate words and are not indexed unless they are preceded by a dash (
-). Examples:-
The string
key boardis indexed as two separate words
keyandboard. -
The string
key- boardis indexed as one word
keyboard.
-
-
Dot or sentence end period (
.) and comma (,)Dots and commas separate words and are not indexed, unless they are both preceded and followed by a letter or digit. Example: The string
www.ibm.comis indexed aswww.ibm.comand not as three separate words. -
Question mark (
?) and exclamation mark (!)Question marks and exclamation marks separate words and are not indexed unless they are followed by a letter.
-
Other punctuation:
( ) { } [ ] < > ; : / \ | " _ -These characters separate words and are not indexed.
-
Other characters
All other characters are removed from the strings in which they appear but do not separate words.
Note
- All characters that split words are discarded during indexing and searching.
- The previous statements apply to indexing. However, in a search query all characters that can be part of the search syntax are treated in that capacity and not as part of the search query. These are the plus (
+) and minus (-) signs, double quotation marks ("), and the asterisk wild card character (*). If users want to include such characters in their search query, they must enclose them in double quotation marks. For example"+hello"searches for the string+hello;"*Hello*"searches for the string*Hello*. - The less than ( < ) and greater than ( > ) symbols are special HTML characters that Search cannot handle.