Stopword analyzer
The Stopword analyzer removes stopwords and converts text to tokens that contain only alphabetic characters.
The Stopword analyzer is useful if you want to remove stopwords and ignore non-alphabetical characters.
The Stopword analyzer processes text characters in the following ways:
- Each word is processed into a separate token.
- Alphabetic characters are converted to lowercase.
- Numeric and special characters are treated as white spaces.
- Stopwords are not indexed.
Examples
In these examples, the input string is shown on the first line and the resulting tokens are shown on the second line, each surrounded by square brackets.
In the following example, stopwords are removed and the letters are converted to lowercase:
The Quick Brown Fox Jumped Over The Lazy Dog
[quick] [brown] [fox] [jumped] [over] [lazy] [dog]
In the following example, the @ symbol and period are treated as white spaces:
xyz@example.com
[xyz] [example] [com]
In the following example, numbers are not included in the tokens:
1abc 12abc abc1 abc12
[abc] [abc] [abc] [abc]