eSoundex analyzer

The eSoundex, or Extended Soundex, analyzer uses the Soundex algorithm to convert words into codes based on the English pronunciation of their consonants.

Vowel sounds are not included unless the vowel is the first letter of the word. The eSoundex analyzer is the similar to the Soundex analyzer except that it allows fewer or greater than four characters in its codes, depending on the length of the word. The eSoundex analyzer is useful if you want to search text based on how words sound. Because the text is converted to codes, you cannot perform proximity and range searches or specify a thesaurus.

The eSoundex analyzer processes text characters in the following ways:

Stopwords are not indexed.
Numbers and special characters are ignored.
The colon (:) character is treated as a whitespace, so that characters on either side of it are considered separate words.

Examples

In these examples, the input string is shown on the first line and the resulting tokens are shown on the second line, each surrounded by square brackets.

In the following example, the words "the" are not converted to tokens because they are stopwords and the rest of the words are converted to eSoundex codes that begin with the first letter of the word:

The Quick Brown Fox Jumped Over The Lazy Dog
[q2] [b65] [f2] [j513] [o16] [l2] [d2]

In the following example, the colon is treated as a whitespace and the backslash is ignored:

c:/informix 
[c] [i51652]

In the following example, the ampersand is ignored:

XY&Z Corporation 
[x2] [c61635]

In the following example, the e-mail address is considered one word:

xyz@example.com
[x2251425]

In the following example, numbers are ignored:

1abc 12abc abc1 abc12
[a12] [a12] [a12] [a12]

In the following examples, three words with the same stem word have different codes:

accept
[a213]
acceptable
[a21314]
acceptance
[a21352]