Attachment full-text indexing with conversion filters
When conversion filters are used for attachment full-text indexing, the Domino® server and Notes® standard client use Apache Tika 2.4.1 open source conversion filters to extract text for full-text searches of attachments.
- Filter a wide range of formats.
- Filter ASCII text files that contain UTF-8 encoding.
Tika runs as a Java™ process when you start the Notes® standard client or Domino®. The process calls tika-server.jar, which starts an HTTP server and listens for text extraction requests on port 9998, by default. If you upgrade to the Notes® standard client or Domino® 10 or above, full-text indexes that previously used KeyView filters to extract text are rebuilt using the Tika filters.
For the list of file formats supported by Tika 2.4.1, see the Apache Tika web site.
Be aware that full text searches sometimes don't return expected results when some documents with PDF attachments are involved. The search results might contain false-negative or false-positive results. For a workaround, see the article Full Text Index: some PDFs are not tokenized correctly using Tika default settings on the HCL Support site.
TIKA_PORT=9997
The Notes® basic client does not use Tika filters for attachment filtering for local databases. The Notes® basic client users can choose to index attachments for local databases but only ASCII text attachments are indexed and searchable.