Troubleshooting: CAS Ingest with 500+ stores loses Natural Language Processing (NLP) data
Catalog Asset Store Ingest with 500+ stores loses NLP data due to the Ingest process running out of scroll space in Elasticsearch.
Problem
"connector": "auth.reindex.cas",
"location": "NLP Service, Parse NLP Messages",
"run": "i-4372abc7-3b10-415a-9684-6cd8d018535d",
"type": "log",
"attributes": "[cas.index.model:true, invokehttp.tx.id:759d7f49-c524-41d7-8c5d-bbdcd8a66a2f, transfer-encoding:chunked,
master.catalog:11001, split.id:, logger.message.severity:E, param.locale:, invokehttp.request.duration:5,
uuid:c42fad00-350e-46e1-bace-ee05137d8804, es.pageSize:2000, invokehttp.request.url:http://elasticsearch-master.elastic.svc.cluster.local:9200/_search/scroll, path:./,
index.scroll.uri:_search/scroll, environment.namespace:demoqa, content-type:application/json; charset=UTF-8, launch.connector.name:,
stage.start.time:1696389007635, invokehttp.response.url:http://elasticsearch-master.elastic.svc.cluster.local:9200/_search/scroll,
logger.message.text:Exceeded maximum number of retries when sending NLP data to Elasticsearch, cache.namespace:demoqaauth,
s2s.address:_gateway:31756, es.bucketSize:2000, invokehttp.status.code:404, flow.name:NLPService, mime.type:application/json;
charset=UTF-8, invokehttp.response.body:{\"error\":{\"root_cause\":[{\"type\":\"search_context_missing_exception\",\"reason\":\"No search context found for id [16499]\"},
{\"type\":\"search_context_missing_exception\",\"reason\":\"No search context found for id [11639]\"},
{\"type\":\"search_context_missing_exception\",\", param.storeId:12001, cache.name:services/cache/nifi/NLP,
param.assetStoreId:12001, stage.name:NLP Service, Parse NLP Messages, time.id:202310031507, filename:0c2ab2fa-0c34-4e7d-8c09-e08e60fad04c,
environment.name:auth, connector.name:auth.reindex.cas, flow.NRT:false, run.id:i-4372abc7-3b10-415a-9684-6cd8d018535d,
param.allStoreIds:11,21,31,715842384,715842385,715842386,715842387,715842388,715842389,715842390,715842391,715842392,715842393,715842884,
715842885,715842886,715842887,715842888,715842889,715842890,715842891,715842892,715842893,715842894,715842895,715842896,715842897,715842898,
715842899,715842900,715842901,715842902,715842903
This error indicates that the Ingest process is running out of scroll space, a condition that causes Elasticsearch to attempt to free older or obsolete buffers more quickly. The error occurs when it is not able to this quickly enough to keep up with the incoming data rate.
Solution
You can eliminate the error by raising the value of the scroll.duration. In the Ingest service, this is the amount of time the search engine should use to retain the search result data for scrolling. For more information about this variable including how to set it, see the Fetch data from a database with SQL scroll section of Optimizing index build and overall flow.
See the Elasticsearch documention for full details.
500
). When the pool is filled, the LRU scroll context will
be dropped even though it hasn't expired yet. In normal situations, each scroll (page) should not take more than one minute to process. If the time approaches a minute to process each page, that implies the page is probably too large and the page size should be reduced instead. However, if you need to keep the page size and increase this scroll retention time, in addition to increasing the scroll timeout parameter, it is recommended that you also increase the scroll context pool size in Elasticsearch. For information on how to do this, see the scroll-search-results topic in the Elasticsearch documentation.