Troubleshooting: Slow Elasticsearch-based Query API
Troubleshoot and resolve slow response times for the Elasticsearch-based Query API.
Problem
Search requests to the Elasticsearch-based Query API take longer than expected.
Delays can occur in different components of the search architecture, including:
- Elasticsearch
- The Query service
- Supporting services (such as configuration lookup services).
Solution
Use the following process to identify the source of the delay.
- Locate the bottleneck
- Use monitoring tools, such as Grafana, to determine where the slowdown occurs. Check for CPU saturation, high heap memory usage, and latency spikes. Determine whether the delay is in Elasticsearch, the Query service, or another component.
- Enable DEBUG trace in the Query service
- If monitoring indicates that the delay is in the Query service, enable a DEBUG trace and reproduce the slow search scenario. Review the logs for the elapsed processing time of each sub-stage. Investigate any processing stage that exceeds 200 ms.
- Enable TRACE to capture the Elasticsearch query
- Once you identify the slow operation, enable TRACE logging in the Query service. Perform a single-user test to invoke the Search API. The TRACE log captures the generated Elasticsearch query and the "explain" setting. Use this query to analyze performance in Elasticsearch.
- Determine the corrective action
-
- If Elasticsearch consumes most of the processing time, analyze and optimize the Elasticsearch query.
- If the delay is within the Query service, identify the responsible logic and apply a fix.