HCL Commerce Version 9.1.20.0 or later

Troubleshooting: Slow Elasticsearch-based Query API

Troubleshoot and resolve slow response times for the Elasticsearch-based Query API.

Problem

Search requests to the Elasticsearch-based Query API take longer than expected. Delays can occur in different components of the search architecture, including:
  • Elasticsearch
  • The Query service
  • Supporting services (such as configuration lookup services).
When you investigate slow search responses, you might observe high CPU usage, high heap memory usage, increased garbage collection activity, or elevated response times in the Query service logs.

Solution

Use the following process to identify the source of the delay.

Locate the bottleneck
Use monitoring tools, such as Grafana, to determine where the slowdown occurs. Check for CPU saturation, high heap memory usage, and latency spikes. Determine whether the delay is in Elasticsearch, the Query service, or another component.
Enable DEBUG trace in the Query service
If monitoring indicates that the delay is in the Query service, enable a DEBUG trace and reproduce the slow search scenario. Review the logs for the elapsed processing time of each sub-stage. Investigate any processing stage that exceeds 200 ms.
Enable TRACE to capture the Elasticsearch query
Once you identify the slow operation, enable TRACE logging in the Query service. Perform a single-user test to invoke the Search API. The TRACE log captures the generated Elasticsearch query and the "explain" setting. Use this query to analyze performance in Elasticsearch.
Determine the corrective action
  • If Elasticsearch consumes most of the processing time, analyze and optimize the Elasticsearch query.
  • If the delay is within the Query service, identify the responsible logic and apply a fix.