Tuning knobs
Elasticsearch has several configuration points that may be changed for optimal performance and the allocation of resources. The following are some critical tuning knobs to think about.
-
- Heap Size
-
The heap size is one of the most critical tuning parameters for Elasticsearch. It determines the amount of memory allocated to Elasticsearch's JVM and affects various operations, including caching, indexing, and search.Note: It is recommended to allocate around 50%of available memory to the heap, up to a maximum of 30 GB. Elasticsearch uses native memory for various caches and buffers in intra- and inter-pod communications.
-
- Thread Pools
- Elasticsearch uses various thread pools for different operations,
such as indexing, searching, and merging. You can tune the thread
pool settings to control the number of threads allocated for each
type of operation and adjust the queue size for pending requests.
This can help balance the allocation of resources based on your
specific workload.
For more information, see Thread pools.
-
- Circuit Breakers
- Circuit breakers protect Elasticsearch against excessive memory
usage or disk space consumption. You can configure circuit breaker
settings to control how Elasticsearch handles resource limitations
and prevent out-of-memory or disk space errors.
For more information, see Circuit breaker settings.
-
- Cache Size
-
Elasticsearch uses various caches, such as the field data and query cache, to improve search performance. You can adjust the cache size settings to optimize memory usage based on your query patterns and data size.
-
- Field Data Cache
- The Field Data Cache in Elasticsearch is a memory-based cache that
stores the field values of indexed documents in a compressed and
optimized format. It speeds up query execution by pre-loading
frequently accessed field values into memory. By caching field data,
Elasticsearch avoids loading data from a disk for each query,
improving performance. It is a crucial optimization feature that can
improve search performance, especially for aggregations, sorting,
and scripting operations.
For more information, see Field data cache settings.
-
- Field Data Loading Circuit Breaker
- The field data loading circuit breaker protects against excessive
memory usage by field data caches. You can configure the
indices.breaker.fielddata.limit setting to
control the memory allocated for field data caches and prevent
out-of-memory errors.
For more information, see Field data circuit breaker.
-
- Query Caching
- Elasticsearch supports query caching, which can improve query
performance by caching the results of frequently executed queries.
To optimize performance, you can enable query caching and adjust
cache settings, such as the cache size and expiration time.
For more information, see Node query cache settings.
-
- Hardware Resources
- Elasticsearch's performance is heavily influenced by the hardware resources available. Consider tuning the hardware configuration, such as CPU memory, disk I/O, and network settings, to match your workload requirements and optimize performance.
-
- File Descriptors and Process Limits
- Elasticsearch is a resource-intensive application requiring sufficient file descriptors and process limits to function correctly. You can increase the maximum number of open file descriptors and adjust process limits to accommodate the needs of your Elasticsearch cluster.
-
- Indexing Buffer Settings
- Elasticsearch uses memory buffers to stage data before it is written
to disk. You can tune the indexing buffer sizes to optimize indexing
performance. The
indices.memory.index_buffer_size setting
controls the size of the buffer for each shard.
For more information, see Indexing Buffer settings.
For more information, see Tune for indexing speed.
-
- Query-Time Filters
- Elasticsearch provides query-time filters that allow you to apply filters to a query. Using filters can improve query performance by reducing the amount of data that needs to be processed.
-
- Refresh and Flush Intervals
-
Elasticsearch periodically refreshes its index to make new data searchable. You can adjust the refresh interval to balance indexing performance and search latency.
Similarly, the flush interval controls how often Elasticsearch writes data from memory to disk. Adjusting these intervals can impact indexing throughput and resource usage. An Elasticsearch flush performs a Lucene commit and starts a new translog generation. Flushes are performed automatically in the background to ensure the translog does not grow too large, which would make replaying its operations take considerable time during recovery.
For more information, see Index Modules.
For more information, see Translog.
-
- Translog Durability
-
The translog is a transaction log that ensures data durability in case of node failures. Adjust the translog durability settings to balance data safety and indexing performance. For more information, see Translog.
When a document is indexed or updated in Elasticsearch, it is first written to the translog before being written to the index. This allows Elasticsearch to recover the changes in case of node failures or restarts. The translog acts as a buffer, storing the changes temporarily until they are flushed to disk and become part of the index.-
- Request Durability
- With request durability, every indexing or update request is synced to disk before a response is sent back to the client. This ensures that the changes are immediately durable but can impact performance due to the disk synchronization overhead.
-
- Translog Durability Settings
-
Elasticsearch provides configuration settings to control the durability of the translog. These settings include:
-
- translog.sync_interval
- Specifies the time interval between sync operations. Changes in the translog are periodically synced to disk based on this interval.
-
- translog.durability
- Controls the translog's durability level. It
can be set to
request
,async
, orrequest_sync
to balance performance and durability.
-
By default, Elasticsearch uses the
async
durability mode, where the changes are periodically synced to disk but not necessarily after each request. This provides a good balance between durability and performance.For more information, see Translog.
-
-
- Aggregations
- Elasticsearch provides powerful aggregation capabilities, but
complex aggregations can impact performance. You can tune
aggregation settings, such as
search.max_buckets and
indices.breaker.total.limit, to control the
memory usage and limit the number of buckets aggregations
produce.
For more information, see Aggregations.
-
- Shard Size
- Each shard in Elasticsearch comes with some overhead, so having a large number of small shards can impact performance. It is essential to balance the number of shards and the size of each shard based on your data volume and hardware resources.
-
- Shard Allocation
- Shards are the basic units of data distribution in Elasticsearch. By default, Elasticsearch tries to distribute shards evenly across nodes. However, you can control shard allocation settings to ensure balanced resource usage and optimize cluster performance.
-
- Shard Routing
- Elasticsearch distributes shards across nodes based on a hashing algorithm. You can influence shard routing by customizing the shard allocation process using shard allocation filters and allocation awareness settings. This can help balance data distribution and improve cluster performance.
-
- Network Settings
- Adjusting network settings, such as TCP configurations, can impact the performance and responsiveness of Elasticsearch. To ensure efficient network communication, you can optimize settings like TCP keep-alive, socket buffers, and connection timeouts.
-
- Data Serialization and Compression
- Elasticsearch allows configuring data serialization and compression
options, such as using a more efficient binary format (like
SMILE or CBOR) or
enabling compression for network communication. These settings can
improve storage efficiency and reduce network overhead.
See Save space and money with improved storage efficiency in Elasticsearch 7.10 for more information.
For more information, see Index Modules.
All these tuning knobs provide flexibility for optimizing Elasticsearch based on your specific workload, hardware resources, and performance requirements. It is essential to carefully monitor the impact of any changes and conduct performance testing to ensure optimal results. Additionally, always refer to the official Elasticsearch documentation and consider the recommendations provided by Elastic for tuning and optimization.