Elasticsearch index field types
A guide to the fields, parameters, and usage of HCL Commerce Elasticsearch index fields.
For an overview of the HCL Commerce implementation of Elasticsearch, see Using the HCL Commerce Search service. To see how these index fields are used, see Building the Elasticsearch index.
Type Name | Parameters | Usage | Description |
text | analyzer, boost, eager_global_ordinals, fielddata, fielddata_frequency_filter, fields, index, index_options, index_prefixes, index_phrases, norms, position_increment_gap, store, search_analyzer, search_quote_analyzer, similarity, term_vector | boosting,searching | A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words within each full text field. Text fields are not used for sorting and seldom used for aggregations. (More) |
keyword | boost, doc_values, eager_global_ordinals, fields, ignore_above, index, index_options, norms, null_value, store, similarity, normalizer, split_queries_on_whitespace | boosting,displaying,exact match,filtering,sorting | A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags. They are typically used for filtering, for sorting, and for aggregations. Keyword fields are only searchable by their exact value, case sensitive. (More) |
long, integer, byte, double, float | coerce, boost, doc_values, ignore_malformed, index, null_value, store | boosting,displaying,filtering,searching,sorting | Numeric field types supported. (More) |
date | boost, doc_values, format, locale, ignore_malformed, index, null_value, store | boosting,displaying,filtering,searching,sorting |
Dates in Elasticsearch can either be:
Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch. Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field. Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document. Date formats can be customised, but if no format is specified then it uses the default: "strict_date_optional_time||epoch_millis" This means that it will accept dates with optional timestamps, which conform to the formats supported by strict_date_optional_time or milliseconds-since-the-epoch. (More) |
boolean | boost, doc_values, index, null_value, store | boosting,displaying,filtering,searching,sorting | Boolean field which also accepts strings which are interpreted as either true or false. (More) |
binary | doc_values, store | filtering,sorting | The binary type accepts a binary value as a Base64 encoded string. The field is not stored by default and is not searchable:. (More) |
integer_range,float_range,long_range,double_range,date_range,ip_range | coerce, boost, index, store | boosting,displaying,filtering,searching,sorting | Supported range types that can be used for boosting, searching, filtering, displaying. (More) |
object | dynamic, enabled, properties | displaying | JSON documents are hierarchical in nature - the document may contain inner objects which, in turn, may contain inner objects themselves. Internally, this document is indexed as a simple, flat list of key-value pairs. (More) |
nested | dynamic, properties | displaying,filtering,searching,sorting | The nested type is a specialised version of the object datatype that allows arrays of objects to be indexed in a way that they can be queried independently of each other. If you need to index arrays of objects and to maintain the independence of each object in the array, you should use the nested datatype instead of the object datatype. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others, with the nested query. (More) |
completion | Term suggester options: text, field, analyzer, size, sort, suggest_mode, max_edits, prefix_length, min_word_length, shard_size, max_inspections, min_doc_freq, max_term_freq, string_distance |
spell correction |
The term suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The term suggester does not take the query into account that is part of request. The term suggester provides a very convenient API to access word alternatives on a per token basis within a certain string distance. The API allows accessing each token in the stream individually while suggest-selection is left to the API consumer. (More) |
Phrase suggester options: field, gram_size, real_word_error_likelihood, confidence, max_errors, separator, size, analyzer, shard_size, text, highlight, collate, stupid_backoff, laplace, linear_interpolation |
spell correction | Often pre-selected suggestions are required in order to present to the end-user. The phrase suggester adds additional logic on top of the term suggester to select entire corrected phrases instead of individual tokens weighted based on ngram-language models. In practice this suggester will be able to make better decisions about which tokens to pick based on co-occurrence and frequencies. (More) | |
Completion suggester options: field, size, skip_duplicates, fuzzy |
suggestion |
The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters. Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in. Hence, completion suggester is optimized for speed. The suggester uses data structures that enable fast lookups, but are costly to build and are stored in-memory. (More) The completion suggester considers all documents in the index, but it is often desirable to serve suggestions filtered and/or boosted by some criteria. For example, you want to suggest song titles filtered by certain artists or you want to boost song titles based on their genre. To achieve suggestion filtering and/or boosting, you can add context mappings while configuring a completion field. You can define multiple context mappings for a completion field. Every context mapping has a unique name and a type. It is mandatory to provide a context when indexing and querying a context enabled completion field. (More) |
|
search_as_you_type | max_shingle_size, analyzer, index, index_options, norms, store, search_analyzer, search_quote_analyzer, similarity, term_vector | suggestion | The search_as_you_type field type is a text-like field that is optimized to provide out-of-the-box support for queries that serve an as-you-type completion use case. It creates a series of subfields that are analyzed to index terms that can be efficiently matched by a query that partially matches the entire indexed text value. Both prefix completion (i.e matching terms starting at the beginning of the input) and infix completion (i.e. matching terms at any position within the input) are supported. (More) |
token_count | analyzer, enable_position_increments, boost, doc_values, index, null_value, store | filtering | A field of type token_count is used to count the number of tokens in a string. It is an integer field which accepts string values, analyzes them, then indexes the number of tokens in the string. (More) |
percolator | none | percolating |
The percolator field type parses a json structure into a native query and stores that query, so that the percolate query can use it to match provided documents. This field type is generally used for anomaly detection and alerting. The normal workflow for Elasticsearch is to store documents (as JSON data) in an index, and execute searches (also JSON data) to ask the index about those documents. Percolation reverses that - you store searches and use documents to ask the index about those searches. Under the hood, indexes with percolate fields keep a hidden (in memory) index. Documents listed in your percolate queries are first put in that index, then a normal query is executed against that index to see if the original percolate-field-bearing document matches An important point to remember is that this hidden index gets its mappings from the original percolator index. So indexes used for percolate queries need to have mappings appropriate for the original data and the query document data. (More) |
join | none | filtering,grouping,searching |
The join datatype is a special field that creates parent/child relation within documents of the same index. The relations section defines a set of possible relations within the documents, each relation being a parent name and a child name. has_child and children aggregation can provide similar functionality as Solr's result grouping. Note that only one join field mapping is allowed per index, and the parent and child documents must be indexed on the same shard. Internally the parent-join creates one field to index the name of the relation within the document. It also creates one field per parent/child relation. The name of this field is the name of the join field followed by # and the name of the parent in the relation. (More) |
flattened | boost, depth_limit, doc_values, eager_global_ordinals, ignore_above, index, index_options, null_value, split_queries_on_whitespace | boosting,filtering,searching,sorting | The flattened type provides an alternative approach, where the entire object is mapped as a single field. Given an object, the flattened mapping will parse out its leaf values and index them into one field as keywords. The object’s contents can then be searched through simple queries and aggregations. This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings. On the other hand, flattened object fields present a trade-off in terms of search functionality. Only basic queries are allowed, with no support for numeric range queries or highlighting. (More) |
- In Elasticsearch, arrays do not require a dedicated field datatype. Any field can contain zero or more values by default, however, all values in the array must be of the same datatype.
- It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields (declared as fields). For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations. Alternatively, you could index a text field with the standard analyzer, the english analyzer, and the french analyzer.