Ingest configuration via REST
You can configure all your Ingest connectors via a single REST endpoint. All connectors inherit a set of configurations such as global attributes and make them available as NiFi flowfile attributes inside NiFi. Having access to this endpoint, you can add or remove NiFi capabilities from outside the main NiFi process flow.
Accessible Ingest features
The default global attribute values are as follows. You can use them to quickly
activate and include a variety of Ingest capabilities.
- alias.keep.backup
- This option is used when an index alias is created. It determines the number of copies of old indexes to retain, which can be kept for recovery purposes in the future. The default value is 2.
- cache.invalidation.duration
- Defines the maximum number of seconds for the base cache invalidation to complete. Default is 60 seconds.
- cache.invalidation.threshold
- Defines the maximum number of entries to perform incremental cache invalidation. A full cache invalidation will be performed when this threshold has been exceeded during an ingest operation. Default is 10000.
- flow.basic.nlp.category.search
- Defines whether to include category names while performing a term search. Default is disabled. Note: enabling this option may result in a much larger search result set, increasing the overall response time.
- flow.disable.basic.nlp
- When set to
true
, NLP processing is not performed in NiFi for Basic NLP. This configuration switch has no effect when you are using Advanced NLP. - flow.index.non-facetable
- This option defines whether to include non-facetable attributes and
their values in the Attribute index. The default value is
true
. Disable this option only when there are many non-facetable attributes in the Attribute Dictionary. Doing so improves the overall ingest time. The additional non-facetable attribute data in the search index can help improve the accuracy of the NLP part-of-speech classification so that the outcome of the term search result is more relevant. - flow.language.fallback
- This feature enables Ingest to index the text from the store's default
language for any language that does not have a translation. Enabling
this feature may have a significant impact to the overall indexing time.
Its default value is
true
. - flow.database.listagg
- This feature allows the Ingest runtime to not use the database version
of LISTAGG and use the application level LISTAGG implementation.
Although the database version is faster, it has a 32K length limitation.
Its default value is
true
. For more information, see .LISTAGG() and Serialize. - flow.payload.strategy
-
This option defines the strategy for managing flowfile content size when sending to Elasticsearch for indexing. It can be set to one of the following values:
- Threshold
- (Default): Allows NiFi to use a threshold size limit to restrict the size of flowfile contents in a dataflow. If a flowfile exceeds this limit, it will be split into smaller sized files before they are sent to Elasticsearch.
- Adaptive
- Allows NiFi to dynamically redistribute flowfile content by merging or splitting it into multiple flowfiles.
- None
- No change is made to the flowfile content in any dataflow.
The threshold value used in the Adaptive and Threshold strategies can either be:- Preset to a number of bytes in the FlowFile Size Threshold property of the Split Bulk Request processor from each Bulk Service process group, or, if not preset,
- Dynamically calculated by NiFi based on your current Elasticsearch system settings.
- flow.retry.partial
- This function allows retrying only failed entries in a bulk request
instead of the entire flowfile. The default is
false
. - flow.marketplace
- This feature enables Ingest to not include Marketplace capabilities into
the Search indices. Its default value is
false
. - flow.wait.strategy
- Defines the default strategy for WaitLink:
- Bulk: only unblocked by signal send from Bulk Service
- Any: same as Bulk, but can also be unblocked by inactivity detected in SQL or Bulk. This is the default.
- matchmaker.proximity
- This number specifies the range of proximity that the MatchMaker uses
for approximation when performing searches. The default is 0.2, which
means +/- 20%. The default value is
0.2
. - flow.price.copy
- This option allows smart copying of prices from price index to product
index in the given environment type(s). The default is set to
"auth, live"
. - flow.inventory.copy
- This option allows smart copying of inventories from inventory index to
product index in the given environment type(s). The default is set to
"auth, live"
. - flow.concurrent.postindex
- This option defines whether to execute the post-index connector in the
background. The default is set to
false
. - cluster.index.nodegroup
- This option defines which Elasticsearch nodegroup setup is being used -
single or dual. The default is
single
. - cache.invalidation.duration
- Defines the maximum number of seconds for the base cache invalidation to complete. The default is 60 seconds.
- cache.invalidation.threshold
- Defines the maximum number of entries to perform incremental cache invalidation. A full cache invalidation will be performed when this threshold has been exceeded during an ingest operation. The default value is 10000.
- flow.wait.strategy
- Defines the default strategy for WaitLink:
- Bulk: only unblocked by signal sent from Bulk Service
- Any: same as Bulk, but can also be unblocked by inactivity detected in SQL or Bulk. This is the default.
- flow.index.attribute.name
- If the search phrase matches the attribute name, a search will be
conducted on the attribute names under the new default settings. This
may result in several product matches to the attribute name. However,
you may use the configuration endpoint in the ingest node to alter the
flow.index.attribute.name property value to
false in order to deactivate the attribute name search.Note: Also set the index.nlp.attribute.name variable to false in the NLPService process group inside NiFi.
- flow.basic.nlp.category.search
- Defines whether to include category names while performing a term
search. Default is disabled.Note: Enabling this option may result in a much larger search result set, increasing the overall response time.
- flow.product.rollupAssociations
- From HCL Commerce version 9.1.15.2+ onwards, enabling flow.product.rollupAssociations, the parent product will also reflect all the merchandising associations added in SKUs of that parent product along with existing merchandising associations.
- flow.index.flattened
- From HCL Commerce version 9.1.15.0 onwards a new ingest option, flow.index.flattened, has been added to stop map explosion. However, this option comes with a performance degradation cost during faceting and filtering, especially when the facetable attributions contain many entries. The default value is to not index with a "flattened" data type, that is, "false." Enable this option (by setting it to "true") when encountering an incredibly long ingest time while loading many attributes into a product document.
- Use flow.database.schema to define the custom database schema name to be used for indexing.
- Use custom.table.catentry to provide a custom CATENTRY table to refine the base scope of catalog entry SQLs.
- Use custom.where.catentry to provide a custom
Where
clause of the custom CATENTRY table. - Use custom.table.catgroup to provide a custom CATGROUP table to refine the base scope of catalog group SQLs.
- Use custom.where.catgroup to provide a custom
Where
clause of the custom CATGROUP table.
To set a custom value to an Ingest feature, you could issue the following REST call,
PATCH /search/resources/api/v2/configuration?nodeName=ingest&envType=auth
with
the following content:{
"global": {
"connector": [
{
"name": "attribute",
"property": [
{ "name": name_of_ingest_feature,
"value": value_of_this_property }
]
}
]
}
}
To disable the default database version of the LISTAGG table, you would include the
following payload in the request:
{
"global": {
"connector": [
{
"name": "attribute",
"property": [
{ "name": "flow.database.listagg",
"value": "false" }
]
}
]
}
}
From HCL Commerce 9.1.15.2+ onwards, enabling
flow.product.rollupAssociations, the parent product will
also reflect all the merchandising associations added in SKUs of that parent product
along with existing merchandising
associations.
PATCH/search/resources/api/v2/configuration?nodeName=ingest&envType=auth
{
"global": {
"connector": [
{
"name": "attribute",
"property": [
{ "name": "flow.product.rollupAssociations", "value": "true" }
]
}
]
}
}