Ingest configuration via REST

You can configure all your Ingest connectors via a single REST endpoint. All connectors inherit a set of configurations such as global attributes and make them available as NiFi flowfile attributes inside NiFi. Having access to this endpoint, you can add or remove NiFi capabilities from outside the main NiFi process flow.

Accessible Ingest features

The default global attribute values are as follows. You can use them to quickly activate and include a variety of Ingest capabilities.

alias.keep.backup

This option is used when an index alias is created. It determines the number of copies of old indexes to retain, which can be kept for recovery purposes in the future. The default value is 2.

cache.invalidation.duration

Defines the maximum number of seconds for the base cache invalidation to complete. Default is 60 seconds.

cache.invalidation.threshold

Defines the maximum number of entries to perform incremental cache invalidation. A full cache invalidation will be performed when this threshold has been exceeded during an ingest operation. Default is 10000.

flow.basic.nlp.category.search

Defines whether to include category names while performing a term search. Default is disabled. Note: enabling this option may result in a much larger search result set, increasing the overall response time.

flow.disable.basic.nlp

When set to true, NLP processing is not performed in NiFi for Basic NLP. This configuration switch has no effect when you are using Advanced NLP.

flow.index.non-facetable

This option defines whether to include non-facetable attributes and their values in the Attribute index. The default value is true. Disable this option only when there are many non-facetable attributes in the Attribute Dictionary. Doing so improves the overall ingest time. The additional non-facetable attribute data in the search index can help improve the accuracy of the NLP part-of-speech classification so that the outcome of the term search result is more relevant.

flow.language.fallback

This feature enables Ingest to index the text from the store's default language for any language that does not have a translation. Enabling this feature may have a significant impact to the overall indexing time. Its default value is true.

flow.database.listagg

This feature allows the Ingest runtime to not use the database version of LISTAGG and use the application level LISTAGG implementation. Although the database version is faster, it has a 32K length limitation. Its default value is true. For more information, see .LISTAGG() and Serialize.

flow.payload.strategy

This option defines the strategy for managing flowfile content size when sending to Elasticsearch for indexing. It can be set to one of the following values:

Threshold: (Default): Allows NiFi to use a threshold size limit to restrict the size of flowfile contents in a dataflow. If a flowfile exceeds this limit, it will be split into smaller sized files before they are sent to Elasticsearch.
Adaptive: Allows NiFi to dynamically redistribute flowfile content by merging or splitting it into multiple flowfiles.
None: No change is made to the flowfile content in any dataflow.

The threshold value used in the Adaptive and Threshold strategies can either be:

Preset to a number of bytes in the FlowFile Size Threshold property of the Split Bulk Request processor from each Bulk Service process group, or, if not preset,
Dynamically calculated by NiFi based on your current Elasticsearch system settings.

flow.retry.partial

This function allows retrying only failed entries in a bulk request instead of the entire flowfile. The default is false.

flow.marketplace

This feature enables Ingest to not include Marketplace capabilities into the Search indices. Its default value is false.

flow.wait.strategy

Defines the default strategy for WaitLink:

Bulk: only unblocked by signal send from Bulk Service
Any: same as Bulk, but can also be unblocked by inactivity detected in SQL or Bulk. This is the default.

matchmaker.proximity

This number specifies the range of proximity that the MatchMaker uses for approximation when performing searches. The default is 0.2, which means +/- 20%. The default value is 0.2.

flow.price.copy

This option allows smart copying of prices from price index to product index in the given environment type(s). The default is set to "auth, live".

flow.inventory.copy

This option allows smart copying of inventories from inventory index to product index in the given environment type(s). The default is set to "auth, live".

flow.concurrent.postindex

This option defines whether to execute the post-index connector in the background. The default is set to false.

cluster.index.nodegroup

This option defines which Elasticsearch nodegroup setup is being used - single or dual. The default is single.

cache.invalidation.duration

Defines the maximum number of seconds for the base cache invalidation to complete. The default is 60 seconds.

cache.invalidation.threshold

flow.wait.strategy

Defines the default strategy for WaitLink:

Bulk: only unblocked by signal sent from Bulk Service
Any: same as Bulk, but can also be unblocked by inactivity detected in SQL or Bulk. This is the default.

flow.index.attribute.name

If the search phrase matches the attribute name, a search will be conducted on the attribute names under the new default settings. This may result in several product matches to the attribute name. However, you may use the configuration endpoint in the ingest node to alter the flow.index.attribute.name property value to false in order to deactivate the attribute name search.

Note: Also set the index.nlp.attribute.name variable to false in the NLPService process group inside NiFi.

flow.basic.nlp.category.search

Defines whether to include category names while performing a term search. Default is disabled.

Note: Enabling this option may result in a much larger search result set, increasing the overall response time.

flow.product.rollupAssociations: From HCL Commerce version 9.1.15.2+ onwards, enabling flow.product.rollupAssociations, the parent product will also reflect all the merchandising associations added in SKUs of that parent product along with existing merchandising associations.

flow.index.flattened: From HCL Commerce version 9.1.15.0 onwards a new ingest option, flow.index.flattened, has been added to stop map explosion. However, this option comes with a performance degradation cost during faceting and filtering, especially when the facetable attributions contain many entries. The default value is to not index with a "flattened" data type, that is, "false." Enable this option (by setting it to "true") when encountering an incredibly long ingest time while loading many attributes into a product document.

You can ingest catalog data from your own database table for CATENTRY and CATGROUP, and use this as the basis for all downstream ingest operations. Use the following global attributes to define this custom base table condition.

Use flow.database.schema to define the custom database schema name to be used for indexing.
Use custom.table.catentry to provide a custom CATENTRY table to refine the base scope of catalog entry SQLs.
Use custom.where.catentry to provide a custom Where clause of the custom CATENTRY table.
Use custom.table.catgroup to provide a custom CATGROUP table to refine the base scope of catalog group SQLs.
Use custom.where.catgroup to provide a custom Where clause of the custom CATGROUP table.

To set a custom value to an Ingest feature, you could issue the following REST call,

PATCH /search/resources/api/v2/configuration?nodeName=ingest&envType=auth

with the following content:

{
    "global": {
        "connector": [
            {
                "name": "attribute",
                "property": [
                    
{                         "name": name_of_ingest_feature,
                         "value": value_of_this_property                     }

                ]
            }
        ]
    }
}

To disable the default database version of the LISTAGG table, you would include the following payload in the request:

{
    "global": {
        "connector": [
            {
                "name": "attribute",
                "property": [
                    
{                         "name": "flow.database.listagg",
                         "value": "false"                     }

                ]
            }
        ]
    }
}

From HCL Commerce 9.1.15.2+ onwards, enabling flow.product.rollupAssociations, the parent product will also reflect all the merchandising associations added in SKUs of that parent product along with existing merchandising associations.

PATCH/search/resources/api/v2/configuration?nodeName=ingest&envType=auth

{
    "global": {
        "connector": [
            {
                "name": "attribute",
                "property": [
                   

{                         "name": "flow.product.rollupAssociations",                         "value": "true"                     }
                ]
            }
        ]
    }
}