Using List Aggregation with Ingest
You can issue SQL calls to retrieve database data from inside the NiFi pipeline. This information is converted from the "2d" tabular format in which the database stores it, to the one-dimensional string that the Elasticsearch process expects. Normally this list aggregation takes place inside the database, however each database imposes a limit on how long a returned string can be; in the case of DB2, for example, this length is 32k. If the SQL tries to serialize a longer string, it will be truncated.
To avoid this problem, HCL Commerce Search provides an application-level
function to perform the serialization rather than having the database do it. You
control this behavior using a variable that you can set at the flow level or, as a
global switch, at the level of the reindex link or the NRT link. This variable is
flow.database.listagg
, and the default value is
True
. Setting it in ReindexLink
,
NRTLink
, or DataloadLink
means to define this
property globally throughout the entire dataflow, while setting this property
against a connector pipe instead will scope the operation only against that given
stage.
In general you should use the database LISTAGG function and only disable it when the limit is reached. This avoids the unnecessary overhead of performing the list aggregration inside the NiFi application.