Statistics service for the Indexing build summary
The statistics service gathers data on catalog dimensions. You can store this data in the HCL Cache for use in creating Ingest run summaries, and for tuning parameter values.
Using the service
The statistics service runs once per day by default. If you want to change the run frequency, update the Run Schedule variable under the Generate FlowFile Process inside the Statistics Auth or Statistics Live stage of the indexing service. It gathers data on all relevant catalog related dimensions, such as products or items, categories, attributes and their values, facets, Search Term Associations, contracts, and prices. The service adds this information to the Indexing Build summary document. Once the statistics have been collected, you can store them in the HCL Cache or even an index of their own, so that they can be looked up when creating an Ingest Run summary. This data is also needed when calculating certain tuning parameter values.
If you are using a single environment configuration, the service
gathers dimensions related only to that environment. For example, if you run the
service in the Auth environment the setup service will fail to
gather data for the Live JDBC and an error will be written to
the logs, which you can access using using GET
/log/_search?q=severity:E
with error code
(logger.message.code
) DI2150E
and
DI2151E
for Live and
Auth respectively. This can be resolved by creating a dual
environment setup, as described in Elasticsearch with dedicated Auth and Live nodes. After
changing your environment, run the service again to get the latest Catalog data
profile summary. For more information about dual
The data gathered by the service can be accessed through the index status API for all reindex operations.
SQL count
method
is used to gather the dimensions. Each SQL statement requires a unique identifier
and the location (file system address) of the SQL code file that contains it. Some
examples are provided below to illustrate the kinds of dimensions to be collected.
You can adjust this SQL to account for other use cases. SQL CODE (ingest.database.sql) | IDENTIFIER (ingest.database.identifier) |
---|---|
SELECT COUNT(*) COUNTER FROM STORE; |
total.stores |
|
total.catalog.asset.stores |
|
total.b2c.stores |
|
total.b2b.stores |
|
total.b2c.storefront.asset.stores |
|
total.b2b.storefront.asset.stores |
|
total.supported.languages |
SELECT COUNT(*) COUNTER FROM STORECAT;
|
total.catalogs |
|
total.master.catalogs |
|
|
|
total.catentries |
|
total.products |
|
total.items |
|
total.other.catenties |
SELECT COUNT(*) COUNTER FROM CONTRACT WHERE STATE =
3; |
total.contracts |
SELECT COUNT(*) COUNTER FROM ATTR; |
total.attributes |
SELECT COUNT(*) COUNTER FROM ATTR WHERE ATTRUSAGE =
1; |
total.allowed.value.attributes |
|
total.assigned.value.attributes |
SELECT COUNT(*) COUNTER FROM ATTR WHERE FACETABLE =
1; |
total.facetable.attributes |
SELECT COUNT(*) COUNTER FROM ATTRVAL;
|
total.attribute.values |
|
total.allowed.attribute.values |
|
total.assigned.attribute.values |
SELECT COUNT(*) COUNTER FROM CATENTRYATTR;
|
total.catentry.attributes |
|
total.catentry.defining.attributes |
|
total.catentry.descriptive.attributes |
|
total.pricelists |
|
total.offers |
SELECT COUNT(*) COUNTER FROM OFFERPRICE;
|
total.offerprices |
SELECT COUNT(*) COUNTER FROM CATGRPTPC;
|
total.offer.pricelists |
SELECT COUNT(*) COUNTER FROM INVENTORY;
|
total.inventories |
The service is used to capture statistics for tuning purpose. It is not a real-time Catalog data profile summary.
If there have been significant changes in Catalog size or you have customized the SQL, re-run the service by changing the Run Schedule in the Generate FlowFile Process inside the Statistics Auth or Statistics Live stage of the indexing process, and do a full reindexing run to generate a real-time Catalog data profile summary.
The Ingest summary is a record of the dimension sizes at the time that it was run. It is not updated in real-time, so any changes made to the underlying tables will not be registered until the next time you run the report.
Adding custom SQL to the service
{
"identifier" : "total.stores",
"location" : "sql/statistics/TotalStores.sql"
},
- In the SQL statement, store the value extracted by the query in COUNTER. This value corresponds to the value in the key-value pair used to store the dimension in HCL-Cache.
- Create a file with the same name as you will be using for the identifier in the SQL code inside sql/statistics and use the SQL file name of the IDENTIFIER key. Examples of the key can be found in the second column of the Default dimension queries table.
- Once you have created the SQL file and its corresponding identifier, add
the identifier/location pair in JSON format to the Replacement
Value variable inside the Catalog
Flowfile processor. Use the convention
Identifier-SQL JSON for the identifier and
sql/statistics/sql_filename.sql
for the SQL file. For example, the default dimension identifiers and their
corresponding SQL code files are defined as
follows:
[ { "identifier" : "total.stores", "location" : "sql/statistics/TotalStores.sql" }, { "identifier" : "total.catalog.asset.stores", "location" : "sql/statistics/TotalCatalogAssetStores.sql" }, { "identifier" : "total.b2c.stores", "location" : "sql/statistics/Totalb2cStores.sql" }, { "identifier" : "total.b2b.stores", "location" : "sql/statistics/Totalb2bStores.sql" }, { "identifier" : "total.b2c.storefront.asset.stores", "location" : "sql/statistics/Totalb2cStorefrontAssetStores.sql" }, { "identifier" : "total.b2b.storefront.asset.stores", "location" : "sql/statistics/Totalb2bStorefrontAssetStores.sql" }, { "identifier" : "total.supported.languages", "location" : "sql/statistics/TotalSupportedLanguages.sql" }, { "identifier" : "total.catalogs", "location" : "sql/statistics/TotalCatalogs.sql" }, { "identifier" : "total.master.catalogs", "location" : "sql/statistics/TotalMasterCatalogs.sql" }, { "identifier" : "total.categories", "location" : "sql/statistics/TotalCategories.sql" }, { "identifier" : "total.catentries", "location" : "sql/statistics/TotalCatentries.sql" }, { "identifier" : "total.products", "location" : "sql/statistics/TotalProducts.sql" }, { "identifier" : "total.items", "location" : "sql/statistics/TotalItems.sql" }, { "identifier" : "total.other.catenties", "location" : "sql/statistics/TotalOtherCatenties.sql" }, { "identifier" : "total.contracts", "location" : "sql/statistics/TotalContracts.sql" }, { "identifier" : "total.attributes", "location" : "sql/statistics/TotalAttributes.sql" }, { "identifier" : "total.allowed.value.attributes", "location" : "sql/statistics/TotalAllowedValueAttributes.sql" }, { "identifier" : "total.assigned.value.attributes", "location" : "sql/statistics/TotalAssignedValueAttributes.sql" }, { "identifier" : "total.facetable.attributes", "location" : "sql/statistics/TotalFacetableAttributes.sql" }, { "identifier" : "total.attribute.values", "location" : "sql/statistics/TotalAttributeValues.sql" }, { "identifier" : "total.allowed.attribute.values", "location" : "sql/statistics/TotalAllowedAttributeValues.sql" }, { "identifier" : "total.assigned.attribute.values", "location" : "sql/statistics/TotalAssignedAttributeValues.sql" }, { "identifier" : "total.catentry.attributes", "location" : "sql/statistics/TotalCatentryAttributes.sql" }, { "identifier" : "total.catentry.defining.attributes", "location" : "sql/statistics/TotalCatentryDefiningAttributes.sql" }, { "identifier" : "total.catentry.descriptive.attributes", "location" : "sql/statistics/TotalCatentryDescriptiveAttributes.sql" }, { "identifier" : "total.pricelists", "location" : "sql/statistics/TotalPricelists.sql" }, { "identifier" : "total.offers", "location" : "sql/statistics/TotalOffers.sql" }, { "identifier" : "total.offerprices", "location" : "sql/statistics/TotalOfferprices.sql" }, { "identifier" : "total.offer.pricelists", "location" : "sql/statistics/TotalOfferPricelists.sql" }, { "identifier" : "total.inventories", "location" : "sql/statistics/TotalInventories.sql" } ]