Exposing metrics to monitor your workload

To control and monitor your workload, you can have HCL Workload Automation expose a number of metrics that provide insight into the state, health, and performance of your workload environment and infrastructure. By further analyzing these values through a data analytics tool, such as AI Data Advisor (AIDA), you detect anomalies and anticipate failure or degradations.

For more information about AIDA and how to use it, see AI Data Advisor (AIDA) User's Guide.

Collecting these metrics can be useful for many reasons:

Generating alerts and addressing problems before they actually occur.
Monitoring and analyzing trends
Comparing historical data
Detecting anomalies

Workload Automation exposed metrics shows a list of the metrics retrieved, along with their description.

See Accessing and visualizing the metrics to find out where and how to find the metrics.

Table 1. Workload Automation exposed metrics
Metric Display Name	Metric name	Description
Monitoring	application_wa_licence_uncountedJobs	The number of jobs that ran but were not counted.
Workload	application_wa_JobsInPlanCount_jobs	Workload by job status: WAITING, READY, HELD, BLOCKED, CANCELED, ERROR, RUNNING, SUCCESSFUL, SUPPRESS, UNDECIDED
	application_wa_JobsByWorkstation	Job status by workstation
	application_wa_JobsByFolder_jobs	Job status by folder
	application_wa_JobsInPlanCount_jobs	Workload throughput (jobs/minute)
Critical Jobs	application_wa_criticalJob_incompletePredecessor	Incomplete predecessors
	application_wa_criticalJob_potentialRisk_boolean	Risk level: potential risk
	application_wa_criticalJob_highRisk_boolean	Risk level: high risk
	application_wa_criticalJob_estimateEnd_seconds	Estimated end
	application_wa_criticalJob_confidence_factor	Confidence factor
WA Server - Internal Message Queues	application_wa_msgFileFill_percent	Internal message queue usage for Appserverbox.msg, Courier.msg, mirrorbox.msg, Mailbox.msg, Monbox.msgn, Moncmd.msg, auditbox.msg, clbox.msg, planbox.msg, Intercom.msg, pobox messages, and server.msg
Workstation Status	application_wa_workstation_running	Workstations running
Workstation Status	application_wa_workstation_linked_boolean	Workstations linked
Database Connection Status	application_wa_DB_connected_boolean	1 - connected, 0 - not connected
WA Server and Console - Liberty	memory_usedHeap_bytes	Heap usage percentage
	session_activeSessions	Active sessions
	session_liveSessions	Live sessions
	threadpool_activeThreads	Active threads
	threadpool_size	Threadpool size
	gc_time_seconds	Time per garbage collection cycle moving average
WA Sever and Console - Connection Pools (Liberty)	connectionpool_inUseTime_total_seconds	Average time usage per connection in milliseconds
	connectionpool_managedConnections	Managed connections
	connectionpool_freeConnections	Free connections
	connectionpool_connectionHandles	Connection handles
	connectionpool_destroy_total	Created and destroyed connections

Accessing and visualizing the metrics

If you use AIDA, you can use the metrics exposed by HCL Workload Automation to detect anomalies in your workload and prevent problems.

For more information about AIDA and how to use it, see AI Data Advisor (AIDA) User's Guide.

You can also use other monitoring tools which support the OpenMetrics standard, for example Grafana, Prometheus, Splunk, and so on.

If you use Grafana, you have access to an out-of-the-box preconfigured dashboard. You can access the preconfigured dashboard named, Grafana Dashboard: Distributed Environments, from Automation Hub to use in your on-premises deployments including Docker.

A separate preconfigured dashboard named, Grafana Dashboard: Kubernetes Environments, is available for cluster monitoring, including monitoring pods. Automation Hub gives you access to the downloadable JSON file on the Grafana web site. The dashboard visualizes the metrics for observability.

The metrics are exposed so that any monitoring tool supporting the OpenMetrics standard can display them. To access the metrics:

From the master domain manager (or server in a cloud environment):

You can view the metrics from any browser by accessing the /metrics endpoint. The product REST APIs retrieve and expose the metrics data through the following address:

https://MDM_HOST:MDM_PORT_HTTP/metrics

where,

MDM_HOST: Represents the hostname or IP address of the master domain manager.
MDM_PORT_HTTP: Represents the HTTP port number of the master domain manager.

From the Dynamic Workload Console (or console in a cloud environment):

You can view the metrics from any browser by accessing the /metrics endpoint with the credentials of the user defined in the authentication_config.xml file. The product REST APIs retrieve and expose the metrics data through the following address:

https://DWC_HOST:DWC_PORT_HTTP/metrics

where,

DWC_HOST: Represents the hostname or IP address of the console.
DWC_PORT_HTTP: Represents the HTTP port number of the console.

You can also filter the metrics by scope, for example:

https://DWC_HOST:DWC_PORT_HTTP/metrics?scope=SCOPE

where

SCOPE: Represents the scope of the metric: vendor, base, or application.

Prometheus is an open-source monitoring and alerting solution. It is particularly useful for collecting time series data that can be easily queried. Prometheus pulls data from targets and then exposes it as metrics through a host address. Prometheus can be configured to retrieve metrics at regular intervals.

Prometheus integrates with monitoring tools like Grafana to visualize the metrics collected. Grafana uses the Prometheus system as a datasource and all of the HCL Workload Automation metrics can be accessed and added to dashboards.

Dashboards display information such as:

Middleware metrics (WebSphere Application Server Liberty)
HCL Workload Automation infrastructure (message files)
Workload statistics (jobs per status, total count or grouped by folder or by workstation)
Critical job information (risk level, confidence factor, incomplete predecessors, estimated end)
Workstation status (running, linked)