Exposing metrics to monitor your workload

To control and monitor your workload, you can have HCL Workload Automation expose a number of metrics that provide insight into the state, health, and performance of your workload environment and infrastructure. By further analyzing these values through a data analytics tool, such as AI Data Advisor (AIDA), you detect anomalies and anticipate failure or degradations.

For more information about AIDA and how to use it, see AI Data Advisor (AIDA) User's Guide.

Collecting these metrics can be useful for many reasons:
  • Generating alerts and addressing problems before they actually occur.
  • Monitoring and analyzing trends
  • Comparing historical data
  • Detecting anomalies

Workload Automation exposed metrics shows a list of the metrics retrieved, along with their description.

See Authorizing access to the metrics for preliminary steps to viewing the metrics.

See Accessing and visualizing the metrics to find out where and how to find the metrics.

See Applying a fix pack for information about the steps to be performed when you apply a fix pack.

Table 1. Workload Automation exposed metrics
Metric Display Name Metric name Description
Flexera Monitoring application_wa_licence_uncountedJobs The number of jobs that ran and Flexera was unable to count.
Workload application_wa_JobsInPlanCount_jobs Workload by job status: WAITING, READY, HELD, BLOCKED, CANCELED, ERROR, RUNNING, SUCCESSFUL, SUPPRESS, UNDECIDED
application_wa_JobsByWorkstation Job status by workstation
application_wa_JobsByFolder_jobs Job status by folder
application_wa_JobsInPlanCount_jobs Workload throughput (jobs/minute)
Critical Jobs application_wa_criticalJob_incompletePredecessor Incomplete predecessors
application_wa_criticalJob_potentialRisk_boolean Risk level: potential risk
application_wa_criticalJob_highRisk_boolean Risk level: high risk
application_wa_criticalJob_estimateEnd_seconds Estimated end
application_wa_criticalJob_confidence_factor Confidence factor
WA Server - Internal Message Queues application_wa_msgFileFill_percent Internal message queue usage for Appserverbox.msg, Courier.msg, mirrorbox.msg, Mailbox.msg, Monbox.msgn, Moncmd.msg, auditbox.msg, clbox.msg, planbox.msg, Intercom.msg, pobox messages, and server.msg
Workstation Status application_wa_workstation_running Workstations running
application_wa_workstation_linked_boolean Workstations linked
Database Connection Status application_wa_DB_connected_boolean 1 - connected, 0 - not connected
WA Server and Console - Liberty base_memory_usedHeap_bytes Heap usage percentage
vendor_session_activeSessions Active sessions
vendor_session_liveSessions Live sessions
vendor_threadpool_activeThreads Active threads
vendor_threadpool_size Threadpool size
base_gc_time_seconds Time per garbage collection cycle moving average
WA Sever and Console - Connection Pools (Liberty) vendor_connectionpool_inUseTime_total_seconds Average time usage per connection in milliseconds
vendor_connectionpool_managedConnections Managed connections
vendor_connectionpool_freeConnections Free connections
vendor_connectionpool_connectionHandles Connection handles
vendor_connectionpool_destroy_total Created and destroyed connections

Authorizing access to the metrics

To allow a user to access the metrics from the server, you must grant the administrator role to the user from the configured user registry.

If you access the metrics from the console, no configuration is required. See Accessing and visualizing the metrics.

Create a configuration file named, prometheus.xml, in which you define a user with administrator privileges to access the metrics data. Perform the following procedure to authorize a user access to the metrics with a basic user registry:

  1. To grant the default user specified in the authentication_config.xml file the administrator role to access the metrics, create a file named prometheus.xml with the following content:
    <server>
      <featureManager>       
         <feature>mpMetrics-2.3</feature>
         <feature>cdi-1.2</feature>
      </featureManager> 
      <administrator-role> 
        <user>${user.twsuser.id}</user> 
      </administrator-role>
    </server>
    
  2. Save the file prometheus.xml in the following location:
    On UNIX operating systems master domain manager
    TWA_DATA_DIR/usr/servers/engineServer/configDropins/overrides
    On Windows operating systems master domain manager
    TWA_home\usr\servers\engineServer\configDropins\overrides
  3. If, instead, you want to use a user different from the default user to access metrics from the server, you must update the authentication_config.xml file with this user.
    1. Make a backup copy of the existing authentication_config.xml file located in the following path:
      On UNIX operating systems
      master domain manager
      TWA_DATA_DIR/usr/servers/engineServer/configDropins/overrides
      On Windows operating systems
      master domain manager
      TWA_home\usr\servers\engineServer\configDropins\overrides
    2. Edit the existing authentication_config.xml file adding the user to which you want to grant access to the metrics. The following is an example of a configuration that enables a user different from the default user, for example, OTHERUSER, in the configured user registry to access the metrics:
      <server description="basicRealm">
      
              <basicRegistry id="basic" realm="TWSRealm">
                       <!--
                              This user is defined in wauser_variables.xml,
                              and it is the user used by liberty to run, if you remove
                              this user please set another valid user and password
                              defined into the user registry in wauser_variables.xml.
                      -->
                     <user name="${user.twsuser.id}" password="${user.twsuser.password}"/>
                     <user name="OTHERUSER" password="OTHERUSERPASSWORD"/>
               </basicRegistry>
      </server>
    3. Save the authentication_config.xml file with the changes.
  4. Dynamic Workload Console users defined in the authentication_config.xml on the console can access the metrics data from the console. To authorize additional users, add them to the authentication_config.xml as follows:
    1. Make a backup copy of the existing authentication_config.xml file located in the following path:
      Dynamic Workload Console
      DWC_DATA_dir/usr/servers/dwcServer/configDropins/templates/authentication
      On Windows operating systems
      Dynamic Workload Console
      DWC_home\usr\servers\dwcServer\configDropins\templates\authentication
    2. Edit the existing authentication_config.xml file adding the user to which you want to grant access to the metrics. The following is an example of a configuration that enables a user different from the default user, for example, OTHERUSER, in the configured user registry to access the metrics:
      <server description="basicRealm">
      
              <basicRegistry id="basic" realm="TWSRealm">
                       <!--
                              This user is defined in wauser_variables.xml,
                              and it is the user used by liberty to run, if you remove
                              this user please set another valid user and password
                              defined into the user registry in wauser_variables.xml.
                      -->
                     <user name="${user.twsuser.id}" password="${user.twsuser.password}"/>
                     <user name="OTHERUSER" password="OTHERUSERPASSWORD"/>
               </basicRegistry>
      </server>
    3. Save the authentication_config.xml file with the changes.

Accessing and visualizing the metrics

If you use AIDA, you can use the metrics exposed by HCL Workload Automation to detect anomalies in your workload and prevent problems.

For more information about AIDA and how to use it, see AI Data Advisor (AIDA) User's Guide.

You can also use other monitoring tools which support the OpenMetrics standard, for example Grafana, Prometheus, Splunk, and so on.

If you use Grafana, you have access to an out-of-the-box preconfigured dashboard. You can access the preconfigured dashboard named, Grafana Dashboard: Distributed Environments, from Automation Hub to use in your on-premises deployments including Docker.

A separate preconfigured dashboard named, Grafana Dashboard: Kubernetes Environments, is available for cluster monitoring, including monitoring pods. Automation Hub gives you access to the downloadable JSON file on the Grafana web site. The dashboard visualizes the metrics for observability.

The metrics are exposed so that any monitoring tool supporting the OpenMetrics standard can display them. To access the metrics:
From the master domain manager (or server in a cloud environment):
You can view the metrics from any browser by accessing the /metrics endpoint with the credentials of the user defined in the prometheus.xml file. The product REST APIs retrieve and expose the metrics data through the following address:
https://MDM_HOST:MDM_PORT_HTTP/metrics
where,
MDM_HOST
Represents the hostname or IP address of the master domain manager.
MDM_PORT_HTTP
Represents the HTTP port number of the master domain manager.
From the Dynamic Workload Console (or console in a cloud environment):
You can view the metrics from any browser by accessing the /metrics endpoint with the credentials of the user defined in the authentication_config.xml file. The product REST APIs retrieve and expose the metrics data through the following address:
https://DWC_HOST:DWC_PORT_HTTP/metrics
where,
DWC_HOST
Represents the hostname or IP address of the console.
DWC_PORT_HTTP
Represents the HTTP port number of the console.

Prometheus is an open-source monitoring and alerting solution. It is particularly useful for collecting time series data that can be easily queried. Prometheus pulls data from targets and then exposes it as metrics through a host address. Prometheus can be configured to retrieve metrics at regular intervals.

Prometheus integrates with monitoring tools like Grafana to visualize the metrics collected. Grafana uses the Prometheus system as a datasource and all of the HCL Workload Automation metrics can be accessed and added to dashboards.

Dashboards display information such as:
  • Middleware metrics (WebSphere Application Server Liberty Base)
  • HCL Workload Automation infrastructure (message files)
  • Workload statistics (jobs per status, total count or grouped by folder or by workstation)
  • Critical job information (risk level, confidence factor, incomplete predecessors, estimated end)
  • Workstation status (running, linked)

Applying a fix pack

If you apply a fix pack to you environment, perform the following steps on the master domain manager before you start exposing metrics again:
  1. Move the prometheus.xml file outside of the overrides folder.
  2. Update the master domain manager.
  3. In the prometheus.xml file, modify the cdi version from 1.2 to 2.0, from:
    <server>
      <featureManager>
          <feature>mpMetrics-2.3</feature>
          <feature>cdi-1.2</feature>
      </featureManager>
    
      <mpMetrics authentication="false" />
    
    </server>
    to
    <server>
      <featureManager>
          <feature>mpMetrics-2.3</feature>
          <feature>cdi-2.0</feature>
      </featureManager>
    
      <mpMetrics authentication="false" />
    
    </server>
  4. Move the prometheus.xml file back to the overrides folder