Leveraging Instana for advanced observability

You can integrate HCL Workload Automation with Instana, a platform that provides deep observability into microservices and containerized applications, giving you a complete view of your system's health and performance.

Its core capabilities are fully automated and include:

Application Performance Monitoring (APM) to track service health.
Root cause analysis to quickly identify the source of failures.
Anomaly detection to proactively find and address unusual behaviour.

Integrating HCL Workload Automation with Instana enhances your monitoring capabilities by helping you track and manage business process objectives. Through detailed analytics and historical trends, you can easily identify patterns, troubleshoot issues, and optimize performance over time. Key benefits of this integration include the ability to:

Gain end-to-end visibility into job processing across all systems and applications.
Receive real-time alerts for job delays, failures, or performance degradations.
Automatically correlate workload jobs with underlying infrastructure and application metrics.
Accelerate root cause analysis with contextual data and intelligent alerts, reducing investigation time.
Create custom dashboards and define KPIs to track what matters most to your business.

Integrating Instana

The integration of HCL Workload Automation with Instana includes the installation of the Instana agent and, optionally, the OpenTelemetry configuration:

Instana agent installation

This step is required to start integrating Instana with HCL Workload Automation. After having installed the Instana agent, you can perform the monitoring of the metrics.

You can directly deploy the Instana agent using standard methods like Docker, Kubernetes, or other system packages. The configuration process is streamlined, requiring only minimal details such as your API key and environment information. After the deployment, the agent automatically discovers all services, containers, and infrastructure components, enabling immediate tracing and metrics collection.

To ensure your HCL Workload Automation environment is detected, navigate to the Infrastructure page on the Instana user interface and verify that the virtual machine containing the master domain manager is listed. If the virtual machine is not listed, you must install an agent on that machine.

For the full installation procedure of the Instana agent, see Installing host agents.

To download the Instana dashboard, see Instana Dashboard.
OpenTelemetry configuration
You can leverage OpenTelemetry to analyze the traces. The configuration requires three steps:
1. Enable the OpenTelemetry SDK
  
  Follow the steps described in Enabling observability with OpenTelemetry to enable the OpenTelemetry SDK.
2. Install the OpenTelemetry Collector
  
  Install the OpenTelemetry Collector.
  OpenTelemetry is available by default on each master domain manager and Dynamic Workload Console installed with a fresh installation. You can also enable it after upgrading to the current version.
  After the installation or upgrade has completed, perform the following steps to enable OpenTelemetry:
  
  Install and configure a tracing tool of your choice, for example Jaeger, Prometheus, or Splunk.
  
  Stop WebSphere Application Server Liberty, as described in Application server - starting and stopping.
  
  Browse to the following paths:
  master domain manager
  
  On UNIX operating systems
  
  TWA_home/usr/servers/engineServer
  
  On Windows operating systems
  
  TWA_home\usr\servers\engineServer
  
  Dynamic Workload Console
  
  On UNIX operating systems
  
  DWC_home/usr/servers/dwcServer
  
  On Windows operating systems
  
  DWC_home\usr\servers\dwcServer
  
  Edit the following properties in the server.env configuration file, based on the specifics of your environment:
  OTEL_EXPORTER_OTLP_ENDPOINT=http://{OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT} OTEL_EXPORTER_OTLP_TRACES_ENDPOINT={OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT} OTEL_SDK_DISABLED=false OTEL_TRACES_EXPORTER OTEL_EXPORTER_OTLP_PROTOCOL
  where
  
  OTEL_EXPORTER_OTLP_ENDPOINT
  
  A base endpoint URL for any signal type, with an optionally-specified port number
  
  OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
  
  Endpoint URL for trace data only, with an optionally-specified port number
  
  OTEL_SDK_DISABLED
  
  Disable the SDK for all signals
  
  OTEL_TRACES_EXPORTER
  
  Trace exporter to be used
  
  OTEL_EXPORTER_OTLP_PROTOCOL
  
  OTLP transport protocol. Supported values are as follows:
  
  grpc
  
  for protobuf-encoded data using gRPC wire format over HTTP/2 connection
  
  http/protobuf
  
  for protobuf-encoded data over HTTP connection
  
  http/json
  
  for JSON-encoded data over HTTP connection
  
  For more information about the properties in the server.env file, see OpenTelemetry documentation.
  
  Set the following properties in the jvm.option file as described below:
  -Dotel.resource.attributes = service.name=<service_name> -Dotel.metrics.exporter = none -javaagent:<TWA_DIR>/usr/servers/engineServer/opentelemetry-javaagent.jar
  
  Start WebSphere Application Server Liberty, as described in Application server - starting and stopping.
  
  Results: You have now configured OpenTelemetry to work with HCL Workload Automation. The resulting telemetry data are displayed on the workstation you specified in the server.env file.
  When enabling OpenTelemetry, it is important to be aware that it generates a substantial amount of data, which may impact system performance, especially on AIX operating systems.
3. Configure the OpenTelemetry Collector
  
  A default configuration is available after the installation of the OpenTelemetry Collector, but you can customize the parameters according to your needs by editing the config.yaml file.

Visualizing HCL Workload Automation health and performance with Instana

The Instana dashboard download file on Automation Hub contains a default dashboard configuration that you can import on Instana. This dashboard configuration offers a functional layout, and it is designed to highlight the most relevant HCL Workload Automation performance metrics and environment status indicators, strategically arranged to support your monitoring activities.

Dashboard widgets help you quickly identify issues and areas of concern. With their easily readable graphs, you can instantly monitor your environment and configure alerts for critical business events. The clarity of the data visualizations helps any team member, regardless of their technical background, to gain valuable insights at a glance without needing specific Instana training.

This base configuration can be further extended: you can design custom widgets to display additional data, such as metrics exposed directly by the server, or information gathered via OpenTelemetry, ensuring total control over your monitoring activities. For the complete list of all the available metrics, see Exposing metrics to monitor your workload.

The default HCL Workload Automation dashboard configuration on Instana contains the following widgets:

Database connection

The database connection metric monitors the stability of the connection between the HCL Workload Automation server and the database. If the value displayed is 0, the database is not connected; if the value displayed is 1, the database is connected.
Job status overtime

The job status overtime metric monitors how many jobs, over a specific period of time, have been in Canceled, Error, Ready, Successful, and Waiting status.
Job status pie chart

The job status pie chart aggregates the jobs according to their status, which can be Canceled, Error, Ready,Running, Successful, or Waiting.
Message queue usage

The message queue usage widget displays the percentage of message queues in use. If the percentage is 0 or near to 0, the environment is healthy.
CPU utilization

The CPU utilization widget displays the percentage of the CPU that is in use during the selected period of time.
HEAP utilization

The HEAP utilization widget displays the percentage of the HEAP memory that is in use during the selected period of time.
Workstation link status

The Workstation link status widget monitors if the workstations, during a specific period of time, are linked or not.
Workstation running status

The Workstation running status widget monitors if the workstations, during a specific period of time, are running or not.
Critical jobs information

Five widgets are dedicated to the monitoring of critical jobs:

Incomplete predecessors

Monitors the number of incomplete predecessor for each critical job.

High risk

Indicates whether a critical job must be considered to be at high risk or not. If the value is 0, the job is not at high risk; if the value is 1, the job is at high risk.

Potential risk

Indicates whether a critical job must be considered to be potentially at risk or not. If the value is 0, the job is not potentially at risk; if the value is 1, the job is potentially at risk.

Confidence factor

Indicates in percentage the confidence that the critical job will meet its deadline.

Estimated end

Displays a calculated end time for each job, based on an analysis of all available performance metrics.