Leveraging Instana for advanced observability
You can integrate HCL Workload Automation with Instana, a platform that provides deep observability into microservices and containerized applications, giving you a complete view of your system's health and performance.
Its core capabilities are fully automated and include:
- Application Performance Monitoring (APM) to track service health.
- Root cause analysis to quickly identify the source of failures.
- Anomaly detection to proactively find and address unusual behaviour.
- Gain end-to-end visibility into job processing across all systems and applications.
- Receive real-time alerts for job delays, failures, or performance degradations.
- Automatically correlate workload jobs with underlying infrastructure and application metrics.
- Accelerate root cause analysis with contextual data and intelligent alerts, reducing investigation time.
- Create custom dashboards and define KPIs to track what matters most to your business.
Integrating Instana
-
- Instana agent installation
- This step is required to start integrating Instana with HCL Workload Automation. After having installed the Instana agent, you can perform the monitoring of the metrics.
-
- OpenTelemetry configuration
- You can leverage OpenTelemetry to analyze the traces. The
configuration requires three steps:
-
- Enable the OpenTelemetry SDK
- Follow the steps described in Enabling observability with OpenTelemetry to enable the OpenTelemetry SDK.
-
- Install the OpenTelemetry Collector
- Install the OpenTelemetry Collector.
OpenTelemetry is available by default on each master domain manager and Dynamic Workload Console installed with a fresh installation. You can also enable it after upgrading to the current version.
After the installation or upgrade has completed, perform the following steps to enable OpenTelemetry:- Install and configure a tracing tool of your choice, for example Jaeger, Prometheus, or Splunk.
- Stop WebSphere Application Server Liberty, as described in Application server - starting and stopping.
- Browse to the following paths:
master domain manager
On UNIX operating systems- TWA_home/usr/servers/engineServer
On Windows operating systems- TWA_home\usr\servers\engineServer
On UNIX operating systems- DWC_home/usr/servers/dwcServer
On Windows operating systems- DWC_home\usr\servers\dwcServer
- Edit the following properties in the server.env
configuration file, based on the specifics of your
environment:
whereOTEL_EXPORTER_OTLP_ENDPOINT=http://{OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT} OTEL_EXPORTER_OTLP_TRACES_ENDPOINT={OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT} OTEL_SDK_DISABLED=false OTEL_TRACES_EXPORTER OTEL_EXPORTER_OTLP_PROTOCOL- OTEL_EXPORTER_OTLP_ENDPOINT
- A base endpoint URL for any signal type, with an optionally-specified port number
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
- Endpoint URL for trace data only, with an optionally-specified port number
- OTEL_SDK_DISABLED
- Disable the SDK for all signals
- OTEL_TRACES_EXPORTER
- Trace exporter to be used
- OTEL_EXPORTER_OTLP_PROTOCOL
- OTLP transport protocol. Supported values are as follows:
grpc- for protobuf-encoded data using gRPC wire format over HTTP/2 connection
http/protobuf- for protobuf-encoded data over HTTP connection
http/json- for JSON-encoded data over HTTP connection
For more information about the properties in the server.env file, see OpenTelemetry documentation.
- Set the following properties in the jvm.option file as
described below:
-Dotel.resource.attributes = service.name=<service_name> -Dotel.metrics.exporter = none -javaagent:<TWA_DIR>/usr/servers/engineServer/opentelemetry-javaagent.jar - Start WebSphere Application Server Liberty, as described in Application server - starting and stopping.
Results: You have now configured OpenTelemetry to work with HCL Workload Automation. The resulting telemetry data are displayed on the workstation you specified in the server.env file.
When enabling OpenTelemetry, it is important to be aware that it generates a substantial amount of data, which may impact system performance, especially on AIX operating systems.
-
- Configure the OpenTelemetry Collector
- A default configuration is available after the installation of the OpenTelemetry Collector, but you can customize the parameters according to your needs by editing the config.yaml file.
-
Visualizing HCL Workload Automation health and performance with Instana
The Instana dashboard download file on Automation Hub contains a default dashboard configuration that you can import on Instana. This dashboard configuration offers a functional layout, and it is designed to highlight the most relevant HCL Workload Automation performance metrics and environment status indicators, strategically arranged to support your monitoring activities.
Dashboard widgets help you quickly identify issues and areas of concern. With their easily readable graphs, you can instantly monitor your environment and configure alerts for critical business events. The clarity of the data visualizations helps any team member, regardless of their technical background, to gain valuable insights at a glance without needing specific Instana training.
This base configuration can be further extended: you can design custom widgets to display additional data, such as metrics exposed directly by the server, or information gathered via OpenTelemetry, ensuring total control over your monitoring activities. For the complete list of all the available metrics, see Exposing metrics to monitor your workload.
-
- Database connection
- The database connection metric monitors the stability of the connection between the HCL Workload Automation server and the database. If the value displayed is 0, the database is not connected; if the value displayed is 1, the database is connected.
-
- Job status overtime
- The job status overtime metric monitors
how many jobs, over a specific period of time, have been in
Canceled,Error,Ready,Successful, andWaitingstatus.
-
- Job status pie chart
- The job status pie chart aggregates the
jobs according to their status, which can be
Canceled,Error,Ready,Running,Successful, orWaiting.
-
- Message queue usage
- The message queue usage widget displays the percentage of message queues in use. If the percentage is 0 or near to 0, the environment is healthy.
-
- CPU utilization
- The CPU utilization widget displays the percentage of the CPU that is in use during the selected period of time.
-
- HEAP utilization
- The HEAP utilization widget displays the percentage of the HEAP memory that is in use during the selected period of time.
-
- Workstation link status
- The Workstation link status widget monitors if the workstations, during a specific period of time, are linked or not.
-
- Workstation running status
- The Workstation running status widget monitors if the workstations, during a specific period of time, are running or not.
-
- Critical jobs information
- Five widgets are dedicated to the monitoring of critical jobs:
- Incomplete predecessors
- Monitors the number of incomplete predecessor for each critical job.
- High risk
- Indicates whether a critical job must be considered to be at high risk or not. If the value is 0, the job is not at high risk; if the value is 1, the job is at high risk.
- Potential risk
- Indicates whether a critical job must be considered to be potentially at risk or not. If the value is 0, the job is not potentially at risk; if the value is 1, the job is potentially at risk.
- Confidence factor
- Indicates in percentage the confidence that the critical job will meet its deadline.
- Estimated end
- Displays a calculated end time for each job, based on an analysis of all available performance metrics.