Leveraging Instana for advanced observability

You can integrate HCL Workload Automation with Instana, a platform that provides deep observability into microservices and containerized applications, giving you a complete view of your system's health and performance.

Its core capabilities are fully automated and include:

  • Application Performance Monitoring (APM) to track service health.
  • Root cause analysis to quickly identify the source of failures.
  • Anomaly detection to proactively find and address unusual behaviour.
Integrating HCL Workload Automation with Instana enhances your monitoring capabilities by helping you track and manage business process objectives. Through detailed analytics and historical trends, you can easily identify patterns, troubleshoot issues, and optimize performance over time. Key benefits of this integration include the ability to:
  • Gain end-to-end visibility into job processing across all systems and applications.
  • Receive real-time alerts for job delays, failures, or performance degradations.
  • Automatically correlate workload jobs with underlying infrastructure and application metrics.
  • Accelerate root cause analysis with contextual data and intelligent alerts, reducing investigation time.
  • Create custom dashboards and define KPIs to track what matters most to your business.

Integrating Instana

The integration of HCL Workload Automation with Instana includes the installation of the Instana agent and, optionally, the OpenTelemetry configuration:
  • Instana agent installation
    This step is required to start integrating Instana with HCL Workload Automation. After having installed the Instana agent, you can perform the monitoring of the metrics.
    You can directly deploy the Instana agent using standard methods like Docker, Kubernetes, or other system packages. The configuration process is streamlined, requiring only minimal details such as your API key and environment information. After the deployment, the agent automatically discovers all services, containers, and infrastructure components, enabling immediate tracing and metrics collection.
    To ensure your HCL Workload Automation environment is detected, navigate to the Infrastructure page on the Instana user interface and verify that the virtual machine containing the master domain manager is listed. If the virtual machine is not listed, you must install an agent on that machine.
    For the full installation procedure of the Instana agent, see Installing host agents.
    To download the Instana dashboard, see Instana Dashboard.
  • OpenTelemetry configuration
    You can leverage OpenTelemetry to analyze the traces. The configuration requires three steps:
    1. Enable the OpenTelemetry SDK
      Follow the steps described in Enabling observability with OpenTelemetry to enable the OpenTelemetry SDK.
    2. Install the OpenTelemetry Collector
      Install the OpenTelemetry Collector.

      OpenTelemetry is available by default on each master domain manager and Dynamic Workload Console installed with a fresh installation. You can also enable it after upgrading to the current version.

      After the installation or upgrade has completed, perform the following steps to enable OpenTelemetry:
      1. Install and configure a tracing tool of your choice, for example Jaeger, Prometheus, or Splunk.
      2. Stop WebSphere Application Server Liberty, as described in Application server - starting and stopping.
      3. Browse to the following paths:

        master domain manager

        On UNIX operating systems
        TWA_home/usr/servers/engineServer
        On Windows operating systems
        TWA_home\usr\servers\engineServer
        Dynamic Workload Console
        On UNIX operating systems
        DWC_home/usr/servers/dwcServer
        On Windows operating systems
        DWC_home\usr\servers\dwcServer
      4. Edit the following properties in the server.env configuration file, based on the specifics of your environment:
         
        OTEL_EXPORTER_OTLP_ENDPOINT=http://{OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT}
        OTEL_EXPORTER_OTLP_TRACES_ENDPOINT={OPENTELEMETRY_HOSTNAME}:{OPENTELEMETRY_PORT}
        OTEL_SDK_DISABLED=false
        OTEL_TRACES_EXPORTER
        OTEL_EXPORTER_OTLP_PROTOCOL
        where
        OTEL_EXPORTER_OTLP_ENDPOINT
        A base endpoint URL for any signal type, with an optionally-specified port number
        OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
        Endpoint URL for trace data only, with an optionally-specified port number
        OTEL_SDK_DISABLED
        Disable the SDK for all signals
        OTEL_TRACES_EXPORTER
        Trace exporter to be used
        OTEL_EXPORTER_OTLP_PROTOCOL
        OTLP transport protocol. Supported values are as follows:
        grpc
        for protobuf-encoded data using gRPC wire format over HTTP/2 connection
        http/protobuf
        for protobuf-encoded data over HTTP connection
        http/json
        for JSON-encoded data over HTTP connection

        For more information about the properties in the server.env file, see OpenTelemetry documentation.

      5. Set the following properties in the jvm.option file as described below:
        
        -Dotel.resource.attributes = service.name=<service_name>
        -Dotel.metrics.exporter = none
        -javaagent:<TWA_DIR>/usr/servers/engineServer/opentelemetry-javaagent.jar
        
      6. Start WebSphere Application Server Liberty, as described in Application server - starting and stopping.

      Results: You have now configured OpenTelemetry to work with HCL Workload Automation. The resulting telemetry data are displayed on the workstation you specified in the server.env file.

      When enabling OpenTelemetry, it is important to be aware that it generates a substantial amount of data, which may impact system performance, especially on AIX operating systems.

    3. Configure the OpenTelemetry Collector
      A default configuration is available after the installation of the OpenTelemetry Collector, but you can customize the parameters according to your needs by editing the config.yaml file.

Visualizing HCL Workload Automation health and performance with Instana

The Instana dashboard download file on Automation Hub contains a default dashboard configuration that you can import on Instana. This dashboard configuration offers a functional layout, and it is designed to highlight the most relevant HCL Workload Automation performance metrics and environment status indicators, strategically arranged to support your monitoring activities.

Dashboard widgets help you quickly identify issues and areas of concern. With their easily readable graphs, you can instantly monitor your environment and configure alerts for critical business events. The clarity of the data visualizations helps any team member, regardless of their technical background, to gain valuable insights at a glance without needing specific Instana training.

This base configuration can be further extended: you can design custom widgets to display additional data, such as metrics exposed directly by the server, or information gathered via OpenTelemetry, ensuring total control over your monitoring activities. For the complete list of all the available metrics, see Exposing metrics to monitor your workload.

The default HCL Workload Automation dashboard configuration on Instana contains the following widgets:
  • Database connection
    The database connection metric monitors the stability of the connection between the HCL Workload Automation server and the database. If the value displayed is 0, the database is not connected; if the value displayed is 1, the database is connected.
  • Job status overtime
    The job status overtime metric monitors how many jobs, over a specific period of time, have been in Canceled, Error, Ready, Successful, and Waiting status.
  • Job status pie chart
    The job status pie chart aggregates the jobs according to their status, which can be Canceled, Error, Ready,Running, Successful, or Waiting.
  • Message queue usage
    The message queue usage widget displays the percentage of message queues in use. If the percentage is 0 or near to 0, the environment is healthy.
  • CPU utilization
    The CPU utilization widget displays the percentage of the CPU that is in use during the selected period of time.
  • HEAP utilization
    The HEAP utilization widget displays the percentage of the HEAP memory that is in use during the selected period of time.
  • Workstation link status
    The Workstation link status widget monitors if the workstations, during a specific period of time, are linked or not.
  • Workstation running status
    The Workstation running status widget monitors if the workstations, during a specific period of time, are running or not.
  • Critical jobs information
    Five widgets are dedicated to the monitoring of critical jobs:
    Incomplete predecessors
    Monitors the number of incomplete predecessor for each critical job.
    High risk
    Indicates whether a critical job must be considered to be at high risk or not. If the value is 0, the job is not at high risk; if the value is 1, the job is at high risk.
    Potential risk
    Indicates whether a critical job must be considered to be potentially at risk or not. If the value is 0, the job is not potentially at risk; if the value is 1, the job is potentially at risk.
    Confidence factor
    Indicates in percentage the confidence that the critical job will meet its deadline.
    Estimated end
    Displays a calculated end time for each job, based on an analysis of all available performance metrics.