Frequently asked questions about metrics and monitoring

A list of questions and answers about metrics and monitoring.

  • Q: What is OpenMetrics in HCL Workload Automation?

    A: It is an HTTP endpoint exposed by HCL Workload Automation that returns system and business metrics in a standard format compatible with monitoring and analytics tools.

  • Q: How are the metrics generated and collected?

    A: Metrics are not read directly from the database upon every request. They are generated internally within the HCL Workload Automation code, stored in an in-memory cache, and returned quickly when the collector makes a request.

  • Q: Where is the metrics endpoint located?

    A: The endpoint is exposed on the HCL Workload Automation component IP address and port. For the master domain manager, this is https://<MDM_IP>:<PORT>/metrics.

  • Q: Is authentication required to access the metrics?

    A: Yes, access is protected for enterprise environments. A robust authentication mechanism, for example, token, client certificate, or HTTP Basic Auth, is typically required and must be configured in your monitoring and analytics tools scrape job.

  • Q: How often is the data updated (Data Freshness)?
    A: Data freshness is determined by two factors:
    • The frequency with which the HCL Workload Automation backend collector generates new metrics (usually every few seconds).
    • The scraping interval configured in your monitoring and analytics tools (typically every 5-15 seconds).
  • Q: What is the maximum end-to-end latency?

    A: The maximum perceived latency, that is the longest time between an event and its display in the metrics tool, is equal to the metrics tool scrape frequency. If your scrape interval is 15 seconds, the maximum latency is 15 seconds.

  • Q: Does accessing metrics impact HCL Workload Automation server performance?

    A: The impact is minimal. The use of an in-memory cache ensures HCL Workload Automation spends minimal resources responding to a metrics request, avoiding intensive, time-consuming queries on the HCL Workload Automation database.

  • Q: What are the crucial labels (metadata) for HCL Workload Automation metrics?

    A: Critical labels for data correlation include: instance (the master domain manager or agent hostname), job_stream_name, workstation, and status (success, failure, abend).

  • Q: What should I check if a metric is missing or incorrect?
    A:
    • Check the scrape status: verify in your monitoring and analytics tools that the HCL Workload Automation target is up and running.
    • Review HCL Workload Automation logs: check the HCL Workload Automation logs for errors related to metric generation.
    • Verify Authentication: confirm that the monitoring and analytics tools service account or configuration has the necessary permissions.
  • Q: Is there a difference between master domain manager and agent metrics?

    A: Yes. The master domain manager exposes control and planning metrics, for example, plan state, number of dependencies. The dynamic agents expose execution and health metrics, for example local Agent CPU and memory usage, and local job run count.