Planning for recovery and restart

This chapter describes how to plan for a job that fails. You can restart it automatically, or examine the job log first and then restart the job from the MCP dialog.

The job log is the system data written to the SYSOUT class defined by the job message class. It is used to build information needed for restart and cleanup.

HCL Workload Automation for Z has a number of tools to help you restart jobs:

Job completion checker (JCC)

This function reads job output and can set an error code. This is useful if you cannot tell from return or abend codes alone whether or how a job must be restarted: you sometimes need to check for specific messages.

Tracker platforms supported: z/OS®

Job log retrieval

This function can fetch the job log for a job (even one that has not failed) so that you can browse it.

Tracker platforms supported: All

Restart and cleanup

This function checks whether a job is restartable, and tailors the JCL, if required, to run again from one named step to another, or to the end of the job. Also, this function can back out changes to catalogs and can delete data sets that are created by the failed job. The following example describes a JCL for jobs failing on reruns:

//OUTDS   DD   DSN=NEW.DATA.SET,DISP=(NEW,CATLG,CATLG),…

Tracker platforms supported: z/OS®

Automatic recovery

The //*%OPC RECOVER job statement controls automatic recovery. Parameters on this statement specify whether HCL Workload Automation for Z should start other occurrences, delete steps, and so on. If the data set cleanup is required, this is invoked before the rerun when immediate cleanup is specified. Otherwise, automatic recovery is not done.

Tracker platforms supported: All (restart and cleanup functions are available only on z/OS® systems)

RECOVERY statement

This statement in the SCRPTLIB defines the options to run the HCL Workload Automation recovery for jobs on distributed agents. The recovery actions can be followed by one of the recovery options (the OPTION parameter), stop, continue, or rerun. The RECOVERY statement is ignored if it is used with a job that runs a centralized script.