Planning for recovery and restart
This chapter describes how to plan for a job that fails. You can restart it automatically, or examine the job log first and then restart the job from the MCP dialog.
The job log is the system data written to the SYSOUT class defined by the job message class. It is used to build information needed for restart and cleanup.
HCL Workload Automation for Z has
a number of tools to help you restart jobs:
- Job completion checker (JCC)
- This function reads job output and can set an error code. This
is useful if you cannot tell from return or abend codes alone whether
or how a job must be restarted: you sometimes need to check for specific
messages.
Tracker platforms supported: z/OS®
- Job log retrieval
- This function can fetch the job log for a job (even
one that has not failed) so that you can browse it.
Tracker platforms supported: All
- Restart and cleanup
- This function checks whether a job is restartable, and tailors
the JCL, if required, to run again from one named step to another,
or to the end of the job. Also, this function can back out changes
to catalogs and can delete data sets that are created by the failed
job. The following example describes a JCL for jobs failing on reruns:
//OUTDS DD DSN=NEW.DATA.SET,DISP=(NEW,CATLG,CATLG),…
Tracker platforms supported: z/OS®
- Automatic recovery
- The
//*%OPC RECOVER
job statement controls automatic recovery. Parameters on this statement specify whether HCL Workload Automation for Z should start other occurrences, delete steps, and so on. If the data set cleanup is required, this is invoked before the rerun when immediate cleanup is specified. Otherwise, automatic recovery is not done.Tracker platforms supported: All (restart and cleanup functions are available only on z/OS® systems)
- RECOVERY statement
- This statement in the SCRPTLIB defines the options to run the HCL Workload Automation recovery for jobs on distributed agents. The recovery actions can be followed by one of the recovery options (the OPTION parameter), stop, continue, or rerun. The RECOVERY statement is ignored if it is used with a job that runs a centralized script.