Procedure to resolve event-reapplying problems at controller startup
When the controller starts and a current plan already
exists, you must ensure that no JT events were lost while the controller
was stopped. JTOPTS CURRPLAN is the statement used by the controller
to determine from which current plan to start. You can set it to one
of the following values:
- Set CURRPLAN(NEW) to start from the latest valid new current plan
(NCP) and apply all the events logged in EQQJTARC and in the current
EQQJTxx.
- OR -
- Set CURRPLAN(CURRENT) to start from the inactive current plan and apply the events logged in EQQJTxx with a date and time later than the inactive current plan backup.
Note: For detailed information about the JTOPTS statement, see Customization
and Tuning.
However, if you set the CURRPLAN(CURRENT) option and the controller finds some inconsistencies, the option is not applied and you are requested to set CURRPLAN(NEW). The restarting process then runs based on the checkpoint data set EQQCKPT, which contains information about the current status of HCL Workload Automation for Z.
If
problems occur during the restart, you can understand what went wrong
by analyzing the initial values stored in the current plan VSAM headers,
checkpoint data set, and job-tracking files. This data, however, might
have changed or been lost during the restarting process and be difficult
to reproduce. To prevent this problem, HCL Workload Automation for Z logs
the following messages and provides you with the following modify
command:
- Message EQQN140I, to show the checkpoint data set content at controller startup and stop.
- Message EQQN141I, to show the checkpoint data set content when the restart processing begins.
- Modify command
/F subsys,DCKPT
to show the checkpoint data set header.
The following scenario describes how these messages and command help you to understand what happened when the controller restarts.
Understanding the restarting process
The
following scenario describes how you can use EQQMLOG and modify command
/F subsys,DCKPT
to
understand what happens when the controller restarts. Suppose that
you:- Start the controller after a REFRESH function. EQQMLOG stores
message EQQN140I to show the checkpoint data set header at startup.
The value for PLAN EXISTING is N, meaning that a current plan does
not exist (because the restart occurred after a REFRESH). Note: In this scenario JTOPTS is set to CURRPLAN(CURRENT).
... 03/20 17.16.22 EQQZ015I INIT STATEMENT: CURRPLAN(CURRENT) .... 03/20 17.16.22 EQQN140I CKPT FILE AT START TIME 03/20 17.16.22 EQQN140I --------------------------------------------- 03/20 17.16.22 EQQN140I CKPT FILE JUST INITIALIZED: N 03/20 17.16.22 EQQN140I BACKUP IN PROGRESS : N 03/20 17.16.22 EQQN140I INACTIVE PLAN DATA SET : EQQCP2DS 03/20 17.16.22 EQQN140I CURRENT JT NUMBER : 05 03/20 17.16.22 EQQN140I LAST ARCHIVED JT NUMBER : 04 03/20 17.16.22 EQQN140I PLAN EXISTING : N 03/20 17.16.22 EQQN140I ACTIONS ON NCP : NO ACTIONS NEEDED 03/20 17.16.22 EQQN140I ---------------------------------------------
Because a current plan does not exist, the following message EQQN141I does not provide information that is useful for the controller restarting decision.03/20 17.16.24 EQQN141I CKPT DATA AFTER RESTART DECISION 03/20 17.16.24 EQQN141I ----------------------------------- 03/20 17.16.24 EQQN141I INACTIVE PLAN DATA SET : EQQCP2DS 03/20 17.16.24 EQQN141I OCP LAST BACKUP TIMESTAMP : 03/20 17.16.24 EQQN141I USED LAST BACKUP TIMESTAMP : 03/20 17.16.24 EQQN141I EVENTS FOUND : N 03/20 17.16.24 EQQN141I JTARC MAY BE TRUNCATED : N 03/20 17.16.24 EQQN141I ----------------------------------
Message EQQN105W provides you with even more detailed information:03/20 17.16.24 EQQN105W NO VALID CURRENT PLAN EXISTS. CURRENT PLAN VSAM I/O IS NOT POSSIBLE
Message EQQN012I informs you that, as soon as you create a plan, all job-tracking events will be stored in file EQQJT05 (this is the JT number 05 shown in the previous message EQQN140I):
To look at the entire content of the checkpoint data set, enter the following modify command:03/20 17.16.24 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE EQQJT05
/F subsys,DCKPT EQQZ024I LAST CKPT HEADER GET/PUT EQQZ024I START ADDRESS: 0007FF68 TIME: 130320 1716488 EQQZ024I 000000 F0F0F0F0 F0F0F0F0 F0F00000 00000000 *0000000000...* EQQZ024I 000010 00004040 40404040 40404040 C5D8D8C3 *.. E* EQQZ024I 000020 D7F1C4E2 C5D8D8C3 D7F2C4E2 F0F5F0F4 *P1DSEQQCP2DS0* EQQZ024I 000030 F0F5D500 00000000 00000000 D5D5D5C5 *05N.........N* EQQZ024I 000040 D8D8D1E2 F1C4E2D5 F1F0C8E6 E2F9F1F0 *QQJS1DSN10HWS* EQQZ024I 000050 F040D500 00000000 404040D6 00640000 *0 N..... O.* EQQZ024I 000060 00000000 00000000 *........ * EQQZ048I A MODIFY COMMAND HAS BEEN PROCESSED. MODIFY CWSG,DCKPT
Note: The layout is mapped by the OPC - OPCSTAT record. - Create a new current plan by running a DP batch. EQQMLOG records
that a new current plan (NCP) is created, named EQQCP2DS. The inactive
plan is EQQCP1DS, the current job-tracking file is EQQJT01, and EQQJT05
was archived in EQQJTARC.
03/20 17.29.27 EQQN121I A DAILY PLANNING ACTIVITY WAS STARTED 03/20 17.29.57 EQQN120I DP BATCH NCP CREATION PHASE ENDED 03/20 17.29.57 EQQN116I A NEW NCP HAS BEEN CREATED ... 03/20 17.29.58 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/20 17.29.58 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQNCPDS, TODD=EQQCP1DS ... 03/20 17.29.59 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED. TRIGGER WAS: DP END ... 03/20 17.30.00 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE EQQJT01 ... 03/20 17.30.04 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/20 17.30.05 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQCP1DS, TODD=EQQCP2DS 03/20 17.30.05 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME EQQJT05 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE DATA SET
- Stop the controller by using the
/P subsys
command. EQQMLOG shows the following messages:03/20 18.09.08 EQQZ000I A STOP COMMAND HAS BEEN RECEIVED ... 03/20 18.09.14 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED. TRIGGER WAS: NMM END
Before closing, the controller creates a current plan backup. EQQMLOG shows you the checkpoint data set information recorded before the CP backup, as follows:03/20 18.09.14 EQQN140I CKPT FILE AT NMM ENDING: BACKUP 03/20 18.09.14 EQQN140I --------------------------------------------- 03/20 18.09.14 EQQN140I CKPT FILE JUST INITIALIZED: N 03/20 18.09.14 EQQN140I BACKUP IN PROGRESS : N 03/20 18.09.14 EQQN140I INACTIVE PLAN DATA SET : EQQCP1DS 03/20 18.09.14 EQQN140I CURRENT JT NUMBER : 01 03/20 18.09.14 EQQN140I LAST ARCHIVED JT NUMBER : 05 03/20 18.09.14 EQQN140I PLAN EXISTING : Y 03/20 18.09.14 EQQN140I ACTIONS ON NCP : NO ACTIONS NEEDED 03/20 18.09.14 EQQN140I --------------------------------------------- 03/20 18.09.19 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/20 18.09.19 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQCP2DS, TODD=EQQCP1DS 03/20 18.09.19 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME EQQJT01 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE DATA SET
After the CP backup completes, the checkpoint data set information is updated. EQQMLOG shows that the JT data set was archived, the inactive plan is EQQCP2DS, the current JT data set has become EQQJT02, and the latest archived JT data set is EQQJT01. For more detailed information about the archiving process, see Job-tracking-log archiver (JL).03/20 18.09.19 EQQN090I COPIED TO THE JOB TRACKING LOG ARCHIVE DATA SET 03/20 18.09.19 EQQN140I CKPT FILE AT NMM ENDING: LOG ARCHIVE 03/20 18.09.19 EQQN140I --------------------------------------------- 03/20 18.09.19 EQQN140I CKPT FILE JUST INITIALIZED: N 03/20 18.09.19 EQQN140I BACKUP IN PROGRESS : N 03/20 18.09.19 EQQN140I INACTIVE PLAN DATA SET : EQQCP2DS 03/20 18.09.19 EQQN140I CURRENT JT NUMBER : 02 03/20 18.09.19 EQQN140I LAST ARCHIVED JT NUMBER : 01 03/20 18.09.19 EQQN140I PLAN EXISTING : Y 03/20 18.09.19 EQQN140I ACTIONS ON NCP : NO ACTIONS NEEDED 03/20 18.09.19 EQQN140I ---------------------------------------------
- Restart the controller. Message EQQN140I shows exactly the same information that was recorded in
the latest run.
03/20 19.53.55 EQQZ015I INIT STATEMENT: CURRPLAN(CURRENT) ... 03/20 19.53.55 EQQN140I CKPT FILE AT START TIME 03/20 19.53.55 EQQN140I --------------------------------------------- 03/20 19.53.55 EQQN140I CKPT FILE JUST INITIALIZED: N 03/20 19.53.55 EQQN140I BACKUP IN PROGRESS : N 03/20 19.53.55 EQQN140I INACTIVE PLAN DATA SET : EQQCP2DS 03/20 19.53.55 EQQN140I CURRENT JT NUMBER : 02 03/20 19.53.55 EQQN140I LAST ARCHIVED JT NUMBER : 01 03/20 19.53.55 EQQN140I PLAN EXISTING : Y 03/20 19.53.55 EQQN140I ACTIONS ON NCP : NO ACTIONS NEEDED 03/20 19.53.55 EQQN140I --------------------------------------------- ... 03/20 19.53.56 EQQN141I CKPT DATA AFTER RESTART DECISION 03/20 19.53.56 EQQN141I --------------------------------------------- 03/20 19.53.56 EQQN141I INACTIVE PLAN DATA SET : EQQCP2DS 03/20 19.53.56 EQQN141I OCP LAST BACKUP TIMESTAMP : 130320/17091869 03/20 19.53.56 EQQN141I USED LAST BACKUP TIMESTAMP : 130320/17091869 03/20 19.53.56 EQQN141I EVENTS FOUND : N 03/20 19.53.56 EQQN141I JTARC MAY BE TRUNCATED : N 03/20 19.53.56 EQQN141I --------------------------------------------- ...
Suppose that the controller runs one day and performs all the jobs scheduled in the CP and then you run a CP backup. EQQMLOG shows that the inactive plan is now EQQCP1DS and the latest archived JT data set is EQQJT02.
03/21 15.48.50 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED. TRIGGER WAS: BACKUP CMD 03/21 15.48.50 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE EQQJT03 03/21 15.48.53 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/21 15.48.53 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQCP1DS, TODD=EQQCP2DS 03/21 15.48.54 EQQN090I THE JOB TRACKING LOG DATA SET DEFINED BY DDNAME EQQJT02 HAS BEEN COPIED TO THE JOB TRACKING LOG ARCHIVE DATA SET
You can verify this information also by issuing the modify command/F subsys,DCKPT
:EQQZ024I LAST CKPT HEADER GET/PUT EQQZ024I START ADDRESS: 0007FF68 TIME: 130321 1552432 EQQZ024I 000000 F0F0F0F0 F0F0F0F0 F0F00000 00000000 *0000000000... EQQZ024I 000010 0000F4F1 F0F3F2F0 F2F4F0F0 C5D8D8C3 *..4103202400E EQQZ024I 000020 D7F2C4E2 C5D8D8C3 D7F1C4E2 F0F3F0F2 *P2DSEQQCP1DS0 EQQZ024I 000030 F0F5D500 00000000 00000000 E8D5D5C5 *05N.........Y EQQZ024I 000040 D8D8D1E2 F1C4E2D5 F1F0C8E6 E2F9F1F0 *QQJS1DSN10HWS EQQZ024I 000050 F040D500 00000000 404040D6 00640000 *0 N..... O. EQQZ024I 000060 00000000 00000000 *........ EQQZ048I A MODIFY COMMAND HAS BEEN PROCESSED. MODIFY CWSG,DCKPT
- As you continue to work, you add new occurrences to the plan. You then force the controller to immediately stop with a CANCEL function.
- When you restart the controller, EQQMLOG shows you that the inactive plan is EQQCP1DS and the
latest archived JT data set is EQQJT02:
The controller uses the information shown in message EQQN141I to decide how to restore a correct current plan. The inactive plan EQQCP1DS is used as the start plan and the events stored in EQQJT03 after the backup date and time 130321/13485048 are applied. The following messages are recorded to show that the inactive plan EQQCP1DS is copied to EQQCP2DS:03/21 15.55.46 EQQN140I CKPT FILE AT START TIME 03/21 15.55.46 EQQN140I --------------------------------------------- 03/21 15.55.46 EQQN140I CKPT FILE JUST INITIALIZED: N 03/21 15.55.46 EQQN140I BACKUP IN PROGRESS : N 03/21 15.55.46 EQQN140I INACTIVE PLAN DATA SET : EQQCP1DS 03/21 15.55.46 EQQN140I CURRENT JT NUMBER : 03 03/21 15.55.46 EQQN140I LAST ARCHIVED JT NUMBER : 02 03/21 15.55.46 EQQN140I PLAN EXISTING : Y 03/21 15.55.46 EQQN140I ACTIONS ON NCP : NO ACTIONS NEEDED 03/21 15.55.46 EQQN140I ---------------------------------------------
03/21 15.55.49 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/21 15.55.49 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQCP1DS, TODD=EQQCP2DS
The following messages show that EQQJT03 is used to update the plan before archiving it, EQQJT04 becomes the current JT data set, and EQQCP1DS becomes the current plan after the CP backup completes. You are also shown that the catch-up process identified and applied all the events required to restore a correct plan:03/21 15.55.50 EQQN058I THE SCHEDULER WILL RECOVER THE CP DATA SET FROM THE JTLOGS 03/21 15.55.50 EQQN058I PLEASE BE PATIENT. THIS WILL TAKE A WHILE 03/21 15.55.50 EQQN014I THE JOB TRACKING LOG DATA SET, DDNAME EQQJT03 , HAS BEEN USED TO UPDATE THE CURRENT PLAN. 2 EVENTS HAVE BEEN APPLIED TO DD-NAME EQQCP2DS 03/21 15.55.50 EQQN051I A CURRENT PLAN BACKUP PROCESS HAS STARTED. TRIGGER WAS: CATCH UP 03/21 15.55.53 EQQN056I A CURRENT PLAN COPY PROCESS HAS STARTED 03/21 15.55.54 EQQN057I A CURRENT PLAN DATA SET WAS SUCCESSFULLY COPIED: FROMDD=EQQCP2DS, TODD=EQQCP1DS 03/21 15.55.54 EQQN012I JOB TRACKING EVENTS ARE NOW BEING LOGGED ON FILE EQQJT04 03/21 15.55.54 EQQN013I JOB TRACKING IS NOW ACTIVE AND CURRENT PLAN DD-NAME IS EQQCP1DS
Description of messages EQQN140 and EQQN141
Messages
EQQN140 and EQQN141 are informational. EQQN140 is issued at controller
startup and stop, and shows the following information:
CKPT FILE AT: &phase
---------------------------------------------
CKPT FILE JUST INITIALIZED: &NEWCKPT
BACKUP IN PROGRESS : &BACKUP
INACTIVE PLAN DATA SET : &OLDCP
CURRENT JT NUMBER : &CURRJT
LAST ARCHIVED JT NUMBER : &LASTJT
PLAN EXISTING : &VALCP
ACTIONS ON NCP : &NCPSTAT
---------------------------------------------
Where:- CKPT FILE AT
- It can be one of the following values:
- START TIME
- Controller has just started.
- NMM ENDING: BACKUP
- Controller is stopping and the current plan is being backed up.
- NMM ENDING: LOG ARCHIVE
- Controller is stopping, the current plan is backed up, and the latest JT data set is archived.
- CKPT FILE JUST INITIALIZED
- It can be one Y or N. Y means that the checkpoint data set has just been initialized with the default values. This might occur the first time that the controller starts, when the checkpoint data set is empty.
- BACKUP IN PROGRESS
- It can be Y or N. Y means that when the controller was stopped, a backup plan was in progress, therefore the controller could not complete switching the CP and JT data set.
- INACTIVE PLAN DATA SET
- The DD name of the inactive plan (the old current plan).
- CURRENT JT NUMBER
- The name of the current JT data set.
- LAST ARCHIVED JT NUMBER
- The name of the latest archived JT data set.
- PLAN EXISTING
- It can be Y or N. Y means that a current plan exists.
- ACTION ON NCP
- The status related to the creation of CP. Possible values are:
- NCP TO BE USED AS CURRENT PLAN
- The new current plan is to be used by the controller as the current plan.
- NCP UNDER CREATION, DP BATCH IS WORKING ON IT
- A DP batch command is running to create a new plan.
- NO ACTIONS NEEDED
- No action to be done on the NCP.
EQQN141 is issued after the restarting decision was taken and shows the following information:
CKPT DATA &phase
---------------------------------------------
INACTIVE PLAN DATA SET : &OLDCP
OCP LAST BACKUP TIMESTAMP : &DTCP
USED LAST BACKUP TIMESTAMP : &DTUS
EVENTS FOUND : &EVENT
JTARC MAY BE TRUNCATED : &JTARCT
---------------------------------------------
Where:- CKPT DATA
- It always has the value AFTER RESTART DECISION.
- INACTIVE PLAN DATA SET
- The DD name of the inactive plan (the old current plan).
- OCP LAST BACKUP TIMESTAMP
- Date and time of the latest backup in the inactive CP data set.
- USED LAST BACKUP TIMESTAMP
- Date and time that the latest backup was actually used. Normally it is the same as OCP LAST BACKUP TIMESTAMP. However, if a backup was in progress at stop time and the backup is not reliable, the controller considers the date and time of the first event in the current JT data set as the latest backup. This date is used to decide which events are to be applied.
- EVENTS FOUND
- Events to be applied are found in the JT data set.
- JTARC MAY BE TRUNCATED
- It can be Y or N. Y means that the latest archived JT data set was not complete and that data might have been lost.