MCIT Technical Newsletter #6

Batch Job Fault Tolerance

        The HealthQuest PM/PA/MR system was designed to abend whenever an
    out-of-sync condition is encountered.  The normal fix for this
    condition is to remove the offending record(s) from the appropriate
    input file(s) and rerun or restart the job.  On occasion programs have
    been modified to skip processing records for specific patients when
    removing records is not an option, such as extracts directly from the
    VSAM file(s).  These solutions are acceptable since there are relatively
    few instances of these anomalies.  A better approach to the problem is
    to make the batch programs Fault Tolerant.  Fault tolerance can be
    implemented easily by modifying the batch cycle programs to skip errant
    records, within predefined limits, and write a fault bypassed record to a
    permanent disk file.  The predefined limit should be a low value, 5 for
    example, that when reached will force the job to abend and the on-call
    programmer be notified.  The only modifications required to make a batch
    program fault tolerant are addition of the fault record copybook
    (YWFLTRC),
    a counter and limit variable, and a call to the PM module YWFLTBC. 
    It is the batch program’s responsibility to build the communication
    block for YWFLTBC and keep a count of faults to force an abend when its
    limit has been reached.  Note that no JCL changes are required.  The
    Fault Log File, ‘NPPM3.PROD.M.FAULTLOG(0)’ for PM, is dynamically allocated
    by YWFLTBC.  The actual file allocated depends on the job name of the job
    encountering the fault.  PPM*, PPA* and PMR* will allocate NPPM3...,
    NPPA3... and NPMR3 data sets respectively.  VP* will allocate the
    appropriate NT*.QVER. data set and TP* will allocate TEST a data set.
    For testing using your individual hospital TSO ids, the data set
    'NTAPPL.TEST.M.FAULTLOG' will be allocated.

         This file will have to be interrogated, after the batch cycle is
    complete, to determine if any fault records were written during the
    cycle.  If any records are found for the current date, the job should
    abend, and the on-call programmer should be notified.  This job should
    be scheduled for execution after 7am and be triggered by the last
    regular cycle job.  There is a sample COBOL program attached that shows
    how to call YWFLTBC.  There is also a sample of the implementation
    of fault tolerance in a PM production program and a sample of the fault
    records.

WORKING-STORAGE SECTION. 01 FAULT-TOLERANCE-FIELDS-WS. 02 FAULT-COUNT PIC 9(04) VALUE 0. 02 FAULT-LIMIT PIC 9(04) VALUE 5. 02 FAULT-PROGRAM PIC X(08) VALUE 'YA050BC'. 02 YWFLTBC PIC X(08) VALUE 'YWFLTBC'. COPY YWFLTRC. PROCEDURE DIVISION. ADD 1 TO FAULT-COUNT. MOVE FAULT-PROGRAM TO FCB-PROGRAM-NAME. MOVE '0120-READ-APM' TO FCB-FAULT-PARAGRAPH. MOVE WS-CPI TO FCB-PATIENT-CPI. MOVE WS-VISIT TO FCB-PATIENT-VISIT. MOVE FAULT-COUNT TO FCB-FAULT-COUNT. MOVE FAULT-LIMIT TO FCB-FAULT-LIMIT. MOVE 'YISEGMD' TO FCB-FAULT-FILE. MOVE 'BAD RETURN CODE ( ) FROM YUI02XC, BYPASSING ABEND' TO FCB-FAULT-MESSAGE. MOVE COMM-PM-RTN-CD TO FCB-FAULT-MESSAGE (18:1). MOVE MSG-CODE TO FCB-FAULT-MESSAGE (51:6). MOVE ABEND-MSG TO FCB-FAULT-MESSAGE (58:6). CALL YWFLTBC USING FCB. IF FAULT-COUNT > FAULT-LIMIT CALL 'KUCANXA' USING PROG-ID MSG-CODE ABEND-MSG END-IF. Menu Utilities Compilers Help BROWSE H308.TEST.FAULTLOG Line 00000000 Col 001 068 Command =è *************************** Top of Data **************************** 19981210105832PPM3$RP1 STEP11 YA050BC YWCPIMD C0244653518336 19981210110529PPM3$RP1 STEP11 YA050BC YWCPIMD C0273495428331 19981210134640PPM3$RP1 STEP11 YA050BC YWCPIMD C0293527328339 19981210140102PPM3$RP5STEP13 PROC01 YA060BC YWCPIMD C0293527328339 ************************** Bottom of Data ************************** Menu Utilities Compilers Help BROWSE H308.TEST.FAULTLOG Line 00000000 Col 069 136 Command =è ************************** Top of Data ***************************** 000100050120-READ-APM BAD RETURN CODE (F) FROM YUI02XC, B 000200050120-READ-APM BAD RETURN CODE (F) FROM YUI02XC, B 000300050120-READ-APM BAD RETURN CODE (F) FROM YUI02XC, B 000100050100-INSERT-CPI DUPLICATE CPI FOUND DURING INSERT A ************************** Bottom of Data ************************** Menu Utilities Compilers Help BROWSE H308.TEST.FAULTLOG Line 00000000 Col 137 204 Command =è *************************** Top of Data **************************** YPASSING ABEND 0120-1 YISEGMD - RECORD NOT FOUND- KEY = C02446535183 YPASSING ABEND 0120-1 YISEGMD - RECORD NOT FOUND- KEY = C02734954283 YPASSING ABEND 0120-1 YISEGMD - RECORD NOT FOUND- KEY = C02935273283 ************************** Bottom of Data ************************** IDENTIFICATION DIVISION. PROGRAM-ID. YWFAULT. AUTHOR. DON RITTER. DATE-WRITTEN. DEC 1998. DATE-COMPILED. ENVIRONMENT DIVISION. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM-370. OBJECT-COMPUTER. IBM-370. INPUT-OUTPUT SECTION. FILE-CONTROL. DATA DIVISION. FILE SECTION. WORKING-STORAGE SECTION. 01 CALLED-MODULES. 02 YWFLTBC PIC X(08) VALUE ‘YWFLTBC’. 01 FCB. 02 FCB-PROGRAM-NAME PIC X(08). 02 FCB-FAULT-FILE PIC X(08). 02 FCB-PATIENT-CPI PIC X(10). 02 FCB-PATIENT-VISIT PIC X(04). 02 FCB-FAULT-COUNT PIC X(04). 02 FCB-FAULT-LIMIT PIC X(04). 02 FCB-FAULT-PARAGRAPH PIC X(30). 02 FCB-FAULT-MESSAGE PIC X(134). PROCEDURE DIVISION. MOVE ‘DUPLICATE CPI FOUND DURING INSERT ATTEMPT’ TO FCB-FAULT-MESSAGE. MOVE ‘YWFAULT ’ TO FCB-PROGRAM-NAME. MOVE ‘YWCPIMD ’ TO FCB-FAULT-FILE. MOVE ‘C029352732’ TO FCB-PATIENT-CPI. MOVE ‘8839’ TO FCB-PATIENT-VISIT. MOVE ‘0001’ TO FCB-FAULT-COUNT. MOVE ‘0005’ TO FCB-FAULT-LIMIT. MOVE ‘0100-INSERT-CPI’ TO FCB-FAULT-PARAGRAPH. CALL YWFLTBC USING FCB. GOBACK.