Jump to content

Focus v7.6.8. Database load sometimes issues U4038 error to operating system (zO/S). Where can I find the reason for issuing that error message?


Richard Merz

Recommended Posts

It has been a long time since I've worked on FOCUS for z/OS, but I might have some things you can check.

  

Are there messages from the focexecs, displaying in the SYSOUT, that can give you a clue as to what FOCUS is doing at the time of the ABEND?

So this process loads a FOCUS database, and sometimes it works, and some times it doesn't; and you aren't changing any code...

Can you check that the DCB specified in the JCL ties to the actual LRECL/BLKSIZE of each of the files involved in the process?

Is it possible that sometimes files are empty, and that is when you get the U4038?

Is it possible that sometimes files have different record layouts, from what the code is expecting, e.g., the code expects alphanumeric data in positions 1 - 4, but the file contains hexadecimal?

Link to comment
Share on other sites

Been forever for me and z/os too.

I don't think we know for sure that he's loading a FOCUS database. He may just be referring to the version of FOCUS there. If you're using another database, can you post it's name and do you get any extra abend info from that?

These errors are usually something like reading past the end of a file or some sort of classic abend like that. It could also be a user written subroutine that chokes on a particular piece of data.

It sounds like sometimes it runs, and sometimes it doesn't . Can you tell anything useful from that? Perhaps, for example, your data has alphas every now and then in a column that's defined as numeric in a master file.

It might help you home in on the problem to put in a -RUN after each steps of your job and -EXIT at places along the way to get closer to your actual problem.

Watch for something that could be data related if it usually loads, but sometimes doesn't.

Link to comment
Share on other sites

Follow up #1 - Thank you for your responses. I wasn't sure if this community was active or not. Initially I posted the question of the U4038 message hoping to find out where Focus put its' actual error message. Now I'm not sure there is a better error message, I think the program just bombed.

 

More facts:

 

The job is loading an 'XFOCUS' database (our data warehouse) weekly. First the DB is deleted, then re-created using Focus Create command. Loading of the DB is done in two JCL steps. First step loads payments, second step loads deductions. List of 'facts' follow.

 

F01: This job has been running for years. I recently added an additional department, and the problems started. This department was added in the same manner as the last one that I added. The new dept. worked in the test env.

 

F02: Sometimes the job aborts with the U4038 error. We fix the problem by re-running the job from the delete DB step. The input data files are NOT changed. And the job runs successfully.

 

F03: Sometimes the job completes successfully (no error indicator). But the database is corrupt, as shown with the "? file DB" command. Same fix, re-run from the delete DB step as above.

 

F04: Using a copy of the production input files (in the test env) the job runs successfully. We only got the error in our test env once. SYSOUT is below.

 

F05: The U4038 message is the result of an application program (Focus) calling for the error, not zO/S.

https://www.ibm.com/docs/en/zos/2.1.0?topic=codes-u4038-xfc6

 

F06: I assume Focus is written in c++ because this looks like a c++ error.

https://cplusplus.com/reference/csignal/raise/

 

F07:This is what is written to the MODIFYCS (job step) SYSOUT when the job was aborted in TEST with RC U4038:

 

EDC6006E THE RAISE() FUNCTION WAS ISSUED FOR THE SIGNAL SIGABRT.

FROM ENTRY POINT xfilexChainAlloc AT COMPILE UNIT OFFSET +0000015A AT ENTRY OFFSET +0000015A AT ADDRESS

27770062.

<> LEAID ENTERED (LEVEL 06/15/2011 AT 18.20)

<> LEAID PROCESSING COMPLETE. RC=4

 

Thats all I can think of to share. I think that Focus is coming to some condition that "should't happen", and so it's aborting.

 

Richard (Ric) Merz | Information Technology Specialist I

Office of State Controller Betty T. Yee

Information System Division, Business Systems Bureau

300 Capital Mall Suite 701

Sacramento, CA 95814 | (916) 445-5135

Link to comment
Share on other sites

Hi Ric

I can tell you've been digging in here. Also I can say you have a tough one on your hands based on what you've tried and not had luck.

I think these 2 facts are the most interesting and rule out a lot of options:

F02: Sometimes the job aborts with the U4038 error. We fix the problem by re-running the job from the delete DB step. The input data files are NOT changed. And the job runs successfully. F03: Sometimes the job completes successfully (no error indicator). But the database is corrupt, as shown with the "? file DB" command. Same fix, re-run from the delete DB step as above.

My first idea was simply that you were getting too big of a dataset to read or write for FOCUS. But - F02 and F03 basically say, just run it again and all will be well. If you exceeding some boundary, I would expect that to be repeatable. So - it's not that.

And worth noting is that your TEST environment does work most of the time but it DOES still fail sometimes. This is interesting. I don't know what to do that that info yet.

The only thing we know that seems related is adding the new department. And - I think we can infer that there's something different about TEST and PROD. Either environmental or the way FOCUS is configured is different because TEST works more consistently than PROD.

I think FOCUS is indeed the culprit who gets some non-zero return code from an OS operation and it decides to abend.

You're at a tough spot. It would help if you had a dataset that you know will cause the error. Then you could just log the heck out of it and see when (perhaps with a certain sort of record) the problem occurs.

First, it's been a VERY long time since I worked for the State of CA and it was core FOCUS back then too. I was only there a couple of months with IBI consulting before they moved me down to County of LA. Way back when the US launched of Operation Desert Storm and I watched CNN from a Residence Inn there. But - San Jose used to be your branch. This is getting tough enough you might need to ask for help. I suggest you open a case with IBI and be sure to point out this is core FOCUS you're talking about so you get to the right support group.

If you want to pursue it on your own though, I'll pass along things to try.

Your job has 2 parts right? Maybe we can narrow down the problem to which of those 2 parts is breaking. For example, you might decide to break this up into 2 totally separate jobs - each with their own JCL and so forth. You seem very knowledgeable about core FOCUS code and techniques. Add a ? FILE DB to the end of each job to see if the databases are corrupted.

The goal here is not to really fix the problem - it's more to isolate it if you can. If your ? FILE shows your database has blown pointers after the payments run, then you can start working there.

Next, I think I'd allocate a dataset and add some logging to my MODIFY to actually write out each record to a log file. You could set up your MODIFY to have a flag you set at the beginning to indicate whether it should be logging or not. But - since we're talking about making up 2 fake jobs anyway to test, you might just want to copy your MODIFY code elsewhere and add the TYPE ON DDNAMEs to your test focexecs.

Adding the ? FILE to the end and adding the TYPE ON DDNAMEs to send messages to yourself should help you get closer to knowing where problems are.

Do you think it would help to REBUILD/REBUILD in between the Payments and Deductions? I just wondered if that might help out with the pointers.

I still don't have a good idea of what's happening to you. Maybe we can get closer to finding the problem using the ideas I posted above.

Currently guessing data related (like non printable characters or something) or it could be some resource related thing on the os.

Can you strip down to just the data from the new department? I didn't think to ask that. If we say the only new thing is the department, then maybe a syncsort kind of run to only run the new department might help too (this may be the way the data comes to you already so this might be pointless).

Keep us posted. Officially I recommend opening a case or contacting your local branch (if they even still exist).

Link to comment
Share on other sites

Maybe check to see if the issue lies within your input file?

One check you can do is to add the START n and STOP n commands to your MODIFY.

Something like:

MODIFY FILE FILENAMESTART 1STOP 100rest of your MODIFY code continues here..

This would read in only the first 100 input records.

If all is well try 101 to 200, and so on...

Link to comment
Share on other sites

Appreciate the thought, but since the process only fails when it wants to, I could easily go past a potential error on one of the runs.

 

BTW, the payment load is ~16m records, the deduction load is ~24m records. Makes it tough. Plus a reload using the same data has always work.

 

Richard (Ric) Merz | Information Technology Specialist I

Office of State Controller Betty T. Yee

Information System Division, Business Systems Bureau

300 Capital Mall Suite 701

Sacramento, CA 95814 | (916) 445-5135

Link to comment
Share on other sites

Gotcha.

Same input files, same code (JCL/FOCUS), and sometimes you get the error U4038 and sometimes you don't.

The messages in SYSOUT suggest that the error is in the C/C++ code that FOCUS is running.

Perhaps if you open a case, someone at the Helpdesk can diagnose those C/C++ messages?

The only other thing I can suggest: Is there a difference in z/OS-land between the runs?

Meaning does one submission run with different/higher memory, with different DASD device geometry, and so on?

Link to comment
Share on other sites

Holy cow - 16m and 24m! That's a lot of records.

Maybe consider moving this to a bulk load of some sort to some SQL Database?

Meantime, it sounds like the best you can do is try to save yourself some processing.

After each load, you could add a ? FILE and have FOCUS send back a return code to the OS?

-SET &THETIME=HHMMSS('A8');? FILE CAR-RUN-IF &RETCODE EQ 0 GOTO THEEND;-* If we fall in here, the FOCUS db is corrupted. -TYPE &THETIME ? FILE failed. FOCUS DB is corrupted. Exiting RC: 8-QUIT FOCUS 8 -THEEND-TYPE &THETIME ? FILE success.

Maybe if this step in the job hits a non-zero return code, you could run a step that tries a REBUILD, REBUILD (which may not be as fast as the way you do it now to just CREATE FILE and start over).

That's a tough one Richard. Let us know if you come up with any news on it.

Toby

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
  • Create New...