Richard Merz Posted December 19, 2022 Share Posted December 19, 2022 That code (U4038) is issued by a process when it wants to abort. I need to determine 'why' Focus is issuing the error. Is the actual reason buried in some file?Thanks, Link to comment Share on other sites More sharing options...
David Briars Posted December 19, 2022 Share Posted December 19, 2022 It has been a long time since I've worked on FOCUS for z/OS, but I might have some things you can check. Are there messages from the focexecs, displaying in the SYSOUT, that can give you a clue as to what FOCUS is doing at the time of the ABEND? So this process loads a FOCUS database, and sometimes it works, and some times it doesn't; and you aren't changing any code...Can you check that the DCB specified in the JCL ties to the actual LRECL/BLKSIZE of each of the files involved in the process?Is it possible that sometimes files are empty, and that is when you get the U4038?Is it possible that sometimes files have different record layouts, from what the code is expecting, e.g., the code expects alphanumeric data in positions 1 - 4, but the file contains hexadecimal? Link to comment Share on other sites More sharing options...
Toby Mills Posted December 20, 2022 Share Posted December 20, 2022 Been forever for me and z/os too. I don't think we know for sure that he's loading a FOCUS database. He may just be referring to the version of FOCUS there. If you're using another database, can you post it's name and do you get any extra abend info from that? These errors are usually something like reading past the end of a file or some sort of classic abend like that. It could also be a user written subroutine that chokes on a particular piece of data. It sounds like sometimes it runs, and sometimes it doesn't . Can you tell anything useful from that? Perhaps, for example, your data has alphas every now and then in a column that's defined as numeric in a master file. It might help you home in on the problem to put in a -RUN after each steps of your job and -EXIT at places along the way to get closer to your actual problem. Watch for something that could be data related if it usually loads, but sometimes doesn't. Link to comment Share on other sites More sharing options...
Richard Merz Posted December 21, 2022 Author Share Posted December 21, 2022 Follow up #1 - Thank you for your responses. I wasn't sure if this community was active or not. Initially I posted the question of the U4038 message hoping to find out where Focus put its' actual error message. Now I'm not sure there is a better error message, I think the program just bombed. More facts: The job is loading an 'XFOCUS' database (our data warehouse) weekly. First the DB is deleted, then re-created using Focus Create command. Loading of the DB is done in two JCL steps. First step loads payments, second step loads deductions. List of 'facts' follow. F01: This job has been running for years. I recently added an additional department, and the problems started. This department was added in the same manner as the last one that I added. The new dept. worked in the test env. F02: Sometimes the job aborts with the U4038 error. We fix the problem by re-running the job from the delete DB step. The input data files are NOT changed. And the job runs successfully. F03: Sometimes the job completes successfully (no error indicator). But the database is corrupt, as shown with the "? file DB" command. Same fix, re-run from the delete DB step as above. F04: Using a copy of the production input files (in the test env) the job runs successfully. We only got the error in our test env once. SYSOUT is below. F05: The U4038 message is the result of an application program (Focus) calling for the error, not zO/S. https://www.ibm.com/docs/en/zos/2.1.0?topic=codes-u4038-xfc6 F06: I assume Focus is written in c++ because this looks like a c++ error. https://cplusplus.com/reference/csignal/raise/ F07:This is what is written to the MODIFYCS (job step) SYSOUT when the job was aborted in TEST with RC U4038: EDC6006E THE RAISE() FUNCTION WAS ISSUED FOR THE SIGNAL SIGABRT. FROM ENTRY POINT xfilexChainAlloc AT COMPILE UNIT OFFSET +0000015A AT ENTRY OFFSET +0000015A AT ADDRESS 27770062. <> LEAID ENTERED (LEVEL 06/15/2011 AT 18.20) <> LEAID PROCESSING COMPLETE. RC=4 Thats all I can think of to share. I think that Focus is coming to some condition that "should't happen", and so it's aborting. Richard (Ric) Merz | Information Technology Specialist I Office of State Controller Betty T. Yee Information System Division, Business Systems Bureau 300 Capital Mall Suite 701 Sacramento, CA 95814 | (916) 445-5135 Link to comment Share on other sites More sharing options...
Toby Mills Posted December 21, 2022 Share Posted December 21, 2022 Hi RicI can tell you've been digging in here. Also I can say you have a tough one on your hands based on what you've tried and not had luck. I think these 2 facts are the most interesting and rule out a lot of options:F02: Sometimes the job aborts with the U4038 error. We fix the problem by re-running the job from the delete DB step. The input data files are NOT changed. And the job runs successfully. F03: Sometimes the job completes successfully (no error indicator). But the database is corrupt, as shown with the "? file DB" command. Same fix, re-run from the delete DB step as above.My first idea was simply that you were getting too big of a dataset to read or write for FOCUS. But - F02 and F03 basically say, just run it again and all will be well. If you exceeding some boundary, I would expect that to be repeatable. So - it's not that. And worth noting is that your TEST environment does work most of the time but it DOES still fail sometimes. This is interesting. I don't know what to do that that info yet. The only thing we know that seems related is adding the new department. And - I think we can infer that there's something different about TEST and PROD. Either environmental or the way FOCUS is configured is different because TEST works more consistently than PROD.I think FOCUS is indeed the culprit who gets some non-zero return code from an OS operation and it decides to abend. You're at a tough spot. It would help if you had a dataset that you know will cause the error. Then you could just log the heck out of it and see when (perhaps with a certain sort of record) the problem occurs. First, it's been a VERY long time since I worked for the State of CA and it was core FOCUS back then too. I was only there a couple of months with IBI consulting before they moved me down to County of LA. Way back when the US launched of Operation Desert Storm and I watched CNN from a Residence Inn there. But - San Jose used to be your branch. This is getting tough enough you might need to ask for help. I suggest you open a case with IBI and be sure to point out this is core FOCUS you're talking about so you get to the right support group.If you want to pursue it on your own though, I'll pass along things to try. Your job has 2 parts right? Maybe we can narrow down the problem to which of those 2 parts is breaking. For example, you might decide to break this up into 2 totally separate jobs - each with their own JCL and so forth. You seem very knowledgeable about core FOCUS code and techniques. Add a ? FILE DB to the end of each job to see if the databases are corrupted. The goal here is not to really fix the problem - it's more to isolate it if you can. If your ? FILE shows your database has blown pointers after the payments run, then you can start working there. Next, I think I'd allocate a dataset and add some logging to my MODIFY to actually write out each record to a log file. You could set up your MODIFY to have a flag you set at the beginning to indicate whether it should be logging or not. But - since we're talking about making up 2 fake jobs anyway to test, you might just want to copy your MODIFY code elsewhere and add the TYPE ON DDNAMEs to your test focexecs. Adding the ? FILE to the end and adding the TYPE ON DDNAMEs to send messages to yourself should help you get closer to knowing where problems are. Do you think it would help to REBUILD/REBUILD in between the Payments and Deductions? I just wondered if that might help out with the pointers. I still don't have a good idea of what's happening to you. Maybe we can get closer to finding the problem using the ideas I posted above. Currently guessing data related (like non printable characters or something) or it could be some resource related thing on the os. Can you strip down to just the data from the new department? I didn't think to ask that. If we say the only new thing is the department, then maybe a syncsort kind of run to only run the new department might help too (this may be the way the data comes to you already so this might be pointless). Keep us posted. Officially I recommend opening a case or contacting your local branch (if they even still exist). Link to comment Share on other sites More sharing options...
David Briars Posted December 22, 2022 Share Posted December 22, 2022 Maybe check to see if the issue lies within your input file? One check you can do is to add the START n and STOP n commands to your MODIFY. Something like:MODIFY FILE FILENAMESTART 1STOP 100rest of your MODIFY code continues here..This would read in only the first 100 input records. If all is well try 101 to 200, and so on... Link to comment Share on other sites More sharing options...
Richard Merz Posted December 22, 2022 Author Share Posted December 22, 2022 Appreciate the thought, but since the process only fails when it wants to, I could easily go past a potential error on one of the runs. BTW, the payment load is ~16m records, the deduction load is ~24m records. Makes it tough. Plus a reload using the same data has always work. Richard (Ric) Merz | Information Technology Specialist I Office of State Controller Betty T. Yee Information System Division, Business Systems Bureau 300 Capital Mall Suite 701 Sacramento, CA 95814 | (916) 445-5135 Link to comment Share on other sites More sharing options...
David Briars Posted December 23, 2022 Share Posted December 23, 2022 Gotcha. Same input files, same code (JCL/FOCUS), and sometimes you get the error U4038 and sometimes you don't.The messages in SYSOUT suggest that the error is in the C/C++ code that FOCUS is running. Perhaps if you open a case, someone at the Helpdesk can diagnose those C/C++ messages?The only other thing I can suggest: Is there a difference in z/OS-land between the runs? Meaning does one submission run with different/higher memory, with different DASD device geometry, and so on? Link to comment Share on other sites More sharing options...
Toby Mills Posted December 27, 2022 Share Posted December 27, 2022 Holy cow - 16m and 24m! That's a lot of records. Maybe consider moving this to a bulk load of some sort to some SQL Database? Meantime, it sounds like the best you can do is try to save yourself some processing. After each load, you could add a ? FILE and have FOCUS send back a return code to the OS? -SET &THETIME=HHMMSS('A8');? FILE CAR-RUN-IF &RETCODE EQ 0 GOTO THEEND;-* If we fall in here, the FOCUS db is corrupted. -TYPE &THETIME ? FILE failed. FOCUS DB is corrupted. Exiting RC: 8-QUIT FOCUS 8 -THEEND-TYPE &THETIME ? FILE success. Maybe if this step in the job hits a non-zero return code, you could run a step that tries a REBUILD, REBUILD (which may not be as fast as the way you do it now to just CREATE FILE and start over). That's a tough one Richard. Let us know if you come up with any news on it. Toby Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now