Jump to content

Recommended Posts

Posted
We have some customer data that need to be anonymized after it is copied from our PROD server to DEV/UAT. The data is held in Focus databases - does anyone have any handy routines that might do this ie. for customer names and bank account numbers
Posted

Perhaps in your ETL process you can shuffle characters from your PROD extract before you load to DEV

DEFINE FILE GGSALES

PRODUCT_SHUFFLED/A16 = PRODUCT;

SRCHFOR/A64 = '-''0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';

REPLWITH/A64 = '-''1$3*5@7#9%MN@PQRSTUVWXYZABCDEFGHIJKLmnop*%stuvwxyzabcdefghijkl';

-* Iterate through PRODUCT and shuffle characters.

-REPEAT LOOP_DESC FOR &I FROM 1 TO 16

CURRCHAR/A1 = SUBSTR(16, PRODUCT_SHUFFLED, &I, &I, 1, 'A1');

SRCHFORPOS/I3 = POSIT(SRCHFOR, 64, CURRCHAR, 1, 'I3');

REPLWITHC/A1 = SUBSTR(64, REPLWITH, SRCHFORPOS, SRCHFORPOS, 1, 'A1');

PRODUCT_SHUFFLED/A35 = OVRLAY(PRODUCT_SHUFFLED, 35, REPLWITHC, 1, &I, 'A35');

-LOOP_DESC

-*

END

TABLE FILE GGSALES

SUM PRODUCT_SHUFFLED AS 'Product'

BY PRODUCT_SHUFFLED NOPRINT

ON TABLE SET PAGE OFF

END

Posted
Thanks David but whilst your ETL routine looks like something that could be of use it would take some time to convert the data. What I am thinking is that we just have various Master File Define statements like what you have suggested to scramble the data as it is used. This can be done on a field by field basis.
Posted

If it isnt required to store the shuffled data in the DEV/TEST databases, you absolutely can shuffle the data at report time.

A centralized location for the DEFINEs is the way to go.

Have fun shuffling the deck; sounds like it is time to deal.

Posted

Sorry to add my 2 cents while its already solved.

I thought it would be a good place to use DEFINE FUNCTION.

Lets try this :

 

DEFINE FUNCTION SCRAMBLE

DESCRIPTION Scramble letters in name

(INPUT_NAME/A50)

OUTPUT_NAME/A50=INPUT_NAME;

-* OVRLAY(source_string, length, substring, sublen, position, output);

-* SUBSTRING(string, position, length)

-* PRDUNI(seed, output)

LIST_SCRAMBLE /A62 = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz;

-REPEAT :END_LOOP FOR &POSITION FROM 1 TO 50 STEP 1

INPUT_SCRAMBLE_&POSITION.EVAL /A01 = SUBSTRING(INPUT_NAME, &POSITION, 1);

SHIFT_SCRAMBLE_&POSITION.EVAL /I03 = ((62 * PRDUNI(&POSITION, D12.10)) + 0 ) ;

OUTPUT_SCRAMBLE_&POSITION.EVAL /A01 = IF INPUT_SCRAMBLE_&POSITION.EVAL EQ

THEN

ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_&POSITION.EVAL, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_&POSITION.EVAL, 1, &POSITION, A50);

-:END_LOOP

SCRAMBLE /A50 = OUTPUT_NAME;

END

TABLE FILE xyz

PRINT IN_NAME

COMPUTE OUT_NAME/A50 = SCRAMBLE(IN_NAME);

END

Posted
If you dont scramble the data as it is loaded to Dev, the raw unscrambled data will still be available to anyone who has edit access to the MFD - sample data or data profiling. Even if the fields are hidden so they dont display in reporting tools and only the DEFINES are exposed, a reasonably clever developer could get to the raw data
  • 2 months later...
Posted
Hi David, just looking into this a little more. Do you think your technique with the DEFINE statements with loop could be added to the the Master File Description
Posted

How secure do the names need to be from a resourceful developer

With the -REPEAT loop approach, you could possibly use a Master File Profile to get it to work within an MFD.

Heres an approach that should be safe from resourceful developers, doesnt use -REPEAT and can be done in the MFD.

First create the following fex in baseapp or other app path folder It defines a function, ANON, that changes e, t and a (most common letters) to a random letter. It also changes random letters to other random letters ten times. You could tweak this as you see fit.

myfunc.fex

DEFINE FUNCTION ANON(STRING/A100)

STRING/A100 = LOWER(STRING);

STRING/A100 = REPLACE(STRING, 'e', CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, 't', CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, 'a', CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

STRING/A100 = REPLACE(STRING, CHAR(RDUNIF('P9')*26+97), CHAR(RDUNIF('P9')*26+97));

ANON/A100V = LCWORD3(100, STRING, 'A100');

END

 

For the MFD I tried it out with empdata, adding two defines at the bottom:

FILENAME=EMPDATA, SUFFIX=FOC, REMARKS='Legacy Metadata Sample: empdata',$

SEGNAME=EMPDATA, SEGTYPE=S1

FIELDNAME=PIN, ALIAS=ID, FORMAT=A9, INDEX=I, $

FIELDNAME=LASTNAME, ALIAS=LN, FORMAT=A15, $

FIELDNAME=FIRSTNAME, ALIAS=FN, FORMAT=A10, $

FIELDNAME=MIDINITIAL, ALIAS=MI, FORMAT=A1, $

FIELDNAME=DIV, ALIAS=CDIV, FORMAT=A4, $

FIELDNAME=DEPT, ALIAS=CDEPT, FORMAT=A20, $

FIELDNAME=JOBCLASS, ALIAS=CJCLAS, FORMAT=A8, $

FIELDNAME=TITLE, ALIAS=CFUNC, FORMAT=A20, $

FIELDNAME=SALARY, ALIAS=CSAL, FORMAT=D12.2M, $

FIELDNAME=HIREDATE, ALIAS=HDAT, FORMAT=YMD, $

$

DEFINE AREA/A13=DECODE DIV (NE 'NORTH EASTERN' SE 'SOUTH EASTERN'

CE 'CENTRAL' WE 'WESTERN' CORP 'CORPORATE' ELSE 'INVALID AREA');$

DEFINE FNAME/A20 = DF.myfunc.ANON(FIRSTNAME); $

DEFINE LNAME/A20 = DF.myfunc.ANON(LASTNAME); $

 

Run a test:

TABLE FILE empdata

PRINT FIRSTNAME

LASTNAME

FNAME

LNAME

END

 

 

image.png634608 53.5 KB

 

One possible drawback, a value that occurs more than once, e.g. WHITE, doesnt get the same scrambled value. This might be an issue for something line a bank account number. I have an idea to handle that too.

Posted

Hi David, your suggestion looks good and I am playing around with it now. I will get back to you. Just one question though is the technique you describe going to produce different results each time it is run as uses the RDUNIF function

I assume it will.

Posted

David, tried your suggestion above and I cant get this to work as REPLACE function isnt recognized. The concept of putting a call to a defined function seems to work ok generally as long as you dont have Dialog Manager commands in the function with REPEAT loops. When I did this it said it couldnt find my fex. I have attached what I tried and couldnt get to work

****master ***

SEGMENT=SET_ARNG,SEGTYPE=S1,PARENT=CURRENCY,

FIELD=EBNK_ACTY_CD ,ALIAS=EBAT_CDE ,FORMAT=A2 ,$

FIELD=EBANK_CODE ,ALIAS=EBNK_CDE ,FORMAT=A4 ,$

FIELD=ESERV_CD_DB ,ALIAS=ESERV_DB ,FORMAT=A2 ,$

FIELD=ESERV_CD_CR ,ALIAS=ESERV_CR ,FORMAT=A2 ,$

FIELD=EBRANCH_CODE ,ALIAS=EBR_CODE ,FORMAT=A9 ,$

FIELD=EBAC_NUMBER ,ALIAS=EBAC_NO ,FORMAT=A34 ,$

FIELD=EBAC_NAME ,ALIAS=EBAC_NAM ,FORMAT=A50 ,$

FIELD=EADDR_USE_CD ,ALIAS=EADUS_CD ,FORMAT=A2 ,$

FIELD=EVAL_MSTFILE ,ALIAS=VL_MAST ,FORMAT=A1 ,$

FIELD=ESET_ARNGKEY ,ALIAS=SA_KEY ,FORMAT=A17 ,FIELDTYPE=I, $

FIELD=EBANK_CURT ,ALIAS=BANK_CT ,FORMAT=A5 ,FIELDTYPE=I, $

FIELD=EBANK_BR_CT ,ALIAS=BKBR_CT ,FORMAT=A14 ,FIELDTYPE=I, $

FIELD=ECURRENT ,ALIAS=ECURT ,FORMAT=A1 ,$

FIELD=ECHG_DATE ,ALIAS=ECHG ,FORMAT=DMYY ,$

FIELD=ESWIFT_CODE ,ALIAS=SWIFT ,FORMAT=A11 ,$

FIELD=EINTER_BANK ,ALIAS=INTER ,FORMAT=A50 ,$

FIELD=EINTER_SWIFT ,ALIAS=INT_SWFT ,FORMAT=A11 ,$

FIELD=EINTER_CODE ,ALIAS=INT_CODE ,FORMAT=A20 ,$

FIELD=EBEN_ACTY_CD ,ALIAS=BEN_ACTY ,FORMAT=A2 ,$

DEFINE EBAC_NMEANON/A50 = DF.ANONNAM.SCRAMBLE(EBAC_NAME);

DESCRIPTION = Calls SCRAMBLE to Anonymise data,$

*** function in focexec ANONNAM.fex***

DEFINE FUNCTION SCRAMBLE

DESCRIPTION Scramble letters in name

(STRING/A50)

OUTPUT_NAME/A50=STRING;

-* OVRLAY(source_string, length, substring, sublen, position, output);

-* SUBSTRING(string, position, length)

-* PRDUNI(seed, output)

LIST_SCRAMBLE /A62 = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz;

-REPEAT :END_LOOP FOR &POSITION FROM 1 TO 1 STEP 1

INPUT_SCRAMBLE_&POSITION.EVAL /A01 = SUBSTRING(STRING, &POSITION, 1);

SHIFT_SCRAMBLE_&POSITION.EVAL /I03 = ((62 * PRDUNI(&POSITION, D12.10)) + 0 ) ;

OUTPUT_SCRAMBLE_&POSITION.EVAL /A01 = IF INPUT_SCRAMBLE_&POSITION.EVAL EQ THEN

ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_&POSITION.EVAL, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_&POSITION.EVAL, 1, &POSITION, A50);

-:END_LOOP

SCRAMBLE /A50 = OUTPUT_NAME;

END

*** calling code ***

DEFINE NSACHOLD

-RUN

WHENCE ANONNAM FOCEXEC

-RUN

TABLE FILE NSACHOLD

PRINT EBAC_NAME

EBAC_NMEANON

-*COMPUTE OUT_NAME/A50 = SCRAMBLE(EBAC_NAME);

IF RECORDLIMIT EQ 200

END

Posted

I wasnt expecting the DEFINE FUNCTION to work with Dialogue Manager. I was able to get it to work this way, eliminating the -REPEAT. Also, it works a bit better if the final assignment of SCRAMBLE is varchar.

DEFINE FUNCTION SCRAMBLE

DESCRIPTION 'Scramble letters in name'

(STRING/A50)

OUTPUT_NAME/A50 = STRING;

LIST_SCRAMBLE /A62 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz';

INPUT_SCRAMBLE_1 /A01 = SUBSTRING(STRING, 1, 1);

SHIFT_SCRAMBLE_1 /I03 = ((62 * PRDUNI(1, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_1 /A01 = IF INPUT_SCRAMBLE_1 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_1, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_1, 1, 1, 'A50');

INPUT_SCRAMBLE_2 /A01 = SUBSTRING(STRING, 2, 1);

SHIFT_SCRAMBLE_2 /I03 = ((62 * PRDUNI(2, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_2 /A01 = IF INPUT_SCRAMBLE_2 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_2, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_2, 1, 2, 'A50');

INPUT_SCRAMBLE_3 /A01 = SUBSTRING(STRING, 3, 1);

SHIFT_SCRAMBLE_3 /I03 = ((62 * PRDUNI(3, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_3 /A01 = IF INPUT_SCRAMBLE_3 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_3, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_3, 1, 3, 'A50');

INPUT_SCRAMBLE_4 /A01 = SUBSTRING(STRING, 4, 1);

SHIFT_SCRAMBLE_4 /I03 = ((62 * PRDUNI(4, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_4 /A01 = IF INPUT_SCRAMBLE_4 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_4, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_4, 1, 4, 'A50');

INPUT_SCRAMBLE_5 /A01 = SUBSTRING(STRING, 5, 1);

SHIFT_SCRAMBLE_5 /I03 = ((62 * PRDUNI(5, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_5 /A01 = IF INPUT_SCRAMBLE_5 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_5, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_5, 1, 5, 'A50');

INPUT_SCRAMBLE_6 /A01 = SUBSTRING(STRING, 6, 1);

SHIFT_SCRAMBLE_6 /I03 = ((62 * PRDUNI(6, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_6 /A01 = IF INPUT_SCRAMBLE_6 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_6, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_6, 1, 6, 'A50');

INPUT_SCRAMBLE_7 /A01 = SUBSTRING(STRING, 7, 1);

SHIFT_SCRAMBLE_7 /I03 = ((62 * PRDUNI(7, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_7 /A01 = IF INPUT_SCRAMBLE_7 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_7, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_7, 1, 7, 'A50');

INPUT_SCRAMBLE_8 /A01 = SUBSTRING(STRING, 8, 1);

SHIFT_SCRAMBLE_8 /I03 = ((62 * PRDUNI(8, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_8 /A01 = IF INPUT_SCRAMBLE_8 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_8, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_8, 1, 8, 'A50');

INPUT_SCRAMBLE_9 /A01 = SUBSTRING(STRING, 9, 1);

SHIFT_SCRAMBLE_9 /I03 = ((62 * PRDUNI(9, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_9 /A01 = IF INPUT_SCRAMBLE_9 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_9, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_9, 1, 9, 'A50');

INPUT_SCRAMBLE_10 /A01 = SUBSTRING(STRING, 10, 1);

SHIFT_SCRAMBLE_10 /I03 = ((62 * PRDUNI(10, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_10 /A01 = IF INPUT_SCRAMBLE_10 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_10, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_10, 1, 10, 'A50');

INPUT_SCRAMBLE_11 /A01 = SUBSTRING(STRING, 11, 1);

SHIFT_SCRAMBLE_11 /I03 = ((62 * PRDUNI(11, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_11 /A01 = IF INPUT_SCRAMBLE_11 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_11, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_11, 1, 11, 'A50');

INPUT_SCRAMBLE_12 /A01 = SUBSTRING(STRING, 12, 1);

SHIFT_SCRAMBLE_12 /I03 = ((62 * PRDUNI(12, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_12 /A01 = IF INPUT_SCRAMBLE_12 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_12, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_12, 1, 12, 'A50');

INPUT_SCRAMBLE_13 /A01 = SUBSTRING(STRING, 13, 1);

SHIFT_SCRAMBLE_13 /I03 = ((62 * PRDUNI(13, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_13 /A01 = IF INPUT_SCRAMBLE_13 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_13, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_13, 1, 13, 'A50');

INPUT_SCRAMBLE_14 /A01 = SUBSTRING(STRING, 14, 1);

SHIFT_SCRAMBLE_14 /I03 = ((62 * PRDUNI(14, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_14 /A01 = IF INPUT_SCRAMBLE_14 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_14, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_14, 1, 14, 'A50');

INPUT_SCRAMBLE_15 /A01 = SUBSTRING(STRING, 15, 1);

SHIFT_SCRAMBLE_15 /I03 = ((62 * PRDUNI(15, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_15 /A01 = IF INPUT_SCRAMBLE_15 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_15, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_15, 1, 15, 'A50');

INPUT_SCRAMBLE_16 /A01 = SUBSTRING(STRING, 16, 1);

SHIFT_SCRAMBLE_16 /I03 = ((62 * PRDUNI(16, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_16 /A01 = IF INPUT_SCRAMBLE_16 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_16, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_16, 1, 16, 'A50');

INPUT_SCRAMBLE_17 /A01 = SUBSTRING(STRING, 17, 1);

SHIFT_SCRAMBLE_17 /I03 = ((62 * PRDUNI(17, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_17 /A01 = IF INPUT_SCRAMBLE_17 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_17, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_17, 1, 17, 'A50');

INPUT_SCRAMBLE_18 /A01 = SUBSTRING(STRING, 18, 1);

SHIFT_SCRAMBLE_18 /I03 = ((62 * PRDUNI(18, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_18 /A01 = IF INPUT_SCRAMBLE_18 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_18, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_18, 1, 18, 'A50');

INPUT_SCRAMBLE_19 /A01 = SUBSTRING(STRING, 19, 1);

SHIFT_SCRAMBLE_19 /I03 = ((62 * PRDUNI(19, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_19 /A01 = IF INPUT_SCRAMBLE_19 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_19, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_19, 1, 19, 'A50');

INPUT_SCRAMBLE_20 /A01 = SUBSTRING(STRING, 20, 1);

SHIFT_SCRAMBLE_20 /I03 = ((62 * PRDUNI(20, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_20 /A01 = IF INPUT_SCRAMBLE_20 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_20, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_20, 1, 20, 'A50');

INPUT_SCRAMBLE_21 /A01 = SUBSTRING(STRING, 21, 1);

SHIFT_SCRAMBLE_21 /I03 = ((62 * PRDUNI(21, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_21 /A01 = IF INPUT_SCRAMBLE_21 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_21, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_21, 1, 21, 'A50');

INPUT_SCRAMBLE_22 /A01 = SUBSTRING(STRING, 22, 1);

SHIFT_SCRAMBLE_22 /I03 = ((62 * PRDUNI(22, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_22 /A01 = IF INPUT_SCRAMBLE_22 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_22, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_22, 1, 22, 'A50');

INPUT_SCRAMBLE_23 /A01 = SUBSTRING(STRING, 23, 1);

SHIFT_SCRAMBLE_23 /I03 = ((62 * PRDUNI(23, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_23 /A01 = IF INPUT_SCRAMBLE_23 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_23, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_23, 1, 23, 'A50');

INPUT_SCRAMBLE_24 /A01 = SUBSTRING(STRING, 24, 1);

SHIFT_SCRAMBLE_24 /I03 = ((62 * PRDUNI(24, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_24 /A01 = IF INPUT_SCRAMBLE_24 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_24, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_24, 1, 24, 'A50');

INPUT_SCRAMBLE_25 /A01 = SUBSTRING(STRING, 25, 1);

SHIFT_SCRAMBLE_25 /I03 = ((62 * PRDUNI(25, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_25 /A01 = IF INPUT_SCRAMBLE_25 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_25, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_25, 1, 25, 'A50');

INPUT_SCRAMBLE_26 /A01 = SUBSTRING(STRING, 26, 1);

SHIFT_SCRAMBLE_26 /I03 = ((62 * PRDUNI(26, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_26 /A01 = IF INPUT_SCRAMBLE_26 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_26, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_26, 1, 26, 'A50');

INPUT_SCRAMBLE_27 /A01 = SUBSTRING(STRING, 27, 1);

SHIFT_SCRAMBLE_27 /I03 = ((62 * PRDUNI(27, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_27 /A01 = IF INPUT_SCRAMBLE_27 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_27, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_27, 1, 27, 'A50');

INPUT_SCRAMBLE_28 /A01 = SUBSTRING(STRING, 28, 1);

SHIFT_SCRAMBLE_28 /I03 = ((62 * PRDUNI(28, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_28 /A01 = IF INPUT_SCRAMBLE_28 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_28, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_28, 1, 28, 'A50');

INPUT_SCRAMBLE_29 /A01 = SUBSTRING(STRING, 29, 1);

SHIFT_SCRAMBLE_29 /I03 = ((62 * PRDUNI(29, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_29 /A01 = IF INPUT_SCRAMBLE_29 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_29, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_29, 1, 29, 'A50');

INPUT_SCRAMBLE_30 /A01 = SUBSTRING(STRING, 30, 1);

SHIFT_SCRAMBLE_30 /I03 = ((62 * PRDUNI(30, 'D12.10')) + 0 ) ;

OUTPUT_SCRAMBLE_30 /A01 = IF INPUT_SCRAMBLE_30 EQ ' ' THEN ' ' ELSE SUBSTRING(LIST_SCRAMBLE, SHIFT_SCRAMBLE_30, 1);

OUTPUT_NAME /A50 = OVRLAY(OUTPUT_NAME, 50, OUTPUT_SCRAMBLE_30, 1, 30, 'A50');

INPUT_SCRAMBLE_31 /A01 = SUBSTRING

Posted

While using random functions to anonymize data might be required it is sometime not necessary to go that far and using a function that is not a bijection and cannot be inverted is enough

Here is one I use sometime that would convert SPEEDY GONZALEZ to SSSSSS SSSSSSSS which has the advantage of keeping sort order on the first letter. The function assume you only only have upper case caracters (otherwise you may want to replace a and 9 for numeric)

DEFINE FUNCTION MASKNAME( LIBELLE/A120 )

PATLIB/A120=PATTERNS(LIBELLE);

FCH/A1=SUBSTRING(LIBELLE,1,1);

MASKNAME/A120=REPLACE(PATLIB, A, FCH );

END

DEFINE FILE MYLASTER

MSK_LABEL/A120=MASKNAME( LABEL );

END

This is something that canbe translated to SQL and gives

REPLACE(

REGEXP_REPLACE(

REGEXP_REPLACE(

REGEXP_REPLACE(

REGEXP_REPLACE(

T1.LABEL,

[[:digit:]],

9

),

[[:upper:]],

A

),

[[:lower:]],

a

),

[[:cntrl:]],

X

),

A,

SUBSTRING(T1.LABEL,

CAST(1 AS INTEGER), 1)

) AS MSK_LABEL,

Posted
Looks like a good solution too Jean-Claude. The Lloyds IT group directive is to now anonymise data on loading into our DEV environment (when refreshing from PROD) rather than on the fly every time the database is accessed for reports.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
  • Create New...