Todd Van Valkenburg Posted December 12, 2023 Share Posted December 12, 2023 I have a requirement to create report with distinct counts by date. But the distinct count is a running count (the population increases as the dates increase).Perhaps easier to explain visually. So given a data set of customer and date like:Date Customer12/01/2024 A12/01/2024 B12/01/2024 A12/01/2024 A12/01/2024 A12/02/2024 C12/02/2024 C12/02/2024 A12/02/2024 A12/02/2024 A12/03/2024 B12/03/2024 B12/03/2024 B12/03/2024 B12/03/2024 BWhat I would be looking for, as a start, is a report like this:Date Running Customer Distinct Count12/01/2024 212/02/2024 312/03/2024 3Notes:12/1 has 2 distinct customers A, B12/1-12/2 has 3 distinct customers A, B, C12/1-12/3 has 3 distinct customers A, B, CI hope my question makes sense. Link to comment Share on other sites More sharing options...
David Beagan Posted December 13, 2023 Share Posted December 13, 2023 I think it makes sense. This is what I came up with. You can try this code for yourself to better understand.-* Step 0. Make the sample data file. SQL CREATE TABLE DATA ( DATE DATE, CUSTOMER CHAR(1) ) ; END MODIFY FILE DATA MATCH DATE CUSTOMER ON NOMATCH INCLUDE ON MATCH INCLUDE DATA20241201,A,$20241201,B,$20241201,A,$20241201,A,$20241201,A,$20241202,C,$20241202,C,$20241202,A,$20241202,A,$20241202,A,$20241203,B,$20241203,B,$20241203,B,$20241203,B,$20241203,B,$END -* Step 1. Hold distinct counts for date, customer. SET ASNAMES=ON, HOLDLIST=PRINTONLY TABLE FILE DATA COUNT DST.CUSTOMER AS 'DISTCOUNTER' BY DATE BY CUSTOMER ON TABLE HOLD AS DATADIST END -* Step 2. Create a hold file of every combination of date and customer. TABLE FILE DATADIST BY DATE ON TABLE HOLD AS DATECUST END TABLE FILE DATADIST BY CUSTOMER ON TABLE HOLD AS DATACUST END JOIN FILE DATECUST AT DATE TO ALL FILE DATACUST AT CUSTOMER AS J1 END -* Step 3. Left outer join file with every combination to file of distince counts. JOIN LEFT_OUTER DATE AND CUSTOMER IN DATECUST TO ALL DATE AND CUSTOMER IN DATADIST AS J2 END -* Step 4. Calculate the running distinct count for each customer and date. TABLE FILE DATECUST SUM COMPUTE RUNDC/I9 = IF LAST CUSTOMER EQ CUSTOMER THEN MAX(DISTCOUNTER, LAST RUNDC) ELSE DISTCOUNTER; BY CUSTOMER BY DATE ON TABLE HOLD AS DATACNT END -* Step 5. Sum up the running distinct counts for each date. TABLE FILE DATACNT SUM RUNDC BY DATE END I have an example with the ggsales file that I used to develop and validate this technique. Let me know if you would like see it. Link to comment Share on other sites More sharing options...
Todd Van Valkenburg Posted December 13, 2023 Author Share Posted December 13, 2023 Wow. Thank you! Let me "play" with what you have to see if I can make sense of it. I really appreciate it! I am not familiar with the "TO ALL" in your joins and the MAX(DISTCOUNTER, LAST RUNDC) so that will be interesting to check out as well. Link to comment Share on other sites More sharing options...
Solution David Beagan Posted February 12 Solution Share Posted February 12 In importing this answer over from the TIBCO Community all of the line breaks got stripped off the code that was posted. Here is the code with the line breaks. -* Step 0. Make the sample data file. SQL CREATE TABLE DATA ( DATE DATE, CUSTOMER CHAR(1) ) ; END MODIFY FILE DATA MATCH DATE CUSTOMER ON NOMATCH INCLUDE ON MATCH INCLUDE DATA 20241201,A,$ 20241201,B,$ 20241201,A,$ 20241201,A,$ 20241201,A,$ 20241202,C,$ 20241202,C,$ 20241202,A,$ 20241202,A,$ 20241202,A,$ 20241203,B,$ 20241203,B,$ 20241203,B,$ 20241203,B,$ 20241203,B,$ END -* Step 1. Hold distinct counts for date, customer. SET ASNAMES=ON, HOLDLIST=PRINTONLY TABLE FILE DATA COUNT DST.CUSTOMER AS 'DISTCOUNTER' BY DATE BY CUSTOMER ON TABLE HOLD AS DATADIST END -* Step 2. Create a hold file of every combination of date and customer. TABLE FILE DATADIST BY DATE ON TABLE HOLD AS DATECUST END TABLE FILE DATADIST BY CUSTOMER ON TABLE HOLD AS DATACUST END JOIN FILE DATECUST AT DATE TO ALL FILE DATACUST AT CUSTOMER AS J1 END -* Step 3. Left outer join file with every combination to file of distince counts. JOIN LEFT_OUTER DATE AND CUSTOMER IN DATECUST TO ALL DATE AND CUSTOMER IN DATADIST AS J2 END -* Step 4. Calculate the running distinct count for each customer and date. TABLE FILE DATECUST SUM COMPUTE RUNDC/I9 = IF LAST CUSTOMER EQ CUSTOMER THEN MAX(DISTCOUNTER, LAST RUNDC) ELSE DISTCOUNTER; BY CUSTOMER BY DATE ON TABLE HOLD AS DATACNT END -* Step 5. Sum up the running distinct counts for each date. TABLE FILE DATACNT SUM RUNDC BY DATE END 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now