Jump to content

Possible to create a running distinct count report?


Todd Van Valkenburg
Go to solution Solved by David Beagan ,

Recommended Posts

I have a requirement to create report with distinct counts by date. But the distinct count is a running count (the population increases as the dates increase).

Perhaps easier to explain visually.

So given a data set of customer and date like:

Date    Customer

12/01/2024 A

12/01/2024 B

12/01/2024 A

12/01/2024 A

12/01/2024 A

12/02/2024 C

12/02/2024 C

12/02/2024 A

12/02/2024 A

12/02/2024 A

12/03/2024 B

12/03/2024 B

12/03/2024 B

12/03/2024 B

12/03/2024 B

What I would be looking for, as a start, is a report like this:

Date Running Customer Distinct Count

12/01/2024 2

12/02/2024 3

12/03/2024 3

Notes:

12/1 has 2 distinct customers A, B

12/1-12/2 has 3 distinct customers A, B, C

12/1-12/3 has 3 distinct customers A, B, C

I hope my question makes sense.

Link to comment
Share on other sites

I think it makes sense. This is what I came up with. You can try this code for yourself to better understand.

-* Step 0. Make the sample data file. SQL CREATE TABLE DATA ( DATE DATE, CUSTOMER CHAR(1) ) ; END  MODIFY FILE DATA MATCH DATE CUSTOMER ON NOMATCH INCLUDE ON MATCH INCLUDE DATA20241201,A,$20241201,B,$20241201,A,$20241201,A,$20241201,A,$20241202,C,$20241202,C,$20241202,A,$20241202,A,$20241202,A,$20241203,B,$20241203,B,$20241203,B,$20241203,B,$20241203,B,$END -* Step 1. Hold distinct counts for date, customer. SET ASNAMES=ON, HOLDLIST=PRINTONLY TABLE FILE DATA COUNT DST.CUSTOMER AS 'DISTCOUNTER' BY DATE BY CUSTOMER ON TABLE HOLD AS DATADIST END -* Step 2. Create a hold file of every combination of date and customer. TABLE FILE DATADIST BY DATE ON TABLE HOLD AS DATECUST END TABLE FILE DATADIST BY CUSTOMER ON TABLE HOLD AS DATACUST END JOIN FILE DATECUST AT DATE TO ALL FILE DATACUST AT CUSTOMER AS J1 END -* Step 3. Left outer join file with every combination to file of distince counts. JOIN LEFT_OUTER DATE AND CUSTOMER IN DATECUST TO ALL DATE AND CUSTOMER IN DATADIST AS J2 END -* Step 4. Calculate the running distinct count for each customer and date. TABLE FILE DATECUST SUM COMPUTE RUNDC/I9 = IF LAST CUSTOMER EQ CUSTOMER THEN MAX(DISTCOUNTER, LAST RUNDC) ELSE DISTCOUNTER; BY CUSTOMER BY DATE ON TABLE HOLD AS DATACNT END -* Step 5. Sum up the running distinct counts for each date. TABLE FILE DATACNT SUM RUNDC BY DATE END

 I have an example with the ggsales file that I used to develop and validate this technique. Let me know if you would like see it.

Link to comment
Share on other sites

  • 1 month later...
  • Solution

In importing this answer over from the TIBCO Community all of the line breaks got stripped off the code that was posted. Here is the code with the line breaks.

-* Step 0. Make the sample data file.
 SQL CREATE TABLE DATA 
   ( DATE DATE,
     CUSTOMER CHAR(1) ) ;
 END
 
 MODIFY FILE DATA
 MATCH DATE CUSTOMER
 ON NOMATCH INCLUDE
 ON MATCH INCLUDE
 DATA
20241201,A,$
20241201,B,$
20241201,A,$
20241201,A,$
20241201,A,$
20241202,C,$
20241202,C,$
20241202,A,$
20241202,A,$
20241202,A,$
20241203,B,$
20241203,B,$
20241203,B,$
20241203,B,$
20241203,B,$
END
 
-* Step 1. Hold distinct counts for date, customer.
 SET ASNAMES=ON, HOLDLIST=PRINTONLY
 
 TABLE FILE DATA
 COUNT DST.CUSTOMER AS 'DISTCOUNTER'
   BY DATE
   BY CUSTOMER
   ON TABLE HOLD AS DATADIST
 END
 
-* Step 2. Create a hold file of every combination of date and customer.
 TABLE FILE DATADIST 
 BY DATE
 ON TABLE HOLD AS DATECUST
 END
 
 TABLE FILE DATADIST 
 BY CUSTOMER
 ON TABLE HOLD AS DATACUST
 END
 
 JOIN   FILE DATECUST AT DATE 
 TO ALL FILE DATACUST AT CUSTOMER AS J1
 END
 
-* Step 3. Left outer join file with every combination to file of distince counts.
 JOIN LEFT_OUTER DATE AND CUSTOMER IN DATECUST 
          TO ALL DATE AND CUSTOMER IN DATADIST AS J2
 END
 
-* Step 4. Calculate the running distinct count for each customer and date.
 TABLE FILE DATECUST
 SUM 
 COMPUTE RUNDC/I9 = IF LAST CUSTOMER EQ CUSTOMER THEN MAX(DISTCOUNTER, LAST RUNDC) ELSE DISTCOUNTER; 
 BY CUSTOMER
 BY DATE
 ON TABLE HOLD AS DATACNT
 END
 
-* Step 5. Sum up the running distinct counts for each date.
 TABLE FILE DATACNT
 SUM RUNDC 
  BY DATE
 END

 

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
  • Create New...