Date Posted:
Product: TIBCO Spotfire®
Product: TIBCO Spotfire®
Problem:
How to Calculate the Top Contributing Groups to the Top N% of Aggregated Results
Solution:
It is sometimes needed to calculate the groups who contribute most to a total, and specifically those groups whose aggregated results make up the top % of the total. Assuming a data set like the the BASEBALL demo data set, for example, you can calculate the % of total with an expression like:
Sum([HOME_RUNS]) OVER ([TEAM]) / Sum([HOME_RUNS])
Which can give you the Team's % of the Total:
TEAM Team Total Team's % of Total N.Y. 322.00 9.16 % Chi. 251.00 7.14 % Min. 190.00 5.40 % Det. 183.00 5.20 % Tex. 177.00 5.03 % Bal. 164.00 4.66 % Cle. 153.00 4.35 % Sea. 153.00 4.35 % Tor. 152.00 4.32 % Oak. 148.00 4.21 % ...
This article explains a method to further calculate which team's make up the Top N% of that total. So top 30% of HOME_RUNS came from N.Y., Chi., Min., Det., Tex (totaling 31.94% of all home runs). In order to calculate which team's make up the Top N% of that total, you will need to:
- Sort that team's % of total, descending
- Create a cumulative sum (over that descending order) of all previous team's total %
- Identify those teams where the cumulative sum is under N%
This can be accomplished with the following calculated columns:
1. Select File > Add Data Tables... Type: Information Link Path: /Information Models/dbo/BASEBALL Id: 49ccc04a-76fe-46f1-9982-5f65c51d5135 Last reload: 5/31/2017 4:10 PM 2. Insert > Calculated Column... Column name: % of Total Expression: Sum([HOME_RUNS]) OVER ([TEAM]) / Sum([HOME_RUNS]) 3. Insert > Calculated Column... Column name: DescendingIndex Expression: DenseRank([% of Total],"desc") 4. Insert > Calculated Column... Column name: Cumulative Sum Expression: Sum(If(Rank(baserowid(),"asc",[TEAM])=1,[% of Total])) OVER (AllPrevious([DescendingIndex])) 5. Insert > Calculated Column... Column name: Contributes to Top N % Expression: case when [Cumulative Sum]<=0.3 then "Top N % Contributor" when ([Cumulative Sum]>0.3) and (Min([Cumulative Sum]) OVER (Previous([DescendingIndex]))<0.3) then "Top N % Contributor" else "Nope" end
Resulting in:
TEAM Team Total Team's % of Total Cumulative % Sum Teams Contributing to the Top 30% N.Y. 16874.53 9.16 % 9.16 % Top N % Contributor Chi. 11841.67 7.14 % 16.30 % Top N % Contributor Min. 5272.00 5.40 % 21.70 % Top N % Contributor Det. 5473.81 5.20 % 26.91 % Top N % Contributor Tex. 3423.50 5.03 % 31.94 % Top N % Contributor Bal. 6935.00 4.66 % 36.60 % Nope Cle. 5805.00 4.35 % 45.31 % Nope Sea. 3107.50 4.35 % 45.31 % Nope Tor. 6422.50 4.32 % 49.63 % Nope Oak. 4315.00 4.21 % 53.84 % Nope
See example attached 'Top Contributing Groups.dxp' file. KB: Identifying duplicate rows using custom expressions.
Comments
0 comments
Article is closed for comments.