Product: TIBCO Spotfire®
Identifying duplicate rows using custom expressions in TIBCO Spotfire
Very often it is required to exclude duplicate rows from a calculation in order to get an accurate result. You can identify duplicates within a specified group using a custom expression that uses the Rank() function, to which you also pass the columns you want to use for grouping. The 'duplicate' values can be determined based on any number of specified columns (i.e., if 3 columns are unique, consider this a duplicate). You can use this method to include or exclude duplicates, depending upon your requirements. Use the following custom expression to identify duplicates within your data set:
If(Rank(Baserowid(),"asc",[myGroup1])=1,"First row of group","Duplicate")
If you want to group by multiple columns, then include those columns in the call to the Rank() function. For example:
If(Rank(Baserowid(),"asc",[myGroup1],[myGroup2],[myGroup3])=1,"First row of group","Duplicate")
You can also use the If() and Rank() logic shown above directly within calculations to include or exclude duplicates, as required:
Avg(if(Rank(baserowid(),"asc",[myGroup1])=1,[Value],Null))
Also, starting in Spotfire version 10.1, there is a "Filter Rows" transformation which allows you to remove any duplicates from the data table completely and only retain one row per defined group. This can be accomplished via:
- Data > Transform data > Transformation: Filter Rows > Insert
Rank(baserowid(),"asc",[myGroup1])=1
EXAMPLE
As an example, say you have the data set below and you want to perform a calculation using only a single row per group:
myGroup | Value |
Yellow | 6 |
Yellow | 6 |
Yellow | 6 |
Blue | 3 |
Red | 9 |
Red | 9 |
Using just an "Avg([Value])" custom expression, the Avg() function will average all rows resulting in:
(6+6+6+3+9+9)/6 = 6.5
To find only a single row per group, you can use the above mentioned custom expression:
if(Rank(baserowid(),"asc",[myGroup])=1,"First row of group","Duplicate")
To result in the following data:
myGroup | Value | Duplicate? |
Yellow | 6 | First row of group |
Yellow | 6 | Duplicate |
Yellow | 6 | Duplicate |
Blue | 3 | First row of group |
Red | 9 | First row of group |
Red | 9 | Duplicate |
Using this technique you can perform an average that will only look at a single value per grouping. For example:
myGroup | Value |
Yellow | 6 |
Blue | 3 |
Red | 9 |
And the average you would like to calculate would be:
(6+3+9)/3 = 6
The following custom expression will then average only those rows that are tagged as the first within the group and will not count subsequent rows within that group:
Avg(if(Rank(baserowid(),"asc",[myGroup])=1,[Value],Null))
This will result in the following calculation, as desired:
(6+3+9)/3 = 6Doc: TIBCO Spotfire Expression Language Functions Overview Doc: Details on Filter Rows Transformation (10.1 and higher)
Comments
0 comments
Article is closed for comments.