Product: TIBCO Spotfire®
The order of embedded data can affect Spotfire file size.
You may notice that the size of .dxp files can vary a great deal even though they have essentially the same data (same number of rows and columns, and same values).
Whether or not a data set is sorted can directly affect the size of a Spotfire .dxp file that has the data embedded. This means that an unsorted data set can result in a final Spotfire .dxp size that is approximately 10 times or more larger than a .dxp file that is based on the same data set which is sorted. This is because the sort order (sorted vs unsorted) of the data directly effects its compression and therefore also affects the Spotfire file size, since the Spotfire .dxp file is an archive, similar to a .zip file.
You can see this behavior by archiving/compressing two text files, one with an unsorted data set and one with the exact same data set but ordered. The file size of the .zip will be different. Spotfire compresses its embedded data in much the same way, so the size of the Spotfire .dxp file will also differ for the same reason.
Attached are two data sets which contain the same data, only one of which is sorted:
- Data-Sorted.txt
TEAM
Atl.
Atl.
Atl.
Atl.
Atl.
Atl.
Atl.
Atl.
Atl.
Atl.
... - Data-Unsorted.txt
TEAM
Det.
Sea.
Tor.
N.Y.
K.C.
Mon.
Pit.
Chi.
Phi.
K.C.
Min.
...
Here you can see the size of the files when this data is added to a .zip archive and when it is loaded in to Spotfire, which demonstrates this behavior. The Spotfire .dxp size is 13.6 times larger when the data is unsorted:
Sorted Data | Unsorted Data | |
---|---|---|
Raw data size in bytes | 3,893,782 | 3,893,782 |
Archive size in bytes (.zip) | 7,893 | 405,810 |
Spotfire file size in bytes (.dxp) | 29,681 | 486,540 |
This can explain why some Spotfire reports are much larger than others, even though the data is essentially identical, and the data may look identical when sorted within Spotfire.
Note: It is normally not required to consider this data sorting since most data sets will be more complex with many columns which means there will be diminished benefit of having the data sorted on a single column, since that will result in other columns being unsorted. This can have a larger impact on "tall and skinny" data sets with many rows but very few, or only one, columns.
Comments
0 comments
Article is closed for comments.