Product: TIBCO Spotfire®
General Memory and Hardware Information for S+
Here is some general memory information on S-PLUS that applies both to versions 7.0, and 8.x:
S-PLUS 7 Professional Edition does not have an absolute upper limit on the size of datasets it can work with. It depends on the amount of memory available on your machine. It also depends somewhat on the data types of the columns in your data. S-PLUS is a 32 bit program, so each process can only access about 2 GB of memory (if it is available). This is really the only limit to the size of data you can work with assuming the machine you are using has sufficient RAM and swap space. S-PLUS uses a combination of physical memory and virtual memory (swap space on your harddisk) as dynamic memory to store data. The underlying S Language that is part of S-PLUS will make several temporary copies of a dataset in memory. In S-PLUS version 6.x and version 7 the number of copies is about 4.5.
For more information on Memory Allocation and Performance in S-PLUS, including tips on maximizing memory performance, please visit:
http://www.insightful.com/insightful_faq/dsp_article.asp?articleID=218
In general, here is a good approximate formula to determine how much total memory (physical + virtual) you will need to import a dataset of a given size:
Dynamic memory needed in S-PLUS 6.x and 7 = No. of rows * No. of columns * 8 * 4.5 = (in bytes)
For example, a dataset with 98672 rows and 507 columns will require about 1.8 GB RAM:
98672 * 507 * 8 * 4.5 = ~1.8 GB
In addition to being able to calculate the required dynamic memory, it is important to know the file dimensions because a dataset with 65000 rows and 20 columns will probably import but not a dataset with 20 rows and 65000 columns. S-PLUS does not handle a large number of columns well. It is important to note that other S-PLUS functions may require more memory than the formula above approximates.
The Big Data library provides functionality for efficiently manipulating and analyzing out-of-memory data. The Big Data library functions have no limits on the number of rows in the data. As summary information is computed and stored for each column, the number of columns is slightly limited, with the current implementation supporting tens of thousands (10,000s) of columns on a typical machine. The new "pipeline architecture" works by streaming large data sets instead of reading the entire data set into memory at once. S-PLUS is now designed to allow users to analyze gigabytes of data on their existing hardware, without needing to purchase additional RAM or migrate to 64-bit operating systems.
There is no predetermined limit in the number of rows allowed in a big data object (or the number of elements in a big data vector). Provided there is enough room available on the disk to create the data cache, the big data object may be created and processed by any scalable function. The speed of most operations is proportional to the number of rows in the data set: if the number of rows doubles, then the processing time will also double. There is likewise no predetermined limit on the number of columns allowed in a big data object. However, some operations (especially statistical modeling functions) increase at a greater than linear rate as the number of columns increases, so doubling the number of columns can have a much greater effect than doubling the processing time. This is important to bear in mind if processing time is an issue.
You may also be interested in reading the following whitepaper on the bigdata library:
http://www.insightful.com/insightful_doclib/document.asp?id=167
The following two FAQs that may help with programming techniques that will help with memory and speed:
Code Efficient Simulations : http://www.insightful.com/insightful_faq/dsp_article.asp?articleID=321
Vectorized Calculations : http://www.insightful.com/insightful_faq/dsp_article.asp?articleID=192
Here are some additional suggestions that may help you get the most out of your S-PLUS installation:
1) When running an S-PLUS job with very large data objects, create a working directory specifically for it that contains only the data objects you will need for the specific job/script. If you are working in a large working directory that contains many objects, this can also cause S+ to run slower and to use more memory, as all of those data objects are loaded into memory. You can read more on setting up separate working data directories in Chapter 7: Working With Objects and Databases (section "ORGANIZING YOUR WORK" beginning on page 408) in the S+8.1 User's Guide available at:
http://www.insightful.com/support/splus81/uguide.pdf
2) If you do not need to run any of the GUI commands, you can open and run the S+ scripts/functions in the S+ Console (which does not load the S+ GUI, so it consumes less memory at start up) .
3) Run the S+ scripts in S+ Batch mode. There is a user interface to make this easier under Start -> Programs -> TIBCO -> Spotfire S+8.1 -> TIBCO Spotfire S+ BATCH. You can enter the script file you would like the batch program to run as well as output and error files. Then, once you have verified that the whole batch program runs successfully, you can build a .bat file using the command given in the lower box of the dialog. For example, using files on my machine, you could set up this batch program using the batch dialog box:
"D:\Program Files\TIBCO\splus81\Splus.bat" START -cwd D:\Data\Splusdata -input D:\Data\Splusdata\input.ssc -output D:\Data\Splusdata\out.txt -logfile D:\Data\Splusdata\error.log -nobigdata
4) You should also try to run memory intensive script files when S-PLUS is the only program open and running on the machine. This will allow it to have all of the available resources available to it.
We do not have any reccommendataions regarding hardware other than to suggest a relatively fast processor with around 4 GB of RAM available.
Regarding CPUs and hyperthreading, currently, S-PLUS is a single-threaded application. There is an existing enhancement request in our bug database to allow S-PLUS to use multiple processors.
The following piece of information on multi-processor machines may also be useful:
S-PLUS can run on a dual-processor machine, but it doesn't take advantage of more than one processor. It can, however, take advantage of a math co-processor.
S-PLUS checks to see whether your system is an Intel Pentium processor, and if so, uses Intel's Math Kernel Library BLAS routines. These routines are optimized for Intel Pentiums and thus significant speed-up should be observed in certain S-PLUS operations (such as matrix multiplication) that call BLAS routines. Significant speed-up of certain operations can be obtained when using a Pentium multi-processor machine. The operations for which S-PLUS can take advantage of the additional processors are those (such as matrix multiplication) in which the BLAS routines of the Intel Math Kernel Library are used. See intelmkl.use for more information.
You can declare the number of processors in the following function:
> intelmkl.use(set=T, number.of.processors=1, allow.warnings=T)
intelmkl.use() allows you to change which set of BLAS routines are used. (To use the S-PLUS engine BLAS, use set=F.) If Intel's Math Kernel Library BLAS routines are to be used (set=T), number.of.processors allows you to specify how many processors of a multi-processor machine should be used (if not specified, any previous specification remains in effect; the default is 1).
For additional details please refer to Chapter 16 (p. 472-3) in the S-PLUS 8.0 Programmer's Guide available under Help -> Online Manuals.
If one were to compare the performance of compute-intensive S-PLUS sessions on a single-CPU Windows workstation versus a dual-CPU Windows workstation (both with math co-processors), there would be very little difference in performance. The primary difference would be that on a dual-CPU workstation, the second CPU would be available for use by other processes such as a word processor, spreadsheet, or e-mail program. With the single-CPU workstation, S-PLUS would use up most of the available cycles.
Regarding Windows Server 2003 Enterprise, the only real advantage to having 32GB of RAM and lots of processors would be the ability to run many simultaneous S-PLUS sessions at once, but one S-PLUS session won't necessarily benefit from the extra RAM and multi-core processor compared to a machine with 4GB RAM and a single processor with only S-PLUS running on it. Unfortunately the Windows limitation of 2GB maximum of RAM for one application to use is the same on a machine with more RAM available.
Comments
0 comments
Article is closed for comments.