INQUA Working Group on Data-Handling Methods

Newsletter 4: July 1990

MVSP: A MULTIVARIATE STATISTICAL PACKAGE

Warren L. Kovach
Palynological Research Centre
Institute of Earth Studies University College of Wales
Aberystwyth, Wales SY23 3DB U.K.
Bitnet/Janet: WLK@ABERYSTWYTH.AC.UK

Until fairly recently the use of computers, particularly for statistical analyses, was not for the faint-hearted. In the mainframe way of doing things programs were run in batch mode, with a stack of 'cards' containing numbers in a very strict format that defined the options the user wished to invoke. Interactive use of programs involved typing these 'cards' individually at a prompt or answering a cascade of questions as to which options were desired. In much scientific computing, it was necessary for users to write their own programs. Those who felt comfortable with computers made great advances in the use of various computer based techniques in their fields. For many, however, the computer was a very formidable foe.

The explosion of personal computing and the drive to make the programs as easy to use as possible has lead to the democratization of computing. Much emphasis is now placed on developing programs that are intuitive and simple to use, so that anyone can easily begin taking advantage of the wide variety of analytical tools that computers provide. Making the programs easy to use also leads to less time being required for learning how to use a program and for running particular analyses. This leaves more time for investigating the implications of the results as well as the finer points of the methods. MVSP (A MultiVariate Statistical Package) is a program I have been distributing for the past four years that performs a variety of ordination and cluster analyses. It was written with the basic premise in mind that multivariate methods can be useful in many areas of biology and geology and that, to promote their use in everyday research, there should be a program that is easily available and easy to use. The program should also be flexible enough so that the user is not locked into one particular form of an analysis but can choose other options that might suit his or her data better. Based on many of the letters I have received over the past few years, MVSP seems to have succeeded in this goal. Many people have told me that complexity of the larger mainframe programs has put them off doing numerical analyses, since so much effort is often required to perform simple tasks. Some of the options provided in MVSP that are not found in most other programs, such as uncentered PCA, have also proved very useful in some people's studies.

There is always the danger that by making a numerical program too easy to use many users will take a 'black box' approach to the analyses, feeding numbers in and getting numbers out without understanding what it is all about. Throughout the manual for MVSP I strongly urge users to sit down and read about the methods before using them and I provide references to a number of books and papers that I think explain the methods clearly at differing levels of mathematical sophistication.

I have been adding new features to MVSP, slowly but surely, and will soon be releasing a new version. MVSP ver. 2.0 most importantly addresses the main criticism of the earlier version, the limited size of the data matrices. The new version uses a virtual memory scheme so that any data that cannot fit in memory are temporarily dumped to disk, so that the size of matrices are limited only by available disk space.

The user interface has been improved to allow for even easier running of analyses. When an analysis such as PCA is chosen and an input file selected, a menu with all possible options and their default values is presented. These can easily be changed if necessary and then saved to a configuration file, so that next time you run the program those new default options will be reinstated. In this way an analysis can be run with as few as a half dozen keystrokes. The user may also define a number of defaults relating to the output format, such as column width and the number of decimal places to display on the printouts. There are a number of options for data manipulation, including a spreadsheet-like data editor and transformations by logarithms, square roots, or Aitchison's logratio formula for proportional data. There is a context-sensitive help system so that pressing F1 will provide a help screen on the currently highlighted option.

Of course the numerical procedures have been enhanced as well. The program performs three eigenanalysis ordinations, principal components (PCA), principal coordinates (PCO), and correspondence analyses (CA). The trade-off between accuracy and speed may now be controlled by the user, and other options let the user tailor the analyses to their needs (standardization and centering of the PCA, different weighting in the CA, etc.). There are 18 different distance or similarity measures, including Gower's general similarity measure and four binary coefficients. Seven clustering strategies are available and the option to perform stratigraphically constrained clustering on any of these is provided. Diversity indices may also be calculated on ecological data; these include Simpson's, Shannon's, and Brillouin's indices.

There is now the option to have scattergrams of the ordination results either plotted using text characters or drawn on the screen in graphics mode, with CGA, EGA, VGA, Hercules, and ATT 6300 graphics modes supported. I am also including a copy of Chris Meachem's excellent PLOTGRAM program. This was developed for drawing cladograms but can be used for plotting dendrograms from MVSP. These can be plotted on the screen, pen plotters, laser printers, or dot matrix printers (the latter thanks to improvements by Joe Felsenstein).

I am still in the process of rewriting the manual and doing further testing of the program, but I hope to begin distributing the program at the end of the summer. As before, MVSP will be distributed as shareware, so that copies of the basic version may be freely copied and given to colleagues and students. There will be an enhanced version available for those who make a voluntary monetary contribution to the programming effort. The enhanced version will differ in three ways: the matrix size will be unlimited (the shareware version will be limited to 100x100 matrices), a special version compiled to take advantage of the 80x87 math coprocessor will be available, and a printed manual will be provided (the shareware version will have a somewhat abbreviated manual on disk). The level of the contribution hasn't been set yet but will be well below $100, depending on the cost of producing the manual.

I will be sending notification of the release of the new version to all those who have contacted me directly about MVSP. Anyone who wishes to receive notification may send me their name and address and I will place them on my mailing list. Happy computing! [Address at head of article, Ed.]


Copyright © 1990 Warren L. Kovach
Home page
Newsletter 4 index
Author index
Subject index
WWW pages by K.D. Bennett