INQUA Working Group on Data-Handling Methods

Newsletter 1: June 1988

RECENT PUBLICATIONS IN MULTIVARIATE DATA ANALYSIS AND PRACTICAL COMPUTING OF RELEVANCE TO PALAEOECOLOGISTS

H.J.B. Birks, Botanical Institute, University of Bergen, Norway.

In the last few years many books have appeared on multivariate data analysis written either for ecologists or applied statisticians and on practical aspects of computing. Several of these are of direct relevance and importance to palaeoecologists but because of their title or intended readership they might be missed by palaeoecologists. This would be unfortunate as several of these books are not only excellent but also are of great value to all palaeoecologists interested in quantitative analysis of their data using appropriate, robust, and powerful methods. I have found the following eight books to be particularly useful and relevant.

P.G.N. Digby and R.A. Kempton 1987 Multivariate analysis of ecological communities. Chapman and Hall, London and New York, 206 pp. ISGN 0-412-24640- 6 (Hb), ISBN 0-412-24650-3 (Pb).
This is written by two associates of John Gower, one of the leading British applied statisticians based on Rothamsted. Gower has developed several important techniques such as principal co-ordinates analysis, skew- symmetry analysis, and Procrustes rotation. The book is, not surprisingly, particularly strong on recent, 'Rothamsted-Gower' approaches, particularly geometrical scaling or ordination methods, comparison of ordinations by Procrustes rotation, and analysis of asymmetric association matrices. It provides some of the first intermediate-level mathematical of biplots, Procrustes rotation, and skew-symmetry analysis. It is rather weak on classification; for example it barely discusses two-way indicator species analysis and it does not mention sum-of-squares minimum-variance clustering at all.

It is a very useful book that neatly complements A.D. Gordon's (1981) excellent Classification book (Chapman and Hall).

R.H.G. Jongman, C.J.F. ter Braak, and O.F.R. van Togeren 1987 Data analysis in community and landscape ecology. Pudoc, Wageningen, 299 pp. ISBN 90-220-0908- 4.
This is an excellent and extremely valuable book, although written primarily for community and landscape ecologists, much of it is directly relevant to palaeoecologists. The largest chapter is on ordination (by Cajo ter Braak) in which he builds up from indirect gradient analysis methods of correspondence analysis, detrended correspondence analysis, principal components analysis, and biplots to the new direct gradient analysis techniques of canonical correspondence analysis and redundancy analysis. Important and ecologically critical distinctions are made throughout between linear and unimodal responses between organisms and their environment and emphasis throughout is on ecological realism, interpretation, and robustness. There are also excellent chapters on regression, with a clear and simple explanation of logit regression (regression for presence/absence data) and of generalized linear models, on calibration and response functions, and on the analysis of spatial data, with clear introductions to spatial autocorrelation, spatial semivariance, and kriging (a powerful spatial-interpolation technique). Other chapters concern data collection, classification, and case studies.

The book provides a wonderfully stimulating introduction to many new and powerful methods of data analysis that are of direct applicability in palaeoecology. It is written for those "who want to understand better the methods they are using and are eager to learn new, more powerful methods". It is a must for any quantitative palaeoecologist.

I.T. Joliffe 1986 Principal component analysis. Springer-Verlag, New York, 271 pp. ISBN 0-387-96269-7.
When I wrote my first principal components analysis program 16 years ago, I would never have guessed or believed that one day there would be a whole book devoted to PCA! PCA has become such a widely used (and misused) technique in so many disciplines that there is renewed interest in the theory and applications of PCA amongst applied statisticians. This book reflects this interest, as it is written by a statistician primarily for statisticians. It is a specialized text. Because of its strong mathematical form the book is unlikely to be read by those who should read it most unless they are prepared to take the trouble to understand the mathematics, whereas those who can easily follow the text will probably learn little from it. It contains new and useful material for palaeoecologists, including discussions of principal components in regression analysis, biplots, robust estimation, outlier detection, analysis of time series data, and analysis of closed, proportional data. Surprisingly it gives little discussion on how to interpret PCA results. The examples that are discussed tend to be interpreted solely in terms of contrasts between [*p.6 / p.7*] variables with high, extreme loadings. It is a valuable reference work for all who use PCA or related techniques.

M.J. Greenacre 1984 Theory and applications of correspondence analysis. Academic Press, London and Orlando, 364 pp. ISBN 0-12-299050-1.
The general technique of correspondence analysis has been frequently reinvented or rediscovered, and given different names such as dual or optimal scaling, reciprocal averaging, canonical analysis of contingency tables, and analyses of correspondences. The geometric approach of correspondence analysis has primarily been developed in France by Jean-Paul Benzecri and his school of French data-analysts. Benzecri's work, although increasingly quoted, is poorly known amongst Anglo-American statisticians. In this book Michael Greenacre gives the first extensive exposition of the Benzecri approach to correspondence analysis. It is a particularly valuable book for all who have tried (unsuccessfully in my case) to understand Benzecri's texts but who want to understand the theory underlying widely used techniques such as reciprocal averaging or detrended correspondence analysis. Because the Benzecri school primarily developed their approach in the context of linguistics, Greenacre emphasizes the use of correspondence analysis to summarize data in contingency tables. He does not develop the unimodal/weighted averaging basis of correspondence analysis that is so important in ecology and palaeoecology and that ter braak has developed so strongly and so effectively in Jongman et al. (1987). Greenacre's book is certainly useful, but its importance to palaeoecologists has largely been superseded by ter Braak's chapter in Jongman et al. (1987).

Pierre Legendre and Louis Legendre 1987 Developments in numerical ecology. Springer-Verlag, Berlin, 585 pp. ISBN 3-540-16086-8.
This massive book presents the proceedings of a NATO Advanced Research Workshop on Numerical Ecology held in France in June 1986. It comprises a series of invited lectures given primarily by applied statisticians or data analysts in related disciplines on new (sometimes very new, sometimes not so new) approaches to the analysis of ecological data, followed by a series of fascinating reports by working groups of ecologists and mathematicians on the possible application of these new approaches in six broad branches of ecology. The new approaches discussed are scaling methods including non-linear scaling, unfolding techniques, and two- and three-way multidimensional scaling, fuzzy set clustering, constrained and conditional clusterings, fractal theory, path analysis, and spatial point patterns and spatial autocorrelation. The possible value (or otherwise!) of these novel approaches is critically considered for microbial, marine benthic, marine pelagic, limnology, terrestrial plant, and terrestrial animal ecology in the working group reports. Unfortunately palaeoecology is not considered. Some of the 'new' approaches are well established in palaeoecology (e.g. constrained clustering). Some have obvious, potential palaeoecological applications (e.g. fuzzy set clustering, spatial autocorrelation, constrained scalings and unfolding techniques, path analysis). Others such as fractal theory and various elaborations of multidimensional scaling appear to have little or no relevance to palaeoecologists (or to terrestrial animal ecologists if Dan Simberloff's working group report on Dirty data and clean questions is representative!).

There is a lot of exciting and thoughtful material in this volume. Only time will tell whether the choice of new approaches made by the Legendre brothers will turn out to be useful to numerical ecologists. Certainly the volume is useful to quantitative palaeoecologists in highlighting novel techniques that we should think about and even try.

J. Aitchison 1986 The statistical analysis of compositional data. Chapman and Hall, London and New York, 416 pp. ISBN 0-412-18060-4.
In 1897 Karl Pearson showed the dangers of using percentages or proportions in many statistical analyses, commonly resulting in 'spurious correlations'. Despite contributions in the 1960's and 1970's from statisticians and quantitative geologists (e.g. J.E. Mosimann, F. Chayes, J.C. Butler) the problems of analyzing closed data have remained unresolved until a series of papers by John Aitchison appeared between 1981 and 1984. Now in this important book Aitchison has built on these papers to provide the first major contribution to many of the problems associated with closed data, induced spurious correlations, and constant-sum constraints. Aitchison's approach usually involves logarithms of ratios and extends from standard techniques of regression, principal components analysis and canonical correlation analysis to questions such as complete subcompositional independence, subcompositional invariance, and partition independence, all of which are unique to percentage data. Many standard multivariate techniques are not appropriate with closed data, but Aitchison provides a diverse and challenging armoury of modelling and statistical testing techniques appropriate solely for percentages.

The book is clearly written for statisticians and Aitchison presents many proofs, properties, and definitions for his statistical colleagues. However, in view of the central importance of the whole book to quantitative palaeoecology, the time and effort involved in understanding the text are worthwhile. A series of BASIC programs called CODA for IBM PC-compatible is available that solves the many examples given in the text. The programs are particularly useful in working through the book and helping to understand the new techniques.

Aitchison emphasizes that his log-ratios, log-linear contrasts, and additive logistic normal distribution are not the last word on this topic (see, for example, Gower's paper in Legendre and Lelgendre 1987!), but modestly suggests that his approach can produce results that "surprise many geologists". We are all in for surprises when we read this book and begin to apply Aitchison's methods in quantitative palaeoecology!

R. Gittins 1985 Canonical analysis - A review with applications in ecology. Springer-Verlag, Berlin, 351 pp. ISBN 3-540-13617-7.
This book, like Greenacre's on correspondence analysis and Joliffe's on principal components analysis, provides an in-depth and very detailed review of one multivariate technique, namely canonical correlation analysis including canonical correlation analysis and canonical variates analysis (= multiple discriminant analysis). About one-third of the book deals with theory and mathematical relationships. One third presents a series of detailed ecological case studies. The remainder of the book concerns an assessment of canonical analysis and future developments.

The book is primarily about canonical correlation analysis, a technique that has obvious appeal to ecologists as a means of studying relationships between two sets of multivariate data (e.g. vegetation and soils). It has, however, a crippling set of assumptions, in particular linear relationships between variables. As a result it has never been a particularly useful or appropriate technique in ecology. Within ecology and palaeoecology, canonical correlation analysis is now largely superseded by ter Braak's canonical correspondence analysis that assumes unimodal responses between biological and environmental variables (see Jongman et al. 1987).

Although there is an enormous amount of information about multivariate data analysis in general in Gittin's book, its main concern, canonical correlation analysis, is today only really of theoretical interest to ecologists and palaeoecologists. Gittin's view that "canonical analysis exists primarily to be used" seems excessively pragmatic when we know that many of its assumptions are biologically unrealistic. It is surely better to refrain from using techniques that exist but are inappropriate than to use techniques simply because they exist. For many quantitative palaeoecologists Gittin's book is now largely replaced by Jongman et al. (1987) where canonical correlation analysis is critically discussed and evaluated in the context of other canonical, constrained ordination techniques.

Considerable time and effort are needed to read and understand Gittin's book. The time is perhaps better spent with Jongman et al. (1987) or Digby and Kempton (1987).

W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vettreling 1986. Numerical recipes - the art of scientific computing. Cambridge University Press, Cambridge, 818 pp. ISBN 0-521-30811-9.
This is a wonderful and much needed book. It presents and discusses efficient and reliable algorithms and FORTRAN and PASCAL source listing for all the main aspects of numerical analysis - solution of linear algebraic equations and matrix manipulations; interpolation, extrapolation and splines; integration of functions; evaluation of functions; derivation of special functions such as factorials, gamma functions, etc.; random numbers; sorting; roots and non-linear equations; function optimization; eigenvalues and eigenvectors, Fourier transforms; basic statistics; modelling, integration of differential equations; partial differential equations; and two point boundary value problems. Its 200+ subroutines are available on 5 1/4" diskettes (FORTRAN ISBN 0-521-30957-3; PASCAL ISBN 0-521 309854-9) along with example books and programs (FORTRAN ISBN 0-521-31330-9 and ISBN 0-521-30958-1; PASCAL ISBN 0-521-30956-5 and ISBN 0-521-20955-7). The FORTRAN subroutines are in FORTRAN 77 and work without difficulty in an IBM AT computer with either the Professional FORTRAN 1.0 compiler or the Microsoft FORTRAN 3.3 compiler. I have no experience of the PASCAL versions.

I know of no other book like this that covers so much material about difficult but important programming problems in a clear and concise way written for scientists rather than professional computer scientists. I only wish it had been available 10-15 years ago - many, many hours of frustration would have been saved! Strongly recommended.


Copyright © 1988 H.J.B. Birks
Home page
Newsletter 1 index
Author index
Subject index
WWW pages by K.D. Bennett