In the last few years many books have appeared on multivariate data analysis written either for ecologists or applied statisticians and on practical aspects of computing. Several of these are of direct relevance and importance to palaeoecologists but because of their title or intended readership they might be missed by palaeoecologists. This would be unfortunate as several of these books are not only excellent but also are of great value to all palaeoecologists interested in quantitative analysis of their data using appropriate, robust, and powerful methods. I have found the following eight books to be particularly useful and relevant.
P.G.N. Digby and R.A. Kempton 1987 Multivariate analysis of ecological
communities. Chapman and Hall, London and New York, 206 pp. ISGN 0-412-24640-
6 (Hb), ISBN 0-412-24650-3 (Pb).
This is written by two associates of John Gower, one of the leading
British applied statisticians based on Rothamsted. Gower has developed
several important techniques such as principal co-ordinates analysis, skew-
symmetry analysis, and Procrustes rotation. The book is, not surprisingly,
particularly strong on recent, 'Rothamsted-Gower' approaches, particularly
geometrical scaling or ordination methods, comparison of ordinations by
Procrustes rotation, and analysis of asymmetric association matrices. It
provides some of the first intermediate-level mathematical of
biplots, Procrustes rotation, and skew-symmetry analysis. It is rather weak
on classification; for example it barely discusses two-way indicator species
analysis and it does not mention sum-of-squares minimum-variance clustering at
all.
It is a very useful book that neatly complements A.D. Gordon's (1981) excellent Classification book (Chapman and Hall).
R.H.G. Jongman, C.J.F. ter Braak, and O.F.R. van Togeren 1987 Data analysis in
community and landscape ecology. Pudoc, Wageningen, 299 pp. ISBN 90-220-0908-
4.
This is an excellent and extremely valuable book, although written
primarily for community and landscape ecologists, much of it is directly
relevant to palaeoecologists. The largest chapter is on ordination (by Cajo
ter Braak) in which he builds up from indirect gradient analysis methods of
correspondence analysis, detrended correspondence analysis, principal
components analysis, and biplots to the new direct gradient analysis
techniques of canonical correspondence analysis and redundancy analysis.
Important and ecologically critical distinctions are made throughout between
linear and unimodal responses between organisms and their environment and
emphasis throughout is on ecological realism, interpretation, and robustness.
There are also excellent chapters on regression, with a clear and simple
explanation of logit regression (regression for presence/absence data) and of
generalized linear models, on calibration and response functions, and on the
analysis of spatial data, with clear introductions to spatial autocorrelation,
spatial semivariance, and kriging (a powerful spatial-interpolation
technique). Other chapters concern data collection, classification, and case
studies.
The book provides a wonderfully stimulating introduction to many new and powerful methods of data analysis that are of direct applicability in palaeoecology. It is written for those "who want to understand better the methods they are using and are eager to learn new, more powerful methods". It is a must for any quantitative palaeoecologist.
I.T. Joliffe 1986 Principal component analysis. Springer-Verlag, New York,
271 pp. ISBN 0-387-96269-7.
When I wrote my first principal components analysis program 16 years ago,
I would never have guessed or believed that one day there would be a whole
book devoted to PCA! PCA has become such a widely used (and misused)
technique in so many disciplines that there is renewed interest in the theory
and applications of PCA amongst applied statisticians. This book reflects
this interest, as it is written by a statistician primarily for statisticians.
It is a specialized text. Because of its strong mathematical form the book is
unlikely to be read by those who should read it most unless they are prepared
to take the trouble to understand the mathematics, whereas those who can
easily follow the text will probably learn little from it. It contains new
and useful material for palaeoecologists, including discussions of principal
components in regression analysis, biplots, robust estimation, outlier
detection, analysis of time series data, and analysis of closed, proportional
data. Surprisingly it gives little discussion on how to interpret PCA
results. The examples that are discussed tend to be interpreted solely in
terms of contrasts between [*p.6 / p.7*] variables with high, extreme
loadings. It is a valuable reference work for all who use PCA or related
techniques.
M.J. Greenacre 1984 Theory and applications of correspondence analysis.
Academic Press, London and Orlando, 364 pp. ISBN 0-12-299050-1.
The general technique of correspondence analysis has been frequently
reinvented or rediscovered, and given different names such as dual or optimal
scaling, reciprocal averaging, canonical analysis of contingency tables, and
analyses of correspondences. The geometric approach of correspondence
analysis has primarily been developed in France by Jean-Paul Benzecri and his
school of French data-analysts. Benzecri's work, although increasingly
quoted, is poorly known amongst Anglo-American statisticians. In this book
Michael Greenacre gives the first extensive exposition of the Benzecri
approach to correspondence analysis. It is a particularly valuable book for
all who have tried (unsuccessfully in my case) to understand Benzecri's texts
but who want to understand the theory underlying widely used techniques such
as reciprocal averaging or detrended correspondence analysis. Because the
Benzecri school primarily developed their approach in the context of
linguistics, Greenacre emphasizes the use of correspondence analysis to
summarize data in contingency tables. He does not develop the
unimodal/weighted averaging basis of correspondence analysis that is so
important in ecology and palaeoecology and that ter braak has developed so
strongly and so effectively in Jongman et al. (1987). Greenacre's book is
certainly useful, but its importance to palaeoecologists has largely been
superseded by ter Braak's chapter in Jongman et al. (1987).
Pierre Legendre and Louis Legendre 1987 Developments in numerical ecology.
Springer-Verlag, Berlin, 585 pp. ISBN 3-540-16086-8.
This massive book presents the proceedings of a NATO Advanced Research
Workshop on Numerical Ecology held in France in June 1986. It comprises a
series of invited lectures given primarily by applied statisticians or data
analysts in related disciplines on new (sometimes very new, sometimes not so
new) approaches to the analysis of ecological data, followed by a series of
fascinating reports by working groups of ecologists and mathematicians on the
possible application of these new approaches in six broad branches of ecology.
The new approaches discussed are scaling methods including non-linear scaling,
unfolding techniques, and two- and three-way multidimensional scaling, fuzzy
set clustering, constrained and conditional clusterings, fractal theory, path
analysis, and spatial point patterns and spatial autocorrelation. The
possible value (or otherwise!) of these novel approaches is critically
considered for microbial, marine benthic, marine pelagic, limnology,
terrestrial plant, and terrestrial animal ecology in the working group
reports. Unfortunately palaeoecology is not considered. Some of the 'new'
approaches are well established in palaeoecology (e.g. constrained
clustering). Some have obvious, potential palaeoecological applications (e.g.
fuzzy set clustering, spatial autocorrelation, constrained scalings and
unfolding techniques, path analysis). Others such as fractal theory and
various elaborations of multidimensional scaling appear to have little or no
relevance to palaeoecologists (or to terrestrial animal
ecologists if Dan Simberloff's working group report on Dirty data and clean
questions is representative!).
There is a lot of exciting and thoughtful material in this volume. Only time will tell whether the choice of new approaches made by the Legendre brothers will turn out to be useful to numerical ecologists. Certainly the volume is useful to quantitative palaeoecologists in highlighting novel techniques that we should think about and even try.
J. Aitchison 1986 The statistical analysis of
compositional data. Chapman and
Hall, London and New York, 416 pp. ISBN 0-412-18060-4.
In 1897 Karl Pearson showed the dangers of using percentages or
proportions in many statistical analyses, commonly resulting in 'spurious
correlations'. Despite contributions in the 1960's and 1970's from
statisticians and quantitative geologists (e.g. J.E. Mosimann, F. Chayes, J.C.
Butler) the problems of analyzing closed data have remained unresolved until a
series of papers by John Aitchison appeared between 1981 and 1984. Now in
this important book Aitchison has built on these papers to provide the first
major contribution to many of the problems associated with closed data,
induced spurious correlations, and constant-sum constraints. Aitchison's
approach usually involves logarithms of ratios and extends from standard
techniques of regression, principal components analysis and canonical
correlation analysis to questions such as complete subcompositional
independence, subcompositional invariance, and partition independence, all of
which are unique to percentage data. Many standard multivariate techniques
are not appropriate with closed data, but Aitchison provides a diverse and
challenging armoury of modelling and statistical testing techniques
appropriate solely for percentages.
The book is clearly written for statisticians and Aitchison presents many proofs, properties, and definitions for his statistical colleagues. However, in view of the central importance of the whole book to quantitative palaeoecology, the time and effort involved in understanding the text are worthwhile. A series of BASIC programs called CODA for IBM PC-compatible is available that solves the many examples given in the text. The programs are particularly useful in working through the book and helping to understand the new techniques.
Aitchison emphasizes that his log-ratios, log-linear contrasts, and additive logistic normal distribution are not the last word on this topic (see, for example, Gower's paper in Legendre and Lelgendre 1987!), but modestly suggests that his approach can produce results that "surprise many geologists". We are all in for surprises when we read this book and begin to apply Aitchison's methods in quantitative palaeoecology!
R. Gittins 1985 Canonical analysis - A review with applications in ecology.
Springer-Verlag, Berlin, 351 pp. ISBN 3-540-13617-7.
This book, like Greenacre's on correspondence analysis and Joliffe's on
principal components analysis, provides an in-depth and very detailed review
of one multivariate technique, namely canonical correlation analysis including
canonical correlation analysis and canonical variates analysis (= multiple
discriminant analysis). About one-third of the book deals with theory and
mathematical relationships. One third presents a series of detailed
ecological case studies. The remainder of the book concerns an
assessment of canonical analysis and future developments.
The book is primarily about canonical correlation analysis, a technique that has obvious appeal to ecologists as a means of studying relationships between two sets of multivariate data (e.g. vegetation and soils). It has, however, a crippling set of assumptions, in particular linear relationships between variables. As a result it has never been a particularly useful or appropriate technique in ecology. Within ecology and palaeoecology, canonical correlation analysis is now largely superseded by ter Braak's canonical correspondence analysis that assumes unimodal responses between biological and environmental variables (see Jongman et al. 1987).
Although there is an enormous amount of information about multivariate data analysis in general in Gittin's book, its main concern, canonical correlation analysis, is today only really of theoretical interest to ecologists and palaeoecologists. Gittin's view that "canonical analysis exists primarily to be used" seems excessively pragmatic when we know that many of its assumptions are biologically unrealistic. It is surely better to refrain from using techniques that exist but are inappropriate than to use techniques simply because they exist. For many quantitative palaeoecologists Gittin's book is now largely replaced by Jongman et al. (1987) where canonical correlation analysis is critically discussed and evaluated in the context of other canonical, constrained ordination techniques.
Considerable time and effort are needed to read and understand Gittin's book. The time is perhaps better spent with Jongman et al. (1987) or Digby and Kempton (1987).
W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vettreling 1986.
Numerical recipes - the art of scientific computing. Cambridge University
Press, Cambridge, 818 pp. ISBN 0-521-30811-9.
This is a wonderful and much needed book. It presents and discusses
efficient and reliable algorithms and FORTRAN and PASCAL source listing for
all the main aspects of numerical analysis - solution of linear algebraic
equations and matrix manipulations; interpolation, extrapolation and splines;
integration of functions; evaluation of functions; derivation of special
functions such as factorials, gamma functions, etc.; random numbers; sorting;
roots and non-linear equations; function optimization; eigenvalues and
eigenvectors, Fourier transforms; basic statistics; modelling, integration of
differential equations; partial differential equations; and two point boundary
value problems. Its 200+ subroutines are available on 5 1/4" diskettes
(FORTRAN ISBN 0-521-30957-3; PASCAL ISBN 0-521 309854-9) along with example
books and programs (FORTRAN ISBN 0-521-31330-9 and ISBN 0-521-30958-1; PASCAL
ISBN 0-521-30956-5 and ISBN 0-521-20955-7). The FORTRAN subroutines are in
FORTRAN 77 and work without difficulty in an IBM AT computer with either the
Professional FORTRAN 1.0 compiler or the Microsoft FORTRAN 3.3 compiler. I
have no experience of the PASCAL versions.
I know of no other book like this that covers so much material about difficult but important programming problems in a clear and concise way written for scientists rather than professional computer scientists. I only wish it had been available 10-15 years ago - many, many hours of frustration would have been saved! Strongly recommended.