INQUA-COMMISSION FOR THE STUDY OF THE HOLOCENE Working Group on Data-Handling Methods Newsletter 9, January, 1993 NOTE FROM THE COORDINATOR The Newsletter deals with a number of very different topics this time. John Birks agreed to contribute an article on computer-intensive techniques--a topic he discussed at Aix last September--which appear to open up a number of possibilities to those studying the Holocene. Malcolm Clark provides the second of his two articles on correlation by the technique of slotting. Robert Webb, Jonathan Overpeck, David Anderson, and Bruce Bauer discuss the establishment of the World Data Center-A (WDC-A) for Paleoclimatology at Boulder, Colorado, and outline what they can do for us. David Green discusses the world electronic databases and provides timely information that should be of interest to many readers. John Birks provides his useful Bookshelf 6. There is a list of the material we now have available for anonymous ftp at geology.wisc.edu. I discuss SLOTDEEP.EXE, a program that is fun and also useful. And Dr. Triage is back as well. During the 1993-94 academic year (Sept-May) I am to be on sabbatical to learn as much as I can about the Internet, electronic databases, e-mail, and computer uses in paleoecology. I hope to do some travelling during the year. If you are doing something you would like to share--or if I could help you with a problem of yours--please let me know, and I will try to schedule a visit if I am anywhere near your base of operations. I ask the readers of the Newsletter to send me information on any of the data-handling techniques that you have used which could be helpful to oth- ers. Please check your regular and e-mail addresses for accuracy. Send any corrections/suggestions to: Louis J. Maher, Jr. Department of Geology & Geophysics University of Wisconsin 1215 W. Dayton Street Madison, WI 53706 USA Phone: (608) 262-9595 FAX: (608) 262-0693 E-mail: maher@geology.wisc.edu IMPACT OF COMPUTER-INTENSIVE PROCEDURES IN TESTING PALAEOECOLOGICAL HYPOTHESIS H.J.B. Birks Botanical Institute, University of Bergen, All‚gaten 41, N-5007 BERGEN Norway E-mail: birks@cc.uib.no Introduction. Applied statistics is currently undergoing a major revolution (e.g. Noreen, 1989; Efron & Tibshirani, 1991; Manly, 1991) with the ever- increasing adoption of computer-intensive procedures (CIP) for statistical inference and estimation. This, in turn, has resulted from the extraordinarily rapid development of increasingly powerful personal computers. Efron (1979) suggested that one major impact of the increasing availability of personal computing resources for statisticians would be "thinking the unthinkable," namely that statisticians could begin to consider data-analytical methods of, say, 500 numbers that would require over 108 basic computations. Such ideas were totally unthinkable even 25 years ago when some of the earliest papers on quantitative palaeoecology began to appear. Computers are revolutionising applied statistics in at least three ways (Manly, 1991): (1) calculations can now be done faster and on larger and larger data sets than ever before, (2) several standard statistical techniques are being replaced by CIP that make fewer assumptions of the data but are more demanding computationally, and (3) problems for which there were no satisfactory analytical solution are now being solved by CIP (see Efron & Tibshirani, 1991). Is this revolution having any impact on quantitative palaeoecology? In this article, I will outline what CIP are, discuss their uses, and evaluate their potential impact in palaeoecology. What are computer-intensive procedures? They are data-analytical procedures that involve a huge number of often highly repetitive computations and thus they may take several hours to run, even on a 486 PC-machine. There are two major types. (1) Monte Carlo permutation methods for testing a specific statistical hypothesis by determining the [*p.1 / p.2*] significance level of some test- statistic calculated for a data-set through comparing this observed statistic with an empirical distribution of values for the statistic obtained by numerous (often 99 or more) permutations of the data under some assumed model. If the model implies all data arrangements are equally likely, the procedure is a randomisation test with random sampling of the randomisation distribution (Noreen, 1989; Manly, 1991). (2) Computer-resampling procedures (e.g. bootstrapping) to estimate the standard error, bias, or confidence interval for an estimate, e.g. estimated July temperature for 6000 B.P., by mimicking, by computer, sampling variation in the estimation procedure (Diaconis & Efron, 1983). Monte Carlo permutation tests. To illustrate the basic ideas of such tests, I will discuss work Andr‚ Lotter (Bern) and I have recently completed (Lotter & Birks, 1993) on the impacts of the Laacher See volcanic ash of late Alleržd age (ca. 11,000 B.P.) on late-glacial biota in lakes on acid- sensitive granitic bedrock in the Black Forest. We have biological data (e.g. pollen and diatom counts) as 'response' variables, environmental factors (e.g. volcanic tephra) of primary interest as 'predictors' or explanatory variables, and other environmental factors (e.g. sample age, sediment lithology, long-term climatic change from the Alleržd to the Younger Dryas) as 'covariables', the effects of which we wish to consider and partial out before testing for any tephra impacts. We used partial redundancy analysis (RDA) (ter Braak, 1990), a constrained form of principal components analysis, to find how much variance in the terrestrial pollen data is explained by the occurrence of tephra after the effects of climatic change and sample age have been partialled out. The eigenvalue of the first RDA axis is our test statistic of interest. At Rotmeer we find an eigenvalue of 0.11. Is it statistically significant? Alternatively is it large enough to reject the null hypothesis that the Laacher See ash had no effect on the terrestrial vegetation around the site when the effects of other explanatory variables have been considered? The biological data are closed compositional data and they are in a regular stratigraphical order. The statistical properties of the eigenvalues for a partial RDA with such data are totally unknown. There is no way that classical statistical inference can help in evaluating the significance of the observed eigenvalue. It is here that Monte Carlo permutation tests can assist. The basic idea is to exchange (i.e. permute) many times the residuals of the biological data after fitting the covariables and explanatory variables thereby giving many (n = 99) randomised data-sets (ter Braak, 1992). Because the data are in stratigraphical order a restricted permutation is required (ter Braak, 1990). The eigenvalue (actually a F-ratio, namely the eigenvalue divided by the residual sum-of-squares; ter Braak, 1990) is calculated for the 99 randomised data-sets. The observed value for the real data is then compared with the same test statistic based on 99 permuted data-sets and an exact Monte Carlo significance value calculated. We find that the observed value is not statistically significant (p=0.09), and we conclude that the Laacher See tephra had no effect on the terrestrial vegetation around Rotmeer (Lotter & Birks, 1993). Monte Carlo permutation tests provide a means of dealing with an otherwise statistically intractable problem and thus of testing specific palaeoecological hypothesis. We are, in a way, using modern computing power in place of theoretical analysis. Permutation tests and constrained ordinations can be used as robust, distribution-free alternatives to, for example, multiple discriminant analysis (MDA), partial MDA or one-way multivariate analysis of covariance, multivariate analysis of variance, and multiple regression for percentage, stratigraphically ordered data. Palaeoecological examples of such tests include Turner & Hodgson (1983), Walker et al. (1991), Gaillard et al. (1992), Lotter et al. (1992), and Odgaard (1992). Monte Carlo permutation techniques can also be used to derive significance levels for rate-of-change statistics and for modern- fossil analogue measures (Birks & Line, unpublished). I believe that there is considerable potential for the imaginative and critical use of permutation tests in quantitative palaeoecology, especially as palaeoecological data are invariably complex and statistically intractable and hence not really amenable to classical statistical testing. Computer re-sampling procedures. Suppose we have reconstructed lake-water pH 2000 years ago from fossil diatom data to be 5.7. Is it 5.7 ń 0.2, ń [*p.2 / p.3*] 1.0, or ń 2.5 pH units? How can we derive reliable standard errors of prediction for inferred palaeoenvironmental estimates? In the physical sciences, standard errors are usually estimated by replication. Extensive replication is not currently practical in such a labour-intensive subject as palaeoecology. We can, however, use computer re-sampling procedures to achieve replication. We have, for example, a modern training or calibration set of 248 modern diatom samples and associated lake-water pH data. We use this training set to estimate modern ecological parameters (e.g. species optima and tolerances) of the species in relation to pH by some form of regression. We then apply these estimates to fossil samples to infer past lake-water pH. It is commonly assumed that the standard error of the differences between observed pH and inferred modern pH is the standard error of prediction not only for the modern training set but also for the fossil samples. It is usually a gross under-estimate (ter Braak & van Dam, 1989). How can we derive more reliable estimates of the prediction error? Two approaches, both involving computer-intensive re-sampling, are the jackknife (Tukey, 1958; Efron & Gong, 1983) and the bootstrap (Efron & Gong, 1983; Efron & Tibshirani, 1986; L‚ger et al., 1992). In jackknifing, the reconstruction is done many times. In the first cycle, sample 1 is left out and all other 247 modern samples are included. In the second, sample 1 and samples 3-247 are included, and so on. 247 recon- structions are thus made using a training set of size 247. From these we can derive a jackknife estimate of pH, its variance, and standard error of prediction. A jackknife, outside statistics, is a knife with a mass of small pull-out tools (e.g. the Swiss army knife is an impressive jackknife!) that can be used to solve many small tasks without any additional tools. Tukey (1958) called this statistical approach for deriving standard errors and for testing hypotheses 'jackknifing,' simply because no additional tools or better methods are easily used. The jackknife can be regarded as an approximation to a more basic approach, termed the bootstrap and developed by Efron and associates (e.g. Efron & Gong, 1983). The name 'bootstrap' reflects the idea that its use is analogous to someone pulling themselves up by their bootlaces (Manly, 1991). In the context of palaeoenvironmental reconstructions, bootstrapping can be used to draw at random a bootstrap training set of 248 samples from the modern training set. As sampling is with replacement, the same sample can be selected more than once. Some modern samples will not be selected and they thus form an independent test set. pH is inferred by an appropriate regression and calibration procedure for both the modern test set and the fossil samples. The process is repeated many times (say 500-1000 bootstrap cycles) and bootstrap error estimates are obtained. Birks et al. (1990) show how to derive mean square errors of prediction for palaeoenvironmental reconstructions that incorporate the different components of the total prediction error estimated by bootstrapping. Computer re-sampling procedures provide means of deriving robust, reliable, and relatively unbiased standard errors of prediction for palaeoenvironmental reconstructions. They are thus of considerable potential in many palaeoecological problems. The recent review by L‚ger et al. (1992) provides a useful overview. However, bootstrap estimates are not explicitly available in any of the major computer packages such as GENSTAT, SAS, Minitab, SPSS, or BMDP, although bootstrap algorithms can be programmed in some of these packages. At present, bootstrap estimation relevant in palaeoecology is generally done by specific purpose-written software (e.g. WACALIB 3.1; Line, ter Braak & Birks, unpublished program). Considerable work is in progress by applied statisticians and theoretical statisticians on efficient bootstrap computations, on the derivation of confidence intervals by bootstrapping, and asymptotic theory for the bootstrap (see Efron & Tibshirani, 1991). It is an extremely active and rapidly developing area of research within statistics today, and with time these methods will surely become available in standard packages. J”ckel et al. (1992) provide an overview of the current range of bootstrap applications in statistics and their impact on statistical methodology. The bootstrapping approach can be used not only for statistical estimation but also, in specific instances, for statistical inference and hypothesis testing (e.g. Hall & Wilson, 1991). In palaeoecology, we may wish to test the null hypothesis that the rate of stratigraphical change in, say, lake- water pH at a particular time is [*p.3 / p.4*] no different from the rates of change at other times. We can test the observed rate of change of interest by means a time-duration or elapsed-time bootstrap test (Kitchell et al., 1987). This involves bootstrap sampling our stratigraphical time- series randomly with replacement many times to create ordered data sequences of the same time duration as the one of interest. These provide an empirical probability distribution for comparison with the observed rate of change (see Birks et al., 1990 for applications). Clearly, this type of bootstrap analysis has much potential in testing specific hypothesis about observed stratigraphical patterns. Possible impact. Before discussing the possible impact of CIP in quantitative palaeoecology, it is useful to recognise that a descriptive, non-experimental subject such as palaeoecology develops through three phases (Birks, 1992) - (1) the descriptive phase where patterns are detected and described, (2) the narrative phase where plausible explanations are presented for the observed patterns, e.g. reconstructions of the past environment, and (3) the analytical phase where specific testable hypotheses are proposed, tested, and rejected. The bulk (? over 95%) of palaeoecology is descriptive or narrative in character. Why is there so little analytical hypothesis testing in palaeoecology? I suspect a possible reason is because it is often so difficult to attempt hypothesis testing in palaeoecology. It is in a subject like palaeoecology that permutation tests are most valuable because they can be valid without random samples, they can be developed to take account of the special properties of stratigraphical or spatial data, and they can use 'non-standard' test statistics. All a permutation test can tell us is that a certain pattern in our data could or could not have arisen by chance (Manly, 1991). The test is completely specific to the data-set of interest. Permutation tests are thus ideal for many palaeoecological problems. The challenge is to devise (a) testable hypotheses and (b) appropriate Monte Carlo permutation procedures. If this challenge can be met, these techniques could have an important impact because they would contribute to the development of an analytical phase in palaeoecology. Computer-resampling procedures similarly could make an important contribution. More and more quantitative reconstructions of past environment are being compared with simulation results from climatic, geophysical, or geochemical models. The reconstructions used for comparison rarely have any standard errors of prediction and quantitative comparisons between the reconstructed and simulated values are rarely attempted. Bootstrapping and randomisation tests for spatial or temporal data could clearly help in testing the validity of such comparisons. CIP are likely to have an important impact in quantitative palaeoecology because they provide a means of doing statistical analysis and associated testing of data that are expressed as percentages, have the objects in a fixed order, have non-random and non-independent samples, have many variables with many zero values, and are highly skewed. Prior to CIP there were no really satisfactory solutions for analysing such data using classical statistical methods. CIP provide the palaeoecologist with the chance to test specific palaeoecological hypotheses in a statistical way, i.e. to do analytical palaeoecology, and not solely to use numerical methods for descriptive and narrative purposes. As Noreen (1989) suggests "The next few years are likely to be an exciting period for those involved in testing hypotheses. Recent dramatic decreases in the costs of computing now make revolutionary methods for testing hypotheses available to anyone with access to a personal computer. These methods are easy to understand, very general, and can avoid troublesome assumptions that are required with conventional methods." Acknowledgements. This article is based on a lecture I gave at the symposium on Computers and Palynologists at the Eighth International Palynological Conference held in Aix-en-Provence in September 1992. I am grateful to Lou Maher for the invitation to contribute to that symposium and to John Line, Andy Lotter, and Cajo ter Braak for valuable discussions about computer-intensive procedures in palaeoecology. References. Birks, H.J.B., 1992. Some reflections on the application of numerical methods in Quaternary palaeoecology. Publ. Karelian Inst., Univ. of Joensuu 102, 7-20. Birks, H.J.B., Line, J.M., Juggins, S., Stevenson, A.C. & ter Braak, C.J.F., 1990. Diatoms and pH recon- [*p.4 / p.5*] struction. Phil. Trans. R. Soc. Lond. B 327, 263-278. Diaconis, P. & Efron, B., 1983. Computer-intensive methods in statistics. Scientific American 248(5), 96-109. Efron, B., 1979. Computers and the theory of statistics: Thinking the unthinkable. SIAM Review 21, 460-480. Efron, B. & Gong, G., 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. Amer. Statist. 37, 36-48. Efron, B. & Tibshirani, R., 1986. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1, 54-77. Efron, B. & Tibshirani, R., 1991. Statistical data analysis in the computer age. Science 253, 390-395. Gaillard, M.J., Birks, H.J.B., Emanuelsson, U. & Berglund, B.E., 1992. Modern pollen/land-use relationships as an aid in the reconstruction of past land-uses and cultural landscapes: an example from south Sweden. Veget. Hist. Archaeobot. 1, 3-17. Hall, P. & Wilson, S.R., 1991. Two guidelines for bootstrap hypothesis testing. Biometrics 47, 757-762. J”ckel, K.-H., Rothe, G. & Sendler, W. 1992. (Eds.) Bootstrapping and related techniques. Springer Verlag, Berlin, 245 pp. Kitchell, J.A., Estabrook, G. & MacLeod, N., 1987. Testing for equality of rates of evolution. Paleobiology 13, 272-285. L‚ger, C., Politis, D.N. & Romano, J.P., 1992. Bootstrap technology and applications. Technometrics 34, 378-398. Lotter, A.F. & Birks, H.J.B., 1993. The impact of the Laacher See Tephra on terrestrial and aquatic ecosystems at two sites in the Black Forest (Southern Germany). (Manuscript in preparation). Lotter, A.F., Eicher, U., Birks, H.J.B. & Siegenthaler, U., 1992. Late- glacial climatic oscillations as recorded in Swiss lake sediments. J. Quat. Sci. 7, 187-204. Manly, B.F.J., 1991. Randomization and Monte Carlo methods in biology. Chapman & Hall, London, 281 pp. Noreen, E.W., 1989. Computer intensive methods for testing hypotheses. An introduction. J. Wiley & Sons, New York, 229 pp. Odgaard, B.V., 1992. The fire history of Danish heathland areas as reflected by pollen and charred particles in lake sediments. The Holocene 2, 218- 226. ter Braak, C.J.F., 1990. CANOCO - a FORTRAN program for CANOnical COmmunity ordination by (partial) (detrended) (canonical) correspondence analysis, principal components analysis and redundancy analysis, version 3.10. Microcomputer Power, Ithaca, New York. ter Braak, C.J.F. 1992. Permutation versus bootstrap significance tests in multiple regression and ANOVA. In Bootstrapping and related techniques (Eds. K.-H. J”ckel et al.), pp. 79-85. Springer Verlag, Berlin. ter Braak, C.J.F. & van Dam, H., 1989. Inferring pH from diatoms: a comparison of old and new calibration methods. Hydrobiologia 178, 209-223. Tukey, J.W., 1958. Bias and confidence in not quite large samples. Annls. Math. Stat. 29, 614. Turner, J. & Hodgson, J., 1983. Studies in the vegetational history of the Northern Pennines. III. Variations in the composition of the mid-Flandrian forests. J. Ecol. 71, 95-118. Walker, I.R., Smol, J.P., Engstrom, D.R. & Birks, H.J.B., 1991. An assessment of Chironomidae as quantitative indicators of past climatic change. Can. J. Fish. Aquat. Sci. 48, 975-987. ASSESSMENT OF SEQUENCE-SLOTTING Malcolm Clark Department of Mathematics Monash University Clayton, Victoria Australia, 3168 E-mail: rmc@monu1.cc.monash.edu.au 1. Introduction. In the previous Newsletter (Clark, 1992), I discussed the problem of combining, in an optimal fashion, two ordered sequences of data into a single combined sequence, subject to various order constraints. For example, the sequences could correspond to pollen data from two cores A and B, with pollen counts made at horizons A1 , A2 , ..., Am in Sequence A and at horizons B1, B2 , ..., Bn in Sequence B. A measure of dissimilarity (or "distance") is computed between the pollen counts at any two horizons from [*p.5 / p.6*] either core. Then the total dissimilarity (or "combined path length" CPL) for any proposed combined sequence of A's and B's is obtained by adding the distances between adjacent or consecutive horizons in the combined sequence. The problem then is to find the best combined sequence, i.e. the one with minimum total dissimilarity or minimum CPL. This optimal combined sequence (and corresponding minimum CPL, denoted here by C*) can be readily found using algorithms based on dynamic programming techniques. Since there are only a finite number of possible combined sequences, all these sequence-slotting algorithms will produce an answer, no matter what data they are applied to. This means that the results of any algorithm need to be assessed carefully. Questions which may be asked are: (1) Is there a unique optimal slotting? (2) Which parts of the combined slotting are most reliable, and which are least reliable? (3) Is the optimal slotting greatly superior to other possible ones? (4) Do some individual horizons have a big influence on the outcome? The first three questions are answered to some extent by the so-called "H- matrix", devised by Gordon et. al. (1988), which gives a graphical representation of the optimal slotting(s) and some alternative slottings. 2. Geometric Interpretation. To understand the H-matrix, we first note that any combined sequence or slotting may be represented geometrically as a path across a certain (m + 1) * (n + 1) grid. This grid has (m + 1) rows, labelled 0,1,2,... up to m, numbered from the top down, corresponding to the horizons in Sequence A. Similarly, it has (n + 1) columns, labelled 0,1,2,... up to n, numbered from left to right, corresponding to the horizons in sequence B. Each possible slotting can be represented by a path across the grid, starting at the top left (or north-west) corner, with co- ordinates (0,0), and ending at the bottom right (or south-east) corner, with co-ordinates (m,n). The path is constructed step-by-step, according to the following rules. (1) At any stage, the next segment of the path is either one step down or one step right from the current point. (2) If the next horizon in the combined sequence is an A, the path goes down one step; if the next horizon is a B, it goes right one step. If the path goes through the point on the grid with co-ordinates (j,k) (the intersection of j-th row and k-th column), the first (j + k) elements in the corresponding combined sequence comprise A1, A2,... up to Aj and B1, B2, ... up to Bk. Furthermore, Aj and Bk must be immediately adjacent. The combined sequence may start with a block of consecutive A's, or a block of B's. This is why the grid contains an extra row, labelled 0, and an extra column, labelled 0. With this method of construction, any slotting may be represented by a path, and any path corresponds to a possible slotting (provided the path obeys rules (1) and (2) above). For example, the combined sequence A1 A2 B1 A3 B2 A4 A5 B3 is represented geometrically by the following path. [Text Figure here] [*p.6 / p.7*] 3. The H-matrix. The H-matrix essentially assigns a numerical value to each point on the above (m + 1) * (n + 1) grid. For each combination (j,k), we define H*(j,k) as the minimum CPL subject to the additional constraint that the path passes through the point (j,k) on the grid. In other words, in the corresponding combined sequence, horizons Aj and Bk must be immediately adjacent. Then H(j,k) is simply a scaled-down version of H*(j,k), obtained by dividing the latter by C*, the minimum CPL for the original problem. Both H*(j,k) and C* take account of all other constraints, such as block-length constraints. The (m + 1) * (n + 1) numbers H(j,k) may be stored in a table or a matrix, known as the H-matrix. In principle, this matrix could be printed out in the same row by column format as the grid described in Section 2. The H- matrix has the following properties. (1) All points on the optimal slotting(s) have H-value of 1. (2) All other H-values are greater than 1. (3) It is easy to compute a lower bound on the CPL for any combined sequence; simply find the maximum of the H-values along the path corresponding to that specific sequence. For example, if this maximum is 1.2 say, then the CPL for that combined sequence must be at least 1.2 times C* , the optimum CPL. The H-matrix can be computed using a relatively straight-forward extension of the basic dynamic programming algorithm. This calculation is done automatically in my CONSLOT and PCSLOT programs (Clark, 1992). The H-matrix is displayed in a coded semi-graphical form, with all H-values of 1 printed as asterisks, and increasing H-values represented by successive letters of the alphabet. The threshold values for determining this latter coding are equally spaced on either a linear or logarithmic scale, as selected by the user. This coded form of the H-matrix may be visualised as a crude contour map of a canyon. The line of asterisks, representing the optimal slotting(s), can be thought of as a river, running from the north-west to south-east corner. The letters of the alphabet represent the walls of the canyon; the further down the alphabet, the higher the wall. The steeper the walls of the canyon, the more tightly is the optimum slotting constrained, and the more reliable is that part of the slotting. Conversely, in regions where the bottom of the canyon is fairly flat, as indicated by an expanse of A's in the coded H-matrix, the optimum slotting is not strongly determined. This is because a change in the slotting in that region would make only a small increase in the CPL. To put it another way, those paths marked with an A are only slightly worse than the optimal path, while those marked with an R are very much worse. If there is a unique optimal slotting, it will be represented by the "river" of asterisks which must be only one asterisk wide. Conversely, if at any stage the river widens out into a "lake", then there are multiple paths or slottings with the same minimum CPL. These alternative slottings can be re- constructed by taking alternative paths across each lake. 4. An Example. These ideas are well illustrated by the following example involving artificial data from two sequences each with 40 horizons. For simplicity, it is assumed that only one pollen variable is measured at each horizon. The data are plotted in Fig. 1. [Figure 1 plotted here] The Y-variable is nearly constant over the last 10 horizons in both sequences. This implies that the [*p.7 / p.8*] distances between these horizons will be relatively small, and alternative slottings of these last 10 horizons will give almost the same total distance or dissimilarity. Hence this bottom part of any optimal slotting will not be very reliable. Conversely, the slotting will be much more reliable in regions where the Y- variable is changing rapidly, at about horizon 30 in A and 27 in B for example. Fig. 2 shows the corresponding coded H-matrix as produced by PCSLOT, which confirms our predictions. The large "flat" area at the south-east corner, indicated by the array of consecutive A's and B's, confirms that this lower part of the slotting is not very reliable. Conversely, there are regions (such as j = 29, k = 28), where the canyon rises steeply from the river, indicated by C's or D's next to asterisks. This means that the corresponding portion of the optimal slotting is well-defined, in the sense that only a minor change in the path (i.e. combined sequence) will produce a substantial increase in the CPL. STANDARDISED H-MATRIX LOG. SCALE ... 0.9000...*... 1.0001...A... 1.0347...B 1.8474...+ TEST SEQUENCE B COLUMNS 0 TO 40 INCLUSIVE 0 0 0 0 0 0 1 2 3 4 + - + - + - + - + 0 *AGGIKKKLOPPPPPPPPPPPPPPPPPPPRRRRRRRRRRRR T *AGGIKKKLOPPPPPPPPPPPPPPPPPPPRRRRRRRRRRRR E **GGIJJJKOPPPPPPPPPPPPPPPPPPPRRRRRRRRRRRR S A*EFHIIIJNOOOOOOOOOOOOOOOOOOPRRRRRRRRRRRR T B*BBDEFFGKLLLLLLLLLLLLLLLNNOPRRRRRRRRRRRR -F***BCDDEIJJJJJJJJJJKKKLLNNOPRRRRRRRRRRRR S HBB**ABBCGIIIIIIIIIIKKKLLNNOPRRRRRRRRRRRR E JDDB*AAACGHIIIIIIIIIKKKLLNNOPRRRRRRRRRRRR Q JDDB*****BDDDDDDDEGHKKKLLNNOPRRRRRRRRRRRR U MHHFEEEC**AABCCCDEGHKKKLLNNOPRRRRRRRRRRRR E 10+MHHHHGGFB*AABCCCDEGHKKKLLNNOPRRRRRRRRRRRR N MHHHHGGFB**ABCCCDEGHKKKLLNNOPRRRRRRRRRRRR C MHHHHHHGBA*ABCCCDEGHKKKLLNNOPRRRRRRRRRRRR E MHHHHHHGBA**BCCCDEGHKKKLLNNOPRRRRRRRRRRRR MHHHHHHGCA**BCCCDEGHKKKLLNNOPRRRRRRRRRRRR A -MHHHHHHGCA**BCCCDEGHKKKLLNNOPRRRRRRRRRRRR MHHHHHHGCA**BCCCDEGHKKKLLNNOPRRRRRRRRRRRR MHHHHHHGCA**BCCCDDGGJKKKKMNOORRRRRRRRRRRR MHHHHHHGCAA**AAABBEFIIIJJLLNNQQQQQQQQQQQQ MHHHHHHGCCCA****ABDEHIIIILLMNPPPPPPPPPPPP 20+MHHHHHHGDDDBAAA***CDGGHHHKKLMOOOOOOOOOOOO MHHHHHHGEEECCBBAA*ABEFFGGIIKKNNNNNNNNNNNN MHHHHHHGGGGEDDDCC***CDDEEGHIJMMMMMMMMMMMM MHHHHHHHHHHFEEEDDA**CCDDDGGIILLLLLLLLLLLL MHHHHHHHHHHFEEEDDA**CCCDDGGHILLLLLLLLLLLL -MHHHHHHHHHHFEEEDDA**AABBBEFGHKKKKKKKKKKKK MHHHHHHHHHHFEEEDDA*****AADDFGJJJJJJJJJJJJ MHHHHHHHHHHFEEEDDA*****AADDFGJJJJJJJJJJJJ PLKJHHHHHHHFEEEDDA*******AACDGGGGGGGGGGGG PNNNNNNNNNNMLLLKKIHEDDDD*****CCCCCCCCCCCC 30+QQQQQQQQQQQPOOONNLLIHHHHEECB*BBBBBBBBBBBB RRRRRRRRRRRQPPPOOMMJJIIIFFDC*BBBBBBBBBBBB RRRRRRRRRRRQPPPOOMMJJIIIFFED*AAAAAAAABBBB RRRRRRRRRRRQPPPPONMJJJIIGGED*AAAAAAAABBBB RRRRRRRRRRRQQPPPPNMKJJJIGGED*AAAAAAAABBBB -RRRRRRRRRRRQQQQPPNNKKJJJHGFE******AAABBBB RRRRRRRRRRRRQQQPPNNKKKJJHGFEAAAA**AAABBBB RRRRRRRRRRRRQQQPPNNKKKJJHGFEAAAA****AABBB RRRRRRRRRRRRQQQPPNNKKKJJHGFEAAAAAAA*AAAAA RRRRRRRRRRRRQQQPPNNKKKJJHGFEAAAAAAA*AAAAA 40+RRRRRRRRRRRRQQQPPNNKKKJJHGFEAAAAAAA****** Figure 2 In this example, the threshold values for the coding of the H-matrix are equally spaced on a logarithmic scale. All those points on the matrix coded with an A have an H-value between 1.0001 and 1.0347, those coded B lie between 1.0347 and (1.0347)2, and so on. The maximum H-value is 1.8474. More details on the various options associated with the H-matrix are given in the READ.ME file available with PCSLOT. Multiple solutions are the rule rather than the exception when only one variable is measured, as here. The H-matrix contains two lakes, a narrow one ranging from horizons 13 to 18 in A, and a larger triangular one bounded by horizon 28 in A. This particular boundary is no coincidence, as the Y- variable for this horizon was deliberately mis-recorded as 0.876, rather than the "correct" 0.476. There are 5 different paths across or along the upper lake, and at least 15 alternative paths across the lower lake. Each combination of paths represents a slotting with the minimum CPL, so there are at least 75 alternative optimal slottings! The PCSLOT program gives just one of these, shown graphically in Fig. 3. When there are multiple solutions, PCSLOT generally chooses one corresponding to a path along the shore of each lake. [Figure 3 plotted here] 5. Additional Constraints. PCSLOT has the facility for the user to specify additional constraints on the [*p.8 / p.9*] combined sequence, for example that horizons A5 and B11 must be immediately adjacent. This constraint implies, for example, that B13 must not be immediately adjacent to A5, remembering that the horizons within each sequence must be kept in their correct order. In geometric terms, any path on the grid which passes through the point (5,13) corresponds to a prohibited slotting, since it violates that condition that A5 and B11 must be adjacent. Accordingly, H(5,13), the value of the H-matrix at (5,13), is set to infinity, and in PCSLOT the corresponding point on the coded H-matrix is printed as a `+'. In fact, there will be a band of points on the grid corresponding to similar invalid or prohibited slottings, for example (5,14), (5,15), ... (7,11), (8,11), ..., (3,12), (4,12) to name just a few. There will be a corresponding band of + signs in the coded H-matrix. In this case, this diagram may be visualised as a contour map of the Grand Canyon, with the +'s indicating the surrounding plateau. Each of the six different types of additional constraints available in PCSLOT will produce a similar band of + signs, indicating invalid slottings with infinite H-value. In an extreme case, the imposed constraints could be mutually contradictory. In this case, the H-matrix would consist entirely of + signs, but PCSLOT stops beforehand, and prints a warning message to indicate there is no slotting which meets all the imposed constraints, let alone an optimal slotting. 5. Sensitivity Analysis. The fourth question posed in the Introduction can be answered by a sensitivity analysis which involves leaving out one horizon at a time, and seeing what happens. For each horizon Aj in Sequence A in turn, PCSLOT temporarily deletes that horizon, and finds the optimal slotting between the remainder of Sequence A and the full sequence B. This is then compared with the optimal slotting based on the full data, by means of three summary statistics, which it calls CSTAT, MSTAT and NSTAT. CSTAT is the reduction in optimal CPL when horizon Aj is omitted from Sequence A, while MSTAT and NSTAT give different measures of how much the A's and B's are shuffled about when Aj is omitted. NSTAT counts up the number of times an A is swapped for a B, and vice versa, while MSTAT gives the total distance the A's have moved. Consider the following simple example in which we consider the effect of omitting A5. The first line gives the optimal combined slotting based on the full data (m = 6, n = 4) but for the purpose of the subsequent comparison, A5 is not shown. The second line shows the optimal combined sequence for the reduced sequence {A1,A2,A3,A4,A6} against sequence B. A1 A2 B1 B2 A3 B3 A4 B4 A6 A1 A2 A3 B1 B2 B3 B4 A4 A6 There are a total of 4 positions in the combined sequence, indicated by bold print, where there is an A in one sequence and a B in the other, so NSTAT = 4. Horizons A1 and A2 are in the same position in both combined sequences, but A3 has been moved by 2 positions (to the left), and A4 has been moved 1 position (to the right). In this case, MSTAT = 2 + 1 = 3. Large values of CSTAT indicate that the corresponding horizon Aj has a significant effect on the CPL. In such a case, MSTAT and NSTAT need not necessarily be large as well. Indeed, it is possible to have a large value CSTAT, but zero values of both MSTAT and NSTAT, indicating that the omission of Aj has no effect on the actual slotting. Fig. 4 shows part of the Sensitivity Analysis produced by PCSLOT for the test data used in Section 3. Only the columns headed J, CSTAT, MSTAT and NSTAT should be looked at. This output confirms that the erroneous Y- value for A28 has a big influence on the CPL, as judged by CSTAT = 0.5567. All the remaining CSTAT values (except two) are identically zero, as might be expected from the smooth curve in Fig. 3. In situations where there is just one Y-variable (as here), the MSTAT and NSTAT figures should be ignored, because they vary according to which of the alternative multiple solutions were selected. In rare cases, CSTAT, the reduction in CPL, can actually be negative, implying that the path length is longer when there is one less point to go through! This apparent anomaly can occur when additional order constraints are imposed. PCSLOT does the Sensitivity Analysis for Sequence A only. To do it for Sequence B, interchange the sequences, so that what was A becomes B and vice versa. Any additional order constraints must be [*p.9 / p.10*] changed accordingly. For example, if horizon A50 in the original Sequence A must precede horizon B58 in the original Sequence B, then with the new inter- changed A's and B's, B50 must now precede A58. SENSITIVITY ANALYSIS TEST SEQUENCE A TOLERANCE FOR MULTIPLE SOLUTIONS = 0.10E-03 J CSTAT MSTAT NSTAT JD: (Type, Opt.-K).... 1 0.0226 7 6 1: 2 0; 2 0.0000 7 6 1: 1 0; 4 1; 3 0.0000 7 6 1: 1 0; 4 1; .. ...... ... ... ................... .. ...... ... ... ................... 25 0.0000 3 4 1: 2 18; 26 0.0000 3 4 1: 2 18; 27 0.1592 2 2 2: 1 18; 2 18; 4 19; 28 0.5567 10 4 1: 1 22; 29 0.0000 2 2 2: 1 18; 1 19; 1 20; 1 21; .. ...... .. ... ..................... .. ...... .. ... ..................... 38 0.0000 0 0 2: 1 32; 1 33; 4 35; 39 0.0000 0 0 1: 2 35; 40 0.0000 0 0 1: 1 35; Figure 4 In conclusion, correlating or cross-matching two sequences of data by means of sequence-slotting is a complex process, and the results of any sequence-slotting algorithm should always be assessed carefully. The H- matrix provides a simple but informative graphical method for assessing the reliability of the optimal solution, for comparing near-optimal match- ings, and for showing any multiple solutions. The sensitivity analysis complements this, by finding those observations having a big influence on the final result. References. Clark, R.M. 1992. Sequence comparisons and sequence-slotting. INQUA - Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 8:3-6. Gordon, A.D., Thompson, R. and Clark, R.M. 1988. The use of constraints in sequence-slotting. Data Analysis and Informatics, V (ed E. Diday), North Holland. pp. 353-364. A DESCRIPTION OF THE NOAA PALEOCLIMATOLOGY PROGRAM AND WORLD DATA CEN- TER-A FOR PALEOCLIMATOLOGY Robert S. Webb rsw@mail.ngdc.noaa.gov Jonathan T. Overpeck j.overpeck@omnet.nasa.gov David M. Anderson dma@mail.ngdc.noaa.gov Bruce A. Bauer bab@mail.ngdc.noaa.gov The establishment of the World Data Center-A (WDC-A) for Paleoclimatology was announced recently at a WDC meeting in Beijing, following recommendations by the U.S. Committee on Geophysical Data, and with the endorsement of the International Council of Scientific Unions (ICSU) and the International Geosphere Biosphere Programme (IGBP). With this announcement, the U.S. National Oceanic and Atmospheric Administration (NOAA) National Geophysical Data Center (NGDC) Paleoclimatology Program has been expanded in scope to serve as a coordination center for paleoenvironmental data management activities needed by the international research community. An immediate goal of the WDC-A for Paleoclimatology in Boulder is to join IGBP Past Global Changes (PAGES) in coordinating the design and implementation of a global, sciencedriven data management system that integrates all types of paleoenvironmental data needed by the international global change community to identify the patterns and the causes of past climatic and environmental change. Data are the basic material for this research and must be integrated into a form that can be easily accessed and used by all who need them. The Paleoclimatology Program is working closely with the PAGES Core Project Office in Bern, Switzerland, as well as the IGBP-Data and Information System (DIS) to develop an easily accessible international data system for the acquisition, management, and distribution of paleoenvironmental data. These data include primary data (e.g., raw tree-ring measurements, fossil counts, isotopic measurements); secondary data developed from the raw data (e.g., tree-ring chronologies, fossil percentages, isotopic ratios as a function of age); and tertiary information inferred from the primary and secondary data (e.g., paleoclimate estimates, sea-surface temperature or paleovegetation reconstructions). Also archived are some modern calibration data needed to convert primary and secondary data into quantitative estimates of past climate, ocean, or biosphere conditions; time series of hypothesized climate forcing (e.g., solar, [*p.10 / p.11*] volcanic, trace-gas, or astronomical changes); climate boundary conditions through time (e.g., ice extent and height, land surface characteristics); and output from atmosphere, ocean, and biosphere models. On the national front, paleoclimatic data management efforts between NOAA, the U.S. National Science Foundation (NSF), and the U.S. Geological Survey (USGS) are being coordinated so that all data generated with U.S. federal assistance can be placed in the public domain quickly and in a form that is easy to share. The program actively encourages formal efforts to coordinate with specific proxy data communities to build new or expanded databases, including those for fossil pollen data, packrat midden data, plant macrofossil data, ice-core data, coral data, tree-ring data, paleosol data, paleovegetation data, past sea-surface data, loess data, lake level data, and climate model boundary conditions and simulations. WDC-A and PAGES efforts also serve to provide the long records of past environmental change that are needed to separate natural from human induced (i.e., greenhouse) climatic change. Another important application of paleoenvironmental data is in the area of model development and validation. The paleoclimatic record is integral to understanding the mechanisms of climatic change. This understanding must be built into predictive models. Furthermore, a critical test of the ability of these models to simulate realistic change is to attempt to simulate past change. Toward this end, a large international effort has been established to compare the ability of 12 major climate models (GCMs) to simulate known climatic conditions at selected times in the past. The NATO-sponsored Paleoclimate Model Intercomparison Project (PMIP) has recently designated the Paleoclimatology Program to archive and distribute digital boundary conditions files (e.g., sea surface temperatures, terrestrial ice sheet height and extent, land surface properties) for GCM simulations of 6000 and 18,000 yr B.P. The data center also will work closely with PMIP to provide paleoclimate estimates that can be used in assessing the simulations. The World Data Center-A contains an ever-growing collection of paleoclimate datasets. The goal in archiving these datasets is to make them easily available to the international community. The WDC-A is working to promote data exchange and sharing at the international level and can provide the assistance to make data sharing easy. Within the U.S., most funding agencies (e.g., NOAA, NSF, USGS) already require that research results (data) be made available to the public within a reasonable period of time. The list of paleoclimate datasets previously archived includes the International Tree-Ring Databank (ITRDB), western North American climate reconstructions, the CLIMAP data, the SPECMAP Archive #1, orbital variations and insolation data, select lake varve sediment data, ice core data, fossil planktonic foraminifera abundances, and marine carbonate stratigraphies. Recent updates have been received by the WDC-A to the ITRDB, to the orbital forcing and insolation dataset, and to the Quelccaya Ice Cap data. Newly contributed datasets include: the COHMAP eastern North America fossil pollen data; SPECMAP Archive #2; the Vostok Ice Core carbon dioxide, methane, dust, and temperature reconstructions; selected time series from the Bradley and Jones book "Climate Since A.D. 1500"; Russian fossil planktonic foraminifera abundances; high resolution ENSO coral records; the Barbados sea level record; and the Barbados U/Th-14C calibration data. On the horizon, the WDC-A already has commitments to receive additional paleoclimate datasets including: western North America fossil packrat midden pollen and macrofossil data; the Oxford/COHMAP lake level database; terrestrial ice sheet extent (size and height) and shorelines at the last glacial maximum; and NASA/GISS GCM simulations for 6K and 18K. Workshops and data cooperatives have been funded by NOAA to help focus the international scientific community on compiling other important paleoenvironmental datasets including high resolution coral records, global ice core data, North American macrofossil data, paleolimnological data, information on the last interglacial in the Arctic and sub-Arctic, global vegetation at the last glacial maximum, and late Pleistocene paleosols. The goal of the Paleoclimatology Program is to distribute data in easy-access formats at a minimal cost. The center remains committed to ensuring that data are available to all interested users and therefore will continue to distribute data in ASCII format on diskettes for DOS, UNIX, and Macintosh machines at the cost of reproduction and distribution. An effort will continue to be made to try to accept contributions of digital paleoenvironmental data in any logical file format from Macintosh, DOS, or UNIX machines. Data on magnetic media or over FTP/INTERNET are acceptable. Preferred formats for contributing data to the WDC-A are: 1) standard ITRDB ("Tucson [*p.11 / p.12*] format") for tree ring data, 2) Tilia Graph for pollen data, 3) CLIMAP structured ASCII files for deep sea fauna, 4) tab or space delimited ASCII, 5) commercial spreadsheets (e.g., LOTUS, EXCEL, with complete documentation). The WDC-A for Paleoclimatology will try hard to make data submission easy. The WDC-A solicits contributions of all paleoenvironmental datasets for archiving and distribution and welcomes any suggestions on how to make the international sharing of paleoclimate data easier. The Paleoclimatology Program is also in the process of developing a data access software, PaleoVu, to provide users with a comprehensive browse and visualization of all archived paleoclimate datasets. PaleoVu is a graphical tool to display and access data on a variety of different platforms, and is near completion for Microsoft Windows, with versions soon to follow for the Macintosh and OPEN LOOK for the X Window System (Sun SPARC Stations). PaleoVu will display data geographically as maps of site locations and/or mapped reconstructions of paleoenvironmental conditions for select time intervals. The user will be able to select data through a variety of filters as a function of data type, region, or temporal coverage, preview data as graphs, and then extract data for export in user-prescribed formats. A prototype of PaleoVu will be available in 1993. A major advance in data accessibility has been the establishment of an ANONYMOUS FTP / INTERNET server that can be used to obtain all datasets free of charge. Data distribution utilizing Internet or other linked wide area networks permit users to browse and download data archived by the NOAA/NGDC Paleoclimatology Program / World Data Center-A (WDC-A) for Paleoclimatology. To LOGON: FTP NGDC1.NGDC.NOAA.GOV -or- FTP 192.149.148.121 Enter anonymous as your login name, and your e-mail address as password You should now see an FTP> prompt Sample commands: FTP> cd paleo (change to /paleo directory) FTP> ls (display data file list on screen) FTP> cd climap18 (change to the CLIMAP 18,000 BP data directory) FTP> get climap18.readme (copy readme file to your computer) FTP> mget sst* (copy all SST files to your computer) FTP> cd (change back to the 'root' directory) FTP> cd pub (change to the public directory) FTP> put mydata (send to NGDC your file "mydata") FTP> quit (end FTP session) For information on the program or to be added to the mailing list contact Mrs. Mildred England, NOAA National Geophysical Data Center, Paleoclimatology Program/World Data Center-A for Paleoclimatology, 325 Broadway, E/GC Boulder, CO 80303 USA. [Telephone: (303) 497-6227; E-mail: mke@mail.ngdc.noaa.gov] DATABASING THE WORLD David G. Green Centre for Information Science Research Australian National University GPO Box 4 Canberra 2601 AUSTRALIA E-mail: david.green@anu.edu.au Computers are forever challenging us with new ways of doing science. Now that computers are a familiar sight in the laboratory the next challenge is to adapt to thousands of computers all joined together. Even by conservative estimates, thousands of institutions and perhaps millions of researchers are now served by Internet (Krol, 1992), a vast communications web that links together computers all around the world. The services and information available on Internet are astounding. Access to world-wide electronic mail and electronic newsgroups covering hundreds of topics are just the beginning. Being connected to Internet means having the resources of literally thousands of computers at your fingertips. Recognizing the advantages of free information exchange, many computer sites now allow guest logins by users over the network. What is more they make available various [*p.12 / p.13*] Figures 1 and 2 on p. 13 [*p.13 / p.14*] data, software and services that can be freely copied or used.The following examples can only hint at the incredible range of information already available: * On-line access to telephone directories, bibliographies and library catalogs in many parts of the world. * Free software - many sites maintain libraries of public domain software. The Free Software Foundation at MIT develops and distributes high-quality, free software under its GNU Project. * Molecular biology databases, software and bibliographies - the Australian National Genomic Information Service (ANGIS) at the University of Sydney maintains up-to-date copies of the major databases. * Satellite and weather data - the University of New Mexico alone makes available 90 gigabytes worth! * Geographic data - electronic atlases, census data and summaries such as the CIA World Databank and Factbook (maps, facts and figures about every country in the world). * Electronic texts - Project Gutenberg, a public domain project, produces electronic versions of English language texts, ranging from Roget's Thesaurus and the Complete Works of Shakespeare to the CIA World Factbook and US Census. For several years now the basic means of accessing files across the network has been FTP ("File Transfer Protocol"). Network archives use the "anonymous ftp" protocol. For example ==================================================== ftp life.anu.edu.au (logging in to the site LIFE at) (the Australian National University) Connected to life.anu.edu.au. 220 life FTP server (SunOS 4.1) ready. Name (life.anu.edu.au:david): anonymous 331 Guest login ok, send ident as password. Password: (give your electronic mail address) 230 Guest login ok, access restrictions apply. ftp> ls (gives you a directory listing) ftp> cd /pub/biomathematics (changes directory to /pub/biomathematics) ftp> bin (changes mode to binary) ftp> get polsta.zip (retrieves the file polsta.zip) =================================================== The number of network archives has grown rapidly, so that finding information, or even knowing what is available, among the thousands of sites is extremely time-consuming. ARCHIE resolves this problem by providing a database of the contents of all known sites. These databases are provided at several major sites, such as archie.au (Australia), archie.funet.fi (Finland) and archie.mcgill.ca (Canada). They can be queried either by logging in directly via Telnet (using the name "archie"), by electronic mail (e.g. to archie@archie.au) with the message consisting of keywords (e.g. "help"). For example ==================================================== telnet archie.au (connecting to the local archie server) Trying 139.130.4.6 ... Connected to archie.au. Escape character is '^]'. SunOS UNIX (plaza.aarnet.EDU.AU) login: archie (log in name is "archie", no password) YOU ARE RUNNING ON ARCHIE.AU (sometimes known as plaza.aarnet.EDU.AU) If you have any problems with archie, send mail to ccw@archie.au This machine is a brand spanking new SparcStation 2 purchased by AARNet funds to further serve the AARNet community. The machine lives directly on the AARNet backbone so should provide excellentconnectivity to all points of AARNet. archie> help (asking for help) Help gives you information about various topics, including all the commands that are available and how to use them. ... etc. archie> quit (finishing a session) ==================================================== User-friendly interfaces, such as XARCHIE (Fig. 1), now make it possible to locate and retrieve files at the touch of a button. Recently, several other protocols have appeared that allow a more systematic approach to searching the network. They include WAIS (Wide Area Information Servers), World Wide Web and Gopher. In Gopher, which has spread the fastest, the user retrieves files by selecting from a menu. Since the menu normally includes links to other Gopher servers, it is possible to hop from site to site. The recent introduction of an indexing system (Veronica) means that users can create and use customized menus "on the fly" (Fig. 2). Network publication. With publication delays often running into years, researchers are increasingly turning to Internet to distribute their results quickly. Furthermore the sheer number of journals means that published work is often missed by other researchers. Electronic collec- tions of papers and references provide a way to communicate research results and innovations. [*p.14 / p.15*] Network publications (e.g. electronic journals) need not be limited to the text and figures of traditional paper publications. Other material can include bibliographies, databases, and software. For instance, the fastest and simplest way to distribute software is to make it freely available on Internet. At present the main drawback to electronic publication on Internet is lack of formal recognition. However, librarians, publishers and site managers are now working on such issues as registering electronic publications and establishing repositories for electronic publications. Coordinating research. Perhaps the most profound effect of Internet on science has been to usher in an era of cooperative science on a scale never seen before. In some areas of research, notably molecular biology, distribution of information over Internet has grown explosively. The most visible result is the appearance of international, public-domain databases such as Genbank and EMBL. As these databases become ever more enormous, working scientists are coming to rely on them as sources of reference. In molecular biology it is already standard practice to compare newly derived sequences against existing ones in the major databanks. Many journals (e.g. Nature) now demand that results be submitted to one of these network databases as a precondition for publication. Contributing to network databases makes both economic and scientific sense. We cannot afford the luxury of carrying out research in piecemeal fashion. Given limited resources, it is essential to make maximum use of every piece of available data. Data that is used only once is like a disposable soft drink bottle - good things come out of it, but thereafter it is junk. The archives of the world's institutions are full of this refuse from uncoordinated research. Ideally, the results of every piece of research should not only answer an immediate question, but also contribute data to a larger scientific jigsaw. Many topical issues, such as biodiversity, are crying out for cooperative databases to support both research and decision-making. Cooperative databases convey many advantages. Previously difficult studies become easy. Completely new kinds of study become possible and there is a significant serendipity effect that emerges as data are combined in new ways. For instance, comparative studies of molecular databases have already yielded new insights about gene families and the mechanisms of evolution. There is every reason to expect that databases in other fields of biology will prove equally as fruitful. Potential network projects in Quaternary science. There are several possible kinds of public domain databases that could be set up on the network to serve Quaternary studies. They include - compilations of useful software; - electronic databases of scientists working in the field; - pollen identification keys (including images); - annotated bibliographies of relevant publications; - abstracts of recent publications; - compilations of data (e.g. complete pollen site records). Each of the above kinds of database would contribute materially to Quaternary existing research projects. Public domain databases usually conform to IAFA standards (Internet Anonymous FTP Archive). They are normally characterized by the following features: COORDINATION - There is a controlling agency or organization that manages the database, receives and processes new entries, and communicates relevant news to its users. PARTICIPATION - Anyone may contribute data to the database. Major databases announce new entries via special newsgroup or mailing lists. ACCESS - Anyone may access, copy or use the database at any time. Normally access is via a computing network using a standard protocol. STANDARDS - Contributors must use standard fields and attributes in submissions (e.g. Croft, 1989). This standard must be well-defined and should be publicized as widely as possible (see below). Usually it is expressed as a submission form (electronic, printed, or both) that is filled in by contributors. FORMAT - Textual data (including bibliographies, mailing lists etc) are normally submitted and stored as ascii files in tagged field format (see Appendix). The database may be compressed, using standard utilities, to simplify network transfer. Images should be in one [*p.15 / p.16*] of the common formats in use, such as GIF (Graphic Interchange Format). QUALITY CONTROL - Users need some guarantee that data provided in a database are both valid and accurate (Green, 1991, 1992). Quality control checks can be applied by database contributors, coordinators, or users - preferably all three. ACKNOWLEDGEMENT - Every entry should include an acknowledgement of its contributor. This is essential to the notion that contributions are a form of publications. AGREEMENTS - there should be an explicit list of terms and conditions that contributors and users must agree to. Notably, users agree to acknowledge the project and to waive liability for any use they make of the data. Contributors agree to place their data in the public domain. LIFE at the Australian National University. The Australian National University Bioinformatics Facility provides a wide range of biological information and software through its Internet anonymous FTP archive: site life.anu.edu.au login anonymous password (your email address) directory /pub and its subdirectories Current topics include biodiversity, bioinformation, complex systems, landscape ecology, molecular biology, and neurophysiology. For instance, we are developing prototype network information systems and protocols for the International Organization for Plant Information (IOPI), which aims to document the distributions of the world's plants. Freely available to all pollen analysts, for instance, is the program POLSTA (/pub/biomathematics/polsta_ .zip), described in previous issues of this newsletter, which is an interactive PC package that provides tools for analysis and modelling of pollen time series. References. Deutsch, P. (1992). Publishing Information on the Internet with Anonymous FTP. IAFA DOC II. Green, D.G. (in prep.). Databasing diversity - a distributed, public-domain approach. In preparation. Krol, E. (1992). The Whole Internet. O'Reilly and Associates. APPENDIX % ---------- < START : Cut here > ---------- ##### % Part 1 CONTACT REGISTRATION % % Please complete this registration form about % the source of this dataset. % This information is needed for the following % reasons: % - identifying who contributed the dataset % - identifying who produced and/or main- % tains this dataset % - telling users whom to contact regarding % this dataset % - linking together information from the % same source % SOURCE Name of person or organization who produced the dataset CONTACT Name of person or organization to contact about the dataset EMAIL Electronic mail address for queries about the dataset ADDRESS Postal address for correspondence about this dataset PHONE International telephone number FAX International fax number % Part 2 DATASET REGISTRATION % % Please complete this registration form about % the data set. This information is crucial % for the following % - Demonstrating the validity of the data % - Defining the methodology & data lineage % for future users % - Identifying this study for all records TITLE Give a short descriptive name for the database. DATE When was the dataset last revised? PURPOSE Why were the data collected? and how? SOURCES For compilations, indicate the orig- inal datasets COMPILER Who is responsible for compiling/ upkeeping the data? STANDARD What standard format (if any) does the dataset conforms to? (e.g. Genbank entry) PROGRAMS Name any special software used to read/manipulate the dataset. REFERENCES Give details of relevant publica- tions (e.g. methods, uses) AUTHOR Name(s) of the author(s) TITLE Name of the book or article PUBLICATION Details of book, journal, publisher, volume & pages. VALIDATION What checks were applied to ensure that data are correct? ASSOCIATIONS Name any other related data sets. (e.g. t001.dat etc) COMMENTS Mention any important issues not covered above. [*p.16 / p.17*] % Part 3 Methodology - repeat as many times as % necessary TAXON Start description of methods CODE Taxon code to use in the data records FAMILY GENUS SPECIES % Part 4 DATA RECORDS - repeat as many times % as necessary RECORD Start of new record DATE TAXA List of taxon codes SITES Landscape units for this record ##### ---------- < STOP : Cut here > ---------- NEW BOOKSHELF 6 H. J. B. Birks E-mail: birks@cc.uib.no The following recently published books may be of interest to readers of this Newsletter. American Statistical Association 1991 Proceedings of the Section on Statistics and the Environment. American Statistical Association, Alexandria. 242 pp. Paperback. F. L. Bookstein 1991 Morphometric tools for landmark data geometry and biology. Cambridge University Press, Cambridge. 435 pp. D. E. G. Briggs & P. R. Crowther (Eds.) 1992 Palaeobiology. A synthesis. Blackwell, Oxford. 583 pp. Paperback. P. J. Brockwell & R. A. Davis 1991 ITMS: an interactive time series modelling package for the PC. Springer-Verlag, New York. 104 pp. Paperback with diskette. J. C. Duplessy, A. Pons, & R. Fantechi 1991 Climate and global change. Commission of the European Communities, Brussels. 357 pp. Paperback. R. C. Eberhart & R. W. Dobbins (Eds.) 1990 Neural network PC tools - a practical guide. Academic Press, San Diego. 414 pp. C. N. Hewitt (Ed.) 1992 Methods of environmental data analysis. Elsevier Applied Sciences, London & New York. 309 pp. R. A. Johnson & D. W. Wichern 1992 Applied multivariate statistical analysis (Third edition). Prentice Hall, London. 642 pp. Paperback. M. Kent & P. Coker 1992 Vegetation description and analysis. A practical approach. CRC Press, Boca Raton. 363 pp. D. G. Kleinbaum 1991 Logistic regression module series. University of North Carolina at Chapel Hill. 6 parts. Paperback. G. V. Middleton 1991 Nonlinear dynamics, chaos and fractals with applications to geological systems. Geological Association of Canada Short Course Notes 9, 235 pp. Paperback with diskette. National Reasearch Council 1991 Spatial statistics and digital image analysis. National Academy Press, Washington DC. 234 pp. Paperback. L. O'Brien 1992 Introducing quantitative geography Measurement, methods and general linear models. Routledge, London & New York. 356 pp. Paper- back. L. Orloci 1991 CONAPACK: program for canonical analysis of classification tables. SPB Academic Publishing, The Hague. 126 pp. Paperback. R. L. Peters & T. E. Lovejoy (Eds.) 1992 Global warming and biological diversity. Yale University Press, New Haven and London. 386 pp. M. Reeves 1989 Microcomputer graphics for geoscientists. Geological Association of Canada, Short Course Notes 5, 149 pp. 9 diskettes. Available from Geological Association of Canada, Department of Earth Sciences, Memorial University of Newfoundland, St John's, Newfoundland, Canada A1B 3XS. H. H. Shugart, R. Leemans, & G. B. Bonan 1992 A systems analysis of the global boreal forest. Cambridge University Press, Cambridge 565 pp. [*p.17 / p.18*] A. T. Walden & P. Guttorp (Eds.) 1992 Statistics in the environmental and earth sciences. Edward Arnold, London. 306 pp. MATERIAL AVAILABLE BY ANONYMOUS FTP FROM GEOLOGY.WISC.EDU A number of you have visited the new INQUA file boutique at geology.wisc.edu that was announced in Juggins et al. (1992). I want to provide a skeleton listing of what is there at the start of 1993. There has been a change of the computer's Internet address number (144.92.137.14) owing to a shuffle on our campus network, so I am stating again how it can be reached. 1. Make an Internet connection for file transfer protocol (ftp): ftp geology.wisc.edu (or ftp 144.92.137.14) 2. Type anonymous as the userid, and YOUR e-mail address as your password. 3. The files are in /pub/inqua, so change to that directory: cd pub/inqua 4. Type help to see the allowable commands. 5. You can get any individual file ending in *.TXT as an ASCII text file; ASCII is the mode you are in when you logon. If you wish to obtain an execu- able file ending in *.exe you must toggle the system to binary mode by typing the command binary. 6. The complete listing of available files can be found in README.TXT: Transfer a file using get: get readme.txt -- etc. 7. The command quit ends the ftp session. The binary *.EXE files are self-extracting ZIPPED files. Once you get the file on your IBM PC or clone, simply type the file's name, and it will self- extract automatically. Contents of Files in /pub/inqua (January 1, 1993) Bibliographic data from Norway: bot1intr.txt, botbib1.txt, and botbib2.txt H. J. B. Birks, Hazel Juggins and Magne S‘tersdalan 1990. Annotated bibliography of numerical methods in Quaternary pollen analysis 1985-1989: Botanical Institute, University of Bergen, Allegaten 41, N-5007 Bergen, Nor- way. (References 1 to 660: ASCII text files: bot1intr.txt and botbib1.txt) H. J. B. Birks and Heather A. Austin 1992. An annotated bibliography of numerical methods in Quaternary pollen analysis 1990-1991: Botanical Institute, University of Bergen, Allegaten 41, N-5007 Bergen, Norway. (References 661 to 910: ASCII text file: botbib2.txt) BTA.EXE (Binary To ASCII utility by Eric C. Grimm. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 8, July 1992. This is an executable program that can encode binary files to ASCII so that they can be sent by regular email. The receiver also needs a copy of BTA.EXE to decode the ASCII version back to the original binary file.) MVSP2E.EXE (MultiVariate Statistical Package by Warren L. Kovach. See INQUA-Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 4, July 1990.) NEWLTR-1.TXT, NEWLTR-2.TXT, etc. (ASCII copies (without figures) of the INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletters #1 - #8.) PALYPLT.EXE (Self-extracting package of PALYPLOT, Craig A. Chumbley's programs for producing a printed pollen diagram. The program produces a batch file that can be imported into Generic CADD Level 3 or Generic CADD 5, or Generic CADD 6. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 5, January, 1991, p. 2-4.) Obtain a copy of README.TXT by ftp; it explains how to install PALYPLOT on your system.) PCSLOT.EXE (Self-extracting package of the PC-version of the SLOTTING programs by Malcolm Clark. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 8, July 1992, p. 3-6.) POL-EGA.EXE for IBM PC with EGA Graphics POL-HERC.EXE for IBM PC with Hercules cards POL-VGA.EXE for IBM PC with VGA Graphics POL-VGAM.EXE for IBM PC with monochrome VGA Graphics [*p.18 / p.19*] (Self-extracting packages of palynology programs of Louis J. Maher. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletters 4 [July 1990], 5 [January 1991], 7 [January 1992].) POLCNTPK.EXE (Self-extracting package of program by Louis J. Maher that turns IBM PC into a bank of 100 counters for Pollen, Microfossils, etc. See INQUA-Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 8, July 1992.) POLISH.EXE (Self-extracting package of PAL, the Palynology Database. See Ralska-Jasiewiczowa and Walanus in INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 5, January 1991.) PSIMPOLZ.EXE (Self-extracting package of PSIMPOLL programs by Keith D. Bennett. PSIMPOLL_ .EXE reads data files to produce PostScript for printing a pollen diagram. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 7, January 1992 and Newsletter 8, July 1992.) RAREFACT.EXE (Self-extracting package of RAREPOLL, RAREFORM, and RARECEP - J.M. Line and H.J.B.Birks. This package contains the programs described and used in H.J.B. Birks & J.M. Line (1992) The use of rarefaction analysis for estimating palynological richness from Quaternary pollen-analytical data. The Holocene, 2, 1-10.) README.TXT (The complete listing of the files on /pub/inqua. You are reading a very abridged summary.) SLOTDEEP program of L.J. Maher. (This program is described on p. 21-26 of this Newsletter. Choose SLTDEPEG.EXE for EGA Graphics or SLTDEPVG.EXE for VGA Graphics.) TRANZIP.EXE (Self-extracting package of TRAN AND ZONE Programs by Steve Juggins. See INQUA-Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 6, July 1992, p. 4-6) Reference. Juggins, S., Kovach, W, and Maher, L. 1992. The data-handling internet. INQUA - Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 8:17-24. THE SILICONE DATABASE After mentioning the problems I was having with pollen mounted in silicone fluid (Newsletter #8), I received the following comments: Keith Bennett (Univ. of Cambridge, England) reported: "...we have two sets of reference material mounted in [silicone]. The earliest must now be nearly 30 years old: produced by Alan Smith when he was in Belfast. I don't recall seeing any problems with grains on these slides. I think the sealant is beeswax. The other set is slides that we have made in the last 14 years. There are some problems on these, but we haven't tracked down what is causing it. They are sealed with paraffin wax. Suspicions centre on the sealant, air bubbles, temperature variations, and combinations of these." Robert E. Nelson (Colby College, Waterville, Maine) reported: "As one of the relatively new members of the profession, my pollen reference slides only go back to about 1975 or so. I've noted significant swelling in my glycerine-jelly mounted materials (the oldest), though I switched to silicon oil (2000 cs.) about 1979. Have usually used clear fingernail polish as a sealant, completely encircling the coverslip, though many slides are sealed with a circle of paraffin beneath the slip. Have checked my reference collection and have not seen any deterioration in at least a dozen of the older slides...." Valerie A. Hall (Queen's University, Belfast, Northern Ireland) wrote saying "About five years ago we reviewed...our type collection as many of the [silicone mounted] slides...contained crystals. In general these were not reducing the quality of the collection but we needed to know if the problem was getting worse with...time. The type collection was prepared about 25 years ago using both fresh and herbarium material. Most of the collections (over 1300 slides) is comprised of pollen types from the N-E European flora [*p.19 / p.20*] and includes spores...as well as pollen of Conifers and Angiosperms. The type collection [was] prepared using the standard procedures of Faegri but some material was acetolysed by the Erdtman method. No pollen was stained. After dehydration the samples were mounted in silicone fluid and sealed. Over 90% of the slides were sealed using beeswax but paraffin wax and clear nail polish were also used. Crystals are now present in almost every slide. Unfortunately we have no record of when this phenomenon was first observed. Their concentration varies but their size is usually about 4-5 micrometers. Where crystals obscured pollen grains we either made new type slides ourselves or bought from Cambridge. The preservation state of the pollen remains good with none of the deterioration observed by other workers." I also asked Ed Cushing to share his experience on this subject, and he sent this very enlightening article. SEALING COMPOUNDS FOR SLIDES OF POLLEN IN SILICONE OIL Edward J. Cushing Department of Ecology, Evolution & Behavior University of Minnesota 1987 Upper Buford Circle, St. Paul, MN 55108. e-mail: cushing@vx.cis.umn.edu. One of the advantages of silicone oil as a mounting medium for pollen analysis is its stability over time. Its volatility is very low, so slides may be left unsealed to permit greater control over turning grains for examination, measurement, and photomicrography. For reference slides that receive much handling, however, sealing the coverglass is desirable to prevent leakage of the oil. Unfortunately, many common compounds used for sealing slides react over time with pollen exines mounted in silicone oil. The experience in our laboratory with various compounds is summarized here, followed by our current recommendations. Fingernail polish was the first sealing compound we used. Pollen grains in slides sealed with nail polish begin to deteriorate noticeably after several months to several years. We first called it "nail-polish disease," but now I suggest "pollen pox," because the first symptoms are pits on the ektexine surface that resemble the pock marks of chicken pox and smallpox. These are especially noticeable on grains with fine sculpturing and thick exines, and they often appear first on the grains of Corylus that we add as a control to our reference preparations, following the recommendation of Faegri & Iversen (1964). With time, the exine softens and becomes plastic, as may be demonstrated by squashing a grain between coverglass and slide. Fine sculptural and structural details disappear. Ultimately the exine becomes completely amorphous, apertures disappear, and the grains round up into hollow or solid spheres. Early in the process the grains begin to adhere to the slide or coverglass and will no longer move when pressure is applied to the coverglass, which provides a useful test of quality. Most slides we sealed with nail polish deteriorated within three years, although a few are still serviceable after 30 years. We have since tested a variety of sealants, including various brands of fingernail polish, epoxy cements, silicone rubber cements, casein glues, varnishes, lacquers, waxes, and paints. Most have failed for various reasons, of which pollen pox is but one. An ideal sealing compound for pollen slides in silicone oil should (1) adhere firmly to glass, (2) be sufficiently fluid to flow under the coverglass and displace any air between the silicone oil and the edge of the coverglass, (3) be immiscible with silicone oil, (4) harden within a few hours with little shrinkage, (5) have no ingredients that will dissolve in silicone oil when the sealant is fluid and later crystallize out in the oil, (6) have no ingredients that will react with sporopollenin (as in pollen pox), (7) be stable over many years, (8) resist, after hardening, objective immersion fluids (immersion oil, anisol) and other solvents (water, xylene) used to clean slides, (9) be easy to mix and apply. No compound that we have tested meets all these criteria. One proprietary brand of epoxy cement, no longer available, was very satisfactory. Other epoxy formulations failed, but experimentation with this group of compounds should be fruitful. The cause of pollen pox remains uncertain, although circumstantial evidence points to dibutyl phthalate, a plasticizer commonly used in nail polishes, glues, and cements. Dibutyl phthalate does soften and dissolve exines experimentally. A mixture of dibutyl phthalate and silicone oil forms two phases, and pollen grains in the mixture occur preferentially in the dibutyl [*p.20 / p.21*] phthalate phase. In sealed slides, I presume the dibutyl phthalate diffuses from the sealant through the oil to the pollen. Other factors are involved, too. Sporopollenin varies in its resistance to pollen pox; in general, gymnosperm pollen and pteridophyte spores are less susceptible than angiosperm pollen. Slides containing many pollen grains, or large grains, may be little affected, presumably because the concentra- tion of dibutyl phthalate (if that is the causative agent) is insufficient in relation to the quantity of sporopollenin. Grains that are hydrated (i.e., that have a high water content) are less susceptible than grains that are thoroughly dehydrated, perhaps because water molecules occupy the sporopollenin sites that the dibutyl phthalate would bind to. At present I recommend either of two imperfect sealants. Paraffin, applied under the coverglass, makes a neat and effective seal. However, paraffin wax when molten is soluble in silicone oil, and platy crystals of paraffin are likely to appear over time in the slide, sometimes obscuring the pollen grains. This can be minimized by heating the slide only just enough to melt the wax. We place small fragments of paraffin under the edge of the coverglass when it is placed on the slide and melt these over a very small gas flame, with care not to heat the silicone oil directly. Cool the slide quickly. Paraffin waxes with low melting points may be the best, but I have not tested this variable systematically. We are now using with success a commercial brand of latex enamel paint (Tru-Test latex gloss enamel). It has the proper viscosity, shrinks little on drying, and seals well. The paint is water-based, so that pollen in slides sealed with it is fully hydrated. However, the seal dissolves slowly in anisol and xylene and softens in water, so care in cleaning slides is necessary. Slides sealed with this paint four years ago remain in excellent condition, with no deterioration of pollen or extraneous crystals in the preparation. Unfortunately, the composition of the paint is proprietary and may vary from brand to brand and over time within a brand. The risk is that one may make thousands of slides with a particular sealant only to find, after several years, that they are deteriorating. If others have experience with other kinds of sealants, I would be glad to hear about them. SLOTDEEP.EXE: MANUAL CORRELATION USING THE DISSIMILARITY MATRIX Louis J. Maher At the end of a paleoecological investigation, the results have to be correlated with other sites to see how the new information ties in with the old. The methods by which correlation is done are extremely varied depending on which part of the geologic column is involved and what materials were studied. Those working in Holocene sediments may not have extinctions or first occurrences to help them, but pollen and diatoms do occur in prodigious numbers, and Carbon-14 is abundant throughout. The standard pollen zonation schemes of Europe and North America emphasize that the sequential changes in taxon composition in a single core are mirrored in other cores of the region. Before the middle of this century, zone boundaries in cores "dated" the sediment as surely as Carbon-14 does now. Pollen can still provide chronologies for material unsuited to carbon analysis. Anyone can recognize a sequence of pollen zones when they are pointed out on a diagram, but almost no two workers will agree exactly where one zone ends and another begins. This shows up in terms like "transition zone", "sub-zone", and "telescoped zones" as well as "veteran researcher" and "neophyte". This sort of problem can be explored by numerical analysis with the computer. Where can a sequence be divided what will yield two adjacent parts that are most alike in themselves and most unalike between themselves? The zone boundaries are the datable entities, and this explains the efforts to define them. I was first introduced to the idea of using numerical analysis to correlate two sites by "slotting them together" from the work of John Birks (1979) and Alan Gordon (1973, 1980). The ambiguous results I obtained with their FORTRAN program for sequence slotting, SLOTSEQ (Birks, 1979, Appendix 2), led me to develop SLOTSEE, a QuickBASIC program with graphics which shows the user the original diagrams, the slotted result, and a "map" of the dissimilarity matrix the computer algorithm uses to do the slotting. I needed to "see" the data to try to understand what the algorithm was doing, and why it sometimes produced strange results. It was through SLOTSEE that I first met (via e-mail!) Malcolm Clark, who told [*p.21 / p.22*] me how he was addressing the problem of "blocking" (slotting where long sequences from one core are inserted between long sequences of the other) and developing the H-matrix concept. As a result of that contact, I asked him to do two articles for the newsletter (Clark, 1992 & p. 5-10 - this issue) to discuss the problems and their solutions. I have come to realize there is a paradox involved when correlating by pollen zonation and correlating by slotting. Pollen zone boundaries are potential points of correlation; pollen zones often prevent effective slotting. Malcolm Clark ( see p. 7 ) suggests the slotted route's best "com- bined path length" (CPL) through a dissimilarity matrix is analogous to a river running in a valley; the slotting path is sure when the valley is deep, and ambiguous when the valley widens to form a lake. In these terms, we can think of pollen zones as matrix lakes. The pollen zone, by definition, is a sediment sequence where the taxon abundances assume characteristic relative values that remain stable for a time. Almost any route through the zone (lake) would yield a very similar CPL. The actual CPL that is shortest might result more from minor fluctuations owing to random counting error than to contemporary vegetation change in the landscape. One of the problems with SLOTSEQ and SLOTSEE is that they rank the sediment sample sequence from 1 to n, but they discard the actual depth spacing of the samples in the sediment. It takes time to accumulate sediment, and--lacking independent information to the contrary--samples spaced farther apart should differ more in age than those situated close together. Surely that kind of information is too valuable to ignore. The silhouette diagrams drawn on the screen in SLOTSEE may look like pollen diagrams, but they are quite schematic; the samples are spaced evenly from top to bottom. That they look like pollen diagrams simply reflects the fact that most of us choose a sample interval and stick with it. The matrix map produced by SLOTSEE plots the samples from two sites in correct sequential order, but it also ignores their actual depths and spacing. SLOTSEE does not even read the sample depths from the sites' data files. I tried to remedy this situation by developing SLOTDEEP. This program is a superset of SLOTSEE. The user has the choice of the same three measures of dissimilarity (Manhattan Metric, Chord Distance, or 1-Spearman Rank). It contains the same Gordon algorithm (Birks, 1979), and it calculates the same results in the automatic mode. However SLOTDEEP retains information about the sample stratigraphy and plots the pollen samples at their correct depth rather than merely in serial order--hence the title SLOTDEEP. The dissimilarity matrix also retains the samples' depth separation. One has the important option of using the dissimilarity matrix to correlate the two diagrams manually. The matrix map displays quantitatively the degree of dissimilarity--depending on the particular measure used-- between each sample in one core with each sample in the other. The dissimilarity of the core tops is plotted at the "northwestern" corner of the matrix map, and the core bottoms plot at the "southeastern" corner. The values of low dissimilarity tend to plot in a northwest to southeast trend when comparable cores are graphed (Malcolm Clark's river valley). This presentation of the pollen data differs markedly from the diagrams we normally use in picking zone boundaries; one is less likely to be influenced by the biases often brought to that task. I will give an example of how SLOTDEEP may be used to correlate two pollen sites from eastern Wisconsin. The sites are separated by 18 km; the pollen counts were done 15 years apart by different analysts. The Ernst Brothers Quarry Site (Maher, 1970) was sampled in 1965 from a section exposed in a sand pit. Stumps of Picea and Larix rooted in till and outwash gravel had been buried first under pond sediments and later by peat. The sequence extended from an arbitrary "floating zero datum" in the peat to a depth of 370 cm in the gravel. Pollen was recovered from 14 samples in the interval from 50 to 275 cm, extending from the peat to the soil of the lower stump layer. The quarry was excavated and closed by the early 1980's, but at least two carbon dates are available for wood from the lower stump layer: 12,410+100 BP (WIS-347) and 12,500+120 BP (ISGS-75). The Radtke Lake Site (Webb, 1987) is based on an 835-cm core of lake mud. Seven carbon dates were obtained from the core. The lowest interval from 826-835 cm yielded an age of 11,460 + 580 BP (GX7893), which was considered to represent 11,290 BP (Webb subtracted 170 years from each date in an [*p.22 / p.23*] attempt to correct for hard-water error.). The first pollen site loaded into SLOTDEEP is considered the subordinate site about which less is known; the better known principal site is loaded second. I will define the short Ernst Brothers Quarry sequence the subordinate site; only its base is dated. Radtke Lake, with its long record, is the principal site. The pollen sum is composed of 17 anemophilous taxa. Once the *.DAT format data files are read and the chord distance is calcu- lated, the following menu appears: 1. SHOW Diagrams of Original Sites 2. SHOW Correlation suggested by SLOTSEQ 3. SHOW Matrix Map 4. ** CORRELATE MATRIX MANUALLY ** 5. * SHOW MANUAL SLOTTING * 6. SAVE SLOTSEQ results to PRINTER 7. CHANGE Dissimilarity Coefficient 8. CHANGE Screen Colors 9. **CHANGE THE SITES** E. EXIT Press 1 - 9, or E Pressing 1 displays the original diagrams (Fig. 1). The subordinate site is plotted above the principal site. The ticks on the depth scale are in meters. Nine minor taxa are shown combined in the right column to save space, but all 17 taxa contribute to the Chord Distance. Pressing 2 or 3 (and 6 - E) produces essentially the same results as SLOTSEE. [Fig 1 and 2 on p. 23] But if you press 4 to manually correlate the sites using the matrix map, you will see the [*p.23 / p.24*] screen shown in Fig. 2. This represents the "exploded" matrix map which shows with correct stratigraphic spacing, all points with Chord Distance less than a stipulated value; here, 0.5. Radtke Lake uses the Y axis; the horizontal lines represent core depth in meters increasing from top to bottom. The subordinate Ernst Brothers site is shown on the X axis, and the vertical lines show its depth in meters, increasing from left to right. The cross-hair cursor in Fig. 2 is shown at a depth of 150 cm in Ernst and 400 cm in Radtke. The coordinates of the cursor can always be read at the Cursor Indicator at the bottom right of the screen. The cross-hair can be moved about the matrix with the arrow keys. A shifted arrow key increases the speed of the cursor. The "Home" key moves the cursor to the top samples at the upper left, and the "End" key moves it to the lowermost samples at the bottom right. The Chord Distances are shown on the screen in spectral colors ranging from white and red (very low dissimilarity; that is, highly similar) through yellow, green, blue, and purple which are less similar. One moves the cross- hair cursor to the colored points that are judged to correlate best and then presses the F1 key to "set" the point. A copy of the cross-hair is anchored at that position, and the "Point Indicator" at the lower left in incremented by one. When two or more points of correlation are established, a heavy yellow line can be fit to the selected points--either by linear segments (press the F5 key) or by a cubic spline (press the F6 key). Pressing the "S" key shows the route the SLOTSEQ algorithm has selected as best; this may be helpful in selecting the points for manual correlation. Fig. 3 shows the screen with the SLOTSEQ solution indicated by the thin line that steps from the upper left to the lower right. After 10 points of correlation were manually selected with the F1 key, the "Home" key was pressed to move the cross-hair out of the way to the top position (50 cm in Ernst and 0 cm in Radtke). Pressing the F6 key then used a cubic spline function to connect the correlation points with a heavy line. The line of correlation follows SLOTSEQ's solution rather closely accept at the upper and lower parts of the trend. The F5 key will fit the points with linear segments. (You can cycle between the F5 and the F6 keys; the last one you press before choosing the F10 or "Q" key to Quit and return to the menu will be the one the program uses.) Pressing 5 (Show Manual Slotting) will display the two pollen diagrams slotted together (Fig. 4) which allows you to judge the success of the correlation. The user has the option of saving the results to disk as an ASCII text file in which the two diagrams' depth sequences are plotted side by side with the subordinate site's samples plotted in the depth units of the principal site. The subordinate site's original depths are used as labels as suggested in the following abridged file generated from the correlation shown in Figures 3 and 4: Results of SLOTDEEP.EXE 12-13-1992 Subordinate Site is Ernst Bros, Ozaukee Co, WI Principal Site is Radtke Lake, WI (Sara Webb) DC was Chord Distance Fit was Spline Subordinate Site's Depths are those correlated to the Principal Site. (Its actual depths are shown in brackets.) Radtke Ernst 0 16 ... 592 608 610 [ 50 ] 624 627.7 [ 80 ] 637 640 641 [ 110 ] 643 656 672 673 [ 130 ] 688 690 [ 150 ] 704 706 [ 180 ] 720 736 737 [ 200 ] 766.3 [ 220 ] 768 800 802 [ 240 ] 816 826.3 [ 260 ] 832 832 [ 265 ] 838.4 [ 270 ] 841.3 [ 272 ] 846.1 [ 275 ] The user also has the option of making a work copy of the subordinate site's file wherein the samples' depths are converted to their equivalents in the principal site. Note that if the principal site's samples had been converted to their estimated ages by the use of DEP-AGE.EXE (Maher, 1992), this option would in effect convert the subordinate's depths into age as well. [*p.24 / p.25*] [Fig 3 and Fig 4 on p. 25] [*p.25 / p.26*] SLOTDEEP.EXE requires a color monitor and is supplied in versions for either the EGA or VGA graphic screen. Both versions (in self-extracting files with some example data) are available for anonymous ftp from the /pub/inqua directory of geology.wisc.edu. The VGA version is named SLOTDEPV to differentiate it from the EGA version; SLOTDEPV can be renamed SLOTDEEP when it is extracted. References. Birks, H.J.B. 1979. Numerical methods for the zonation and correlation of biostratigraphical data. 99-123 + Appendix 2, (15 p; the SLOTSEQ.FOR listing appears on 13-15 of Appendix 2). In Bjorn E. Berglund, Ed. Vol I. General Project Descriptions. Subproject B: Lake and Mire Environments. Project 158: Palaeo-hydrological Changes in the Temperate Zone in the Last 15,000 Years, International Geological Correlation Programme. Lund, Sweden. 143 pp + 2 Appendices. Clark, Malcolm. 1992. Sequence comparisons and sequence-slotting. INQUA - Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 8:3-6. Gordon, A.D. 1980. SLOTSEQ: a FORTRAN IV program for comparing two sequences of observations. Computers and Geosciences 6, 7-20. [The 1980 version differs from somewhat from the version listed in Birks (1979) that is used in SLOTSEE.EXE and SLOTDEEP.EXE.] Gordon, A. D. 1973. A sequence-comparison statistic and algorithm. Biometrika 60, 197-200. Maher, Louis J., Jr. 1992. Depth-age conversion of pollen data. INQUA - Commission for the Study of the Holocene, Working Group on Data-Han- dling Methods Newsletter 7:13-17. Maher, Louis J., Jr. 1970. Two Creeks forest, Valders glaciation, and pollen grains, p. D-1 - D-8. In Black, R. F. et al., Pleistocene geology of Southern Wisconsin. Wisconsin Geological and Natural History Survey Information Circular No. 15, 175 p.. Webb, Sara L. 1987, Beech range extension and vegetation history: pollen stratigraphy of two Wisconsin lakes. Ecology, 68(6):1991-2005. FIRST AID FOR TILIA AND PALYPLOT USERS by Dr. Triage The Newsletter's length is keeping my column short again this issue. Please send or e-mail questions to the newsletter coordinator saying they are for Dr. Triage. TILIA and TILIAłgraph Dr. Triage: I have had odd things happen when I enter TILIAłgraph from TILIA's main menu with my new IBM clone with American Megatrends Bios. My *.TGF file has been scrambled, the GSS drivers have been disabled, and the system may lock up. On one memorable Sunday afternoon the hard disk crashed. What can I do? Help! Dear Help: Perhaps you should not work on Sunday! But seriously, Eric Grimm informs me there appears to be a problem with how the Borland-compiled version of TILIA/TILIAłGraph interacts with some clones with the American Megatrends bios. For the time being, he suggests that after you work with your TILIA files, you should exit TILIA and start TILIAłgraph by typing 'tg' from the DOS prompt. Lou Maher mentioned to me that he too has a clone with the American Megatrends bios, and he had some trouble once after exiting TILIAłgraph; his computer forgot it had a hard drive, and it had to be retold the drive type, etc. Lou did not lose any information, and the problem has not occurred since...even when going from TILIA to TILIAłgraph and back. D.T. Dr. Triage: I work with varved sediment, and I want to show pollen abundance with silhouette bars that differ in thickness the same way my varves do. Can TILIAłgraph do that? A. Varvephile. Dear A. Varvephile: Yes it can, but with some trouble on your part. You need to set up TILIA with TWO SAMPLE COLUMNS FOR EACH VARVE. You can do this in TILIA, but it may be easier to use your spreadsheet and then import it into TILIA (see my column on p. 18 of Newsletter #7, January 1992). [*p.26 / p.27*] The two columns exactly duplicate the pollen numbers in the varve, but differ in their depths. Assume that a thick varve extends from 10.0 to 11.0 cm in the core, and it is underlain by two thin ones: one extending from 11.0 to 11.1 cm and the other from 11.1 to 11.25 cm. Set up your data so the depth for the top varve is listed as 10.0 cm in the first column and 10.999 cm in the second. The next two (duplicate) columns should have depths of 11.0 and 11.099; The last two (duplicate) columns' depths should be 11.1 and 11.249. As you can see, the depth separation between the base of one varve and the top of the next should be specified to 3 decimal places. You will undoubtedly have more than three varves, and it will take you more time, but the principle remains the same for the others. After you load the file into TILIAłgraph, use the following sequence of menus: [E] Graph style | [A] All graphs | [A] Silhouette | [A] Fill pattern | [B] Hollow. When you make the diagram and view it, I think you will have what you want. D.T. PALYPLOT Dr. Triage: I have been trying to contact Craig A. Chumbley to order a copy of PALYPLOT. I was unable to reach him at the address listed in Newsletter #5, January 1991. What's Up? Dear W. Up: I am sorry to report that Craig Chumbley has decided to leave the field of paleobiology. The recent series of NY State budget crises apparently convinced him there was no future there, and he is now into computer aspects of educational testing in the Midwest. Before he left he provided us with an updated version of PALYPLOT which works with Generic Cadd Level 3, Cadd 5 and Cadd 6. He asked us to make it available by anonymous ftp, and it is now in /pub/inqua in geology.wisc.edu (see also p. 18 of this Newsletter). Craig said that if anyone has problems with PALYPLOT, they should contact me by way of maher@geology.wisc.edu. E-MAIL ADDRESSES (I use the term in the general sense; it includes those on Internet, NSFNet, Janet, Bitnet, etc. A line ending with a trailing low dash [ _ ] indicates the address continues without a space on the line below.) Jim Almendinger jdinger@vz.cis.umn.edu Brigitta Ammann ammann@sgi.unibe.ch Pat Anderson pander@u.washington.edu Kathy Anderson kathya@brownvm.bitnet John Andrews andrews_jt@cubldr.colorado.edu Dick Baker dick-baker@uiowa.edu Carlos A. Baied gg_cab@selway.umt.edu Philip Barker p.a.barker@lut.ac.uk Pat Bartlein bartlein@oregon.uoregon.edu bartlein@oregon.bitnet Rick Battarbee ucfamar@ucl.ac.uk Pat Behling pbehling@vms.macc.wisc.edu Keith Bennett kdb2@phx.cam.ac.uk Bj”rn E. Berglund bjorn.berglund@geol.lu.se John Birks birks@cc.uib.no R. Bonnefille azerty@frmop11.bitnet Richard Bradshaw orskage@gemini.ldc.lu.se John Brew J.Brew@vax.rhbnc.ac.uk [*p.27 / p.28*] Linda Brubaker lbru@u.washington.edu Ian D. Campbell icampbell@nofc.forestry.ca Gail Chmura chmura@mgm.lan.mcgill.ca Malcolm Clark rmc@monu1.cc.monash.edu.au Ed Cushing cushing@vx.cis.umn.edu Les Cwynar cwynar@unb.bitnet Owen Davis palynolo@vms.ccit.arizona.edu Walter Doerfler guf12@rz.uni-kiel.dbp.de Mary E. Edwards ffmee@alaska.bitnet Scott Elias elias_s@cubldr.colorado.edu Dan Engstrom dre@umnacvx.bitnet John Flenley j.flenley@massey.ac.nz Jesse Ford fordj@ucs.orst.edu David R. Foster dfoster@lternet.washington.edu dfoster@lternet.bitnet Marie-Jos Gaillard mjgl@gemini.ldc.lu.se Konrad Gajewski gajewski@acadvm1.uottawa.ca Lisa Graumlich graumlich@arizrvax.bitnet David Green david.green@anu.edu.au Eric Grimm grimm@denr1.igis.uiuc.edu Elisabeth Gr”nlund mg@joyl.joensuu.fi Joel Guiot lbhp@frmrs11.bitnet Margret Hallsdottir mh@rhi.hi.is Sandy Harrison nguva@pax.uu.se Linda and Cal Heusser heusser@acf1.nyu.edu Sheila Hicks hicks%oygeol@figbox.funet.fi Richard A. Hodkinson umfbco5@vaxa.cc.imperial.ac.uk Geoff Hope gxh411@coombs.anu.edu.au Brian Huntley brian.huntley@durham.ac.uk Tristram C. Hussey io10651@maine.maine.edu George L. Jacobson Jr. and Heather Almquist-Jacobson jacobson@maine.edu Jan A. Janssens jjanssen@umnacvx.bitnet Devra I. Jarvis devra@u.washington.edu Steve Juggins ucfasju@ucl.ac.uk Peter Kershaw geg625n@vaxc.cc.monash.edu.au [*p.28 / p.29*] John C. Kingston 76330.1657@compuserve.com Warren Kovach warrenk@cix.compulink.co.uk 100016.2265@compuserve.com John Kutzbach jkutzbach@vms.macc.wisc.edu Henry Lamb hfl@aberystwyth.ac.uk Gerhard Lang u346@cbebda3t.earn.bitnet Dan Livingstone daltuc@tucc.bitnet Andre Lotter lotter@sgi.unibe.ch Ruud Lutgerink bottema@rugr86.rug.nl Glen MacDonald gmmacd@sscvax.cis.mcmaster.ca Joyce Macpherson jmacphers@kean.ucs.mun.ca Darrel Maddy d.maddy@mail.soton.ac.uk Louis J. Maher, Jr. maher@geology.wisc.edu Vera Markgraf markgraf@vaxf.colorado.edu John V. Matthews matthews@cc2smtp.emr.ca John H. McAndrews docjock@utcs.utoronto.ca Matt McGlone mcglonem@lan.lincoln.cri.nz Peter Minchin prm411@csc.anu.edu.au Fraser Mitchell fmitchll@vax1.tcd.ie Dave Murray fyherb@alaska.bitnet Robert E. Nelson renelson@colby.edu Jonathan Overpeck j.overpeck@omnet.nasa.gov Stephen C. Porter scporter@u.washington.edu Colin Prentice colin@pax.uu.se P.J.H. Richard richard@ere.umontreal.ca Jim Ritchie UK Telephone: (0)823 42434 New e-mail when available John Smol smolj@qucdn.queensu.ca Tony Stevenson tony.stevenson@newcastle.ac.uk Eugene F. Stoermer eugene.f.stoermer@_ ub.cc.umich.edu Alayne Street-Perrott geog2@oxford.vax1.ac.uk Rachel R. Summers s.r.summers@massey.ac.nz Robert Thompson rthompso@usgsresv.bitnet Guus van der Geer gvdgeer@postoffice.utas.edu.au Adam Walanus polslask@plwrtu11.bitnet Ian R. Walker iwalker@admin.okanagan.bc.ca [*p.29 / p.30*] Tom Webb ge710006@brownvm.bitnet Robert S. Webb rsw@mail.ngdc.noaa.gov Mina Weinstein-Evron mrect36@haifauvm.bitnet Marge Winkler mwinkler@vms.macc.wisc.edu Sergei B. Yazvenko bwarner@watdcs.uwaterloo.ca Zicheng Yu yuzi@gpu.utcs.utoronto.ca Mingming Zhou zhouming@acf9.nyu.edu If you wish to be added to the directory, or to correct your present entry, please e-mail to: maher@geology.wisc.edu (If you do not receive my acknowledgment within a few days, best use regu- lar mail as a backup.) The Coordinator is aware that if you have never used anonymous ftp on the Internet, much of what is discussed in this issue may seem like gibberish. But the point is, that it is very useful and easy--once you have gone through the steps on your system. If you do not know how to start, then ask someone at your university or place of business. Those in charge of your computers or your local network or e-mail very likely will help you get started. Even if your lab is not directly on the Internet, you should be able to ask a friend to use anonymous ftp and get you the files or programs you want. L.M.