INQUA Working Group on Data-Handling Methods

Newsletter 10: July 1993

CORESEG: A PROGRAM FOR THE DETECTION OF CHANGES (i.e. DEFINITION OF SEGMENTS) IN A DOWN-CORE PROPERTY

J. T. Andrews
INSTAAR and Department of Geological Sciences, Box 450
University of Colorado,
Boulder, CO 80309
E-mail: andrews_jt@cubldr.colorado.edu

Introduction. A common and fundamental problem in studies of down-core properties, such as the magnetic susceptibility record of Fig. 1, is to decide where there are "breaks" in the particular variable being considered. Such a change could either be in the average value and/or in the variance about that value. Webster examined this problem and proposed a solution to it (Webster, 1973; Webster, 1980); the general problem is also discussed by Davis (1986).


Fig. 1
Fig. 1. Volume magnetic susceptibility data (MS, × 10-5 SI) from marine piston core HU90-023-001, Frobisher Bay, NWT, Canada (Andrews and Stravers, in press). The next two columns show the strength of D2 with window lengths 10 and 20, respectively, and the far column shows the location of possible segment boundaries.
My program CORESEG was initially developed without knowledge of the earlier papers by Webster (1973, 1980), but it turns out to adopt a similar approach with some twists (Andrews and Stravers, in press). It was originally written for a TRS80 computer using standard BASIC; the present version is in use on a MAC SE/30 and LCIII (QuickBASIC).

Details of the program. Appendix I leads you through the questions posed as the program is executed. A major difference between this program and that of Webster (1980) is that CORESEG uses a lot of code that was written for Exploratory Data Analysis (EDA) and published in 1981 (Velleman and Hoaglin, 1981). Thus CORESEG uses the median as the measure of central tendency and derives an estimate of the standard deviation from the spread between the upper (L3) and lower hinges (L2) of the sample (i.e. the 25 and 75 percentiles). The standard deviation for a series is estimated from:

s = (L3-L2) * 1.349

The program computes the median (md) and the estimated standard deviation (s) for a forward (f) and backward (b) window of length l and then derives a generalized distance measure D2 which is computed as:

D2 = (Mdf-Mdb)2/(s2f + s2b)

Thus one of the decisions that is made during the program's running is the length of the forward and backward window. Of course there is no reason to limit the analysis to a single length, l, and I have found it informative to run a variety of window lengths and then compare the results (Fig. 1). If a Mac is being used, the output of D2 values can be exported to the "clipboard" and then imported to a variety of graphical/statistical packages for plotting and further analysis.

The D2 values are partly related to the length of the window, but a critical question is: which peaks in D2 represent significant changes in the variable and denote the start or end of a new segment? To assist in obtaining a "feel" for the answer to this question CORESEG proceeds to compute a random series with the same mean and standard deviation as the original time-series and then computes a D2 series for this random series. This series can then be plotted as a probability plot (Fig. 2) and a conservative (i.e. high) value for D2 chosen such that the original D2 parameter might be only expected to exceed a particular value 1 in 100 trials. Note that the quasi-random generator available on most personal computers does not generate a new random time-series on each reiteration unless a new seed is specified


Fig. 2
Fig. 2. Probability plots of D2 values for window lingths 10 and 20 (see Fig. 1) and for a random time-series with the same mean and standard deviation (HU#001 RND D2). On the right are box-plots of the D2 parameters.
The program is written so that one transform (a log transform) is an option; frequently magnetic susceptibility is plotted on a log scale. In addition, the first difference operator can be called (see Appendix 1) and used to investigate the importance of a trend in the data (Fig. 3). The various measures of D2 can be saved to disk as can the random series and its D2 series.
Fig. 3
Figure 3. Magnetic susceptibility (same units as Fig. 1) for HU90-023-022 from Breevort Basin (Andrews and Stravers, in press) and the D2 plot for these data after the first difference operator had been used; compare the upper graph of D2 (on the 1st difference) versus the comparison between it (solid line) and D2 with a window of 10 (lower figure, dashed line).
In the original version a variety of output was routed to the printer, but in this code modified for the Mac there is no printer output, instead files are written to disk. The data results, shown as three different "Results" (see Appendix 1), include output on the median, the hinges, maximum values etc. In the Mac version these need to be written down as the program proceeds. For those with an IBM then the output can be sent to the printer with the "LPRINT" command.

As with any analytical tool this program is not a panacea for all concerns involved with describing and segmenting a core. However, in my experience it does give a number of useful insights into down-core changes, and it focuses attention on specific levels within the cores.

References.

Andrews, J. T., and Stravers, J. A., in press, Magnetic susceptibility of late Quaternary marine sediments, Frobisher Bay, N.W.T.: An indicator of ice sheet/ocean interactions: Quaternary Science Reviews, v. p.

Davis, J. C., 1986, Statistics and data analysis in Geology: New York, John Wiley & Sons, 646 p.

Velleman, P. F., and Hoaglin, D. C., 1981, Applications, Basics, and Computing of Exploratory Data Analysis: Boston, Duxbury, 354 pp p.

Webster, R., 1973, Automatic soil-boundary location from transect data: Journal International Association Mathematical Geology v. 5, p. 27-37.

Webster, R., 1980, DIVIDE: A FORTRAN IV Program for segmenting multivariate one-dimensional spatial series:Computers and Geosciences v. 6, p. 61-68.

APPENDIX

Line No.  Question
         > Comments or Action
 475  Max. length of record?    
         > Number of items in input file
2440  Data input-AGAIN OR STOP
         > To start, type AGAIN
4540  File name?
         > File name (1 item/line in ASCII)
2565  Use the first diff. operator Y/N?
         > N=as is, Y=removes trend
2590  1. Results on input data
         > Shows median, hinges etc.
2640  Data need a transform Y/N?
         > N=as is, Y=log transform
3940  Proceed with analysis of core seg. Y/N? 
2730  Print out all results,Y=0,N=1
         > Type 1
2740  Treat standardized data Y/N?
         > Ignore, type N
3150  Length of window?
         > Length must be > 3
4920  Storing r2 values as file/r2
4922  File name for d2=?
         > Type File name
3640  2. Results on test statistic
         > Shows median, hinges, etc. of d2
3655  Continue Y/N?
         > Type Y; N takes you to 2440
4000  Results 3 random= 
         > Results shown
4001  File name random nos?
         > Type File name
4005  Continue?
         > Type Y
4920  Storing re values as file/r2
4922  File name for d2=?
         > Type File name for random d2

Copyright © 1993 J.T. Andrews
Home page
Newsletter 10 index
Author index
Subject index
WWW pages by K.D. Bennett