INQUA - COMMISSION FOR THE STUDY OF THE HOLOCENE Working Group on Data Handling Methods Newsletter 2, June 1989 The first newsletter has been well received, so we plan to continue in the same informal way, now with an expanded mailing list. We invite you to submit a contribution, of the same approximate length as what follows here, and in any area that bears on the handling of data used in Quaternary sciences. In this issue you will find contributions on: a proposed new large pollen data bank, and this raises the general question of suitable repositories for and access to such information. You are invited to comment on any aspect of this larger question, such as proprietary use of banked data, etc., for a future issue; an interesting museum based data base; a new data storage and handling system for limnological information about diagnostic analysis programs for tree rings; and an update on the POLSTA package from Canberra. Please direct any contributions to Jim Ritchie, University of Toronto, Scarborough College, 1265 Military Trail, Scarborough, Ontario. M1C 1A4, BITNET Ritchie@vm.utcs.utoronto.ca, FAX (416) 284-3371. Workshop Leading to Creation of a European Repository for Late-Quaternary Pollen Data. August, 1989, Lund, Sweden. George L. Jacobson, Jr. During the past few years the scientific benefits of computerized databases have become increasingly clear. Most palynologists are familiar with the interesting developments that have come as part of the COHMAP project in North America, where data have been collected at Brown University. Pollen data from Europe constitute an equally valuable scientific resource for Quaternary palynologists in connection with IGBP Global Change initiatives during the next few years. In anticipation of the need for accumulated data, a small group of colleagues (George L. Jacobson, Maine, USA; Bjorn E. Berglund, Lund, Sweden; Brian Huntley, Durham, UK; Joel Guiot, Marseille, France; Eric C. Grimm, Illinois State Museum, USA) has initiated a process designed to establish a European repository for Quaternary pollen data. This initiative is an outgrowth of independent plans by IGCP Project 158B and European Community researchers to organize data from Europe. Coordinated efforts are important because the EC project will have access to data mostly from western and southern Europe, whereas the IGCP group is concerned with northern and central Europe, along with the eastern-block nations. This planning effort must, of course, involve a large number of palynologists from throughout the continent. Accordingly, organizers have arranged for a workshop that will provide a forum for reaching a collective agreement on the many issues surrounding such a database. The workshop is scheduled for Lund, Sweden, 24-27 August 1989, and will have as participants colleagues from as many European nations as possible. Financial support for the workshop has been requested from the Swedish Natural Sciences Research Council, the Royal Swedish Academy of Sciences, the European Community, and the U.S. National Science Foundation. Although responses are not yet forthcoming, it is hoped that a portion of the travel costs and expenses of workshop participants will be covered. The success of the database depends in large part on the goodwill and support of colleagues from throughout Europe. Input from all interested colleagues is earnestly requested. Plans resulting from the workshop will be widely circulated among colleagues for additional suggestions and support. Specialized "user friendly" computer software for handling pollen data will also be distributed to all active laboratories. [*p.1 / p.2*] For more information contact: George L. Jacobson, Jr. Institute for Quaternary Studies Boardman Hall University of Maine Orono, ME 04469, USA Computerized cataloguing: Paleobiology Division, National Museum of Natural Sciences, Ottawa. K.M. Shepherd The computerization of a museum collection is never an easy task. The initial struggle with hardware, software, data formatting and the odious task of data entry is a great incentive to stay with a traditional manual file system. Eventually, however, your collection reaches a point where its size dictates computer cataloguing to perform the most elementary of collection management functions such as basic inventories. At Paleobiology, we reached that point about three years ago. Even a basic inventory took weeks to perform, hence the decision was made to finally computerize our catalogue records. We were indeed fortunate to have access to the Canadian Heritage Information Network (CHIN), a museum collections inventory programme administered by Communications Canada. We had access to a mainframe with 7 gigabytes of memory and a dedicated support staff who did much of the initial work to establish our database, including a formal training programme for staff. CHIN allowed us to computerize our records without the often complex process of selecting software, since they already had established a successful system using PARIS software. The only hardware required was a simple terminal and a 1200 baud modem. All the communications such as data lines were handled by CHIN. By October of 1986, our communication line, hardware and individualized software requirements had been established. We had the potential to access over 250 fields of information for each specimen. These fields included taxonomic, morphologic, geologic, locality and a myriad of other descriptive and collection management data (see sample attached). One element of the project that CHIN did not assist us with was the lengthy process of data entry. I initiated this task and continued it through the effort of volunteers and student interns. In Ottawa, we are fortunate to have access to student interns from the museology programme at Algonquin College, and their dedicated efforts are invaluable to the eventual completion of this project. To date, we have catalogued over 50% of our vertebrate fossil collections and we hope that the project will be completed within two years. Once the system is in place, it should prove to be a valuable collections and research tool, not just for our institution, but others as well. This is possible because of the establishment of the Natural Sciences data base which allows other research institutions mutual access to each other's data bases. Overall, we are pleased with our cataloguing process. It promises to be a valuable tool for us now and in the future. DOCUMENT 1 PARIS NUMBER 1418 USER ID NMQS1 DATE OF BIRTH 870107 DATE OF CHANGE 881228 NATIONALPARIS# 3001418 DESTINATION DB SNDB DEST.CONTROLFIELD 0 INSTITUTION NMNS DEPARTMENT PALEOBIOLOGY DISCIPLINE QUAT.ZOOLOGY CLASS Mammalia ORDER Artiodactyla FAMILY Bovidae GENUS Bison SPECIES priscus SPECIMEN NATURE Bone SPECIMENPOSITION right SPECIMEN NAME Hornsheath PARTIALSPECIMEN Fragment [*p.2 / p.3*] COLLECTOR Rampton,Fyles DATE COLLECTD 19690000 ACQUISITIONDATE 19700606 MODEOFACQUISITION Transfer ACCESSIONNUMBER 00429 ACCESSION DATE 19700615 CATALOGUENUMBER NMC17505 PREVIOUS NUMBERS 85 ROV CATALOGUER Shepherd,K.M. AGE/STAGE Adult DATING TECHNIQUES C14 ISOTOPIC DATING 1810+/-90yrBP LAB NUMBER/CODE Isotope 5407 ORIGIN-COUNTRY Canada ORIGIN/PROVINCE/TERR NorthwestTerr LOCALITY NAME BaillieIsland LOCALITY DESCRIPTION From beach at east edge of Baillie Is. LATITUDE 753500N LONGITUDE 1280800W MAP REFERENCE 107E/09 SOURCE Geo.Survey of Canada K.M. Shepherd, National Museum of Natural Sciences, Ottawa, Canada The PIRLA DataBase Management System Donald F. Charles, John P. Smol, Allen J. Uutala, P. Roger Sweets and Donald R. Whitehead The PIRLA project (Paleoecological Investigation of Recent Lake Acidification) is an interdisciplinary paleoecological study of the effects of acidic deposition on aquatic ecosystems (Charles and Whitehead, 1986a). Three questions were addressed: Have lakes acidified? If so, when and to what degree? What were the probable causes? The project was large and complex, and required a sophisticated database management system to store and maintain the large amount and variety of data generated. The study has involved stratigraphic analysis of approximately 35 lakes within four regions of the U.S. (Adirondack Mountains of New York; Northern New England; Northern Minnesota, Wisconsin and Michigan; and Northern Florida). The data were collected by over 30 investigators from eight academic institutions. Sediment characteristics analyzed include diatoms, chrysophytes, total metals, sequentially extracted metals, total sulfur, carbon, hydrogen and nitrogen, sulfur isotopes, polycyclic aromatic hydrocarbons, coal-soot, oil-soot, pollen, 210Pb activities and calculated dates, and other characteristics. In addition, there are data on location and characteristics of the lakes and their watersheds, water chemistry, and diatom counts on nearly 200 calibration lakes, a master list of diatom and chrysophyte taxa, and ecological data on the taxa. The existence, quality, and flexibility of this database contributed significantly to the development of subsequent paleoecological projects. These projects were built upon the foundation provided by the data, data structure, techniques, and programs already existing in the PIRLA DataBase Management System. The Paleoecological Investigation of Lake Acidification in the Sierra Nevada had the same goals and used the same techniques as the original PIRLA project. Centered at Indiana University (Mark Whiting, Donald R. Whitehead), the project created diatom-pH transfer functions for lakes in the Sierra Nevada and reconstructed pH and ANC levels from five lakes, including a long core spanning one lake's full postglacial development. Although not originally designed for long-term reconstructions, the database was easily modified to handle these new demands. With the PIRLA II project, we are making regional estimates of lake acidification through analyses of lake sediment cores from 37 randomly selected lakes in the Adirondack Mountains of New York. Diatom and chrysophyte inferred pH is being calculated for the tops (recent; 0-1 cm) and bottoms (pre-1850; >30 cm) of the sediment cores. Numbers of Adirondack lakes that have changed in pH will be estimated using statistical extrapolation procedures developed [*p.3 / p.4*] by the U.S. EPA. Another component of the PIRLA II project is the determination, for a non-random set of lakes, of trends in pH and ANC since 1970, when atmospheric deposition began to decrease in the northeastern U.S.A. This is being accomplished through close interval (0.25 cm) analysis of surface sediments. Research on the above projects is centered at Queen's University. Another aspect of PIRLA II investigations involves the stratigraphic analysis of ten Florida lakes. Calculation of inferred pH, ANC, DOC, and investigation of the effect of lake-level change will be done for these sediment cores. Eight cores will have only the 'tops' and 'bottoms' analyzed, and two will have extensive stratigraphic analyzes performed. These lakes are also the site for extensive hydrological research and are the focus of attempts to model pH, lake-level, and other changes. Hindcasts of lakewater parameters resulting from modeling and paleoecology will be compared. The Florida program is centered at Indiana University (P. Roger Sweets). The PIRLA DataBase Management System (DBMS) has been developed using the Scientific Information Retrieval/DataBase Management System (SIR/DBMS) computer software program (SIR, Inc., 5215 Old Orchard Brook Road, Suite 800, Skokie, Illinois 60077, U.S.A.). The SIR/DBMS package will handle large and diverse amounts of data, and it will run on a wide variety of mainframes and microcomputers. Vendors of the software offer courses on the use of SIR/DBMS. Databases can easily be transferred to other computer systems. The database is used primarily for storage and maintenance of data, which can be accessed in a variety of ways. Simple analysis, including basic statistics, is done within the SIR/DBMS system. Desired sets of data are retrieved using programs written in PQL (Procedural Query Language), which is provided as part of the SIR/DBMS system. Such retrieval programs can be written to output data sets for use directly in SAS, SPSS or BMDP for more sophisticated statistical analysis. The PIRLA DBMS is currently maintained at three locations: the Department of Biology, Queen's University (Laboratory of Dr. John Smol, Dr. Allen Uutala - local database administrator (DBA)); Department of Biology, Indiana University (Laboratory of Dr. Donald Whitehead, P. Roger Sweets - local DBA)); and U.S. EPA Corvallis Environmental Research Laboratory (Dr. Donald Charles, Jeremy Smith - local DBA). All three sites use IBM compatible computers with either an 80286 or an 80386 CPU, a math coprocessor, and hard disk memory storage. Hard disk requirements vary with the application. The current PIRLA DBMS occupies about 40 MB. The installation at Queen's University is now the central database. Allen Uutala is responsible for entering and checking consistency of new data, making necessary modifications to the database structure, and adding new retrievals. Updated versions of the database are distributed to the other two sites using a cartridge magnetic tape backup system (Excell Stream-60 Tape Backup System, Everex Systems, Inc., 48431 Milmont Drive, Fremont, CA 94538, U.S.A.). A database activity log is maintained using Memory Mate (Broderbund Software, Inc., 17 Paul Drive, San Rafael, CA 94903, U.S.A.), a freeform text database software program. Retrievals of large sets of data can run for hours so the 80386 CPU and a fast hard disk are highly recommended. At least one of the floppy disk drives should have high capacity capability, e.g. for either 1.2 MB 5-1/4" or 1.44 MB 3-1/2" diskettes. A tape backup system is essential not only for transporting the data base to other sites, but also as a general means of backing up the database should it become corrupted. Some means of recording database activity, such as the Memory Mate program referred to above, is also highly recommended. Such a log is useful in reconciling problems or discrepancies which may arise in the database. [*p.4 / p.5*] The SIR/DBMS system is primarily relational, although data are stored in a hierarchical fashion (e.g. region, lake, core, sediment depth interval, sediment characteristic) and subsets of data types can be networked (e.g., relating diatom counts with diatom taxa names). Data of similar kind (e.g., total metals concentrations or diatom counts) are grouped in "record types". Information from these record types can be retrieved individually or grouped for additional analysis or retrieval. Over 30 special retrieval programs have been written. These are stored within the database management system and can be used interactively to retrieve data for specified lakes, cores, intervals, etc. (e.g., a diatom count or inferred pH values for a particular lake core). Data can be entered as batch files, or record by record using an interactive module within SIR/DBMS. The database can accommodate data for my length of core, from a surface sediment sample to a 15 m "long" core. The source of data, including name and location of scientists who obtained it, and the methods used, are documented in the database. Particular method numbers are documented and cross-referenced in the PIRLA Methods book (Charles and Whitehead 1986b). The database was created in 1984, and has evolved continuously. Initial funding for the PIRLA project and development of the database came from the Electric Power Research Institute (EPRI). Later funding came from the National Science Foundation, and more recently (PIRLA II) through the U.S. Environmental Protection Agency's Aquatic Effects Research Program at the Corvallis Environmental Research Laboratory. There are several major advantages of having all the data in one large database. First, it requires that all data be in a consistent format, in the same units, with the same number of decimal places, and that missing values be specified. This consistency allows data comparisons to be made easily, both within and among data sets for one region, project, or data type. For biological data, the taxonomic systems used by all investigators must be the same. Providing adequate documentation and consistency is difficult and time- consuming, but in the long run it has made the database more valuable and easier to use. A final important advantage of the database is that now PIRLA- type paleoecological projects need not budget for the large investment of time and money that went into the original setup of the database, and therefore makes many new projects more feasible. We plan to continue developing the database and adding data. We anticipate the data we now have can be used for a wide variety of basic and applied ecological studies. In addition, we anticipate adding data from future studies and other completed studies, as the data can be generated, or as is necessary for use in particular projects. Questions regarding use of PIRLA data or acquisition of data within the database should be directed to Donald Charles. Specific questions on the software and general use of the database should be directed to Allen Uutala. A PIRLA DBMS manual is available (contact Donald Charles), which describes the structure and content of the database, background on the SIR/DBMS program, the schema (list of variables and format), and example retrievals and output. The PIRLA DBMS (overall framework and retrieval programs) is reasonably well documented at this time. The system can accommodate a wide variety of paleoecological data covering a diverse array of studies. Because many retrieval programs have been written, a new user might be able to adopt our system faster than rewriting programs on their own. The database is still being fine-tuned, however. Not all [*p.5 / p.6*] components are fully documented, and the SIR/DBMS software is sophisticated. While simple tasks are easy to accomplish, full use of the power of the database requires a commitment to learn its systems. A full or half-time database administrator has been employed for the duration of the PIRLA and PIRLA-related projects. Questions on potential acquisition and use of the system should be directed to Allen Uutala. Data in the PIRLA database is still being analyzed by PIRLA investigators. With permission of the data generators, however, some of the information can be released for analysis by others. The PIRLA DBMS is one of the largest database management systems for paleoecological data, currently holding about 150,000 records. The system is capable of handling a wide variety of complex data types, is quite flexible, expandable, and transportable. It will interface with a variety of statistical analysis programs. The system is being further developed with the goal that it can be used by investigators planning and involved with other paleoecological studies. References: Charles, D.F. and D.R. Whitehead, 1986a, The PIRLA project: Paleoecological Investigation of Recent Lake Acidification. Hydrobiologia 143:13-20. Charles, D.F. and D.R. Whitehead (eds.), 1986b, Paleoecological investigation of recent lake acidification: Methods and project description. EA-4906 Res. Project 2174-10, Electric Power Res. Inst., Palo Alto, CA. Donald F. Charles John P. Smol and Allen J. Uutala Corvallis Environmental Research Lab Department of Biology 200 SW 35th Street Queens University Corvallis, OR 97333 Kingston, Ontario U.S.A. CANADA K7L 3N6 P. Roger Sweets and Donald R. Whitehead Department of Biology Indiana University Bloomington, IN 47405 Diagnostic analyses of Tree-Ring Data Hal Fritts I have prepared a high-capacity disk with two programs and the corresponding data sets, and they form the basis of my forthcoming text "Reconstructing Large-Scale Climatic Patterns from Tree-Ring data: A Diagnostic Analysis". The mapping programs can be used on several types of printer. One program maps average values of up to 102 chronologies for time periods you can select. Differences can be calculated between two time periods, or individually selected groups of years to perform tests of hypotheses you may make. The other program does the same thing using the annual and seasonal temperature and precipitation reconstructions from 1602-1960. For example. 1) Plot years of large volcanic eruption and subtract them from years of no known eruption. 2) Years of forest fires versus years of no known fire. 3) El Nino years, etc. All of these have provided very instructive results. Read the DOC files for instructions on uncrunching the disks. You will need a hard drive with at least 6 MEG of free space for all of the data on this disk. The manuscript text describes the annual temperature, precipitation, and pressure reconstructions which appear to be the most reliable results. Use the seasonal data on the disks with caution. They have large error terms, because of the effects of other seasons and variables on the annual ring growth. I would not try to interpret values for single seasons or even single years; but if you average a number of years of similar [*p.6 / p.7*] characteristics, the error may be sufficiently reduced to provide meaningful results. This varying reliability is discussed in the text. Temperature is more reliable than precipitation, annual values more reliable than seasonal values, and the reconstructions near the tree-ring grid more reliable than those more distant from the tree-ring grid. Don't trust the reconstructions for Eastern North America or for the Gulf Coast. The annual temperature reconstructions for the central part of the U.S. have some reliability but not the precipitation estimates. Please keep these limitations in mind if you use the disk. Harold C. Fritts, Laboratory of Tree-ring Research, University of Arizona, Tucson, Arizona 85721, U.S.A. Polsta Interactive Time Series Analysis POLSTA is a computer programme designed to simplify the preparation, plotting, and analysis of time-based observations. Short for "Pollen Time Series Analysis", POLSTA is applicable to a wide range of data, including pollen counts, tree rings, meteorological readings, and economic records. It was designed by Dr. David G. Green of the Australian National University to meet three common needs of scientists today. 1. To cope with large data sets, especially those consisting of many separate series of observations; the standard POLSTA package can handle up to 200 series of 500 observations; (versions that accommodate larger data sets can be supplied). 2. To reduce the time needed to perform routine tasks with your data. For instance, using POLSTA it is possible to generate a plotted pollen diagram from a raw counts in a matter of minutes. 3. To make it possible for non-statisticians to use powerful time series statistics and modelling techniques. POLSTA is an interactive program: it "talks" with you as it runs by asking questions or by requesting simple formulae. If you get stuck, you can either ask POLSTA for help or else refer to the user's manual. POLSTA warns you when you make a mistake and keeps back up copies of the data in case of trouble. Major Subroutines * Data Handling Up to 200 taxa and 500 samples. Any input format acceptable. Up to five ancillary series, not to be analyzed, can also be accepted. Data set can be output in any format to disk file thereby eliminating the time needed to set up data appropriately in later sessions. Data series are stored in a direct-access file; simple commands are used to define a workspace containing copies of up to 20 data series at one time. Program can create new data series. * Algebra Create or modify data series in response to simple formulae. Most standard arithmetic operations are possible plus integration, moving or changing segments of data, and generating models such as polynominials, sine waves and random series. * Chronologies Sample ages, and associated sediment deposition rates, can be estimated either by finding the power curve of best fit to known dates, or by fitting straight line segments between them. More complex chronologies can also be constructed by using the algebra routine. * Data Plotting A variety of data plotting is available. Conventional diagrams use depth on one axis and data series (one by one) on the other axis. Scattergrams may be plotted for pairs of series. Data may be plotted as points, lines, bars, or histograms. Rough plots of series can also be printed at the terminal or line printer. * Detrending and Filtering Performed algebraically or by best fits to portions of a series of the appropriate linear, power or [*p.7 / p.8*] logarithmic curve. Special commands are available for filtering series by moving average, autoregression and Nth differences. * Elementary Statistics Means, standard deviations, and extrema can be computed for any segment of the data. * Time and Frequency Domain Analysis Serial and cross-correlegrams, power spectra, and coherency/phase spectra. Select the spectral window and check aliases, bandwidths, and confidence intervals of cycling frequencies detected. * Multiple Regression Linear combinations or values in up to 20 independent series. Non-linear and lagged relationships are found by generating the appropriate independent series algebraically. Observed and estimated values of the dependent series can be plotted and the residuals computed. * Sequence Splitting Search for significant changes in mean and variance - an alternative to zonation in the interpretation of stratigraphic records. POLSTA displays the location of sequence splits and gives statistics for the subsections identified. * Special Purpose Features Some optional routines may be available upon request. Please explain your needs in detail. IMPLEMENTATION Mainframes: DEC-10, VAX series (VAX11-750/780/8700 etc.) PC's: IBM, XT, AT and compatibles (maths co-processor desirable) Plotting devices supported by PC version (enquire re mainframes): Postscript printers (e.g. Apple Laserwriter), HP Laserjet, HP Pen Plotter, IBM Graphics, IBM Proprinter, Epson FX-80, FX-85, MX-80. Information available from: Anutech Pty Ltd., Childers Street ACTON ACT 2601, GPO Box 4, Canberra ACT 2601. Telex AA 62760 Natuni. [ Email addresses (not reproduced) extend onto p. 10. ]