INQUA-COMMISSION FOR THE STUDY OF THE HOLOCENE Working Group on Data-Handling Methods Newsletter 8, July, 1992 NOTE FROM THE COORDINATOR The Newsletter is a long one as contributions came in from all over the world. Geoff Hope describes the South Pacific Pollen Atlas Project. I asked Malcolm Clark to do a couple of articles on sequence-slotting, and the first appears this time. David Goodman and Richard Becker discuss the spectacular PALCAT paleontological database which boasts image storage and retrieval. Eric Grimm introduces his handy and efficient BTA (Binary to Ascii) utility for encoding binary files so they will pass through regular email. Keith Bennett agreed to say more about PSimpoll, his program that produces a very effective pollen diagram via PostScript. Darrel Maddy and John Brew discuss the statistics manual they are editing for the Quaternary Research Association (UK), and ask for some help. Owen Davis reports on the results he has gotten from his questionnaire in Newsletter 7. (If you have not yet sent in your answers, please do it now so that the conclusions can be based on a larger cross-section of the readers.) John Birks provides another valuable Bookshelf. Steve Juggins, Warren Kovach, and I discuss the Internet and announce a new software archive where you can use Anonymous FTP to get many of the special purpose software programs described in the Newsletter. And Dr. Triage is back as well. I ask the readers of the Newsletter to send me information on any of the data-handling techniques that you have used which could be helpful to oth- ers. Please check your regular and email addresses for accuracy. Send any corrections/suggestions to: Louis J. Maher, Jr. Department of Geology & Geophysics University of Wisconsin 1215 W. Dayton Street Madison, WI 53706 USA Phone: (608) 262-9595 FAX: (608) 262-0693 Email: maher@geology.wisc.edu AUSTRALIAN AND SOUTHWEST PACIFIC POLLEN ATLAS PROJECT Geoff Hope, Carlo Martinello, and Judy Owen Department of Biogeography and Geomorphology RSPacS, Australian National University PO Box4 Canberra 2601 Australia Email: gxh411@coombs.anu.edu.au The Department of Biogeography and Geomorphology, Australian National University, proposes to develop a print and electronic record of the pollen flora and associated microfossils likely to be recovered from river, swamp and lake and other deposition sites in Australia, Eastern Indonesia, New Guinea, the high western Pacific islands (Solomons, Vanuatu, New Caledonia, Fiji) and the sub-antarctic islands. The major purpose of this record is to make identification of pollen possible for regional users who wish to carry out studies into past vegetation and environmental histories on a variety of timescales. Such studies can contribute to understanding the stability of existing ecosystems, the impact of human activities and the effects of climate change. In the Australian region pollen analysis has produced remarkable results, but it has the potential to be used with much greater precision by a wider group of users if the problem of access to regional pollen identification can be improved. This work will make a major contribution to understanding the ecological effects of climate change and hence can contribute to strategies for managing the present problems of global warming. The Atlas will also be a major resource for plant systematists interested in the relationships of the living floras of the region, as pollen is a taxonomic feature closely tied to the floral structures used to classify the flowering plants. Palynology is a major geological tool, and the atlas will help stratigraphic palynologists to refine their fossil types in relation to living floras. In converse, more accurate knowledge of palaeofloras will improve understanding of the evolution of Australian plant communities and environments. No pollen atlas exists for the Austra- lian region, and in fact workers make do with atlases from Taiwan, Argentina and Britain. The atlas will also be of interest to other branches of pollen research such as honey, allergy and plant reproductive biology. Background. Since 1965 the Department of Biogeography and Geomorphology in the Research School of Pacific Studies has been developing a comprehensive collection of the spores and pollen of Australian [*p.1 / p.2*] and Pacific plant species. The collection currently stands at about 16,000 slides covering about 7500 species. Although the total flora of the region proba- bly exceeds 35,000 species, pollen is a conservative element, and for many large families only a sample of representative genera is necessary. Some families (for example the Orchidaceae, the family with the most species in the region), are virtually never represented in sedimentary sites, and hence do not have a high priority for comprehensive collection. Although still incomplete, the department's collection is by far the largest and most comprehensive for the region in the world. It has cost some millions of dollars in equipment, materials and staff time to collect, prepare and catalogue modern pollen. Further effort will be required to develop more complete holdings in groups with potential to contribute to understanding past ecology. Examples are the eucalypts and related genera, the daisies, and mangroves. The pollen obtained from flowers from identified specimens is suspended in silicon oil in small sample bottles, and is also held as microscope slides. A card file record including a formal description and microphotographs is continually being enlarged, and such coded specimens are being entered in a database at present. The collection and these records are only available to users in Canberra, and it is constantly visited by interstate researchers. Using the material is difficult and time consuming as slides deteriorate and often need to be re-made to allow a researcher to carefully compare unknown pollen grains with likely matches in the collection. Obtaining a slide and examining a few pollen grains under 400 or 1000x magnification usually takes several minutes. Photographs which are kept on reference cards are useful, but often not sufficient. Even more effort is needed to obtain scanning electron micrograph images, although these provide detail which is usually quite diagnostic, and they can greatly help the interpretation of normal light images. The Pollen Atlas Project plans to make the data available as descriptions and images that will allow identification of unknown grains from all regions in the Australian region. This will take the form of a computer database incorporating scanned micrographs, which will incorporate SEM images where these are available. From this database, which will be continually upgraded and extended, a major pollen atlas will be prepared using a high definition laser printer to handle the images. The advantage of the database is that it will allow the production of updated versions and increased detail on selected groups. The total region is very large and variable so the main atlas may not prove suitable in day to day identifications at specific sites. There may also be too much detail in the database to allow it all to be published in book form. Read/write compact disk technology appears to be the most suitable method for making the data available. There are advantages over a book, if images can be displayed at standard size on a split video screen which is also showing unknown grains via a connection to a microscope camera. In addition to the complete atlas, the project is intended to make sub-regional or purpose built atlases and databases available to users. These would normally cover natural floristic regions, for example western Tasmania, south eastern montane - alpine Australia or lowland New Guinea. Using locational data, often at generic or family level, we expect to be able to produce pollen guides consisting of 150 - 300 pollen types for any site in our region. These could be tailored for particular purposes, such as food plant identification or aquatic hydroseral analyses. Such guides are going to be interactive, in that obvious gaps in the existing collection will be apparent, and updated versions can be prepared. Currently the collection is strongest for Tasmania, coastal and subalpine Victoria - NSW, northeast Queensland, inland NSW and South Australia and montane New Guinea. Such regional atlases will be easier to use, and should be capable of being installed on common PC facilities. This is essential if a wide base of users in Australia and the Pacific is to be encouraged to make use of pollen analytical techniques. Atlas Development. The collection is currently catalogued using a database on a PC, and 4000 of the preliminary complete records (without images) are held in a more complex computer database capable of storing a range of image formats and producing publication quality output. Because pollen characteristics are coded in the database, sorting routines can be used to locate groups of pollen with similar characteristics for comparison. We have tested several scanning techniques using photographs of pollen, but the file sizes produced are very large, and photographs are not available for more than part of the collection. We expect that a video capture of the selected grains under the microscope will provide the highest quality image with the least delay or risk of error, and this system is being developed by pTIZAN [*p.2 / p.3*] Computer Services Pty Ltd. The problem of image file storage and retrieval seems most economically solved by videodisk technology, and practical options are being investigated. In the interim the database is being expanded and is stored in a mainframe comput- er at ANU, but is accessed by personal computers. Interactive user access to this database could be arranged as an alternative to book or videodisk products. This might save on some equipment costs, but could cut out some routine users. The project will be capable of continuing expansion once the methodology is in place. Future developments will include the incorporation of other fossil groups of interest, including diatoms, algae, dinoflagellates, microfauna and phytoliths. Larger objects commonly found in stratigraphic contexts could also be included, such as seeds less than 5mm, standard SEM micrographs of wood and charcoal or insect carapaces. It would also be advantageous to incorporate the well-archived Tertiary fossil pollen from the region in the database. The project will require the cooperation of laboratories in the region to provide sample data, to make corrections, and to house non-pollen collections. Significant collections in Sydney, Melbourne and Bandung (Indonesia) exist. Because preparation of the data is already well advanced, the atlas should start to appear in the next three years, if external support for the project becomes available. Although the project has received substantial development support from internal funds, the goals are not directly those of the Research School of Pacific Studies, and thus cannot claim a disproportionate extent of resources. Similar projects in Europe and North America have received grants from geological and taxonomic sources, as well as industry funds. - - STOP PRESS - - The former Coordinator of the Newsletter reports a new address and title. Word came at press time that Dr. J.C. Ritchie, Professor Emeritus of Botany (University of Toronto), effective immediately has the following permanent address: Pebbledash Cottage, Corfe, TAUNTON, Somerset, England U.K. TA3 7AJ. Tel: (0)823 42434. SEQUENCE COMPARISONS AND SEQUENCE-SLOTTING Malcolm Clark Dept. of Mathematics Monash University Clayton, Victoria Australia, 3168 Email: rmc@monu1.cc.monash.edu.au 1. Introduction. Many investigations involve the comparison, cross- correlation or amalgamation of two or more sequences of measurements. Examples of such sequences are geophysical well-logs, palaeomagnetic measurements from lake sediment cores, and pollen sequences. An essential feature is that observations within each sequence are ordered, say by increasing depth down a core, or by increasing age. As an example, consider the comparison of pollen data obtained from two cores, say core A and core B. Each core is sub-divided into a number of small samples or horizons, numbered A1, A2,..,Am in core A and B1, B2,..,Bn in core B. The numbering is in the same direction, say from top to bottom, in both cases. At each of the (m + n) horizons, a count is made of the various types of pollen found. If the cores are from the same lake, then the pattern of variability in the pollen counts should be similar in both cores. If this is the case, this common pattern can best be estimated by combining the sequences into a single sequence. In general, given two ordered sequences of observations, the objectives may be: (1) to measure the difference between the sequences, or (2) to identify matching parts of the sequence, or (3) to combine the two sequences into a single joint sequence. A general strategy for achieving these objectives is first to define a measure of dissimilarity or distance d between any two objects (i.e. horizons, in the case of pollen analysis) from either core. The objects A1, A2,.., Am and B1, B2,.., Bn may thus be represented as points in some space of k dimensions. One intuitively appealing measure of the difference or distance between two sequences is the total distance or "combined path length" (CPL) through the combined sequence of A's and B's. For any such sequence, the CPL is obtained by adding the distances between adjacent objects (i.e. horizons) in the combined sequence. The problem then is to combine or to slot [*p.3 / p.4*] the two sequences into a single combined sequence, preserving the ordering within the A's and B's, in such a way that the CPL is minimised. The resulting minimum CPL, or some transformation of it, is a measure of the difference between the two sequences, while the corresponding optimum combined sequence gives the best amalgamation of the information in the original sequences. For example, one possible combined sequence for the case m = 4 and n = 3 would be A1 A2 B1 A3 B2 A4 B3, for which the CPL = d(A1, A2) + d(A2, B1) + d(B1, A3 + d(A3, B2) + d(B2, A4) + d(A4, B3), where d( , ) denotes the distance between the specified horizons. Notice how the A's and B's are in their correct order. An alternative approach, applicable when each object Ai or Bj can take only a few possible values, is to align the two sequences side-by-side, rather than slotting them together into a single sequence. If the numbers of A's and B's are unequal, there will necessarily be some gaps in either sequence. For example, one possible alignment with m = 4 and n = 3 would be A1 A2 . A3 A4 . B1 B2 B3 . where the dots indicate gaps. This approach arises naturally when comparing DNA sequences and the like, but the sequence-slotting technique seems more appropriate when the variables of interest are on a continuous scale. 2. Methodology. The optimum combined sequence (in the sense of minimum CPL) may be easily found by dynamic programming techniques. The basic algorithm for this and similar problems was discovered independently by various workers in the early 1970's, e.g. Delcoigne and Hansen (1977), Sakoe and Chiba (1978), Needleman and Wunsch (1970). The sequence-- slotting algorithm uses an array of 2mn numbers, which is built up itera- tively using essentially two equations. (See Clark (1985), Gordon (1980) for details, and Sankoff and Kruskal (1983) for a broad overview). The numerous dynamic programming algorithms may be divided into two categories, depending on whether the sequences are to be aligned or slotted. There is an enormous literature on techniques for the alignment of sequences (such as DNA sequences), but very little on sequence- slotting as described above. 3. Additional Constraints. Sometimes the combined sequence must take account of additional constraints based on stratigraphic information. For example, each core may contain one or more "markers" which must match up in any combined sequence. There could be say a distinct band of sediment at position A5 in core A, but at position B11 in core B. Then in any combined sequence A5 and B11 must be immediately adjacent. The algorithms of Gordon (1980) and Clark (1985) allow for a variety of such additional stratigraphic constraints. For example, users may specify that certain parts of one sequence must overlap a particular subset of the other sequence, or that part of one sequence must precede part of the other, or that part of the final sequence must contain elements from just one sequence only. On the other hand, sometimes the optimal combined sequence may contain long blocks of consecutive A's or B's. This is likely to happen in parts of the sequence where the response variables (e.g. pollen distribution) are nearly constant. In such cases, these long blocks do not necessarily represent a significant difference between the relevant parts of the two sequences. The CONSLOT and PCSLOT programs mentioned below enable the user to specify the maximum number of consecutive A's or B's (i.e. the maximum block-length) to be permitted. 4. Problems. Experience has shown that the se- quence-slotting procedure depends heavily on the choice of distance measure, and on any preliminary scaling of the measurements. It is advisable to try alternative distance measures, e.g. "city-block" metric versus Euclidean distance, and to scale the original data to ensure that all variables are in comparable units of measurement. The solution to the sequence-slotting problem need not necessarily be unique. There may be many alternative combined sequences all with the same minimum CPL especially when only one variable or characteristic is measured on each object Ai or Bj. [*p.4 / p.5*] The various algorithms guarantee to find only one of the multiple solutions. In principle, the dynamic programming algorithms can be extended to solve the problem of the simultaneous slotting of k sequences into a single combined sequence. In practice, the amount of computer time and memory required increases so rapidly that only the above case of k = 2 is feasible. Thompson and Clark (1989) give an overview of the practical problems associated with stratigraphic correlation using sequence-slotting. 5. Assessment of results. All sequence-slotting algorithms will produce an answer, no matter what data they are applied to. But does the answer make sense? Questions to ask are: (1) Is the optimal slotting greatly superior to other possible ones? (2) Which parts of the combined sequence are more reliable? (3) Do some of the individual objects have a large influence on the outcome? (4) Did it make sense to combine the two sequences anyway? (Are they "telling the same story"?) Questions (1) and (2) are answered to some extent by the so-called "H- matrix" devised by Gordon et al. (1988). This displays graphically both the optimal slotting(s) and some of the slottings which are worse than it. It also indicates which parts of the slotting are "tight", where a minor change in the combined sequence would produce a large change in the CPL. Question (3) may be answered by the simple device of leaving one observation out at a time, seeing what happens, and repeating this all the way down either sequence. In principle, this sensitivity analysis requires (m + n) passes through the data, but Gordon et al. (1988) show how it can be done in a single pass, at the same time as computing the H- matrix. There are two proposed statistical tests of the hypothesis implied in (4) that both sequences show the same basic pattern in the variables of interest but on possibly different time-scales. Both methods involve computer simulation. Gordon (1982) suggested simulating data, with interpolation, from one sequence, and slotting the simulated data against the other sequence. Repeated simulations give an indication of the mean and standard deviation of the CPL. Clark (1989) proposed a randomisation test in which each sequence is split at random into two sub-sequences, A1 and A2 from sequence A, and B1 and B2 from sequence B. The idea is that each random sub-sequence is slotted against every other sub- sequence, and the whole procedure is repeated say 40 times. If the optimum CPLs for the "between-sequence" comparisons (e.g. A1 versus B1) are consistently bigger than those for the "within-sequence" comparisons (e.g. A1 versus A2), then this suggests that the original sequences are not compatible. This test appears to be very sensitive, but is difficult to implement when there are additional stratigraphic constraints. 6. Computer programs. There are numerous computer programs dealing with some form of sequence comparison. I know of only a few which deal with the particular problem of sequence slotting as discussed here. These are based on Allan Gordon's SLOTSEQ program published in 1980. SLOTSEE, produced by Lou Maher, is a PC-based program which is easy-to- use, produces very effective graphical output, and is designed particu- larly for the analysis of pollen sequences. I have written CONSLOT (now in Version 7.3) which allows for 12 different types of stratigraphic constraints as well as block-length constraints, and produces both the H-matrix and sensitivity analysis of Gordon et al. (1988). Written in Fortran to run on a mainframe computer, it can readily handle sequences containing up to 500 objects each. The output is largely non-graphical. I now have a PC-version, known as PCSLOT, which retains all the features, is easier to use, but at present is limited to about 100 objects per sequence. References Clark, R.M. 1985. A Fortran program for constrained sequence-slotting based on minimum combined path length. Computers & Geosciences 11, #5, 605-617. Clark, R.M. 1989. A randomization test for the comparison of ordered sequences. Math. Geol. 21, #4, 429-442. [*p.5 / p.6*] Delcoigne, A. and Hansen, P. 1975. Sequence comparison by dynamic programming. Biometrika 62, 661-664. Gordon, A.D. 1980. SLOTSEQ: A FORTRAN IV program for comparing two sequences of observations. Computers & Geosciences 6, 7-20. Gordon, A.D. 1982. An investigation of two sequence-comparison statistics. Austral. J. Statist. 24, 332-342. Gordon, A.D., Thompson, R. and Clark, R.M. 1988. The use of constraints in sequence-slotting. Data Analysis and Informatics, V (ed E. Diday), North Holland. pp.353-364. Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443-453. Sakoe, H. and Chiba, S. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions for Acoustics, Speech and Signal Processing ASSP-26, 43-49. Sankoff, D. and Kruskal, J.B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley. Thompson, R. and Clark, R.M. 1989. Sequence slotting for stratigraphic correlation between cores: theory and practice. J. Paleolimnology, 2, 173-184. THE PALCAT INTERACTIVE PALEONTOLOGICAL DATABASE AND IMAGE STORAGE AND RETRIEVAL SYSTEM David K. Goodman (Arco Oil & Gas Company) Richard C. Becker (Mobil Exploration & Production Company) The paleontological disciplines require specialized, if not unique, computer storage and data exchange systems in order to manage extremely large data sets of intimately related textual information and photograph- ic images. These data sets are used to document fossil taxa which have a wide and variable spectrum of morphological characteristics, and which have complex environmental, chronological, and geographic distribution patterns. The PALCAT (PALeontology CATalog) system is an interactive relational database and high-resolution image retrieval platform specifically designed to increase the efficiency with which scientists use paleontological information. The system provides users with the unique ability to "build" or "browse" electronic catalogs within a single integrated software environment. PALCAT incorporates a unique combina- tion of analog and digital optical disk technology to manage very large data sets containing both text and images. It opens new horizons for the future of specimen-based research by providing a means to condense and access otherwise widely-disseminated information onto a single desktop system. PALCAT was designed and developed as a co-operative project by a team of professional paleontologists (from Agip, Amoco, Arco, Chevron, Exxon, Mobil, Marathon, Shell and Unocal) and Electro Communication Systems (ECS) in Dallas, Texas, under the auspices of the American Museum of Natural History (Micropaleontology Press) and in cooperation with the American Association of Stratigraphic Palynologists. Technical problems were overcome by cooperation and compromise among paleontologists, hardware specialists, and software engineers. A group of paleontological research scientists designed and specified the overall user interface, relational database table structure, and primary functional requirements of the system from a user's perspective (overall work session flow, use of images as data elements, as well as other required equipment including microscopes and data input devices). Hardware and software engineers integrated these design parameters into a desktop hardware system. As one example of a cooperative solution, high resolution images were mandated by the requirements of many users regarding acceptable visual quality to make valid identifications or perform morphological compari- sons. Existing analog optical disc storage technology did not have the required resolution to satisfy these users, and consequently digital image storage capabilities were also incorporated into PALCAT. The resultant combination of analog optical disc and high-capacity digital image storage devices is "transparent" from a user's perspective. The basic PALCAT hardware configuration contains the following: (a) 386/486 computer and SVGA [*p.6 / p.7*] monitor for text data viewing, (b) 12" analog optical disk recorder/player or player-only for image storage, (c) 1 Gbyte removable cartridge erasable or WORM optical disk drive for storage of digital images as well as very large databases, (d) hi-res multisync monitor for image viewing, (e) hi-res image capture board, (f) microscope equipped with B&W or color video camera, and (g) optional digital flatbed scanner to capture existing photographic material. Variations on this basic set-up can be configured to meet specific needs and budgetary constraints of individual users. Several options to the hardware system include use of relatively inexpensive laservision players to view pre-recorded images, systems with digital record/analog view capability, and inclusion of a "live-video" window which displays a specimen being viewed under a microscope onto a monitor for comparison with recorded images from the database. Current development activity includes enhanced data display, search/retrieval optimization, expanded system hardware configurations, automatic compression/decompression of digital images, and possible implementation on UNIX and Macintosh operating systems.The system can also be incorporated into a local area network (LAN) which employs the client-server database architecture of Gupta's SQLBase. The PALCAT system operates under Microsoft Windows 3.0 on PC DOS-com- patible personal computers, and was developed using the SQLWindows/- SQLBase relational database software from Gupta Technologies. The PALCAT system is designed to operate in conjunction with a wide range of associated paleontological software packages (DOS and Windows), including expert system shells, taxon identification keys (ANGIOKEY, DINOKEY), literature databases (PALYNODATA), biostratigraphic data entry utilities (BUGIN, RAGWARE), as well as optical scanning and image processing software. PALCAT will include all search functionality of the SQLBase relational database software. The user simply specifies a set of appropriate criteria to initiate a search routine. The result of the search consists of the database records for one or more fossil taxa which meet the selection criteria, along with a series of recorded images for each taxon. As the user browses the textual records for each successive taxon on one monitor, the initial image for that taxon is retrieved from either the analog or digital optical disc and is displayed on a second (video) monitor; the user can then browse through the collection, or "stack," of successive images for that taxon or update text fields in the record. All types of original image source material can be incorporated into the electronic catalog; images can be captured directly from both transmitted light and scanning electron microscopes via video or digital cameras, or by optical scanning of 35mm transparencies, photographic prints, and published photomicrographs. Ultimately, hundreds of thousands of images can be stored and accessed on a network using multiple, connected storage devices. Analog image storage clearly provides an incremental advantage over digital storage regarding media (disc) requirements and practicality for the end user. In addition, the analog format offers a considerable cost savings by requiring only one disc (capacity 108,000 images, either full-color or grey-scale) in comparison to the multiple digital optical discs that would be needed for an image library identical in size. PALCAT offers a practical, fast and efficient image storage and retrieval system at a relatively economical cost. Information technology in the form of analog image storage capability is such a critical element of the PALCAT architecture that it would not have been possible to build such a system without the analog medium. The Ellis and Messina Catalog of Foraminifera (published by Micropal- eontology Press, a division of the American Museum of Natural History in New York) is the first major paleontological catalog to be adapted (by ECS) for use on the PALCAT platform. The printed catalog consists of approximately 76,000 pages of fossil images and text, and is currently used by numerous industrial and academic research institutions throughout the world. The catalog images were scanned and recorded onto an analog optical disc. The catalog text was converted into ASCII files using OCR software; these files will be imported into the database after final editing at the American Museum. A dinoflagellate image library (developed by ImageWare, Richardson, Texas) containing approximately 25,000 images is currently in production, and additional regional- and/or group-specific databases are being planned by several third-party vendors. PALCAT offers ease of updating the electronic catalog for both Micro- paleontology Press as well as end users by eliminating the need to manually interleave new pages into existing volumes of the catalog [*p.7 / p.8*] each year. Furthermore the program offers the end user the ability to update an existing catalog with proprietary images and data, or to extract or construct an entirely new one within individual laborato- ries, including full image capture and retrieval capabilities, with its standard software. Finally companies and universities, who would not otherwise be able to justify a printed catalog, can now implement a PALCAT related image database as an easy-to-use reference work for educational and training purposes. PALCAT therefore widens the base of end users for the Ellis & Messina Catalog of Foraminifera; it provides an ease of production, distribution and updating not previously possible for both the publisher and user; and it significantly broadens the scope of database activities available to end users for a variety of research applications. In the past, paleontologists have exchanged photographic information using traditional media - photographs, 35mm slides, or technical publications in scholarly journals - all of which are based on "hardcopy," or paper form. PALCAT offers the entirely new option of electronic dissemination of very large image data sets for scientific purposes, and thereby truly represents the future of data exchange for the science. However, the fundamental concept of the PALCAT system is not limited to paleontology. Numerous scientific, educational, and commercial applications exist which require a large image database. Potential future applications may include, among others, medical databases (e.g., histology, surgery, dermatology), botanical/zoological collections and identification guides, museum/research collections of cultural artifacts and fine arts, automotive or machine catalogs, and a host of others. PALCAT offers any organization or individual conducting paleontological research the opportunity to create and use image-based databases and to bridge fundamental problems related to the standardization of image file formats. There are many digital file formats currently available for the digital storage of image data. This variety of digital formats creates difficulty regarding both standardization of one or more formats for use by an organization, and conversion of one digital file format to another for data exchange. PALCAT offers a global solution to the situation in two ways: (1) PALCAT will operate on any PC-based personal computer system anywhere in the world, and (2) the stored video images represent a universal standard because PALCAT is a "closed" system utilizing NTSC video images that can be recorded, exchanged and played by any paleontologist in the world using the system. Finally, although the system was originally designed in cooperation with and intended for use by a rather limited group of research scientists, PALCAT interfaces can be easily modified for other integrated image databases, and thus has the potential to provide a universal standard hardware and software platform for image databases in a wide variety of research and educational applications. In summary, the PALCAT system offers significant potential as the first integrated and widely-supported platform utilizing a graphical user interface to electronically archive paleontological data. It creates more open access to data, resulting in standardized taxonomy and more efficient identification procedures, substantially reduced learning curves for persons unfamiliar with particular fossil groups, and more effective retention of the cumulative knowledge of experienced paleontologists. Interested individuals should contact the following organizations for additional information concerning various aspects of the PALCAT system and products: Dr. John van Couvering American Museum of Natural History 79th Street at Central Park West New York, NY 10024 (PALCAT program; Ellis and Messina Catalog of Foraminifera) Mr. Terry Muncey, CEO Electro Communication Systems, Inc. 2043 Empire Central Dallas, TX 75235 (PALCAT hardware configurations and image processing services) Barbara A. Goodman, President ImageWare 2415 Fairway Drive Richardson, TX 75080 (Dinoflagellate Image Library; PALCAT image processing services) [*p.8 / p.9*] BTA: BINARY TO ASCII CONVERSION PROGRAM Eric C. Grimm Illinois State Museum Research and Collections Center 1920 South 10 1/2 Street Springfield, IL 62703 USA Email: grimm@denr1.igis.uiuc.edu Introduction. BTA converts binary files to ASCII and the converted ASCII files back to binary. The program is useful for converting binary files to ASCII format for transfer by electronic mail. Both receiver and sender must have a copy of the program. The ASCII files, called BTA ASCII, consist entirely of ASCII characters, and can therefore be transferred by electronic mail utilities that understand ASCII. The program is written for IBM PC compatible computers by Eric C. Grimm and is available for free distribution and copying. How the program works. The program first compresses the binary file and then writes out each byte in the compressed file as a two-character hexadecimal number. Compression will often reduce the original file in size by 50% or more. The final hexadecimal file will be approximately twice as large as the compressed file, and often about the size of the original file. The hexadecimal or BTA ASCII file consists entirely of ASCII characters. When the program converts BTA ASCII back to the original binary format, it first converts the two-character [ Figures 1 and 2 on p. 9 not reproduced. ] [*p.9 / p.10*] [ Figures 3 through 5 on p. 10 not reproduced. ] hexadecimal numbers to 8-bit bytes and then explodes the compressed file back to the original format. When converting the binary file to BTA ASCII, the program asks for a maximum file size. If the file is larger than this size, the program will write multiple files. The default maximum file size is 64 kb. The ASCII file will have the same root name as the original file, but the file extension will be a sequential number. For exam- ple, if the file PROG.EXE is converted to BTA ASCII, the file name would be PROG.001. If PROG.001 exceeds the maximum file size, file PROG.002 would be written, and so on. The first file in the sequence contains the original file name. Usage: BTA [FILE- NAME] [TOASC[II]|TOBIN[ARY]]. The filename and TOASC or TOBIN arguments are optional. See Fig. 1. If you are converting from binary to BTA ASCII, type the complete filename, including the extension. If converting from BTA ASCII to binary, you do not need to type the extension (.001 is assumed). If more than one file exists (*.002, *.003,...), the sequential [*p.10 / p.11*] files will be read in their correct order. If you do not enter a file on the command line, the program will ask for the file name: File to convert? (Press Enter for list) If you press the Enter key with no filename or enter a wildcard (e.g. *.exe), the program will list the files in the current directory. See Fig. 2 and 3. If you are converting from binary to BTA ASCII, you will be asked the maximum size of the output file in kb. (Ed: Some systems limit the size of email letters.) The program reads the BTA ASCII file until it finds the starting point designated by a pair of curly braces "{{". Thus, introductory headers in the files do not need to be deleted before conversion (assuming a pair of curly braces does not occur in the header). Error checking. When BTA writes a BTA ASCII file, it sums the bytes in the compressed file and writes this check sum at the end of the file. The numerical value of each byte is summed, not the number of bytes. When BTA reads a BTA ASCII file, it first checks that each ASCII character is a valid hexadecimal digit (0-9, A-F); it then converts the 2-character hexadecimal numbers to 8-bit bytes, which it adds for comparison with the check sum, thereby detecting virtually any error. BTA ASCII file format. As shown in Fig. 4, the start of the first BTA ASCII file in a sequence is denoted by two curly braces "{{", followed by the name of the original file in quotes, followed by the hexadecimal codes. The end of the file is indicated by two closing curly braces "}}". If another file follows in the sequence (Fig. 5), the end of the file is indicated by two closing parentheses "))". The beginning of sequential files is indicated by two opening parentheses "((". Carriage return and line feed characters are inserted every 78 characters. Thus, the file consists of multiple lines 78 characters long. The check sum is written in brackets "[ ]" after the closing braces or parentheses of each file. How to get the program. Send a formatted diskette (5.25" or 3.5") and self- addressed diskette mailer to: Dr. Eric C. Grimm, Illinois State Museum, Research and Collections Center, 1920 South 10 1/2 Street, Springfield, IL 62703, USA. PSIMPOLL - A QuickBASIC PROGRAM THAT GENERATES PostScript PAGE DESCRIPTION FILES OF POLLEN DIAGRAMS K. D. Bennett Sub-department of Quaternary Research Department of Plant Sciences University of Cambridge Downing Street Cambridge CB2 3EA United Kingdom Email: kdb2@uk.ac.cam.phx Most pollen analysts today have access to computer facilities for carrying out, at least, basic calculations on their raw data and plotting pollen diagrams. Traditional requirements for the presentation of pollen data in graphical form make it difficult to use commercial graphics packages for the purpose, and most pollen analysts use software written specially for this use. For example, in the 1970s John Birks and Brian Huntley developed POLLDATA for calculation and graphical presentation of pollen data on the Cambridge IBM mainframe. This program was success- fully transferred to other mainframes, and now has a PC version. More recently, Eric Grimm's TILIA and TILIAGRAPH, written for PCs, has come into general use in many pollen labs around the world. I have seen and used several such programs (all my Ph.D. thesis work was done with POLLDATA), but over the last few years all my pollen-data crunching and plotting has been done with software that I have written myself: currently I am using a QuickBASIC program called PSIMPOLL. The aims of this note are to outline the way in which PSIMPOLL works, to present the reasons for continuing to use my own software rather than more generally available software, and (most importantly) to act as a reminder that it is possible to write and maintain working programs that suit individuals. Such programs may lack sophistication, but they also lack constraints that are inevitably imposed when using a package that someone else devised and planned. Commercial packages cannot be used easily to plot pollen diagrams from raw data, but spreadsheets are well-suited to making the calculations. There is little to be gained from writing software for this aspect of pollen data-handling, and I do all my basic calculations on a spreadsheet, exporting the results in the [*p.11 / p.12*] form of an ASCII file. This may then be read by a plotting program directly, or after modification by a suitable text editor. PSIMPOLL provides the first of two steps in plotting results: it reads in calculated pollen data, annotated to indicate data type (percentages, concentrations, etc), and writes as output a PostScript page description file, containing the information needed for a PostScript interpreter to produce the pollen diagram. The second step is the passing of that file to the interpreter and producing the diagram (see Bennett, 1992). PSIMPOLL reads data from several files: up to four supplied by the user, and up to three that include blocks of text that will form part of the eventual output file. All these files (input and output) are ASCII: readable and modifiable by text editors. The user input files consist of the main data file (essential), and up to three optional files with data on pollen zones, radiocarbon dates, and sediment stratigraphy (using the Troels-Smith notation). The names of the optional files are related to the name of the main data file: PSIMPOLL looks for them, uses them if they are present, and carries on without if they are not. The format of output can be altered by effects introduced within the main data file, and by selecting options from a menu when PSIMPOLL is run. Within the main data file, a two-character code identifies the data as pollen percentages, concentrations (by volume or weight), accumulation rates, non-pollen data (e.g. magnetic susceptibility data, loss-on-igni- tion, sediment chemistry, or anything else), or macrofossils. Data may be presented by depth, age, or sample numbers (e.g. for a surface sample dataset). Particular levels, pollen types, or individual data values can be omitted by suitable editing of the dataset. It is also possible to plot different pollen types at different scales, mark certain "types" as charcoal, rarefaction data, rate-of-change data, or to enable PSIMPOLL to recognize a sequence of data for a summary diagram. PSIMPOLL will either run a dataset immediately using default options, or the user can change the defaults through a menu. These currently include scale factors for plot width and height, style (curves outline or shaded solid), font and font size for text, angle of taxa names, labels and subtitles, selecting taxa for plotting interactively, and joining up curves across missing values (or leaving a gap). The PostScript output is written to a file by default, but can be directed straight to a printer with PostScript interpreter. Output includes sediment stratigraphy, pollen zones, and radiocarbon dates if the necessary data files were present when PSIMPOLL was run. Example output appeared with my note on PostScript in Newsletter 7. PSIMPOLL is thus a straightforward plotting program. It works reasonably well, and has most features that a pollen analyst will want. It exists because I have found that it suits me to have available a program that I can tinker with: adding, modifying, and deleting features as necessary. I do not have the time to ensure that the program is bug-free, userfriendly, or documented to the standard that would be required for a marketable product, or even one given away free. I fix problems as they occur, or when a colleague complains loudly enough. It copes with the kind of data generated in our group, but I have made no effort to include features that might be found useful by others. Eventually, accumulated changes will turn the program into enough of a mess that I shall, without compunction and with no feeling of responsi- bility to users outside our group, retire it, and replace it with something better, as I have done with three predecessors of PSIMPOLL. As well as PSIMPOLL, I have written zonation and pollen rarefaction programs which run from datasets organized in the same way. If anyone out there wants to try PSIMPOLL, even after reading all this, I will be happy to send them a copy, free, with example data. But I do not guarantee any subsequent service! The newsletter coordinator told me he believes is it useful for the readers to hear about the kinds of software that other individuals and groups are using. After all, that is a purpose of the newsletter; another's solution may help solve a problem of your own. But if you find that the software that you have bought, begged, stolen, or borrowed is limiting, then write your own. Avoid the temptation to bring your own programs up to commercial standards, and you should find that you can quickly produce something that fills your own needs. I have never regretted it. Bennett, K.D. (1992) Use of PostScript to increase portability of pollen diagrams. INQUA - Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 7: 6-7. [*p.12 / p.13*] STATISTICAL MODELLING OF QUATERNARY SCIENCE DATA: A PRACTICAL MANUAL D. Maddy and J.S. Brew (editors) Department of Geography Southampton University Southampton SO9 5NH United Kingdom Email: D.Maddy@MAIL.SOTON.AC.UK J.Brew@VAX.RHBNC.AC.UK We are in the process of putting together a technical guide with the above title for the Quaternary Research Association (UK) with a provisional schedule as shown below. As you will see, a number of techniques we would like to include have yet to be assigned to an author. We would be delighted if any members of the working group feel they could help us by offering to write a section on any of these remaining techniques. We would also be happy to hear from anyone who feels that additional techniques should be included (although you should be prepared to write that chapter!). The book is not intended to give complete and thorough discussion of each technique but more to guide the reader in the direction of the appropriate technical literature, in uses of the technique, and the availability of computer routines to undertake the analysis. Although technical descriptions will be necessary the reader should be able to skirt around the equations and yet still 'understand' the underlying rationale. We anticipate that the deadline for initial drafts will be end May 1993. All papers will be refereed and then returned for alterations. The final deadline will be end July 1993. We will then prepare camera ready copy before the end of August 1993. Provisional Schedule: INTRODUCTION: J.S. Brew (University of London) and D. Maddy (Southampton University) Properties of Quaternary data sets What questions are of interest Outline of a statistical model What techniques are available Methods Decision Tree PART ONE : MULTIVARIATE DATA ANALYSIS Ordination techniques: W.L. Kovach (University of Wales) Classification : (Cluster analysis, TWINSPAN etc.) Discriminant analysis: Application: Factor Analysis : J. Walden (Oxford University) PART TWO: TIME SERIES AND SPATIAL ANALYSIS Spatially and temporally constrained classification and ordination : Time Series Methods : (ARIMA etc.) Spectral Analysis : Spatial Analysis : (Trend surface analysis, spatial autocorrelation, kriging etc.) Application: Tree ring sequence matching : M. Bridge PART THREE: GENERALISED LINEAR MODELLING: J.S. Brew and D. Maddy General Principles Regression - error and link function specification Logistic regression for proportions Log-linear models for counts Regression diagnostics and model selection Application: Poisson clustering PART FOUR: ENVIRONMENTAL PROXY-DATA ANALYSIS : H.J.B. Birks Regression and calibration - weighted averaging, maximum likelihood,partial least squares Error estimation (jackknifing, bootstrapping) PART FIVE: COMPUTER SOFTWARE We hope to hear from some of you out there SOON! [*p.13 / p.14*] PRELIMINARY RESULTS OF DATA FORMAT SURVEY Owen K. Davis Department of Geosciences University of Arizona Tucson, AZ 85721 Email: palynolo@ccit.arizona.edu We have received 15 responses to the data questionnaire printed in Newsletter 7 (January 1992). That is just 20% of the number of email addresses listed in the back of the Newsletter, so you should consider this a preliminary report. IF YOU HAVE NOT RETURNED THE QUESTIONNAIRE, PLEASE DO SO NOW. These preliminary results include some surprises for me, and some lessons regarding questionnaires. Under the "lessons" category, I have learned to expect the unexpected such as how many different ways could one question be interpreted? The first question was intended to determine if anyone was NOT storing their data on hard disks, floppy disks, or tape. However, EVERYONE answered the question, and the results may indicate that 2/3 of the responders enter their counts directly into the computer, whereas 1/3 mark the counts on a piece of paper, and then enter the final counts on the computer (see Table). The 2/3 who count on the computer (pun intended) store a printout of the data as a hard copy. The implication is that a Newsletter article on microfossil counting programs would be welcome -- who uses them and how many different programs are there (hint, hint editor). For example, John Kingston mentions that one of his students is using a BASIC program written by Paul B. Hamilton (Hydrobiologica, 1990, 194:23-30). The modal responder uses an upper-end (80386) IBM-Compatible computer with a math co-processor, and has more than one computer. Surprisingly (to me) only one responder has a Macintosh. Those who use mainframes indicated they primarily use them for email. Another surprise is that 1/2 the responders cannot read 5 1/4 Double-Density diskettes! I had though 360 K diskettes were the universal medium of the computer industry, but more use 3 1/2 Hight Density diskettes than any other kind (see Table). Another interesting result - but no surprise - is that most of the responders (3/4) store data in ASCII format rather than in binary. Both hard and floppy disks have become so cheap that the convenience of being able to easily view and edit the file outweighs the size savings of the binary format. Some of us probably use data-compression programs like PKARC for archiving and backup. The diversity of ASCII formats also is surprising. The formats I consider "standards" are rarely used; and there are more "attribute" (levels ordered by type) formats than I knew of. Overall, most responders are storing data in an ASCII or binary format written by commercially-available software such as LOTUS or PARADOX. Otherwise, the "condensed" format (code, count) leads by a slim margin over "matrix" and "attribute." Based on responder comments, this is due to its general applicability to numerical analysis software, rather than its size savings. Still under the heading of the "condensed" format, both John Birks and Steve Juggins pointed out that I had confused CAMBRIDGE with CORNELL in both the article (Newsletter 7, p.9) and questionnaire (the file formats, not the schools). I have excerpted some of HJBB's comments at the end of this article. The preference for commercial software is even more evident when it comes to manipulation, plotting, and numerical analyses. For manipulation and plotting, Eric Grimm's TILIA clearly is the favorite. Most of the responders have it and use it. Several responders commented that they had adopted the European - North American format of PARADOX (Newsletter 7, p.1). Although TILIAyGRAPH is the favorite, the diversity of plotting programs is remarkable; eleven of the responders use programs not mentioned by the others. Remember that this is the Newsletter of the DATAHANDLING COMMITTEE? Based on the breadth techniques in the numerical methods portion of the questionnaire (see Table), the responders might better be categorized as DATA CRUNCHERS. Alternatively, these "power users" may be more likely to return questionnaires. Nearly every responder uses several different techniques. Although some programs are widely used - for example, 40% use CONISS and CONOCO - half of the programs listed are used by only one responder. [*p.14 / p.15*] My primary goal in writing the questionnaire was to determine the most common format in use by members of the Data-Handling Committee. As I suspected, the answer is that "it is human nature to do things different- ly." Rather than suggest a common format, I suggest that members be prepared to translate among the various formats, and that software writers make the data-entry portions of the programs as flexible as possible. One possibility for standardization that I foresee is the emerging European - North American database format in PARADOX. I suggest to the members of these committees (Newsletter 7, p.1) that it is to their advantage to make that format available to the palynological community as soon as possible. -- COMMENTS BY H.J.B. BIRKS -- The Cornell condensed format first came into existence with Mark Hill's TWINSPAN and DECORANA programs in 1979. The Cornell Ecology Programs, through Hugh Gauch, used this format widely (e.g. DECORANA, TWINSPAN, GRADBETA, COMPCLUS, DATAEDIT) and they distributed a program called CONDENSE to help prepare condensed files. Since then Cajo ter Braak in Wageningen has developed CANOCO that also uses Cornell condensed format, as do several other recent multivariate programs from The Netherlands (e.g.FLEXCLUS, DISCRIM, MILTRANS). Because we use these programs so much with the same data sets as in our own programs, John Line and I use Cornell condensed format for all our recent programs like WACALIB, ANALOG, RATEPOL, RSURF, etc. Onno van Tongeren in The Netherlands has developed his CEDIT program to edit, manipulate, append, merge, convert, create, transform, summarize, etc. condensed files so one can easily change them. There is thus no Cambridge condensed format but the Cornell condensed format. I hope Cornell will not be upset by being called Cambridge! TABLE I. (15 responses as of June 11, 1992) HARD COPY: Paper count sheets (10) Paper print-outs (5) COMPUTER TYPES: IBM-Compatible (undif - 2) 8088 (3) 80286 (7) 80287 (4) 80386 (9) 80387 (7) 80486 (6) Macintosh (1) Mainframe (primarily for email): VAX (6) SUN (2) Other (3) MEDIA TYPES FOR EXCHANGE: Floppy disk 8 (2) 5 1/4 DD (7) 5 1/4 HD (8) 3 1/2 DD (10) 3 1/2 HD (12) Magnetic tape: (1) FILE FORMAT FOR DATA STORAGE: ASCII: (29) Attribute: (7) BIRKS (2) Other (5) Matrix: (10) ASCII (4) LOTUS (3) TILIA (2) Other (1) Condensed:(12) CORNELL (7) MINNESOTA (3) Other (2) Binary: (11) TILIA (4) PARADOX (2) Other (5) SOFTWARE: Data manipulation: (31) dBASE (2) EXCEL (2) LOTUS (3) PARADOX (4) QUATTRO (5) TILIA (8) Other (7) Plotting: (27) GRAPHER/SURFER (3) POLPROF (2) [*p.15 / p.16*] SYGRAPH (2) TILIAyGRAPH (9) Other (11) Numerical methods: (34) CANOCO (6) CONISS (6) MVSP (2) NTSYS (2) SYSTAT (2) WACALIB (4) ZONATION (2) ZONE (3) Other (17) NEW BOOKSHELF 5 H.J.B. Birks Email: birks@cc.uib.no The following recently published books may be of interest to readers of this Newsletter. L.S. Aiken & S.G. West 1991 Multiple regression: testing and interpreting interactions. Sage, Newbury Park. 212 pp. D.L. Bruton & D.A.T. Harper 1990 Microcomputers in palaeontology. Contributions from the Palaeontological Museum, University of Oslo 370, 105 pp. (Available from Paleontologisk Museum, Sars gate 1, N-0562 OSLO 5, Norway) J.S. Cramer 1991 The LOGIT model: an introduction for economists. Edward Arnold, London. 110 pp. E. Feoli & L. Orloci (Eds.) 1991 Computer assisted vegetation analysis. Kluwer, Dordrecht. 498 pp. B.S. Everitt & G. Dunn 1991 Applied multivariate data analysis. Edward Arnold, London. 304 pp. (Paperback). P. Firth, & S.G. Fisher (Eds.) 1992 Global climate change and freshwater ecosystems. Springer-Verlag, New York. 321 pp. B. Fenzel (Ed.) 1991 Evaluation of climate proxy data in relation to the European Holocene. Gustav Fischer Verlag, Stuttgart, 309 pp. (Paper- back). D.R. Harris & K.D. Thomas (Eds.) 1991 Modelling ecological change. Institute of Archaeology, University College London. 102 pp. Paperback. J.E. Jackson 1991 A User's Guide to Principal Components. J. Wiley & Sons, New York. 569 pp. M. Jambu 1991 Exploratory and multivariate data analysis. Academic Press, Boston. 474 pp. J.N.R. Jeffers (Ed.) 1991 Microcomputers in environmental biology. Parthenon Publishing, Carnforth. 344 pp. K.-H. Jckel, G. Rothe & W. Sendler (Eds.) 1992 Bootstrapping and related techniques. Springer-Verlag, Berlin. 245 pp. Paperback. P.M. Mather 1991 Computer applications in geography. J. Wiley & Sons, Chichester. 257 pp. Paperback. G. McLachlan 1992 Discriminant analysis and statistical pattern recog- nition. J. Wiley & Sons, Chichester. 528 pp. M. Monmonier 1991 How to lie with maps. University of Chicago Press. 176 pp. (Paperback). P.L. Nimis & T.J. Crovello (Eds.) 1991 Quantitative approaches to phytogeo- graphy. Kluwer, Dordrecht. 280 pp. R.J. Pankhurst 1991 Practical taxonomic computing. Cambridge University Press, Cambridge. 202 pp. P.G. Risser (Ed.) 1991 Long-term ecological research - an international perspective. John Wiley & Sons, Chichester. 294 pp. D.J. Saville & G.R. Wood 1991 Statistical methods: the geometric approach. Springer Verlag, New York. 560 pp. L.C.K. Shane & E.J. Cushing (Eds.) 1991 Quaternary Landscapes. University of Minnesota Press, Minneapolis. 229 pp. Readers may be interested in the comparative review of 8 statistical software packages for PCs by Aaron M. Ellison (1992 in Bulletin of the Ecological Society [*p.16 / p.17*] of America (73, 74-87). SYSTAT, Stata, CRUNCH, CSS, Minitab, SPSS/PC, S-PLUS, and Gauss are extensively reviewed by Ellison. SYSTAT and S-PLUS both emerged particularly well from the review. THE ULTIMATE PALYNOLOGICAL DATABASE Louis Maher I received an email message recently from Matt McGlone, an old friend with the New Zealand DSIR. His group had just gotten on Internet, and he was looking for a commercial source of the "water-clear silicone fluids" so much used as a mounting medium for pollen grains. In passing along an address, I mentioned that my silicone-mounted reference slides were deteriorating, and some grains were now nothing but amorphous pink spheroidal "oil drops." Seven years ago colleagues kindly contributed demonstration slides on short notice so that I could teach my fall pollen class. Some of these specimens are already losing their wall structure, and I have gone back to using glycerin jelly. Matt expressed concern, observing that reference slides were the ultimate database in palynology and suggesting the Working Group of Data-Handling Methods would be a logical group to query about the subject. A number of years ago Ed Cushing (Univ. Minnesota) had traced the problem to the slide sealant. But the problem slides I have had come from different labs, and the coverslips had different kinds of seal. Are the palynologist readers having trouble with pollen preservation in silicone? What have you noted, and what have you done about it? Let me know by email or letter, and I will summarize the opinions in the next Newsletter. THE DATA-HANDLING INTERNET Steve Juggins, London, UK Warren Kovach, Aberystwyth, UK Louis Maher, Madison, WI USA Part 1. Louis Maher Several years ago the department's computer person told me about File Transfer Protocol (FTP) on the Internet. He put me in front of a terminal and proceeded to show me how to logon strange computer systems scattered around the whole country. These places had unintelligible names, and they let one logon by using the name anonymous. Although they requested a password, you could type in anything (the proper response is your email address), and you would be let in. I recall looking at huge stores of public domain (often government purchased) software for all kinds of computers, and there were treasure-troves of free data that could be copied to our computer in a few seconds. I could not help imagining we were disembodied spirits moving about in the musty cellars of old Victorian buildings where one locked trunk after another could be thrown open to view. I came away from the experience with three distinct impressions: 1) I was dead tired, 2) I recognized that people who could do this easily had a marked advantage over those who could not, and 3) I had an inkling of how much disk space and time I would need even to begin sampling the treasure. There was so much stuff that simply browsing through titles could kill a career. I felt old. I still do! After Newsletter 7 was mailed, Steve Juggins suggested that we do a column on "Anonymous FTP," discuss its benefits, and even look about for a "basement room" where we could establish a program boutique that might serve the needs of the Newsletter's readers. It seemed like a good idea, and I asked Steve Juggins and Warren Kovach to contribute to a joint article. I planned to meld our material into a primer that would summarize the technique. And there is the rub. Just as Owen Davis (p. 14, this issue) found that we all use different computer programs and file-storage techniques, almost none of us uses exactly the same communication programs or the same routes to the electronic networks, and we all vary in our experience and our needs. Therefore I decided to keep the articles separate and suggest that you scan them all to pick up things useful to you. The Internet generally links mainframe computer installations which are maintained by universities, businesses, and governments, and these organizations invest a lot of time and money maintaining the services. We individuals tie into the network by linking our terminals and microcomputers to the local host computer. That may be by a very fast communi- [*p.17 / p.18*] cations link (Ethernet), or by a telephone modem from a home computer. [Note: When referring to computer path names and commands, the unbroken strings often become too long to fit in the column structure of the newsletter. When a string reaches the right margin and ends with a "trailing dash_ ", it means the unbroken string continues on the line below.] Ben Abernathy, our computer person, has allowed me to set up a directory in our department computer geology.wisc.edu that anyone with access to Internet can reach by Anonymous FTP. In the directory /pub/inqua, I have put self-extracting zipped (compressed) files of my computer programs. These include SLOTSEE, POLFILE, PLOTSITE, DE_ P-AGE, and others, as well as example data files. The sets are designed for different IBM graphics screens and are named accordingly: POL-EGA.EXE, POL-VGA.EXE, POL-HERC.EXE, etc. From this directory you can get the pollen counter program discussed on p. 24 in this issue (POLCNTPK.EXE), and you can also get a zipped copy of PAL, the Polish Database described by Ralska- Jasiewiczowa and Walanus (1991). Warren Kovach (1990) put his MVSP statistical package there, as did Keith Bennett his PSimpoll program, which he describes on p. 11 of this issue. Eric Grimm announces his Binary-to-Ascii utility on p. 9 of this issue, and I have taken the liberty of putting a copy of BTA.EXE there too. I hope that some of you will practice Anonymous FTP in order to get copies of these programs and that you will put some of your own there as well. We can make this an archive site that specializes in free programs useful to the Holocene Data- Handling Working Group. All those bold-face puts and gets are reminders that Anonymous FTP uses these commands to move data. Your perspective of the world is from the computer where you originate the FTP command; you get their data from the foreign computer, and you put your data there. Steve Juggins will discuss networks, introduce some of the terminology, and describe how he uses [ Figures 1 and 2 on p. 18 not reproduced. ] [*p.18 / p.19*] anonymous FTP to contact geology.wisc.edu from England. Then Warren Kovach will provide additional details about the network from the European perspective. I will comment on Telnet from a North American perspective, and then I will mention a very useful 100-page book of useful hints about the Internet and suggest several ways you can get a free copy. Part 2. Steve Juggins A network is a configuration of computers that exchange information, such as a local area network (LAN) or a wide area network (WAN). Computers in a network may come from a variety of manufacturers, and have major differences in their hardware and software. To enable different types of computers to communicate, a set of formal rules for interaction, or protocols, is needed. The most widely used of these is the Transmission Control Protocol/Internet Protocol Suite, commonly known as TCP/IP. TCP/IP was developed by the Defense Advanced Research Projects Agency (DARPA) for its own wide area network ARPANET. The term "Internet" is commonly used to refer to both the protocol suite and the larger DARPA network, which connects many individual TCP/IP campus, state, regional, and national networks into one single logical network. There is no one individual network known as The Internet. Rather it is a network of networks in which communication takes place at blazing speeds. Like a living thing, it grows, interconnects and evolves. Parts die off as well; the old ARPANET no longer exists as a singular entity. The Internet Protocol Suite specifies a set of services for, among other things, Simple Mail Transfer Protocol (SMTP) for email, remote terminal emulation (Telnet), and File Transfer Protocol (FTP). Here we concentrate on file transfer using the FTP command. The procedure is often run of computers using unix which is case-sensitive and employs lower case for most commands; we will too. The ftp command implements the File Transfer Protocol, and permits the copying of files to and from remote machines running different operating systems. To use ftp, you first open a connection with the remote machine by giving the command: ftp address where address is the Internet (IP) ss of the remote machine. The Internet address is a series of four 8-bit fields separated by periods, and should be unique for every machine on Internet. For example, the University College London Department of Geography's VAX has the address 128.40.32.128. Instead of using the numeric IP address, machines may also be referred to by their host-name. For example, the Vax computer referred to above uses vax.geog.ucl.ac.uk for its host-name. On the larger systems special commands can cause a name-server to look up the Internet address for a given host-name. But this will not work for my PC which is connected directly to the Internet, and an entry for the host name will have to exist in the hosts file used by the networking software on the PC. If you know the host-name of a machine you want to connect to, but not the numeric IP address, and wish to connect to it from your PC, then you may use the host command found on some unix machines which will return the IP address for the specified host-name, which can then be used from the PC. [Ed. Some systems use the command whois host-name or gethost host-name for this purpose; you may have to consult your resident guru to see what you should use.] So to connect to our departmental VAX to transfer files I would use the command ftp 128.40.32.128 (or ftp vax - because I have the IP address for the host vax in my local hosts file). Once connected ftp will ask for your username and password on the remote machine. The commands you will need are a mixture of unix and DOS; some of the common terms are shown in Figure 1. Once logged on, you may change to the appropriate directory using cd. Typing help gives a list of ftp commands like those shown in Figure 2. The command ls will display a directory listing, ls -l (ls with the long-form switch), gives file sizes and dates etc. Ftp transfers files in two modes, binary or ascii. Type binary or ascii to toggle back and forth between them. (The default generally is ascii; when you specify one or the other, it will remain in effect for the session unless you change it again. You must use binary to transfer an exact image of a compiled program; if in doubt, specify binary.) The command get copies a file from the remote to your local [*p.19 / p.20*] machine. For example, get datafile copies the file datafile from the remote to your local machine, giving it the same name on the local machine. get datafile newfile copies the same file, renaming it to newfile on your local machine. The command put datafile copies a file from the local to the remote machine, and is used in the same way. The commands mget and mput copy multiple files and allow wildcard symbols. To use ftp in this way you need to be able to logon the remote machine. If it is someone else's they would have to tell you their userid (logon name) and password. But a much more secure and commonly used way of sharing files is by means of anonymous ftp. With anonymous ftp, the administrator for a particular remote machine will have configured the system to accept a generic word as the logon name for ftp with any string as a valid password. Usually the system will ask users to supply anonymous or ident as the userid, and their email address as the password. An enormous amount of public domain software is available on Internet and distributed in this way. Lou's pollen programs are also available via anonymous ftp. This is how I got them: 1. Made a connection to Lou's computer: ftp geology.wisc.edu (or ftp 128.104.139.14 if from my PC) 2. Gave anonymous as the userid, and my email address as password. 3. The appropriate files are in the directory /pub/inqua, so change to this directory with the command cd pub/inqua . 4. Because the files are executable, switch the transfer mode to binary using the command binary . 5. Transfer the files using get: get bta.exe, get pol-ega.exe , etc. 6. The command quit ends the ftp session. A large amount of public domain software (PC, Mac, UNIX etc) can be found on Simtel (wsmr-simtel2_ 0.army.mil - 192.88.110.20). Part 3. Warren Kovach FTP From the UK--Using FTP on computers directly linked to the Internet is relatively simple. However, when this is not the case, there are alternative ways to do anonymous file transfers. These range from using intermediate gateway machines to having the files sent via normal email. To give an example, I will explain the two alternative ways I use to access FTP (working from a PC and a mainframe running unix), and then discuss a couple others. JANET--The British Joint Academic Network is a separate network from the Internet. Some UK universities also have Internet connections, but most do not. To allow all UK users access to the global networks, several gateways between Janet and other networks (such as Internet and Bitnet) have been set up. The University of London provides a "guestftp" service for Janet users. Because this computer is also connected to the Internet, the normal unix ftp commands can be used to perform anonymous FTP. GuestFTP--Using the Rainbow terminal and file transfer program (designed for use with Janet by Edinburgh University) on my PC linked to an Ether- net, I simply type call uk.ac.nsf.sun and logon as guestftp with the password guestftp. From here on, I can use FTP as if I was directly connected to the Internet. I type ftp to start a session, "open xxx.xx_ x.xxx to connect to some distant machine, then logon as anonymous with my username as the password. I can then change directories and transfer files with get and put. When I transfer a file from the distant computer, it is placed on the guestftp computer. I must then transfer these files to my own computer. The guestftp computer provides a command that transfers the files to my account on our mainframe here in Aberystwyth. From there, I must transfer it from the mainframe to my PC. With the Rainbow program (which only works with an Ethernet connection) I also can give a command on my PC that will retrieve the files directly from the guestftp machine without having to do an intermediate transfer to the Aber mainframe. FT-RELAY--This whole process can be cumbersome. The guestftp service is also very popular and during working hours it is difficult to get a connection. An alternative system, called FT-RELAY, has been set up that allows easier access. Using this, a single (albeit lengthy) command can be typed on the local mainframe that will submit a FTP request. The desired file will then automatically be transferred to my mainframe filestore. [*p.20 / p.21*] The FT-RELAY is another gateway computer between the Janet and Internet systems. Requests are submitted to it using the host-to-host-copy command of unix (hhcp). They are placed in a queue, and when a connection can be made to the distant computer the file is transferred. It can sometimes take many tries and several hours before the file is moved, but most often it is sent straight away. Let us say I want to get the Eric Grimm's program bta.exe from the geolo- gy.wisc.edu computer. I logon the mainframe here in Aberystwyth and type the command: hhcp -L -b uk.ac.ft-relay:"geology.wisc.ed_ u::pub/inqua/bta.exe" bta.exe This will send a message to the ft-relay machine instructing it to connect to geology.wisc.edu, get the file bta.exe from the named directory, and place it in a file named bta.exe on our local mainframe. I will be asked for a remote username (anonymous) and password (my e-mail address). Since this is a binary transfer (specified by the -b), I also will be asked the binary word size, which is 8. The -L causes the computer to maintain a log of host-to-host file transfers. This mouthful of a command can be simplified. For instance, I can set up an alias for the full name of the ft-relay machine by typing hhalias uk.ac.ft-relay ftb This alias will be stored for future use, and I can then replace the full name in the hhcp command with ftb. I also can use the hhstore command to record permanently the remote username and password, so that I will not be asked for them each time I use the hhcp command. If you regularly transfer files from certain unix computers, you can set up a shell script (similar to a batch file on a MS-DOS PC) that contains the repetitive parts of the above command. If I set up one called getwisc for the geology.wisc.edu machine to retrieve any file from the inqua directory, I could simply type getwisc bta.exe If a site had multiple directories of interest, you also could set up the script so you can specify the subdirectory. The FT-RELAY does not allow you to browse on the remote computer as you can with the normal ftp command. However, you can request a directory from the remote computer with the command: hhcp -L ftb:"geology.wisc.edu::(D)pub/in_ qua" wisc.dir The "(D)" specifies that a list of files in the following directory should be sent and placed in the local file called wisc.dir. Mail Servers--If your mainframe computer does not have a network connection that allows for ftp file transfers, you can still access the wealth of files and data on these anonymous FTP archives. Mail servers have been set up that get files from FTP archives, convert them into ASCII format using a program like uuencode (see Lou Maher's article in issue 6, July 1991, of this newsletter), then sends them to you through normal email channels. You must then use the uudecode program to reconstruct the original file. People on the Bitnet (Because It's Time Network) can use the mailserver at Princeton. Send a message with the single word help to BITFTP@PUCC to get more information on using it. Non-Bitnet sites can use an alternative service set up by Digital computers at ftpmail@decwrl.dec.com. Help on using this service also can be obtained by sending a message of help. Some Useful FTP Sites--Although there are hundreds of FTP sites, some have much larger archives than others. If you are looking for public domain and shareware software, SIMTEL20 (wsmr-simtel20.army.mil) is one of the best known and largest. It is also quite busy and difficult to logon. The wuarchive.wustl.edu site "mirrors" Simtel20 (that is, it has all the same files and directories) as well as numerous other ftp sites. It also has an extensive collection of mathematical software. This should probably be your first stop if you are in North America. One of the largest archives of Microsoft Windows software is at cica.cica.indiana.edu. [*p.21 / p.22*] On the European side, the Simtel20 files are mirrored on src.doc.ic.ac.uk. Two sites in Finland (garbo.uwasa.fi and nic.funet.fi) also have large and useful archives of PC software. Although I have not tried it, sol.deakin.edu.au in Australia supposedly mirrors both Simtel20 and Garbo. There are also sites that carry more specialized software and data. Besides the new geology.wisc.edu /pub/inqua archive, the COGS (Computer Oriented Geological Society) software archive is available on the csn.org site. A large collection of data and programs related to taxonomy is on the Taxacom FTP site (huh.harvard.edu). The molecular biologist in your life will be interested in the Indiana University Biology archive (ftp.bio.indiana.edu). A large archive of statistics software (mainly as Fortran source) is on lib.stat.cmu.edu (note that the username for logon should be statlib, not anonymous). Archie--There are thousands of files available out there through anonymous FTP, but finding the one you want can be difficult. McGill University has developed a system that allows you to search a database of FTP archives from all over the world, looking for files with specific names. This system, named Archie, is now installed on several computers around the world. It can be accessed by logging on directly through telnet, using the logon name archie. Typing help after logging on will tell you how to use the system. It also can be used through e-mail. Send a message with the single word help to the username archie at your nearest archie server. If you are in North America, the main archie machine is archi_ e.mcgill.ca . In Britain, it is archie.doc.ic.ac.uk, while on the European Continent it is archie.fun_ et.fi . Antipodean FTPers can use the archie.au in Australia. Part 4. Louis Maher Telnet is the main Internet protocol for creating a connection with a remote machine. FTP uses Telnet, and like FTP, the actual command for making a Telnet connection varies with the system. I have a Telnet directory on my PC which contains a batch of programs (including one called telbin and another called ftpbin) for connecting my PC to the department's networked unix minicomputers by Ethernet. I invested in the Ethernet board for my PC because it was cheaper than buying a bigger hard drive. I now keep bulk storage items on a digital tape unit that looks like a subdirectory of ice, one of the networked Sun computers that comprise geology.wisc.edu. From the PC I can type telnet geology.wisc.edu, logon by name and password, and then read, write, and do all the usual things. From the DOS prompt, I can ftp geology.wisc.edu and logon either as username and password, or as anonymous and email name. Depending on which option I take, I get to different places in the computer. Using my name I get to my own directory; using anonymous I reach the public storage area. But either way, I can only copy files without reading or editing them; it is after all, a File Transfer Protocol. I like my present setup because I can move big files quickly. I can zip up my whole PARADOX directory structure and ftp it to storage on tape in less than a minute. If I logon the department unix computer from home using PROCOMM and a modem, I can type ftp some.remote.place and logon as anonymous. I can transfer a copy of the distant file to the department's computer in a fraction of a second; if it is ascii text, I can by modem (slowly) page through it with an editor. But if I want to send a copy to my home computer by modem, it may take an hour. When obtaining large files from distant points, it is often best to move the files from the source to a temporary file on your host machine. You can then visit the host, that is, "go to work," and take the file to your computer on a floppy disk rather than tieing up your modem and phone line for several hours on the last and slowest link. Your setup will undoubtedly differ and you will have to talk to someone in your local computer system about the options open to you. Our department's computer guru has been giving a book to our new graduate students which you may find rewarding, both in the reading and in the getting. He obtained the book as a PostScript file via the Internet. It is called Zen and the Art of the Internet; A Beginner's Guide to the Internet, by B. P. Kehoe (1992). It is a good read, and its hundred pages of text can give you much more information about the Internet than we can do in a newsletter. The work is copyrighted, but it can be reproduced freely so long as it is kept whole. It contains a lot of information [*p.22 / p.23*] about networking, ftp, telnet, email, and it has appendices, a glossary, addresses of archives, and much more. Brendan P. Kehoe's address is brendan@cs.wi_ dener.edu . I sent a note asking how an international audience might get a copy of Zen. I will reproduce his answer because it can help nearly any reader. Although he mentions a new hard-cover book will be sold in bookstores soon, I suggest you make the effort to get a free copy of the first edition while at the same time practicing on the Internet. Date: Mon, 1 Jun 1992 10:27:15 -0400 From: Brendan Kehoe Subject: Re: Zen... Here are the instructions I usually send out. If you are on BITNET, you will want to send mail to bitftp@pucc with the line help in the body of your message, to get instructions on FTPing it through mail. If you are not on BITNET, but are limited to email, write to ftpmail@decwrl.dec.com with a similar body (i.e. the word help), to receive instructions. In case you are interested, in the next few weeks the second edition of Zen will be coming out as a Prentice-Hall book. You may want to consider looking for it in your local bookstores then, since it will be in a nicely bound format and contains approx. 30 pages of new information, as well as hundreds of updates and revisions. College bookstores will hopefully be stocking it, since it will be used in introductory Internet classes, and as a tutorial. If you would like to be notified upon its availability as a book, please contact me. If you do not have access to FTP or do not feel comfortable with bitftp and ftpmail, write to info-server@nnsc.nsf.net with the body of your message containing request: nsfnet topic: zen-1.0.PS topic: zen.readme and it will return the selected files. For FTP, here is what you need to do: first, type ftp ftp.cs.widener.edu and when it gives you the 'Name:' prompt, type anonymous If the name ftp.cs.widener.edu failed, try 147.31.254.132 instead. Then, when you see the prompt Password: just give it your email address; it does not really matter what you type here...using your address is just the tradition. Anyway, you will get in and be left sitting at the prompt. Type cd pub/zen and do dir; you will see the files listed there. If you want the PostScript version, type get zen-1.0.PS If you are on a system that only allows one period in a filename, use get zen-1-0.PS for example. If you want the .DVI (a TeX dvi file) file, type binary get zen-1.0.dvi (binary so it does not monkey with anything in the file). The .tar.Z file has the TeX source to the book, as well as the two other files (PS and dvi); to get that, type binary get zen-1.0.tar.Z instead. Once the file is finished transferring, type quit to get out. Good luck! Brendan --- Brendan Kehoe, Sun Network Manager brendan@cs.widener.edu Widener University, Chester, PA --- I got the PostScript version from Kehoe using: ftp ftp.cs.widener.edu anonymous maher@geology.wisc.edu [An opening note said if display screen chokes put '-'in front of password. As my screen was choking, I quit and relogged with -maher@geology.wisc.edu and then ls and cd, etc. worked correctly.] [*p.23 / p.24*] cd pub/zen get zen-1.0.PS transmitted in 107 seconds quit I sent the 499,365-byte file to our PostScript printer and the Times Roman typeset-quality reproduction was finished in less than ten minutes. Later, I tried his non-ftp route by emailing info-se_ rver@nnsc.nsf.net with the message containing request: nsfnet topic: zen-1.0.PS topic: zen.readme Within half an hour I had 13 incoming email letters numbered: 1 of 13, 4 of 13, 2 of 13, etc. I kept the lot by saving the first as "zen" and then appending (in correct numerical order) the other 12 files to it. Afterward, "zen" was read into the unix editor vi, and the 13 email addresses were deleted. (The 12-line headers all start with "From nnsc.."; you can find them by searching for that string.) The final file, of course, was just the same as that delivered by ftp. Based on an old network established by the military, the present Internet comes as close as any example I know of forging swords into plowshares. It is growing at a phenomenal rate. Use it! References Kehoe, Brendan P. 1992. Zen and the art of the Internet; A beginner's guide to the Internet, 1st ed. January 1992, 96 p. + i-iv. Kovach, Warren L. 1990, MVSP: A multivariate statistical package. INQUA - Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 4:1-3. Ralska-Jasiewiczowa, Magdalena, and Walanus, Adam. 1991. Polish palynological database (POLPAL) in course of building. INQUA - Commis- sion for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 5:1-2. TURN YOUR EXPENSIVE OLD PC INTO A DUMB POLLEN COUNTER Louis Maher Back in the late 1970's I recorded pollen identifications on a paper-roll adding machine as a way of recording pollen counts. It was one of the old fashioned models with 72 keys: 8 columns of keys 1 to 9. It devoted the two right columns for cents so the largest number you could enter was 999999.99. When I saw a pine grain I would ring up a 1000.01, a spruce would get 1000.02, and a sedge 1000.13. You did not have to enter the zero columns, so it was not much work. As you might guess a spruce bladder would be recorded as 500.02. You could get a count total just by touching the subtotal key. The sum of all the taxon identity numbers was gibberish, of course, but by putting an imaginary decimal point three places left of the machine's, you would get the total of all the 1's and 0.5's. This was a low-tech way of keeping a history of the pollen types so that later I could do a statistical study to see whether the different sizes or types were randomly distributed on the slide. After accumulating several miles of paper tape, it will suddenly occur to you that somebody is going to have to go back through it all to get the total for each taxon. At the time my research computer was a "Commodore PET," with a 16K memory (upgraded from its original 8K), and I proudly worked up a program to simulate a bank of 100 counters, any one of which could be incremented by touching a two-key code. I gave the program to anyone who would take it; I do not know whether it was used. Keith Bennett (1990a, 1990b) devised a clever aid to pollen-counting by programming a Psion "Pocket Organiser" to record taxa by a combination of one or two letters. The Psion is battery operated, relatively inexpensive, and its data can be up-loaded to a PC. I use one and enjoy it. The only problem with it is that once a count is started, you cannot do other programming with it without saving the data first. And then it is not easy to put the partial count back in the machine to continue the count. But if you think of the Psion as a dedicated counter, this is no problem; it simply shuts down and "sleeps" when you stop touching the keys. [*p.24 / p.25*] I never bothered to use my IBM PC as a pollen counter; it is in constant use for other things. But then I noted Pierre Zippi (1992) had a program for turning a Macintosh into a microfossil counter, and I got to thinking that a lot of labs probably have old outmoded DOS PCs around that are being used mostly as furniture. So I reworked my old PET program by borrowing some of Keith Bennett's Psion ideas and added some save and retrieve features. Owen Davis comments in this issue (p. 14) that we ought to deal with counters for microfossils, so this is a first shot. If others let me know what they have developed, I will include them in the next issue. POLCOUNT sets up 100 counters for 100 different pollen taxa. Think of these counters as labeled 00, 01, 02, ... 99; each is assigned to a pollen taxon. Each time POLCOUNT is run, a *.TAX file is loaded that associates the name of a real object (a pollen taxon, for example) with each of the counters. You can load TAXLIST1.TAX, a dummy taxon list that comes with POLCOUNT, in order to see how the program works. To record a grain of taxon 05 in counter 05, simply type the two numbers 0 and 5 in sequence. You will hear a click, and the actual name of taxon 05 will appear on the screen as confirmation. You do not need to touch the key; the program knows that any two strokes on number keys should increment the named counter by one. You will note that keys other than numbers usually produce an irritating beep. (The key will always end the program if you wish to do so.) The number keys serve to increment the counters. One of the 100 counters you control is reserved for a special purpose: 00 is used for marker grains that you might add to the sample for determining pollen concen- tration. There is also a counter 100, but you cannot access it directly because 99 is the largest two-digit number you can enter. Counter 100 sums the counts in counters 01 through 99 (it does not include counter 00); you can think of it as keeping a rough record of the pollen sum. POLCOUNT loads its Taxon List each time the program is run. It offers TAXLIST1.TAX as the default name, and you can choose that by merely touching the key. The initial TAXLIST_ 1.TAX file is a dummy sample that allows you to see how the program works. TAXLIST0.TAX is an 'empty' list consisting only in the names: Marker Grain, Taxon 01, Taxon 02, etc. When you decide the taxon order you wish to use, load TAXLIST_ 0.- TAX into your word processor and replace the dummy names with the actual taxon names you are going to use. The names can be short or long; they can contain spaces. When you wish to look at the taxon list while running POLCOUNT (see F7, below), only the first 8 letters will show. SAVE YOUR LIST IN ASCII TEXT FORMAT under a different name, but ending with the extension .tax. If you use TA_ XLIST1.TAX, it will load as the default. Different analysts can use different lists simply by giving them different names (But always with the same 3-letter extension: '.TAX') which can be entered manually from the *.TAX files shown in the directory listing. The function keys F1 - F10 allow you to correct errors, seek information, edit the counters, and load and save data files. The purpose of the function keys is always displayed at the bottom of the screen while counts are recorded: F1 Help gives a quick description of the program. F2 Ignore the first key pressed if the mistake is noted before pressing the second key. F3 Subtract 1 from the last counter changed; i.e. correct a mistake. If the counter ranged between 01 and 99, then the pollen sum counter would be debited as well. F4 Subtract 0.5 from the last counter changed. This can be used to record half grains. If Picea were taxon 01, and a single Picea bladder were noted, the key sequence 0, 1, F4 would, in effect, add a half count to counter 01 (and to the pollen sum counter 100). F5 Fix/Correct a counter sum. This allows you to make substantial corrections. If you have been recording a special unknown grain type and then find out what it is, you can press F5. The program will ask which counter you wish to change, and it will tell you what that counter now holds. You can then add (+ value) a number to one counter and subtract (- value) it from another, etc. F6 Displays the contents of counters 00 (markers) and 100 (pollen sum). [*p.25 / p.26*] F7 Displays for counters 01 - 99, their two-digit identification numbers, the first 8 letters of their taxon name, and the number of grains recorded. The number of marker grains and the pollen sum are indicated in the screen's title. This allows you to look up the 2-digit Number Code for a particular Taxon. F7 is especially useful for looking up the code numbers of taxa so rarely encountered that they are hard to remember. When you first start using POLCOUNT, it is best to use a key on a sheet of paper. You can make one by putting your printer on line; then run POLCOUNT, touch F7 to display the taxon list, and then press the computer's key. F8 You can change your Taxon List file as you work although you should use this feature only on rare occasions. You should think carefully about the taxon list before you start a project and use a word processor to edit TaxList0.TAX that comes with POLCOUNT. If you save your file with the default name TaxList1.TAX, it can be loaded very easily when you start POLCOUNT. For many projects, you will not need all 100 counters, and the higher numbered ones can be left as dummy 'Taxon nn'. When you begin to find taxa that you had neglected to put on the taxon list, press F7 to find an unused counter; touch any key to return to the count screen; then touch F8 to assign it a taxon name. When you are finished, the changes will be immediately saved to your *.TAX file on disk. F9 When you start POLCOUNT, you can recover a disk file from a previous session and go on counting an unfinished slide. When you press F9 you will be shown the files in the directory. Choose the one to load, and continue the counts. F10 SAVE CURRENT FILE. This is the normal way to end a POLCOUNT session: Save the file and quit. When the file is saved you are given the option of continuing the count. Using F10 at intervals is a way of making sure your data are mostly saved in case of a power failure. The first time, you will be asked the name of your site and the sample number (usually the depth in cm). For reasons that will be obvious later, make the file names as short as possible. Level 302.5 from 'Blue Lake' could be saved as: BL302.5 POLCOUNT files for the individual samples can be combined into "Wisconsin" format files. POLFILE allows these to be imported into spreadsheets, or PALYPLOT, or TILIA and TILIAyGRAPH. Two POLCOUNT sample files are shown below. POLCOUNT makes an ASCII text file, fashioning a title line from the site's name and the day's date. The second line is always 102, the number of categories in the file: counters 00 through 100, plus the sample's identifying number, usually its depth in centimeters. You will note that all data lines (except the last) consist of ten values, each separated by a space. The sample's depth in centimeters is the first value on the file's third line; the other numbers in the sequence are the sums in counters 00 through 100. The sum in Counter 100 will equal the total of all the previous categories EXCEPT THE FIRST TWO, which are Depth and the number of Markers. Blue Lake, WI = 01-02-1992 102 .5 250 1.5 0 7 12 1 1 16 5 0 33.5 13 172 5 3 5 10 5 1 1 1 1 2 0 0 2 0 0 0 0 0 0 0 0 24 11 5 153 4 13 0 1 1 4 2 0 1 2 3 0 0 0 0 0 7 5 2 21 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 595 Blue Lake, WI = 01-11-1992 102 535 315 30.5 1 117 0 5 23 123 8 44.5 153.5 44 59 10 7 11 4 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 6 13 5 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 5 1 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 729.5 The following is the partial list of taxa in TA_ XLIST1.TAX. The file has a title line (here, BlueLake Taxa), followed by the number 101, which is the number of categories to follow. BlueLake Taxa 101 Marker Picea Larix Fraxinus ... Taxon 98 Taxon 99 Sum(01-99) [*p.26 / p.27*] A "Wisconsin" format file has a site title for the first line. The second line records the number of taxon categories in the file, and the third line records the number of samples in the file. The sample counts then follow in sequence from top to bottom in the core, each starting with an identifier such as the sample's depth in centimeters. The 102 taxon categories are found at the end of the file, each on a separate line. Use a word processor to make a "Wisconsin" format file. Load each POLCOUNT sample in sequence into a single large file and append the TAXLIST1.TAX file at the end. Use the word processor to delete the first two lines of each sample (the title and the number 102). Put a main title at the top, followed by the number 102 on the second line and the total number of separate samples on the third line. From the appended TAXLIST1.TAX list, delete the first name (the title) and the second line (101). Insert the word Depth(cm) on a separate line just after the counts and just before "Markers." The following is an example of a "Wisconsin" format file made from the example material. Blue Lake, Wisconsin 102 2 .5 250 1.5 0 7 12 1 1 16 5 0 33.5 13 172 5 3 5 10 5 1 1 1 1 2 0 0 2 0 0 0 0 0 0 0 0 24 11 5 153 4 13 0 1 1 4 2 0 1 2 3 0 0 0 0 0 7 5 2 21 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 595 535 315 30.5 1 117 0 5 23 123 8 44.5 153.5 44 59 10 7 11 4 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 6 13 5 3 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 5 1 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 729.5 Depth(cm) Marker Picea Larix Fraxinus ... Taxon 98 Taxon 99 Sum(01-99) When the "Wisconsin" format file has been created with your word processor, save it as an ASCII text file with the extension .RAW, such as BLUELAKE_ RAW. This file will undoubtedly contain some taxa that never received a count in any of the samples. You can remove the taxa with zero counts from the file by using the program REMZEROS. Run REM_ ZEROS and load "BLUELAKE.RAW." You can save the file without the unused taxa either with the same name or another of your choice. But it is good practice not to use the same name; keep the original file intact in case you wish to count additional levels later. The new levels can be put in their correct positions with a word processor and REMZEROS run again to remove any taxa still with zero counts. If you wish free copies of POLCOUNT and REM_ ZEROS, I would be pleased to make them available both as compiled programs and as QuickBASIC source code. The latter will work with QBASIC that comes with DOS 5.0, and will allow you to modify the programs to fit your own needs and output file structures. These programs make no use of graphics and should run on any IBM compatible PC. The pollen counter package which contains the programs, sample files, and instructions for transferring the final counts into TILIA, is available in a self-unzipping format as POLCNTPK.EXE. This is also available for Anonymous FTP in directory /pub/in_ qua of geology.wisc.edu (see the article on the Internet on p. 17 of this issue). Bennett, K. D. 1990a. Pollen counting on a pocket computer. INQUA - Commission for the Study of the Holocene, Working Group on Data- Handling Methods Newsletter 3:5. Bennett, K. D. 1990b. Pollen counting on a pocket computer. New Phytologist 114:275-280. Zippi, P. A. 1992. Scientific software for Apply Macintosh. INQUA - Commis- sion for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 7:10-13. FIRST AID FOR TILIA AND PALYPLOT USERS by Dr. Triage I have a lot of questions to answer but the Newsletter Coordinator is cutting my column drastically because of this issue's length. Perhaps they can be fitted in next time. [*p.27 / p.28*] TILIA and TILIAygraph Dr. Triage: We are an eager group of novice Tiliaphiles, with a problem. Every time we make a graph, the x-axis (showing % abundance) appears both at the top and the bottom of each profile. We sure like it at the bottom, but how do we get rid of it from the top??? We saw the TILIA graph in the Jan. 92 newsletter, and were jealous. We also admired the lithologic stratigraphy feature, and want to know how to acquire it, since our version doesn't have it. Please help us! Trembling. Dear Trembling: Try the following sequence of menus: [G] Axes | [A] X-axes | [A} Modify all graphs | [A] Tic Marks | [C] Length; type '0' to set the TOP tick marks to 0. This makes the top column margin a solid line which looks good extending under the taxon names. I like the tick marks at the top because you can lay a ruler along them to estimate quantities better. Incidentally, there is a way to edit a diagram made with TILIA if you have access to AutoCADD or Generic CADD 6.0. TILIAyGRAPH can send its output to a DXF file (Drawing Interchange File) in the current directory with the default name CGI.DXF. This *.DXF file can be interpreted by the CADD programs, edited, scaled, and printed or saved as CADD files. (To use this option, you must go to TILIA's subdirectory "Drivers" and load the CGI.CFG file into a text editor. Insert the line PRINTER=DXFTEXT.SYS and save it as an ASCII file of the same name.) Regarding the lithologic column, you probably have the same version as I do. When you select [K] Lithology, you get a box that says: "Note: The lithology option is incomplete. It now plots only a blank column. A module for interactively designing a lithology column is planned. GKS and Troel-Smith lithology symbols will be available." You can turn the lithology column ON, and it puts in a blank column which lets you draw in the sediment by hand or paste on a sediment pattern from one you make with a "draw" program like FREELANCE, etc. The diagram you liked in the January issue was not a TILIA graph, but one Keith Bennett did in PostScript with his PSimpoll. I think it does a neat job, and I prevailed on the Coordinator to ask Keith to discuss it (see p. 11, this issue.) D.T. PALYPLOT Dr. Triage: I am cracking up! I just got an up-grade to Generic Cadd 6.0 from Autodesk, which has a lot of fine new features and fonts. But when I try to load PALYPLOT's *.PPD files as a "batch file," CADD6 starts, but then just stops, and I have to reboot my computer. What am I doing wrong? The Tasmanian Devil. Dear Devil: The problem you encountered is caused by certain "improvements" made in CADD6. CADD5 set text size with a command starting TZ,T, whereas CADD6 has changed the command to TS,Z, to conform better to its other text commands. You can solve this by loading PALYPLOT's *.PPD file into an ASCII text editor to Search (TZ,T,) and Replace (TS,Z,) in all the problem lines. But a much quicker solution is to load in the following BASIC program. This works with QuickBasic or QBasic that comes with DOS 5.0. It makes a backup file ( *_ .BAK) of the original, and produces a new *.PPD file that will work in Cadd6. I believe this will solve your problems using CADD6 with PALYPLOT files. But if you find others, you should be able to modify PPD2CAD6 to fix those too. Just let me know what you did for the record. D.T. 'ppd2cad6.bas 8 April 1992 CLS LOCATE 2, 5 PRINT "PPD2CAD6 converts PALYPLOT *.PPD file for CADD6" SHELL "dir *.PPD" PRINT : PRINT INPUT "Load which file? "; File1$ Place = INSTR(File1$, ".") ReadFile$ = LEFT$(File1$, Place) + "BAK" Task$ = "REN " + File1$ + " " + ReadFile$ SHELL Task$ 'rename the file *.BAK OPEN ReadFile$ FOR INPUT AS #1 OPEN File1$ FOR OUTPUT AS #2 DO WHILE NOT EOF(1) LINE INPUT #1, A$ IF LEFT$(A$, 5) = "TZ,T," THEN A$ = "TS,Z," + MID$(A$, 6) PRINT A$ END IF PRINT #2, A$ LOOP CLOSE #1: CLOSE #2 END EMAIL ADDRESSES (I use the term in the general sense; it includes those on Internet, NSFNet, Janet, Bitnet, etc. A line ending with a trailing low dash "_" indicates the address continues without a space on the line below.) [*p.28 / p.29*] [The e-mail addresses on p. 29-30 are not reproduced here.]