There are many ways of exchanging data. Tables of numbers on paper are easily handed or mailed to a colleague and kept for a lifetime in a file folder. For many purposes paper records are still the easiest and most permanent means of storing and sharing data. FAX has recently become popular by moving information at near-light speed and letting the recipient supply the paper. But this Newsletter's readers are especially aware that there are not many "degrees of freedom" in a table of numbers on paper.
A practical data file should record information about a topic, a site, or sites, so that the information is secure, but at the same time readily available for use. To be readily usable, the information should be in the form of a digital file using standard and well-proven formats and media. At present the files are normally stored on magnetic disks and archived on either magnetic or optical disks and tape. (Paper and microfilm are still useful for archival purposes because technologies change. Have you ever tried to find a machine to play back a spool of magnetic WIRE RECORDINGS from the 1950's?)
Today, microcomputers and floppy magnetic disks give Quaternary scientists incredibly effective ways of sharing information. The developing European and North American pollen databases (see Newsletter 4) come to mind as examples involving the handling and storage of large masses of data.
It is not my purpose here to deal with the management of large databases. Rather I would make some observations on means by which an individual can share pollen data files with others. The files should have a format that is accessible internationally and sharable to the widest number of potential users. On the basis of the number of installed units and reasonable cost, the IBM PC (and its many clones) would seem the standard to adopt for data exchanged by magnetic disk. Loyalists of other types of computer (Apple, etc.) likely are able to translate between the IBM protocol and their own.
There are many word-processor and database programs to choose between, and we often develop strong feelings about which is best. There is no reason to demand conformity among individuals in how the data are handled in their own facilities. However some file formats offer more generality for successful transfer than others. Although there are specialized programs for converting one proprietary file format to another, practical considerations suggest that data in standard ASCII text format are the easiest to handle on an international basis. All major data-manipulating programs have an option for importing or exporting ASCII text. Data in ASCII text format also can be viewed easily, and the receiver gets immediate assurance that the file survived the trip. ASCII text generally can be converted to another individually-preferred style with a simple conversion program written with the BASIC interpreter supplied with most PC's.
One may hear that ASCII text is "old technology," and that there are better techniques for efficient storage--like the waste of using zeros for unrecorded taxa, or wasting a whole byte to indicate a decimal "1", say, when the same byte could record 256 separate integers. Efficient utilization of space is undoubtedly important when the volume of data is truly huge and when storage media are being newly developed and especially expensive. But it becomes trivial when one considers that all the pollen data in a good-sized country can be kept on a few floppy disks.
I once did a study on how the size of a file depends on the storage format; I give a summary here because I found it interesting. I used an array of pollen data from my Devils Lake Site. It consisted of 80 taxon categories over 134 stratigraphic samples; that makes up 10,720 items of data, though about a third were zeros. To that value we must add the two array dimensions, an alpha-string title, and 80 alpha strings for the taxon names. I stored these data in seven different formats to determine the file size required by each format; the results are shown in Fig. 2.

The third file structure is what I call a "Concentrated Wisconsin" format. Here the array rows are broken with a carriage return after every ten numbers, and the numbers are each separated by a single space; this file format is easier to interpret when viewed on the screen--perhaps because of our ten fingers. The "Regular Wisconsin" format has two spaces between numbers; it is easier to read, but a bit more bulky. When the data are stored in Eric Grimm's TILIA file structure, it uses 48,100 bytes, partly because Grimm builds in space for additional information about the pollen taxa. Stored in Borland's PARADOX Database results in three files which total over 100,000 bytes. And if the data are saved as a Borland QUATTRO PRO spreadsheet, it is 126,500 bytes long. The various files can differ markedly in size; powerful programs pay a certain size penalty in the overhead it takes to provide that power.
File size is probably less important than it was but a few years ago; disk storage is relatively cheap. In addition, there are some very excellent, inexpensive compression utilities (i.e. PKZIP: PKWare, Inc., 7545 North Port Washington Road, Glendale, WI 53217 USA. Version 1.1 is a shareware program available from many electronic bulletin boards.) that can drastically shrink the size of stored files, as is shown in the lower part of Fig. 2.
For the immediate future, data files can be exchanged in the mail or in person using the standard 5¼- and 3½-inch floppy magnetic disks. For in- ternational distribution the low-density (360 Kb 5¼ inch and 720 Kb 3½ inch) disks are compatible with the widest range of equipment. The higher- density (1.2 Mb 5¼- inch and 1.4 Mb 3½-inch) disks transport data more efficiently and can be used when both sender and receiver agree to their use. Computers equipped with inexpensive modems can exchange limited data sets by the commercial telephone network. When the files are large, disks sent by mail are much more cost effective. Quaternary workers would be well advised to make a concerted effort to use the various governmental e-mail networks (Bitnet, NSFnet, UseNet, etc.) for both the international exchange of data and ideas. The system is essentially free for many academic and governmental users. It is incredibly fast, and the file comes in digital form that can be recorded on disk when the e-mail is read. Thus the receiver can store the data directly on his/her own equipment, and it is in a form than can be manipulated at will. Contrast this with a page of figures transmitted by FAX; the data are very difficult to process further.
The "folded" structure (ten items per row followed by a carriage return) of my "Wisconsin" .RAW and .DAT files was in part devised for use with e-mail. They always fit into a normal-width screen, and a standard text editor can then strip away the address and other extraneous comments from the e-mail, leaving a usable data file. While ASCII files wider than 80 characters will easily travel by e-mail, there is always the possibility that extra-long lines may become fragmented during the process, and the recipient of a "trashed" file may spend hours trying to find why it does not work.
An individual can enjoy many of the computational advantages of a large research center by utilizing readily available commercial programs (word processors and spreadsheets) and tieing them into specialized programs through simple translation utilities. I have just completed v. 1.15 of my POLFILE program. Like the earlier versions, it can read and write my .RAW file format which Grimm's TILIA program recognizes as a "Wisconsin" File. TILIA's binary files would require special handling on e-mail; the .RAW file simply travels as text. As Craig Chumbley mentions in his discussion of PALYPLOT in this newsletter, a TILIA file converted to a .RAW file format can be changed to a .DAT file which is used by PALYPLOT. POLFILE v. 1.15 can convert a .DAT file to one that can be read directly by Warren L. Kovach's Multivariate Statistics Package MVSP Plus (see Newsletter #4). POLFILE also can now change its .DAT file structure so that it too can be read into TILIA as a Wisconsin File.
And the new POLFILE can convert either a .RAW file or a .DAT file so that they can be directly imported into LOTUS 1-2-3 or QUATTRO PRO. Some of the proprietary statistics packages should accept output from these ubiquitous spreadsheets. (QUATTRO PRO files can be read directly by the Borland PARADOX Database program which I understand will be used in both the European and the North American Pollen Databases that were mentioned in Newsletter #4.) POLFILE v. 1.15 is free for the asking. It is a useful bridge between some very powerful programs, and its ASCII format allows you to exchange your data by disk, or by e-mail.