We have received 15 responses to the data questionnaire printed in Newsletter 7 (January 1992). That is just 20% of the number of email addresses listed in the back of the Newsletter, so you should consider this a preliminary report. IF YOU HAVE NOT RETURNED THE QUESTIONNAIRE, PLEASE DO SO NOW.
These preliminary results include some surprises for me, and some lessons regarding questionnaires. Under the "lessons" category, I have learned to expect the unexpected such as how many different ways could one question be interpreted? The first question was intended to determine if anyone was NOT storing their data on hard disks, floppy disks, or tape. However, EVERYONE answered the question, and the results may indicate that 2/3 of the responders enter their counts directly into the computer, whereas 1/3 mark the counts on a piece of paper, and then enter the final counts on the computer (see Table). The 2/3 who count on the computer (pun intended) store a printout of the data as a hard copy.
The implication is that a Newsletter article on microfossil counting programs would be welcome -- who uses them and how many different programs are there (hint, hint editor). For example, John Kingston mentions that one of his students is using a BASIC program written by Paul B. Hamilton (Hydrobiologica, 1990, 194:23-30).
The modal responder uses an upper-end (80386) IBM-Compatible computer with a math co-processor, and has more than one computer. Surprisingly (to me) only one responder has a Macintosh. Those who use mainframes indicated they primarily use them for email. Another surprise is that 1/2 the responders cannot read 5 1/4 Double-Density diskettes! I had though 360 K diskettes were the universal medium of the computer industry, but more use 3 1/2 Hight Density diskettes than any other kind (see Table).
Another interesting result - but no surprise - is that most of the responders (3/4) store data in ASCII format rather than in binary. Both hard and floppy disks have become so cheap that the convenience of being able to easily view and edit the file outweighs the size savings of the binary format. Some of us probably use data-compression programs like PKARC for archiving and backup.
The diversity of ASCII formats also is surprising. The formats I consider "standards" are rarely used; and there are more "attribute" (levels ordered by type) formats than I knew of. Overall, most responders are storing data in an ASCII or binary format written by commercially-available software such as LOTUS or PARADOX. Otherwise, the "condensed" format (code, count) leads by a slim margin over "matrix" and "attribute." Based on responder comments, this is due to its general applicability to numerical analysis software, rather than its size savings.
Still under the heading of the "condensed" format, both John Birks and Steve Juggins pointed out that I had confused CAMBRIDGE with CORNELL in both the article (Newsletter 7, p.9) and questionnaire (the file formats, not the schools). I have excerpted some of HJBB's comments at the end of this article.
The preference for commercial software is even more evident when it comes to manipulation, plotting, and numerical analyses. For manipulation and plotting, Eric Grimm's TILIA clearly is the favorite. Most of the responders have it and use it. Several responders commented that they had adopted the European - North American format of PARADOX (Newsletter 7, p.1). Although TILIA·GRAPH is the favorite, the diversity of plotting programs is remarkable; eleven of the responders use programs not mentioned by the others.
Remember that this is the Newsletter of the DATAHANDLING COMMITTEE? Based on the breadth techniques in the numerical methods portion of the questionnaire (see Table), the responders might better be categorized as DATA CRUNCHERS. Alternatively, these "power users" may be more likely to return questionnaires. Nearly every responder uses several different techniques. Although some programs are widely used - for example, 40% use CONISS and CONOCO - half of the programs listed are used by only one responder.
My primary goal in writing the questionnaire was to determine the most common format in use by members of the Data-Handling Committee. As I suspected, the answer is that "it is human nature to do things differently." Rather than suggest a common format, I suggest that members be prepared to translate among the various formats, and that software writers make the data-entry portions of the programs as flexible as possible.
One possibility for standardization that I foresee is the emerging European - North American database format in PARADOX. I suggest to the members of these committees (Newsletter 7, p.1) that it is to their advantage to make that format available to the palynological community as soon as possible.
The Cornell condensed format first came into existence with Mark Hill's TWINSPAN and DECORANA programs in 1979. The Cornell Ecology Programs, through Hugh Gauch, used this format widely (e.g. DECORANA, TWINSPAN, GRADBETA, COMPCLUS, DATAEDIT) and they distributed a program called CONDENSE to help prepare condensed files. Since then Cajo ter Braak in Wageningen has developed CANOCO that also uses Cornell condensed format, as do several other recent multivariate programs from The Netherlands (e.g.FLEXCLUS, DISCRIM, MILTRANS). Because we use these programs so much with the same data sets as in our own programs, John Line and I use Cornell condensed format for all our recent programs like WACALIB, ANALOG, RATEPOL, RSURF, etc. Onno van Tongeren in The Netherlands has developed his CEDIT program to edit, manipulate, append, merge, convert, create, transform, summarize, etc. condensed files so one can easily change them.
There is thus no Cambridge condensed format but the Cornell condensed format. I hope Cornell will not be upset by being called Cambridge!