Computers are forever challenging us with new ways of doing science. Now that computers are a familiar sight in the laboratory the next challenge is to adapt to thousands of computers all joined together. Even by conservative estimates, thousands of institutions and perhaps millions of researchers are now served by Internet (Krol, 1992), a vast communications web that links together computers all around the world.
The services and information available on Internet are astounding. Access to world-wide electronic mail and electronic newsgroups covering hundreds of topics are just the beginning. Being connected to Internet means having the resources of literally thousands of computers at your fingertips. Recognizing the advantages of free information exchange, many computer sites now allow guest logins by users over the network. What is more they make available various data, software and services that can be freely copied or used. The following examples can only hint at the incredible range of information already available:
* On-line access to telephone directories, bibliographies and library catalogs in many parts of the world.
* Free software - many sites maintain libraries of public domain software. The Free Software Foundation at MIT develops and distributes high-quality, free software under its GNU Project.
* Molecular biology databases, software and bibliographies - the Australian National Genomic Information Service (ANGIS) at the University of Sydney maintains up-to-date copies of the major databases.
* Satellite and weather data - the University of New Mexico alone makes available 90 gigabytes worth!
* Geographic data - electronic atlases, census data and summaries such as the CIA World Databank and Factbook (maps, facts and figures about every country in the world).
* Electronic texts - Project Gutenberg, a public domain project, produces electronic versions of English language texts, ranging from Roget's Thesaurus and the Complete Works of Shakespeare to the CIA World Factbook and US Census.
For several years now the basic means of accessing files across the network has been FTP ("File Transfer Protocol"). Network archives use the "anonymous ftp" protocol. For example
====================================================
ftp life.anu.edu.au (logging in to the site LIFE at)
(the Australian National University)
Connected to life.anu.edu.au.
220 life FTP server (SunOS 4.1) ready.
Name (life.anu.edu.au:david): anonymous
331 Guest login ok, send ident as password.
Password: (give your electronic mail address)
230 Guest login ok, access restrictions apply.
ftp> ls (gives you a directory listing)
ftp> cd /pub/biomathematics
(changes directory to /pub/biomathematics)
ftp> bin (changes mode to binary)
ftp> get polsta.zip (retrieves the file polsta.zip)
===================================================
The number of network archives has grown rapidly, so that finding
information, or even knowing what is available, among the thousands of
sites is extremely time-consuming. ARCHIE resolves this problem by
providing a database of the contents of all known sites. These databases
are provided at several major sites, such as archie.au (Australia),
archie.funet.fi (Finland) and archie.mcgill.ca (Canada). They can be queried
either by logging in directly via Telnet (using the name "archie"), by
electronic mail (e.g. to archie@archie.au) with the message consisting of
keywords (e.g. "help"). For example
====================================================
telnet archie.au (connecting to the local archie
server)
Trying 139.130.4.6 ...
Connected to archie.au.
Escape character is '^]'.
SunOS UNIX (plaza.aarnet.EDU.AU)
login: archie (log in name is "archie", no password)
YOU ARE RUNNING ON ARCHIE.AU (sometimes known as
plaza.aarnet.EDU.AU)
If you have any problems with archie,
send mail to ccw@archie.au
This machine is a brand spanking new SparcStation 2 purchased by AARNet
funds to further serve the AARNet community. The machine lives directly on
the AARNet backbone so should provide excellentconnectivity to all points
of AARNet.
archie> help (asking for help)
Help gives you information about various topics, including all the commands
that are available and how to use them. ... etc.
archie> quit (finishing a session)
====================================================
User-friendly interfaces, such as XARCHIE (Fig. 1), now make it possible to
locate and retrieve files at the touch of a button.


Network publications (e.g. electronic journals) need not be limited to the text and figures of traditional paper publications. Other material can include bibliographies, databases, and software. For instance, the fastest and simplest way to distribute software is to make it freely available on Internet.
At present the main drawback to electronic publication on Internet is lack of formal recognition. However, librarians, publishers and site managers are now working on such issues as registering electronic publications and establishing repositories for electronic publications.
Coordinating research. Perhaps the most profound effect of Internet on science has been to usher in an era of cooperative science on a scale never seen before. In some areas of research, notably molecular biology, distribution of information over Internet has grown explosively. The most visible result is the appearance of international, public-domain databases such as Genbank and EMBL. As these databases become ever more enormous, working scientists are coming to rely on them as sources of reference. In molecular biology it is already standard practice to compare newly derived sequences against existing ones in the major databanks. Many journals (e.g. Nature) now demand that results be submitted to one of these network databases as a precondition for publication.
Contributing to network databases makes both economic and scientific sense. We cannot afford the luxury of carrying out research in piecemeal fashion. Given limited resources, it is essential to make maximum use of every piece of available data. Data that is used only once is like a disposable soft drink bottle - good things come out of it, but thereafter it is junk. The archives of the world's institutions are full of this refuse from uncoordinated research. Ideally, the results of every piece of research should not only answer an immediate question, but also contribute data to a larger scientific jigsaw. Many topical issues, such as biodiversity, are crying out for cooperative databases to support both research and decision-making.
Cooperative databases convey many advantages. Previously difficult studies become easy. Completely new kinds of study become possible and there is a significant serendipity effect that emerges as data are combined in new ways. For instance, comparative studies of molecular databases have already yielded new insights about gene families and the mechanisms of evolution. There is every reason to expect that databases in other fields of biology will prove equally as fruitful.
Potential network projects in Quaternary science. There are several possible kinds of public domain databases that could be set up on the network to serve Quaternary studies. They include
- compilations of useful software;
- electronic databases of scientists working in the field;
- pollen identification keys (including images);
- annotated bibliographies of relevant publications;
- abstracts of recent publications;
- compilations of data (e.g. complete pollen site records).
Each of the above kinds of database would contribute materially to Quaternary existing research projects.
Public domain databases usually conform to IAFA standards (Internet Anonymous FTP Archive). They are normally characterized by the following features:
COORDINATION - There is a controlling agency or organization that manages the database, receives and processes new entries, and communicates relevant news to its users.
PARTICIPATION - Anyone may contribute data to the database. Major databases announce new entries via special newsgroup or mailing lists.
ACCESS - Anyone may access, copy or use the database at any time. Normally access is via a computing network using a standard protocol.
STANDARDS - Contributors must use standard fields and attributes in submissions (e.g. Croft, 1989). This standard must be well-defined and should be publicized as widely as possible (see below). Usually it is expressed as a submission form (electronic, printed, or both) that is filled in by contributors.
FORMAT - Textual data (including bibliographies, mailing lists etc) are normally submitted and stored as ascii files in tagged field format (see Appendix). The database may be compressed, using standard utilities, to simplify network transfer. Images should be in one of the common formats in use, such as GIF (Graphic Interchange Format).
QUALITY CONTROL - Users need some guarantee that data provided in a database are both valid and accurate (Green, 1991, 1992). Quality control checks can be applied by database contributors, coordinators, or users - preferably all three.
ACKNOWLEDGEMENT - Every entry should include an acknowledgement of its contributor. This is essential to the notion that contributions are a form of publications.
AGREEMENTS - there should be an explicit list of terms and conditions that contributors and users must agree to. Notably, users agree to acknowledge the project and to waive liability for any use they make of the data. Contributors agree to place their data in the public domain.
LIFE at the Australian National University. The Australian National University Bioinformatics Facility provides a wide range of biological information and software through its Internet anonymous FTP archive:
site life.anu.edu.au login anonymous password (your email address) directory /pub and its subdirectoriesCurrent topics include biodiversity, bioinformation, complex systems, landscape ecology, molecular biology, and neurophysiology. For instance, we are developing prototype network information systems and protocols for the International Organization for Plant Information (IOPI), which aims to document the distributions of the world's plants.
Freely available to all pollen analysts, for instance, is the program POLSTA (/pub/biomathematics/polsta.zip), described in previous issues of this newsletter, which is an interactive PC package that provides tools for analysis and modelling of pollen time series.
References.
Deutsch, P. (1992). Publishing Information on the Internet with Anonymous FTP. IAFA DOC II.
Green, D.G. (in prep.). Databasing diversity - a distributed, public-domain approach. In preparation.
Krol, E. (1992). The Whole Internet. O'Reilly and Associates.
APPENDIX
APPENDIX
% ---------- < START : Cut here > ----------
#####
% Part 1 CONTACT REGISTRATION
%
% Please complete this registration form about
% the source of this dataset.
% This information is needed for the following
% reasons:
% - identifying who contributed the dataset
% - identifying who produced and/or main-
% tains this dataset
% - telling users whom to contact regarding
% this dataset
% - linking together information from the
% same source
%
SOURCE Name of person or organization who
produced the dataset
CONTACT Name of person or organization to
contact about the dataset
EMAIL Electronic mail address for queries
about the dataset
ADDRESS Postal address for correspondence
about this dataset
PHONE International telephone number
FAX International fax number
% Part 2 DATASET REGISTRATION
%
% Please complete this registration form about
% the data set. This information is crucial
% for the following
% - Demonstrating the validity of the data
% - Defining the methodology & data lineage
% for future users
% - Identifying this study for all records
TITLE Give a short descriptive name for the
database.
DATE When was the dataset last revised?
PURPOSE Why were the data collected? and how?
SOURCES For compilations, indicate the orig-
inal datasets
COMPILER Who is responsible for compiling/
upkeeping the data?
STANDARD What standard format (if any) does
the dataset conforms to?
(e.g. Genbank entry)
PROGRAMS Name any special software used to
read/manipulate the dataset.
REFERENCES Give details of relevant publica-
tions (e.g. methods, uses)
AUTHOR Name(s) of the author(s)
TITLE Name of the book or article
PUBLICATION Details of book, journal,
publisher, volume & pages.
VALIDATION What checks were applied to
ensure that data are correct?
ASSOCIATIONS Name any other related data
sets. (e.g. t001.dat etc)
COMMENTS Mention any important issues
not covered above.
% Part 3 Methodology - repeat as many times as
% necessary
TAXON Start description of methods
CODE Taxon code to use in the data records
FAMILY
GENUS
SPECIES
% Part 4 DATA RECORDS - repeat as many times
% as necessary
RECORD Start of new record
DATE
TAXA List of taxon codes
SITES Landscape units for this record
#####
---------- < STOP : Cut here > ----------