INQUA Working Group on Data-Handling Methods

Newsletter 8: July 1992

THE DATA-HANDLING INTERNET

Steve Juggins, London, UK
Warren Kovach, Aberystwyth, UK
Louis Maher, Madison, WI USA

Part 1. Louis Maher

Several years ago the department's computer person told me about File Transfer Protocol (FTP) on the Internet. He put me in front of a terminal and proceeded to show me how to logon strange computer systems scattered around the whole country. These places had unintelligible names, and they let one logon by using the name anonymous. Although they requested a password, you could type in anything (the proper response is your email address), and you would be let in. I recall looking at huge stores of public domain (often government purchased) software for all kinds of computers, and there were treasure-troves of free data that could be copied to our computer in a few seconds. I could not help imagining we were disembodied spirits moving about in the musty cellars of old Victorian buildings where one locked trunk after another could be thrown open to view.

I came away from the experience with three distinct impressions: 1) I was dead tired, 2) I recognized that people who could do this easily had a marked advantage over those who could not, and 3) I had an inkling of how much disk space and time I would need even to begin sampling the treasure. There was so much stuff that simply browsing through titles could kill a career. I felt old. I still do!

After Newsletter 7 was mailed, Steve Juggins suggested that we do a column on "Anonymous FTP," discuss its benefits, and even look about for a "basement room" where we could establish a program boutique that might serve the needs of the Newsletter's readers. It seemed like a good idea, and I asked Steve Juggins and Warren Kovach to contribute to a joint article. I planned to meld our material into a primer that would summarize the technique. And there is the rub. Just as Owen Davis (p. 14, this issue) found that we all use different computer programs and file-storage techniques, almost none of us uses exactly the same communication programs or the same routes to the electronic networks, and we all vary in our experience and our needs. Therefore I decided to keep the articles separate and suggest that you scan them all to pick up things useful to you.

The Internet generally links mainframe computer installations which are maintained by universities, businesses, and governments, and these organizations invest a lot of time and money maintaining the services. We individuals tie into the network by linking our terminals and microcomputers to the local host computer. That may be by a very fast communications link (Ethernet), or by a telephone modem from a home computer.

[Note: When referring to computer path names and commands, the unbroken strings often become too long to fit in the column structure of the newsletter. When a string reaches the right margin and ends with a "trailing dash_ ", it means the unbroken string continues on the line below.] [Note: All such dashes have been removed for the WWW version - KDB]

Ben Abernathy, our computer person, has allowed me to set up a directory in our department computer geology.wisc.edu that anyone with access to Internet can reach by Anonymous FTP. In the directory /pub/inqua, I have put self-extracting zipped (compressed) files of my computer programs. These include SLOTSEE, POLFILE, PLOTSITE, DEP-AGE, and others, as well as example data files. The sets are designed for different IBM graphics screens and are named accordingly: POL-EGA.EXE, POL-VGA.EXE, POL-HERC.EXE, etc. From this directory you can get the pollen counter program discussed on p. 24 in this issue (POLCNTPK.EXE), and you can also get a zipped copy of PAL, the Polish Database described by Ralska-Jasiewiczowa and Walanus (1991). Warren Kovach (1990) put his MVSP statistical package there, as did Keith Bennett his PSimpoll program, which he describes on p. 11 of this issue. Eric Grimm announces his Binary-to-Ascii utility on p. 9 of this issue, and I have taken the liberty of putting a copy of BTA.EXE there too. I hope that some of you will practice Anonymous FTP in order to get copies of these programs and that you will put some of your own there as well. We can make this an archive site that specializes in free programs useful to the Holocene Data-Handling Working Group. All those bold-face puts and gets are reminders that Anonymous FTP uses these commands to move data. Your perspective of the world is from the computer where you originate the FTP command; you get their data from the foreign computer, and you put your data there.

Steve Juggins will discuss networks, introduce some of the terminology, and describe how he uses anonymous FTP to contact geology.wisc.edu from England. Then Warren Kovach will provide additional details about the network from the European perspective. I will comment on Telnet from a North American perspective, and then I will mention a very useful 100-page book of useful hints about the Internet and suggest several ways you can get a free copy.

Part 2. Steve Juggins

A network is a configuration of computers that exchange information, such as a local area network (LAN) or a wide area network (WAN). Computers in a network may come from a variety of manufacturers, and have major differences in their hardware and software. To enable different types of computers to communicate, a set of formal rules for interaction, or protocols, is needed. The most widely used of these is the Transmission Control Protocol/Internet Protocol Suite, commonly known as TCP/IP. TCP/IP was developed by the Defense Advanced Research Projects Agency (DARPA) for its own wide area network ARPANET. The term "Internet" is commonly used to refer to both the protocol suite and the larger DARPA network, which connects many individual TCP/IP campus, state, regional, and national networks into one single logical network. There is no one individual network known as The Internet. Rather it is a network of networks in which communication takes place at blazing speeds. Like a living thing, it grows, interconnects and evolves. Parts die off as well; the old ARPANET no longer exists as a singular entity.

The Internet Protocol Suite specifies a set of services for, among other things, Simple Mail Transfer Protocol (SMTP) for email, remote terminal emulation (Telnet), and File Transfer Protocol (FTP). Here we concentrate on file transfer using the FTP command. The procedure is often run of computers using unix which is case-sensitive and employs lower case for most commands; we will too. The ftp command implements the File Transfer Protocol, and permits the copying of files to and from remote machines running different operating systems.

To use ftp, you first open a connection with the remote machine by giving the command:

ftp address

where address is the Internet (IP) ss of the remote machine. The Internet address is a series of four 8-bit fields separated by periods, and should be unique for every machine on Internet. For example, the University College London Department of Geography's VAX has the address 128.40.32.128. Instead of using the numeric IP address, machines may also be referred to by their host-name. For example, the Vax computer referred to above uses vax.geog.ucl.ac.uk for its host-name. On the larger systems special commands can cause a name-server to look up the Internet address for a given host-name. But this will not work for my PC which is connected directly to the Internet, and an entry for the host name will have to exist in the hosts file used by the networking software on the PC. If you know the host-name of a machine you want to connect to, but not the numeric IP address, and wish to connect to it from your PC, then you may use the host command found on some unix machines which will return the IP address for the specified host-name, which can then be used from the PC. [Ed. Some systems use the command whois host-name or gethost host-name for this purpose; you may have to consult your resident guru to see what you should use.]

So to connect to our departmental VAX to transfer files I would use the command

ftp 128.40.32.128

(or ftp vax - because I have the IP address for the host vax in my local hosts file).

Once connected ftp will ask for your username and password on the remote machine. The commands you will need are a mixture of unix and DOS; some of the common terms are shown in Figure 1. Once logged on, you may change to the appropriate directory using cd. Typing help gives a list of ftp commands like those shown in Figure 2. The command ls will display a directory listing, ls -l (ls with the long-form switch), gives file sizes and dates etc. Ftp transfers files in two modes, binary or ascii. Type binary or ascii to toggle back and forth between them. (The default generally is ascii; when you specify one or the other, it will remain in effect for the session unless you change it again. You must use binary to transfer an exact image of a compiled program; if in doubt, specify binary.) The command get copies a file from the remote to your local machine. For example, get datafile copies the file datafile from the remote to your local machine, giving it the same name on the local machine. get datafile newfile copies the same file, renaming it to newfile on your local machine. The command put datafile copies a file from the local to the remote machine, and is used in the same way. The commands mget and mput copy multiple files and allow wildcard symbols.

To use ftp in this way you need to be able to logon the remote machine. If it is someone else's they would have to tell you their userid (logon name) and password. But a much more secure and commonly used way of sharing files is by means of anonymous ftp. With anonymous ftp, the administrator for a particular remote machine will have configured the system to accept a generic word as the logon name for ftp with any string as a valid password. Usually the system will ask users to supply anonymous or ident as the userid, and their email address as the password. An enormous amount of public domain software is available on Internet and distributed in this way. Lou's pollen programs are also available via anonymous ftp. This is how I got them:

  1. Made a connection to Lou's computer: ftp geology.wisc.edu
    (or ftp 128.104.139.14 if from my PC)
  2. Gave anonymous as the userid, and my email address as password.
  3. The appropriate files are in the directory /pub/inqua, so change to this directory with the command cd pub/inqua.
  4. Because the files are executable, switch the transfer mode to binary using the command binary.
  5. Transfer the files using get: get bta.exe, get pol-ega.exe, etc.
  6. The command quit ends the ftp session.
A large amount of public domain software (PC, Mac, UNIX etc) can be found on Simtel (wsmr-simtel20.army.mil - 192.88.110.20).

Part 3. Warren Kovach

FTP From the UK--Using FTP on computers directly linked to the Internet is relatively simple. However, when this is not the case, there are alternative ways to do anonymous file transfers. These range from using intermediate gateway machines to having the files sent via normal email. To give an example, I will explain the two alternative ways I use to access FTP (working from a PC and a mainframe running unix), and then discuss a couple others.

JANET--The British Joint Academic Network is a separate network from the Internet. Some UK universities also have Internet connections, but most do not. To allow all UK users access to the global networks, several gateways between Janet and other networks (such as Internet and Bitnet) have been set up. The University of London provides a "guestftp" service for Janet users. Because this computer is also connected to the Internet, the normal unix ftp commands can be used to perform anonymous FTP.

GuestFTP--Using the Rainbow terminal and file transfer program (designed for use with Janet by Edinburgh University) on my PC linked to an Ethernet, I simply type call uk.ac.nsf.sun and logon as guestftp with the password guestftp. From here on, I can use FTP as if I was directly connected to the Internet. I type ftp to start a session, "open xxx.xxx.xxx to connect to some distant machine, then logon as anonymous with my username as the password. I can then change directories and transfer files with get and put.

When I transfer a file from the distant computer, it is placed on the guestftp computer. I must then transfer these files to my own computer. The guestftp computer provides a command that transfers the files to my account on our mainframe here in Aberystwyth. From there, I must transfer it from the mainframe to my PC. With the Rainbow program (which only works with an Ethernet connection) I also can give a command on my PC that will retrieve the files directly from the guestftp machine without having to do an intermediate transfer to the Aber mainframe.

FT-RELAY--This whole process can be cumbersome. The guestftp service is also very popular and during working hours it is difficult to get a connection. An alternative system, called FT-RELAY, has been set up that allows easier access. Using this, a single (albeit lengthy) command can be typed on the local mainframe that will submit a FTP request. The desired file will then automatically be transferred to my mainframe filestore.

The FT-RELAY is another gateway computer between the Janet and Internet systems. Requests are submitted to it using the host-to-host-copy command of unix (hhcp). They are placed in a queue, and when a connection can be made to the distant computer the file is transferred. It can sometimes take many tries and several hours before the file is moved, but most often it is sent straight away.

Let us say I want to get the Eric Grimm's program bta.exe from the geology.wisc.edu computer. I logon the mainframe here in Aberystwyth and type the command:

hhcp -L -b uk.ac.ft-relay:"geology.wisc.edu::pubbta.exe" bta.exe

This will send a message to the ft-relay machine instructing it to connect to geology.wisc.edu, get the file bta.exe from the named directory, and place it in a file named bta.exe on our local mainframe. I will be asked for a remote username (anonymous) and password (my e-mail address). Since this is a binary transfer (specified by the -b), I also will be asked the binary word size, which is 8. The -L causes the computer to maintain a log of host-to-host file transfers.

This mouthful of a command can be simplified. For instance, I can set up an alias for the full name of the ft-relay machine by typing

hhalias uk.ac.ft-relay ftb

This alias will be stored for future use, and I can then replace the full name in the hhcp command with ftb. I also can use the hhstore command to record permanently the remote username and password, so that I will not be asked for them each time I use the hhcp command.

If you regularly transfer files from certain unix computers, you can set up a shell script (similar to a batch file on a MS-DOS PC) that contains the repetitive parts of the above command. If I set up one called getwisc for the geology.wisc.edu machine to retrieve any file from the inqua directory, I could simply type

getwisc bta.exe

If a site had multiple directories of interest, you also could set up the script so you can specify the subdirectory.

The FT-RELAY does not allow you to browse on the remote computer as you can with the normal ftp command. However, you can request a directory from the remote computer with the command:

hhcp -L ftb:"geology.wisc.edu::(D)pub/inqua" wisc.dir

The "(D)" specifies that a list of files in the following directory should be sent and placed in the local file called wisc.dir.

Mail Servers--If your mainframe computer does not have a network connection that allows for ftp file transfers, you can still access the wealth of files and data on these anonymous FTP archives. Mail servers have been set up that get files from FTP archives, convert them into ASCII format using a program like uuencode (see Lou Maher's article in issue 6, July 1991, of this newsletter), then sends them to you through normal email channels. You must then use the uudecode program to reconstruct the original file.

People on the Bitnet (Because It's Time Network) can use the mailserver at Princeton. Send a message with the single word help to BITFTP@PUCC to get more information on using it. Non-Bitnet sites can use an alternative service set up by Digital computers at ftpmail@decwrl.dec.com. Help on using this service also can be obtained by sending a message of help.

Some Useful FTP Sites--Although there are hundreds of FTP sites, some have much larger archives than others. If you are looking for public domain and shareware software, SIMTEL20 (wsmr-simtel20.army.mil) is one of the best known and largest. It is also quite busy and difficult to logon. The wuarchive.wustl.edu site "mirrors" Simtel20 (that is, it has all the same files and directories) as well as numerous other ftp sites. It also has an extensive collection of mathematical software. This should probably be your first stop if you are in North America. One of the largest archives of Microsoft Windows software is at cica.cica.indiana.edu.

On the European side, the Simtel20 files are mirrored on src.doc.ic.ac.uk. Two sites in Finland (garbo.uwasa.fi and nic.funet.fi) also have large and useful archives of PC software. Although I have not tried it, sol.deakin.edu.au in Australia supposedly mirrors both Simtel20 and Garbo.

There are also sites that carry more specialized software and data. Besides the new geology.wisc.edu /pub/inqua archive, the COGS (Computer Oriented Geological Society) software archive is available on the csn.org site. A large collection of data and programs related to taxonomy is on the Taxacom FTP site (huh.harvard.edu). The molecular biologist in your life will be interested in the Indiana University Biology archive (ftp.bio.indiana.edu). A large archive of statistics software (mainly as Fortran source) is on lib.stat.cmu.edu (note that the username for logon should be statlib, not anonymous).

Archie--There are thousands of files available out there through anonymous FTP, but finding the one you want can be difficult. McGill University has developed a system that allows you to search a database of FTP archives from all over the world, looking for files with specific names. This system, named Archie, is now installed on several computers around the world. It can be accessed by logging on directly through telnet, using the logon name archie. Typing help after logging on will tell you how to use the system. It also can be used through e-mail. Send a message with the single word help to the username archie at your nearest archie server. If you are in North America, the main archie machine is archie.mcgill.ca. In Britain, it is archie.doc.ic.ac.uk, while on the European Continent it is archie.funet.fi. Antipodean FTPers can use the archie.au in Australia.

Part 4. Louis Maher

Telnet is the main Internet protocol for creating a connection with a remote machine. FTP uses Telnet, and like FTP, the actual command for making a Telnet connection varies with the system. I have a Telnet directory on my PC which contains a batch of programs (including one called telbin and another called ftpbin) for connecting my PC to the department's networked unix minicomputers by Ethernet. I invested in the Ethernet board for my PC because it was cheaper than buying a bigger hard drive. I now keep bulk storage items on a digital tape unit that looks like a subdirectory of ice, one of the networked Sun computers that comprise geology.wisc.edu.

From the PC I can type telnet geology.wisc.edu, logon by name and password, and then read, write, and do all the usual things. From the DOS prompt, I can ftp geology.wisc.edu and logon either as username and password, or as anonymous and email name. Depending on which option I take, I get to different places in the computer. Using my name I get to my own directory; using anonymous I reach the public storage area. But either way, I can only copy files without reading or editing them; it is after all, a File Transfer Protocol. I like my present setup because I can move big files quickly. I can zip up my whole PARADOX directory structure and ftp it to storage on tape in less than a minute.

If I logon the department unix computer from home using PROCOMM and a modem, I can type ftp some.remote.place and logon as anonymous. I can transfer a copy of the distant file to the department's computer in a fraction of a second; if it is ascii text, I can by modem (slowly) page through it with an editor. But if I want to send a copy to my home computer by modem, it may take an hour. When obtaining large files from distant points, it is often best to move the files from the source to a temporary file on your host machine. You can then visit the host, that is, "go to work," and take the file to your computer on a floppy disk rather than tieing up your modem and phone line for several hours on the last and slowest link.

Your setup will undoubtedly differ and you will have to talk to someone in your local computer system about the options open to you.

Our department's computer guru has been giving a book to our new graduate students which you may find rewarding, both in the reading and in the getting. He obtained the book as a PostScript file via the Internet. It is called Zen and the Art of the Internet; A Beginner's Guide to the Internet, by B. P. Kehoe (1992). It is a good read, and its hundred pages of text can give you much more information about the Internet than we can do in a newsletter. The work is copyrighted, but it can be reproduced freely so long as it is kept whole. It contains a lot of information about networking, ftp, telnet, email, and it has appendices, a glossary, addresses of archives, and much more.

Brendan P. Kehoe's address is brendan@cs.widener.edu. I sent a note asking how an international audience might get a copy of Zen. I will reproduce his answer because it can help nearly any reader. Although he mentions a new hard-cover book will be sold in bookstores soon, I suggest you make the effort to get a free copy of the first edition while at the same time practicing on the Internet.

Date: Mon, 1 Jun 1992 10:27:15 -0400
From: Brendan Kehoe <brendan@cs.widener.edu>
Subject: Re: Zen...

Here are the instructions I usually send out.

If you are on BITNET, you will want to send mail to bitftp@pucc with the line
  help
in the body of your message, to get instructions on FTPing it through mail.

If you are not on BITNET, but are limited to email, write to ftpmail@decwrl.dec.com with a similar body (i.e. the word help), to receive instructions.

In case you are interested, in the next few weeks the second edition of Zen will be coming out as a Prentice-Hall book. You may want to consider looking for it in your local bookstores then, since it will be in a nicely bound format and contains approx. 30 pages of new information, as well as hundreds of updates and revisions. College bookstores will hopefully be stocking it, since it will be used in introductory Internet classes, and as a tutorial. If you would like to be notified upon its availability as a book, please contact me.

If you do not have access to FTP or do not feel comfortable with bitftp and ftpmail, write to info-server@nnsc.nsf.net with the body of your message containing

  request: nsfnet
  topic: zen-1.0.PS
  topic: zen.readme

and it will return the selected files.

For FTP, here is what you need to do: first, type
  ftp ftp.cs.widener.edu
and when it gives you the 'Name:' prompt, type
  anonymous

If the name ftp.cs.widener.edu failed, try 147.31.254.132 instead. Then, when you see the prompt Password: just give it your email address; it does not really matter what you type here...using your address is just the tradition. Anyway, you will get in and be left sitting at the prompt. Type
  cd pub/zen
and do dir; you will see the files listed there. If you want the PostScript version, type
  get zen-1.0.PS

If you are on a system that only allows one period in a filename, use
  get zen-1-0.PS
for example. If you want the .DVI (a TeX dvi file) file, type
  binary
  get zen-1.0.dvi
(binary so it does not monkey with anything in the file). The .tar.Z file has the TeX source to the book, as well as the two other files (PS and dvi); to get that, type
  binary
  get zen-1.0.tar.Z
instead. Once the file is finished transferring, type
  quit
to get out. Good luck!

Brendan
---
Brendan Kehoe, Sun Network Manager
brendan@cs.widener.edu
Widener University, Chester, PA
---
I got the PostScript version from Kehoe using:

ftp ftp.cs.widener.edu
anonymous
maher@geology.wisc.edu

[An opening note said if display screen chokes put '-'in front of password. As my screen was choking, I quit and relogged with -maher@geology.wisc.edu and then ls and cd, etc. worked correctly.]

cd pub/zen
get zen-1.0.PS
  transmitted in 107 seconds
quit

I sent the 499,365-byte file to our PostScript printer and the Times Roman typeset-quality reproduction was finished in less than ten minutes.

Later, I tried his non-ftp route by emailing info-server@nnsc.nsf.net with the message containing

  request: nsfnet
  topic: zen-1.0.PS
  topic: zen.readme

Within half an hour I had 13 incoming email letters numbered: 1 of 13, 4 of 13, 2 of 13, etc. I kept the lot by saving the first as "zen" and then appending (in correct numerical order) the other 12 files to it. Afterward, "zen" was read into the unix editor vi, and the 13 email addresses were deleted. (The 12-line headers all start with "From nnsc.."; you can find them by searching for that string.) The final file, of course, was just the same as that delivered by ftp.

Based on an old network established by the military, the present Internet comes as close as any example I know of forging swords into plowshares. It is growing at a phenomenal rate. Use it!

References

Kehoe, Brendan P. 1992. Zen and the art of the Internet; A beginner's guide to the Internet, 1st ed. January 1992, 96 p. + i-iv.

Kovach, Warren L. 1990, MVSP: A multivariate statistical package. INQUA - Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 4:1-3.

Ralska-Jasiewiczowa, Magdalena, and Walanus, Adam. 1991. Polish palynological database (POLPAL) in course of building. INQUA - Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 5:1-2.


Copyright © 1992 Steve Juggins, Warren Kovach and Louis Maher
Home page
Newsletter 8 index
Author index
Subject index
WWW pages by K.D. Bennett