So You Want To FTP/Download, Huh?
Subtitled: ASCII vs. Binary Transfers
by Joe Burns Ph.D.
FTP stands for File Transfer Protocol. It's the concept of moving a file
from your storage space to your server so others can look at it. This is
one of the basics of HTML, and most people know how to do it, but are
confused why items can sometimes be corrupted when transferring. I put this
together to try and explain some of the statements I make.
How FTP Works
FTP is actually very basic. There are about a million different FTP
programs you can take off the Internet as shareware or purchase, heck even
Netscape and Explorer will allow you to take files, often called
"downloading." Obviously then, placing a file from your storage space to
the server is called "uploading."
My guess if that you have your own FTP program already, but if not, you can
download one. They come in freeware and shareware. If you want free, try
FTP Explorer (http://www.ftpx.com/) or the free (limited) version of WSFTP
(http://www.ipswitch.com/cgi/download_eval.pl?product=WL-1000).
If you don't mind spending a couple of bucks after a 30-day trial, click on
one of these. I've tried them all and can vouch that they are all good
programs. Try either WSFTP full version
(http://www.ipswitch.com/Products/WS_FTP/index.html), CuteFTP
(http://www.cuteftp.com/products/cuteftp/) or my personal favorite, FTP2000
(http://www.sharewarejunkies.com/8ef3/ftp2000.htm).
Below, in Table form, is the general interface for a basic FTP program.
Yours is something like this. By the way, "interface" is the thing between
you and the computer, mainly the graphics. In fact, an interface is
anything that acts as a go-between for two items. You and your computer are
as good as any two.
ASCII BINARY
Files <--COPY--> Files
On <--VIEW--> On
Your <--DELETE--> Your
Hard <--RENAME--> Server
Drive
Let Me Explain It...
-- The Bold "ASCII" and "BINARY" at the top are buttons that change the
transfer type.
-- The center column of three buttons allows you to click either side of
the command to transfer the file from or to the server. See the arrows?
ASCII vs. Binary
This is the main reason for this tutorial. I get letters all the time
asking why images, or Applets, or JavaScripts, don't work. My answer is
usually that the person corrupted it in the FTP. That usually confounds the
problem further. Here's a more in-depth explanation:
ASCII Sometimes called "TEXT" or "TEXT DOS".
ASCII stands for American Standard Code for Information Interchange. It is
text, short and simple. But it is text that is standardized so all
computers everywhere understand it. Look at your keyboard. See all those
things, those letters and characters? There are actually 128 of them in
all. (Count upper and lower case as two).
Now it gets loopy --
Computers deal with numbers. Period. Yes, you see little letters, but the
computer doesn't. It sees numbers -- ones and zeros to be exact. Each one
or zero is called a "bit." That's short for "binary digit." ASCII is a
series of seven one and zero number combinations representing letters and
characters. (Some computers now use an "Extended ASCII" that uses eight
numbers) An extra digit is often added as a check to see if the other seven
are correct. It's called a "parity digit" or "check bit" and through a
mathematical equation involving the other seven numbers, it checks to see
if the numbers are correct. Here's what some ASCII code looks like:
Symbol ASCII Code Symbol ASCII Code
A 01000000 a 01100001
! 00100001 $ 00100100
Z 01011010 z 01111010
...etc, etc, etc up to 128
Notice that there are only two numbers involved, one and zero. This is
what's known as "binary", two items. THEN WHAT'S THE DIFFERENCE BETWEEN THE
TWO?!?! -- you ask. I told you this gets loopy. ASCII code is code for text
alone. Those 128 groupings of ones and zeros represent text, period.
In terms of FTP: If you are FTPing something that only has text, like an
HTML document, use the ASCII mode of your application. More on why in a
moment.
Binary Sometimes called "Raw Data" or "All Files"
Binary is best explained in comparison to ASCII. Binary also uses the seven
(sometimes eight) digit ones and zeros combinations, but sees the
characters in a different light.
Let's say you are FTPing an applet. Yes, it uses only the 128 characters on
the keyboard, but with one major exception...all characters are not equal.
If you look at a GIF, or a JPEG, or an image (in an editor) - yes, it looks
like text. Remember, the computer sees numbers only, and characters are a
good representation of those numbers. You see, the computer doesn't require
that you see what happens to work. Text is just so you can get a
representation of what it is doing.
Where a binary transfer differs from ASCII is how it treats the characters
used. An Applet needs to not only retain the same characters when it
transfers, but also needs to retain the same form. It has to be equally as
wide and tall when it arrives at its destination as it was when it left. If
it is not - it's corrupted and won't work.
An Example:
When you create an HTML Document - you may have noticed that adding a ton
of spaces between words did not translate into a ton of spaces in the
browser window. In addition, where you hit "enter" to jump to the next line
didn't mean didley when you posted it. The line broke when it wanted
(unless you put in a <BR> command).
The reason for this is because you saved the document as "TEXT" of one form
or another. That only saved the letters, nothing else. Where you hit return
didn't mater. Your margin settings weren't saved - only the text. This is
why you have to put in flags to make the text do what you want. When you
transfer the file over as ASCII, only the text goes, because that's all
that is required. Its form is immaterial. You could write your HTML
Document as one really, really long line. The computer doesn't care. It
changes the text off of the flags, not by the form it was sent. How pretty
you make your HTML document doesn't matter.
Now, imagine you just finished writing an Applet. Yes, it's in text, but
the text is more than just a bunch of words. The text is in a certain
format. Some of the text represents commands for the computer, and some
represents text that will appear on the screen. Still other text represents
a jump to the next line. That format of text must be retained. If you send
the Applet as ASCII, the transfer literally changes the Applet into a long
line of characters - basically it makes it text alone. The different types
of commands have lost their meaning. All is now equal. It is corrupted. It
will not work.
If you send (or download) an image as "TEXT" - same thing. The ASCII
transfer changes the code into straight text with no special meaning. It is
no longer an image but rather a long line of unrelated characters - it is
corrupted and it won't work.
Rule of Thumb
This goes for both FTP and downloading!
-- Use ASCII only for transferring HTML Documents.
-- Everything else - goes Binary (or raw data or all files depending on
your program)
-- Additional rule - if your HTML Document contains a JavaScript that you
know is correct, but doesn't work when you post it, send the HTML
-- Document as Binary. The text might need some minor adjusting, but will
probably be fine. The Script is the concern here.
Why Not Send and Save Everything as Binary?
You can. It's just that sending an HTML document as Binary, tends to mess
it up a bit. More than just text is being sent - form is now involved, and
it may alter up what you want. Make a point of sending in two forms, binary
and ASCII.
Finally!
The things I said above go for both FTP and downloading!!! Text is text.
Applets, images, CGI's and Scripts, are quite different. If you download or
transfer something and it fails to work - the smart money is that you
corrupted it through one or many of your transfers.
Thanks for reading...now go FTP something.
About the Author:
Joe Burns can be reached at http://www.joeburnsphd.com/