Revised: March 24, 1995 Release 02 --------------------------------------------- TEACHING AND PUBLISHING IN THE WORLD WIDE WEB by Harry M. Kriz University Libraries Virginia Polytechnic Institute & State University Blacksburg, VA 24061-0434 hmkriz@vt.edu http://learning.lib.vt.edu/authors/hmkriz.html Teach people to surf the Internet and they can tour the world. Teach people to serve on the Internet and they can touch the world. --H. M. Kriz (1994) -------- ABSTRACT The World Wide Web (WWW) is emerging as an elegant and usable method for distributing information via computer networks. WWW uses interlinked hypertext documents to find and display multimedia information, including text, color graphics, video, and audio. The Web can be used to distribute information within a company or university, to a small group of students in a particular class, or to the entire world. The means for doing this are now available to anyone with a desktop computer connected to a network. Software is freely available on the Internet, as are WWW documentation and tutorials on creating hypertext multimedia documents. This paper is a brief introduction to the knowledge needed by professionals in any field who would like to extend their reach by distributing information through computer networks. ------------------- PUBLICATION HISTORY The most recent plain text (ASCII) version of this paper is always available by anonymous FTP from nebula.lib.vt.edu in directory /pub/www under the name websrv**.asc. For example, using the syntax of the Uniform Resource Locator (URL), this version is available as ftp://nebula.lib.vt.edu/pub/www/websrv02.asc. The hypertext version of this paper is maintained on a more regular basis. It is available through the World Wide Web at: http://learning.lib.vt.edu/webserv/webserv.html The first version of this paper was written at the editor's request and produced in Adobe's Portable Document Format (PDF) for inclusion on the "25th-Anniversary CACHE CD-ROM," CACHE Corporation, November 13, 1994, Professor Peter Rony, Editor. ASCII and hypertext versions were released on October 14, 1994. For this ASCII release, all of the text was reviewed and revised as necessary. Two sections have been added. The section on "Who Will Access Your Server?" presents some data about the success of my own desktop server, which is now being accessed more than 20,000 times per month by readers of my articles. The section on "What Role for Publishers?" reviews some ideas I find especially intriguing about academic publishing in the networked world. This ASCII file was prepared using the 2nd beta release of Internet Assistant for Word for Windows. The twelve separate HTML files of the hypertext version were captured via the Web and pasted to a single file. Redefining Word's styles handled most of the reformatting necessary to produce a readable ASCII file. As similar tools develop, it will become increasingly easy to reformat documents for the differing requirements of a variety of distribution media. -------- CONTENTS This paper is divided into the following parts: The World Wide Web Reasons to Operate a WWW Server Who Will Access Your Server?(New in this release) What Role for Publishers? (New in this release) Overview of Server Operation Obtaining Server Software Constructing WWW Documents HTML Documentation Some Limitations of WWW Documents Other Sources of Information Disclaimers and Warranties ------------------ THE WORLD WIDE WEB The World Wide Web (WWW) is a system that enables users to find and retrieve information by navigating through linked hypertext documents. In a hypertext document, selecting a highlighted word, phrase, or image causes a new document or image or sound to be retrieved and displayed. WWW documents lead the user to skip from one document to another, retrieving information from servers scattered around the world. Viewing a WWW document with a graphical client such as Cello, MacWeb, Mosaic, Netscape, or WinWeb, is much like reading a magazine. Textual information is displayed with typographic fonts. Color graphics can be supplemented by sound that can be played by clicking an icon embedded in the document. Web documents can be interactive in that they can respond to a user who clicks a button on the document to submit information to the Web server or to send e-mail to a predetermined address. The World Wide Web is an application of client/server computing. Client software, known as a Web browser, sends a request for information from the user's computer to a server running on another computer. The server processes the request and returns a file to the browser. The returned file will normally be a hypertext document, which is often referred to as a WWW page. Other kinds of files also may be returned to the user. If the returned file is a hypertext document, the browser software interprets the returned file and formats it for the user. During interpretation of a hypertext document, the browser may be instructed by links in the document to retrieve additional files. These additional files may be graphic images that are decoded by the browser and displayed inline with the text. Such links cause additional requests to be sent to the server to obtain the additional files. Thus, a single document request may result in numerous contacts between the browser and the server. Some selections in hypertext documents may return files that are not themselves hypertext documents. Thus, it is possible that the file returned to the browser will be a Gopher menu from a Gopher server. In this case, the Web browser formats the Gopher menu for the user. The browser then substitutes for a Gopher client. Similarly, a file returned to the browser may be a directory from an FTP server. In this case the browser substitutes for an FTP client. The file returned to the browser also could be an ASCII file or binary file from an FTP server. An ASCII file will usually be displayed by the browser as plain text with fixed line lengths. A binary file either will be saved to the user's local hard disk, or it will be passed to another application on the user's computer for further processing. Videos and sounds, and some images, typically are passed to "helper" applications or "viewers" for decoding and display. Windows client and server software is based on the Winsock Applications Programming Interface (API). For a discussion of Windows client software and the Winsock standard, see my paper "Windows and TCP/IP for Internet Access" (Hypertext version at http://learning.lib.vt.edu/wintcpip/wintcpip.html or ASCII version at ftp://nebula.lib.vt.edu/pub/winsock/wtcpip**.asc. An earlier print version was published in the American Institute of Chemical Engineers "CAST Communications," Vol. 17, No. 2, pp. 6-14 (Summer 1994)). ------------------------------- REASONS TO OPERATE A WWW SERVER In the old days (early 1993), people could impress their neighbors by mentioning that they had an e-mail address on the Internet. By late 1994, e-mail addresses were passe. Now one needs a personal home page on the Web in order to be considered current. Your personal page should include a color photograph of yourself engaged in some hobby, preferably unrelated to computing or your profession. If a sound recording or video is linked to the page, so much the better. Thus, one reason for operating a WWW server is to have fun while maintaining your reputation as one who is conversant with the technology of networked information. As the fun proceeds, you will discover that there are good professional reasons for placing information on your own Web server. These reasons arise out of the original goals of those who created the Web. The World Wide Web began at CERN (Conseil Europeen pour la Recherche Nucleaire, or European Council for Nuclear Research, Geneva, Switzerland) in 1991 as a means to distribute information within the High Energy Physics community. People from many countries working in small teams needed a method to share their own information and to access information provided by others. Tim Berners-Lee, inventor of WWW, describes the Web as a "collaborative knowledge-sharing tool for a community." (Internet World, October 1994, p. 78). Teachers, researchers, publishers, politicians, and shopkeepers are among the vast array of people who need to share knowledge and information among various communities. Thus, just about anyone might have a reason to operate a server. Indeed, anyone who spends time browsing the Web soon comes to the conclusion that just about everyone on the Internet now has a personal home page. Teachers especially have reasons to investigate using the Web if for no other reason than to gain hypertext experience. A recent article in the "Chronicle of Higher Education" (Vol. XLI, No. 5, pp. A25-A27, A30, September 28, 1994) shows that some believe that hypertext could revolutionize teaching. Teachers can use WWW to publish syllabi and distribute class handouts. Handouts on the Web can be updated instantaneously and are always available to students in their dorm rooms, apartments, or homes. The Web is not limited to supporting teaching functions. Researchers can share information among widely scattered colleagues. Companies can use WWW to distribute internal information and announcements within their organizations, which may have workers scattered around the world. Merchants can use the Web to advertise their wares. For example, people anywhere in the world can use the Web to order flowers from Wade's Flower Shop in the small town of Blacksburg, VA. (http://oscar.biznet.com.blacksburg.va.us/~wades/flower.html ) ---------------------------- WHO WILL ACCESS YOUR SERVER? Who will access your server will depend, of course, on the content you serve. Anyone wishing to teach or publish always has a target audience in mind, and serving documents on the Web is no different from publishing on paper in that respect. At first, my target audience was local users within my workplace, for whom I created the "Always Learning" page (http://learning.lib.vt.edu/) as an entry point for internal training and teaching outreach. However, I had a wider audience in mind when I created a hypertext version of my paper "Windows and TCP/IP for Internet Access" (http://learning.lib.vt.edu/wintcpip/wintcpip.html). As an ASCII document first issued in November 1993, the paper had already reached a world-wide audience via news groups and anonymous FTP. I had received e-mail from individuals on all seven continents attesting to that fact. Now I would have the chance to see how the Web would facilitate distribution of that paper. The log file created by the server software can be read by a statistics program to produce a statistical summary of the files served from my desktop. The statistical summary can even be made available to users of the server (http://learning.lib.vt.edu/webstat.html | about 10-15KB). I wrote the hypertext version of this article about Web serving in October 1994, at which time I posted it on my desktop Web server. I then announced its availability through a few Internet news groups and a couple of listservs. Almost within minutes of posting my announcements, the paper was being accessed. During the next six weeks, the article was accessed 2,300 times. Word about the article appears to have spread through the Web, as it continues to attract more than 600 readers each month. In February 1995, my Web server distributed more than 16,000 files to readers in 50 countries. Virtually all the traffic (about 98%) was generated by readers of one or both of my two articles. The article on Windows and TCP/IP is the most popular title. This is expected since it covers introductory material useful to anyone using Windows to access the Internet. The abstract page of that article was accessed 4,350 times in February. (Note that the total number of files distributed is much greater than the sum of readers of the articles. This is because each article consists of several files. It is also obvious, and expected, that many readers look at only a few of the files that constitute a complete article, and some never read past the Abstract.) As word of my papers has spread, activity on my server has taken on a life of its own. In March, the number of connections to my server should exceed 20,000. Between 500 and 1,000 files per day are being served to requesters. Simply put, I have been stunned by the enormous power of the Web to reach a world-wide audience. I continue to be amazed that my papers are reaching thousands of readers in several dozen countries. Yet I am using hardware similar to that found in the bedrooms of many school children. The software is available free or for a nominal fee. I have bypassed all traditional publishing channels to reach my audience. ------------------------- WHAT ROLE FOR PUBLISHERS? That I can reach an international audience from my desktop, without the aid of publishers or distribution channels, can be seen as a testimony to the democratizing nature of personal computers and networks. In the world of information dissemination, power is flowing to individuals. In the past, in order to reach a wide audience, an author had to be published by a large publishing house with the resources to distribute the publication. As my own publisher, I have been able to reach a world-wide audience without any assistance from traditional publishing channels. This ability of the Web to empower an individual is far more fascinating to me than all the commercial Web applications being discussed in the news media. A question arises as to the function of publishers and refereed publications in the new open environment where anyone can reach a world-wide audience from a personal desktop computer. In the teleconference "Networked Information and the Scholar" (October 28, 1994), Dr. James J. O'Donnell (http://ccat.sas.upenn.edu/jod/jod.html) made some interesting comments about the changing role of traditional publishing, especially academic publishing. Professor O'Donnell is coordinator of the Center for Computer Analysis of Texts in the Department of Classical Studies at the University of Pennsylvania. Professor O'Donnell argued that peer review of manuscripts by publishers has the purpose of rationing access to rare print resources. Recall that those reviewing the papers prior to publication are in fact in competition with the authors being reviewed for rare print space and for grant money. In the networked world, there is no necessity to ration access to the audience. Peer review becomes censorship in this environment. Professor O'Donnell used the evocative and provocative phrase that in the networked environment "what was indispensable will become indefensible." Control of access to the publication medium, once an indispensable part of the publication process, will become indefensible when anyone can distribute publications through the network. ---------------------------- OVERVIEW OF SERVER OPERATION Until recently, server operation was the domain of UNIX wizards. Certainly industrial strength servers for institutions and business will continue to depend on UNIX, and increasingly on Windows NT. However, it is now possible for any knowledgeable end- user of MS Windows or the Mac platform to operate a server and accomplish useful work. The operator of a Web server can now concentrate on the information content of documents rather than on the technology of server operation. Thus, serving documents on the Web has become just another end-user computing tool to be used by professionals in many fields. Transfer of messages on the World Wide Web is based on the HyperText Transfer Protocol (HTTP). A server that serves hypertext documents is referred to as an HTTP server or daemon. Server software frequently has the designation HTTPD somewhere in its name. Installation of HTTP server software is straightforward. The program and support files are copied to an appropriate directory (Windows) or folder (Mac) hierarchy on the hard disk. A few text entries may have to be made in an initialization file to customize defaults. Fortunately for those with little patience and a strong need for instant gratification, servers for both Windows and the Mac are supplied with sample hypertext documents that can be used for initial testing. Ultimately, of course, custom hypertext documents will be created for distribution to anyone accessing the server. Once these documents are created, the Webmaster can advertise the availability of the server to its intended audience. With an Internet connection the potential audience is the entire world. Documents available to the HTTP server are stored in hierarchical subdirectories or in nested folders within a default directory or folder. When the server is accessed by a browser, the server can serve either a default document or a document that is specifically requested. Other files can be delivered depending on the hypertext links in the first file served. For the beginning Webmaster, the most difficult task may be assigning suitable directory or folder names that will facilitate future expansion of the file hierarchy. ------------------------- OBTAINING SERVER SOFTWARE ******* Windows Windows httpd is an HTTP server for Microsoft Windows. The package was written by Robert B. Denny. Details about the package, and an FTP link for downloading, are available on the Windows httpd home page (http://www.alisa.com/win-httpd/). Alternatively, the file whttpd14.zip (ftp://ftp.alisa.com/pub/win- httpd/whttpd14.zip) (Version 1.4a released January 27, 1995 | ZIP file dated March 16, 1995 | 664,684 bytes) can be obtained directly by anonymous FTP. Note that the FTP server that distributes the file can be accessed ONLY between the hours of 6:00 PM and 6:00 AM Pacific Time (UTC-8). Windows httpd may be used without charge for personal and educational use. For commercial use there is a one-time fee of $99.00. Windows httpd is based on the Winsock standard that supports most TCP/IP clients for Windows. At first glance, Windows httpd is an intimidating package. The ZIP file unzips into numerous subdirectories containing over 400 files totaling more than 1.4 MB. However, the server executable itself is only 243,216 bytes. The remaining files contain extensive documentation, images, sounds, and support software for uses more sophisticated than simple file serving. Initial setup is straightforward. *** Mac MacHTTP is an HTTP server for the Mac. This package was written by Chuck Shotton. It can be obtained from the MacHTTP home page (http://www.biap.com/). Alternatively the file machttp.sit.hqx (ftp://oac.hsc.uth.tmc.edu/public/mac/MacHTTP/machttp.sit.hqx | Version 2.0 | December 19, 1994 | 1,265,463 bytes) can be obtained by anonymous FTP. MacHTTP may be used without charge by private individuals running a public use, not-for-profit server not defined as government, educational, or commercial use. Fees vary up to $1,000 for a site license in a public-use, for-profit environment. Details are included in the documentation. -------------------------- CONSTRUCTING WWW DOCUMENTS WWW servers use HTTP to serve hypertext documents that are requested by a browser. Hypertext documents are plain text files in which the content is augmented by descriptive tags using the Hypertext Markup Language (HTML). HTML tags describe the logical structure of the document. In addition, tags can provide linking information that enables the browser either to jump to other portions of the current document or to request other documents from the same or other HTTP servers. Links can also be used by the browser to connect to other types of servers such as Gopher servers or FTP servers. HTML is a variation of the Standard Generalized Markup Language (SGML). SGML is an ISO standard for describing the logical structure of a document. Specifically, HTML is a Data Type Definition (DTD) within the SGML standard. For the most part, HTML markup does not control how a document will be formatted for the ultimate reader of the document. Formatting is applied to the document by the Web browser at the time the document is displayed. It is the browser that determines the final appearance of the various logical parts of the document. How a particular browser chooses to format a particular logical element of a document may or may not be under the control of the user of the browser. The most common logical element in a document is the paragraph. Other logical elements include headings of various levels to separate the document into sections, types of lists such as numbered and un-numbered lists, and items in those lists. In addition to such commonly understood text elements, hypertext documents contain anchor elements that use the syntax of Uniform Resource Locators (URL) to define links to other documents and files. Providing a tutorial on HTML is far beyond the scope of this paper. However, it is worth emphasizing here that marking up a document with HTML tags is exactly the task that used to be called word processing. Many of us who predate desktop computers learned to word process on IBM mainframes using IBM's GML markup in conjunction with the Script text formatter. There was even a well-reviewed PC text formatter named ReadiWriter that could read GML files and format text on the early Epson dot matrix printers. Marking up a document can be tedious. Specialized editors that facilitate HTML markup are becoming available as stand alone applications or as add-ons to standard word processing packages. The beta version of Internet Assistant for Word for Windows (http://www.microsoft.com/pages/deskapps/word/ia/) is an example of this newest category of WWW tools. Web browsers usually can display the plain text source document for the currently displayed Web page. Some will even load the source document into the user's preferred text editor. The source can also be saved to disk. Thus, it is easy to view examples of HTML markup and learn how to accomplish some desired effect. For the sake of completeness, an example of HTML markup is illustrated below. This example consists of a paragraph of text containing an ordered (numbered) list nested within an unordered (unnumbered) list. ________________________________________________________________ The raw HTML text might appear as follows:
This is the beginning of the paragraph.