What's on the CD-ROM The CD that comes with this book should be readable on a Mac, Solaris, Linux, and Windows 95/98/Me/NT/2000. Just put the CD in the drive, and mount it using whatever method you normally use to load a CD on your platform, probably filemanager in Solaris, and just stick it in the drive if you're using a Mac, Linux or Windows. There's no fancy installer. You can browse the directories as you would a hard drive. All CD-ROM files are read-only. Therefore, if you open a file from the CD-ROM and make any changes to it, you'll need to save it to your hard drive. If you copy a file from the CD-ROM to your hard drive on Windows, the file retains its read-only attribute. To change this attribute after copying a file, right-click the file name or icon and select Properties from the shortcut menu. In the Properties dialog box, click the General tab and remove the checkmark from the Read-only checkbox. The CD is divided into seven main directories: * Browsers * Parsers * Specifications * Examples * Source Code * Utilities * PDFs Browsers This directory contains a number of Web browsers that support XML to a greater or lesser extent including: * Microsoft Internet Explorer 5.5 for Windows * Microsoft Internet Explorer 5.0 for MacOS * Mozilla 0.8 (various platforms) * Amaya 4.3 (Linux, Windows 95,98,NT) Parsers This directory contains a variety of open source XML parsers including: * The Xerces Java XML parser * The Xerces-C XML parser for C++ * The Xerces-Perl XML parser for Perl * The expat parser for C++ Most of the examples in this book that have used a specific parser have used Xerces Java, in particular, the sax.SAXCount program, To install it, just copy the xerces.jar and xercesSamples.jar archives to your jre\lib\ext directory. You'll need the Java Runtime Environment (JRE) 1.2 or later, which you can download from http://java.sun.com/. If you've installed the Java Development Kit (JDK) instead of the JRE on Windows, you may have two ext directories, one somewhere like C:\jdk1.3\jre\lib\ext and the other somewhere like C:\Program Files\Javasoft\jre\1.3\lib\ext. You need to copy the jar archive into both ext directories. Putting one copy in one directory and an alias into the other directory does not work. You must place complete, actual copies into each ext directory. Specifications This directory contains the XML Specifications from the World Wide Web Consortium (W3C) including: * XML 1.0, second edition * Namespaces in XML * CSS Level 1 * CSS Level 2 * XSLT 1.0 * XPath 1.0 * HTML 4.0 * XHTML 1.0 * MathML 2.0 * The Resource Description Framework * SMIL These are all included in HTML format, and most are available in XML as well. Some are also provided in additional formats such as PDF or plain text. Many technologies discussed in this book are not yet finalized (for example, XLinks). You can find the current draft specifications for these on the World Wide Web Consortium (W3C) Web site at http://www.w3.org/TR/. Examples This directory contains several examples of large XML files and large collections of XML documents. Some (but not all) of these are based on smaller examples printed in the book. For instance, you'll find complete statistics for the 1998 Major League Baseball season including all players and teams. Examples include: * The 1998 Major League Baseball season * The complete works of Shakespeare (courtesy of Jon Bosak) * The Old Testament (courtesy of Jon Bosak) * The New Testament (courtesy of Jon Bosak) * The Koran (courtesy of Jon Bosak) * The Book of Mormon (courtesy of Jon Bosak) * The periodic table of the elements Source Code All complete numbered code listings from this book are on the CD-ROM in a directory called source. They are organized by chapter. Very simple HTML indexes are provided for the examples in each chapter. However, because most of the examples are raw XML files and because most don't have style sheets, some Web browsers won't display them very well. Internet Explorer 5.x probably does the best job with most of these files. Otherwise, you're probably better off just opening the directories in Windows Explorer, the Finder or the equivalent on your platform of choice, and reading the files with a text editor. Most of the files are named according to the listing number in the book (for example, 6-1.xml, 27-1.cdf). However, in a few cases in which a specific name is used in the book, such as family.dtd or family.xml, then that name is also used on the CD. The files on the CD appear exactly as they do in the book's listings. Utilities The utilities directory contains assorted programs that will be useful for processing XML documents of one type or another. These include: * Dave Raggett's HTML Tidy, compiled for a variety of platforms. Tidy can clean up most HTML files so that they become well-formed XML. Tidy can correct many common problems and warn you about the ones you need to fix yourself. The latest version can be found at http://www.w3.org/People/Raggett/tidy. * The Xalan-J XSLT Processor from the XML Apache Project * Michael Kay's SAXON XSLT Processor * James Clark's XT XSLT Processor * The Batik SVG Viewer from the XML Apache Project * FOP, a print formatter driven by XSL formatting objects from the XML Apache Project. PDF The pdf directory contains Acrobat PDF files for this entire book. To read them, you'll need the free Acrobat Reader software included on the CD-ROM. Feel free to put them on your local hard disk for easy access. I don't really care if you loan the CD-ROM to some cash-strapped undergrad who finds it cheaper to tie up a school printer for a few hours printing all 1200+ pages rather than spend $49.99 for a printed copy. (If you're using your own printer, toner, and paper, it's much cheaper to buy the book.) However, I would very much appreciate it if you do not place these files on Web, FTP, Gnutella, Publius, or any other servers. This includes intranet servers, password-protected sites, and other things that aren't meant for the public at large. Most local sites and intranets are far more exposed to the broader Internet than most people think. Today's search engines are very good at locating content that is supposed to be hidden. Putting mirror copies of these files around the Web makes it extremely difficult to keep all the files up to date and to make sure that search engines find the right copies.