home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
AmigActive 13
/
AACD13.ISO
/
AACD
/
Resources
/
System
/
BoingBag1
/
Internet
/
AWeb
/
Plugins
/
HTTX
/
HTTX.doc.eng
< prev
next >
Wrap
Text File
|
1999-12-20
|
38KB
|
1,074 lines
HTTX
HTML to TEXT converter
Created by Gabriele Favrin
(E-Mail: favrin@tin.it - FidoNet: 2:333/726.8)
Version 2.0b - December 1999
Index:
1. Utilization terms
2. Property of HTTX and distribution terms
3. Introduction
4. Hardware requirements and installation
5. How to use
5.1 Command line parameters
5.2 External configuration
6. Error Messages and AmigaDOS Return Codes
7. FAQ (advices, interfacing with other programs and more)
8. Technical informations
8.1 What is supported, what is not (yet) and implementation
8.2 Notes about ANSI conversion
8.3 Notes about conversion of <PRE>, <XMP>, <LISTING> and <SCRIPT>
contents
9. How to contact the Author
10. Greetings
11. Program history
12. Future versions
-----------------------------------------------------------------------------
*** CHILDWARE ***
This software is "CHILDWARE". The author explicitly asks whoever uses this
program to make a donation toward a beneficial corporate body which works
helping children in some way.
If you don't know of any, ask at your local post office and inform yourself
on how to make a donation to UNICEF.
The amount of the offer is up to you, but please do it!
-----------------------------------------------------------------------------
1. Utilization terms
--------------------
Before running this program on your computer, please read carefully the
following paragraph, and continue only if you agree with the terms written
below.
THE HTTX AUTHOR IS IN NO WAY RESPONSIBLE FOR MORAL AND/OR MATERIAL DAMAGES
THAT HIS PROGRAM MAY CAUSE TO PEOPLE OR THINGS. THE PROGRAMMER GAVE HIS BEST
TO LIMIT THE PROBLEMS THAT HTTX MAY CAUSE, BUT HE IS NOT ABLE TO GUARANTEE
ITS EFFICIENCY IN ALL THE SITUATIONS. USING HTTX, YOU, THE USER, ARE
RESPONSIBLE FOR ALL MORAL, MATERIAL, CIVIL AND PENAL THINGS.
WARNING:
Many HTML documents are under Copyright, and are not freely distributed, even
if converted to plain text format. The author declines every responsibility
in the utilization of the files generated with HTTX.
All of the programs mentioned in this document are properties of their
respective owners.
2. Property of HTTX and distribution terms
------------------------------------------
The executable program, the source code and the ideas that are its basis are
the EXCLUSIVE PROPERTY of Gabriele Favrin. All rights reserved.
HTTX is freeware, NOT public domain. It may be spread only if the executable
files and the documentation remain unchanged. Distribute of the files in
archive formats other from LhA is permitted, but compressing the individual
files using PowerPacker or similar tools is not.
Commercial use of this package is exclusively granted to AmiTrix who may
spread it with complete or partial versions of the AWeb WWW Browser.
Staffs of Aminet, Fred Fish, Meeting Pearls, Amy Resource and Amiga magazines
which have a cover CD disk are authorized to include HTTX in their public
domain software collections.
3. Introduction
---------------
HTTX (HTml > TXt) is a program to convert files from HTML format, used for
viewing files on World Wide Web, to pure ASCII. There are analogous products,
but since none had completely satisfied my needs, I started to write one
myself.
I don't say this is the best or the fastest one, but surely it has some
functions unpublished in similar Amiga programs till now.
4. Hardware requirements and installation
-----------------------------------------
Required system:
Amiga, 512K, Kickstart 2.04 (37.175) or above.
Required memory:
The size of the file to convert to, and about a 15K, for buffers and other.
Install:
Copy HTTX to your C: directory or a directory in your current path.
HTTX is compatible with AmigaOS 3.5 and U.A.E.
5. How to use HTTX
------------------
HTTX can be only used from Shell. Users of AWeb please note that a special
ARexx plugin to fully control HTTX from the browser is included in this
distribution. Also please read section 7.
Command syntax:
HTTX InputFile [OutputFile] [options]
The parameters in square brackets are optional. You are only required to
specify a valid HTML file ("InputFile").
If there is no OutputFile specified, it defaults to 'InputFile'.txt (eg.
"test.html" will be saved as "test.txt"). If a path is specified for
OutputFile, that file will be saved to that path.
Examples:
-> HTTX data:txt/html/abox.html
The file "aboxe.html" will be converted and saved
as "data:txt/html/abox.txt"
-> HTTX data:txt/html/aboxe.html ram:aboxe.txt
The file "abox.html" will be converted and saved
as "ram:abox.txt"
-> HTTX data:txt/html/abox.html data:txt/
The file "abox.html" will be converted and saved
as "data:txt/abox.txt"
5.1 Command line parameters
---------------------------
HTTX offers many options to control the conversion process.
LEN
Maximum line length of the output file.
Default: 77 - Minimum: 15 - Maximum: 255
INDENT or IN
Number of spaces for indentation (re-enter to the right) of the <UL>, <OL>
and <DL> lists. The specified value must allow at least two levels of
indentation regarding of the line length specified with the LEN option.
Default: 3 - Minimum: 1 - Maximum: (LEN value - 10) / 3
ANSIMODE
ANSI conversion of HTML styles and LINKS (HREF and NAME) and optimization
of alignment functions.
Not to be used if the converted text will go on message areas, like
Fidonet or Usenet newsgroups.
Please read the section 8.2 (ANSI conversion) for important
informations about ANSI sequences and general compatibility issues.
The option supports three modes:
ANSIMODE=0
No ANSI codes are used. This equals to not place the option both in the
command line or in the configuration.
ANSIMODE=1
Standard ANSI codes are used. This is the only ANSI conversion type
available while printing.
ANSIMODE=2
VT100 cursor control sequences are used to reduce size of the file by
compressing spaces. This could cause incompatibility with some programs
(see section 8.2).
Default: ANSIMODE=0 (styles are not converted).
ANSICOL
Color used to render the links during ANSI conversion.
Accepted value range is from 0 to 9. The colour shown may depend from the
terminal program used.
Default: 3 (file save) or 4 (printing).
7BIT
Conversion of HTML entities (accent letters, symbols, and so...) to ASCII
codes lower than 128. This is required for text forwarded on nets like
FidoNet, where the character codes allowed can only range between
32 and 127.
IMPORTANT: remember that the ANSI option adds Escape codes (ASCII 27),
forbidden on FidoNet, and strongly not recommended for a non personal use
(broadcast) of converted text.
Default: OFF (8 bit chars are not converted).
HRMODE or HR
HTML documents often contains the <HR> TAG, which defines a separating
line between paragraphs. HTTX allow the management of these lines in
three ways:
HRMODE=0
No lines drawn.
It was NOHR in HTTX versions before 1.5.
HRMODE=1
Lines are drawn using the minus "-" character.
HRMODE=2
Lines are drawn using underlined spaces (in ANSI). This mode
generates a nicer line, but introduces ANSI codes, it is absolutely to
be avoided if the text will go on FidoNet or Usenet newsgroups.
Default: HRMODE=1 (lines are inserted using the minus "-" character).
NOALIGN or NA
HTTX supports (right or center) alignments of texts and separators (<HR>).
Examples:
centered text
right-aligned text
If NOALIGN option is ON, both the above lines will start on left margin,
this saves characters.
Default: OFF (alignment is rightly converted).
TDEOL
Insertion of an EOL between cells of an HTML table. Using this option
can improve the layout of some tables.
Default: OFF (between each cell a space is inserted)
SETNOTE or SN
Use the document title (<TITLE> TAG) or its url as output file
comment. This option is ignored when options PRINT or STDIO
(display on console) are used.
The options supports the following values:
SETNOTE=0
No comment is added.
SETNOTE=1
Title of the document, if one, is added as comment of the output file.
Only first 64 characters are saves, as HTML standard defines.
SETNOTE=2
URL of the document, if available, is added as comment of the output
file.
Default: OFF (file comment is not set).
SITE
Insertion of the specified source URL in the output file.
It may be useful to know which document the file was created from.
Example:
HTTX ram:children.html SITE=http://www.unicef.org
will start the file with "URL : http://www.unicef.org"
Note: SITE has priority over GETNOTE, so specifying a site this way will
override that option if it is active.
Default: OFF (without this option the URL will not be added).
GETNOTE or GN
Usage of the input file comment as the URL. This option is alternative to
SITE and is useful with files created using AWeb or other browsers and
programs which save the URL in the comment.
The AmigaDOS comment length limit is 80 chars.
Default: OFF
NOHEADER or NOHEAD
Don't insert HTTX version information, title (<TITLE>) and URL in the
converted file. This option automatically turns off the SITE and GETNOTE
options if present.
Default: OFF (HTTX version, optional title and URL may be added).
HREF or LINK
Adds the addresses of links (<A> TAG).
Very useful if the document contains links that you want to keep.
Default: OFF (links aren't added).
IMG
Insertion of the ALT-text of images (<IMG> TAG) in the output file.
Useful if the document contains images with descriptions.
Default: OFF (ALT-text isn't added).
SCRIPT or SCP
Insertion of the content of <SCRIPT> TAG, eg. JavaScript, in the output
file.
Note: this option adds the script itself, not its result!
Please read the section 8.3 for important information about conversion of
this type of text.
Default: OFF (<SCRIPT> content is skipped).
BADHTML or BAD
Partial support for documents created outside of the HTML standards. Use
this option only if parts of the converted page are missing. Using this
option with correct HTML documents may cause unpredictable results in the
converted document.
Default: OFF (HTTX uses standard DTD rules).
FORCE
Forces conversion of input file without checking if it is an HTML
document.
USE IT AT YOUR OWN RISK: conversion of normal text or binary files may
cause unpredictable results.
Normally, HTTX considers a file valid HTML when one or more of the
following conditions is valid:
- file extension is .html or .htm
- the starting TAG is <HTML>
- the starting TAG is <!DOCTYPE ...>
This option must be specified if the three above conditions are false,
even if the file IS an HTML document.
Default: OFF (automatic check of file).
STDIO
Display the converted file on screen instead of saving it on disk. This
option automatically enables the QUIET option.
Default: OFF (converted file is saved to disk).
PRINT or PRT
Print the document instead of displaying or saving it.
The printer.device will convert standard ANSI codes and end-of-lines to
the ones used by the Printer set in your Preferences.
This option should be used if you want to print the converted document.
If ANSIMODE is enabled (value different than 0) it will be set to 1 for
use of standard ANSI codes. Also, if not specified, value of option
ANSICOL will be set to 4 (blue color, as from Commodore specifications
regarding the printer.device).
Older versions of HTTX used a solution like "HTTX abox.html prt:",
which is now to avoid.
This option automatically enables QUIET, and turns off FILENOTE and STDIO
options.
Default: OFF (document is displayed on screen or saved to a file).
APPEND
Normally HTTX will overwrite an existing file.
If APPEND is ON, the converted text will be added to the end of the
specified file.
Default: OFF (overwrite output file if it already exists).
NOCFG
HTTX loads a default configuration from ENV:httx.prefs (if another is not
specified with the CFG option). If this option is ON, HTTX uses the
default values for the options or the parameters specified in the command.
For more informations on external configuration, see section 5.2.
Default: OFF (HTTX searches for default or specified configuration).
CFG
With this option it is possible to specify the name of the configuration
file to be used by HTTX. This file MUST be located in the ENV: directory.
This option turns NOCFG OFF.
For more informations on external configuration, see section 5.2.
Default: OFF (HTTX loads the httx.prefs configuration file).
INCLUDE
Using this option it is possible to include text at the start
of each output file, before the converted HTML data.
The included text file is NOT ALTERED IN ANY WAY, no 8 bit character
conversion, wordwrap, ANSI codes and so on. HTTX will not warn if 8 bit
characters are included.
Remember this, especially if the converted text will go on message areas,
like FidoNet or Usenet newsgroups.
Default: OFF (no text file is included in the output file).
QUIET
Don't display any HTTX messages. This option is useful when HTTX is used
within a script.
WARNING: if active this option also hides error messages, but the AmigaDOS
error codes are always returned.
Default: OFF (HTTX output is displayed).
If not specified, HTTX uses the default settings.
When conversion is finished, if QUIET option is OFF, HTTX will show:
- the size of input and output files.
- the presence of 8 bit chars and their conversion, if active.
(relative only to HTML content, not the optional included file).
- TABs or ASCII chars less than 32 not converted because they are included
in pre-formatted text.
- non-standard HTML comments, which may make parts of document invisible.
If the converted file appears incomplete, try using BADHTML option.
5.2 External configuration
--------------------------
HTTX supports an external configuration, this is a text file that includes
the most used options, so they do not need to be typed every time you use the
program.
By default (except when NOCFG option is set, or CFG option with a different
filename) HTTX searches the file "ENV:httx.prefs".
It's possible to create multiple configurations, maybe one to use for file
conversion and another one to use for printing, creating different
configuration files and enabling the CFG option with the name of the file (do
not specify the path, it is always 'ENV:').
Example:
-> HTTX abox.html
Converts "abox.html" using default configuration (ENV:httx.prefs).
-> HTTX abox.html PRINT CFG=httxprt.prefs
Converts "abox.html" using the configuration file ENV:httxprt.prefs
Allowed parameters
------------------
The external configuration file supports a subset of available command line
options. Each option MUST be specified in its extended form (for example
HRMODE, not HR, INDENT instead of IN, and so on).
The file must contain only the options and their parameters. It's allowed to
put each option on a separate line for better readability.
Available options (for description see section 5.1):
LEN - the maximum length for output lines.
INDENT - the indentation size.
ANSIMODE - type of ANSI sequences used during conversion.
ANSICOL - colour used for links during ANSI conversion.
7BIT - conversion of 8 bit HTML entities to 7 bit chars.
HRMODE - line drawing mode.
NOALIGN - ignore center and right alignment.
TDEOL - insertion of an EOL between table cells.
SETNOTE - use HTML document title or URL as the output file comment.
GETNOTE - use the source file comment as URL of destination file.
NOHEADER - skip header (HTTX version, URL and title of original
document) in the converted file.
HREF - insertion of links (<A>) in the destination file.
IMG - insertion of the ALT-Text of images in the destination file.
SCRIPT - insertion of the content of <SCRIPT> element in the
destination file.
BADHTML - partial support for badly written HTML.
Parameters specified on command line acts after the parameters specified in
the configuration file. This can eventually override (or toggle twice) one or
more options.
Examples:
If a configuration file has the following line:
IMG GETNOTE LEN=70
and on command line you write:
-> HTTX abox.html IMG
the result is IMG turned on because it's present in configuration file, but
is turned off again because it's also present in command line.
-> HTTX abox.html LEN=74
LEN is both present in configuration file and command line, but this one
overrides the previous value. LEN is now set to 74.
How to create an external configuration
---------------------------------------
External configuration files are in effect system variables and are located
in ENVARC: directory (on disk) and ENV: (generally on RAM). So, the contents
in ENV: are valid only for the current session, while the contents in ENVARC:
are also valid after a reset.
To permanently save a configuration file, copy it both in ENV: and ENVARC:
Use your favorite text editor (Ed, Cygnus Editor, GoldEd, and so on) to
create your prefs file, httx.prefs is the default filename. Save it in ENV:,
also save the file to ENVARC: so it will not be lost when you reboot.
Temporary changes can be made by editing just the ENV: file.
HTTX configuration may be fully managed using the plugin for the AWeb WWW
browser.
6. Error Messages and AmigaDOS Return Codes
-------------------------------------------
When execution terminates, HTTX returns the appropriate AmigaDOS Return Code
(RC), usable within scripts to determine if the conversion was successful.
See your AmigaDOS handbook for a complete list of error codes.
In case of error, if QUIET is off, the appropriate AmigaDOS message will be
displayed.
Following is a list of the most common errors. If the system is localized,
messages are displayed in the appropriate language. See your AmigaDOS manual
for further information.
Argument line invalid or too long
Arguments entered in a wrong way.
*** Break
The user has pressed Control-C keys, interrupting the conversion, and the
output file has been removed.
Not enough memory available
There is no memory available to allocate the buffers used by HTTX.
This can happen if the HTML file to convert or the text file to include
is too big or the memory is too fragmented.
Try to rebooting your Amiga.
Object not found
Specified file doesn't exist or it's not accessible.
Check file name and path.
Object is not of the required type
The input file seems not to be an HTML document.
Try using the FORCE option.
HTTX can display other errors (in English only) due to wrong use of commands
or options:
The line length must be between 15 and 255 characters (current is NN)
The line size specified with LEN parameter is a number less than 15 or
more than 255.
Indentation size must be at least 1 (current is NN)
The indentation size must be at least one character.
With line length XX, indentation size YY, max indent level is ZZ.
You must allow at least 3 indentations.
The maximum indentation level, with specified line size and indent value,
is less than 3.
HRMODE value must be 0, 1 or 2
Value set for HRMODE is not valid. It must be 0, 1 or 2.
Finally, there are a few warnings which may be displayed. The conversion will
take place but there may be situations altering the final result:
Error in env config. HTTX will use its defaults.
External configuration file has errors. HTTX will use the default settings
and the parameters passed on the command line.
Check external configuration file (see section 5.2).
ENV config 'NAME' not found.
The configuration file specified with CFG option was not found.
HTTX will use the default settings and the parameters passed on command
line.
Remember that the configuration file must be located in ENV:
Remember also to copy it again to ENVARC: when you edit it.
Found non-ASCII chars in preformatted text!
In a non formatted text section HTTX has found some 8 bit characters.
Do not ignore this warning if the converted text will be posted in
Fidonet conferences or Usenet newsgroups.
This file contains non standard HTML comment(s)!
File could be not completely converted, since non standard HTML comments
were found. If this is the case, try using BADHTML option.
Include file could not be added.
File specified with INCLUDE option can't be added because it doesn't
exist or it's not accessible.
Check file name and path.
7. FAQ (advises, interfacing to other programs and more)
--------------------------------------------------------
Q. ANSI styles (bold, italic, underline, blue) stop after first line.
A. See section 8.2.
Q. Converted text isn't centered, but in the original document it is.
A. This can happen if the text in a table row (<TR>) or cell (<TD>) is
defined as centered. To maintain compatibility to some programs used with
HTTX, this version doesn't yet supports alignment defined in those
elements. This will be added in future versions that will have more table
support. Otherwise it may be an ANSI compatibility problem
(check section 8.2).
Q. Sometimes alignment doesn't work, wordwrap and lists are not correctly
formatted or HTML TAGS are shown.
A. It's the text included in the element <PRE>. HTTX copies this text as is,
without formatting. This choice was made because often that kind of text
contains text that the document's author probably wishes to show as is.
In <LISTING> and <XMP> the TAGS are left as they are, as specifications
for those elements define. Although its use is deprecated in HTML 4.0,
<XMP> is still largely used for examples in many documents.
eg. in the Netscape JavaScript specifications.
Q. Some pages are not correctly converted...
A. There could be many reasons: layout based on tables (not fully supported,
see section 8.1), errors on HTML source (HTTX is quite tolerant, but there
are limits) or errors on HTTX engine. If you think the page is correct,
send me an E-mail with the URL.
(E-Mail: favrin@tin.it, FidoNet: 2:333/726.8)
Q. Can I use HTTX from other programs?
A. In this archive an ARexx plugin is provided to use HTTX with the AWeb
browser.
Regarding external programs, HTTX can be easily used from Directory Opus
by creating a button configured as follow (Directory Opus 4.12):
New Entry/AmigaDOS:
C:HTTX {f} {d}
(replace C: with path for HTTX)
With this configuration, a file selected from "source" directory will be
converted to text and saved to the "destination" directory.
By activating "Do all files" flag it's possible to convert more than one
file, by selecting them and clicking the HTTX button.
Q. Are there GUI for HTTX?
A. Alfonso Ranieri (alfier@iol.it) has written an ARexx script that offers a
StormWizard interface to configure HTTX.
The AWeb plugin contains a Reaction interface for configuration. A
separated version to be used from wb will be available soon.
Q. I'm using a program that refers to HTTX for printing html (eg. MoreHTML).
If I install versions newer than HTTX 1.1b the printing doesn't work any
more.
A. Various programs that use HTTX were written to work with version HTTX
1.1b. They use a command like "HTTX <filename> prt:" to print the
converted text. This doesn't work any more, especially if the ANSIMODE
option is used. HTTX versions 1.5 and newer may use some ANSI codes that
are not compatible with this printing method.
The solution is to ask author of these programs to change the template
they use to call HTTX, if it isn't directly configurable by the user.
Q. How can I improve the performances of HTTX?
A. To speed up the conversion, try using a filesystem with blocks of 1024
bytes, like RAM disk. Note that if memory is almost full or fragmented,
saving to RAM disk may slow down the conversion process.
Q. Can I make HTTX resident?
A. Starting from version 2.0 HTTX "should" be resident-able. Anyway it hasn't
been possible to fully test its behavior in this state, so, to avoid
problems to users, this option is not officially supported.
Whoever wants to try can type the following command in a shell:
Resident C:HTTX FORCE
(replace C: with path for HTTX)
8. Technical informations
-------------------------
This section talks about some thematics of HTML and its implementation in
HTTX. Although reading this is not required to learn how to use HTTX, there
is important information about conversion that you should read if you plan to
distribute your converted texts.
8.1 What is supported, what is not (yet) and implementation
-----------------------------------------------------------
Supported HTML:
- Entities described in RFC 1866, © and ® (NHTML), numeric entities
(both decimal and hex), Win'95 numeric entities and special characters.
- Separators (<CENTER>, <DIV>, <BR>, <P>, <HR>) and font height change
(from <H1> to <H6>).
- Alignment (center and right) of text (headers and paragraphs) and
separators.
- Physical and logical styles.
- Numbered lists (<OL> with possible START attribute) of numeric,
alphabetical and mixed type, unnumbered (<UL>) and definition (<DL>) lists
to a maximum of 255 levels.
- Document title (<TITLE>).
- Links (<A>), user maps and inline images (<IMG> with optional
ALT-text).
- Pre-formatted text (<PRE>, <XMP> and <LISTING>).
- Scripts (content of the<SCRIPT> TAG).
- Non standard use of "<" and ">" in a preformatted text (this may be
changed).
What is (not yet fully) supported:
- Tables (<TABLE>). Currently each table cell is treated as a separate
document with its alignment, list indentation level, styles and so on.
- <APPLET>, <STYLE> and <SELECT>: content of these elements is skipped.
Implementation of the standard:
- Unknown TAGS are ignored.
- Double spaces, trailing and leading blanks for each line are removed.
- Unprintable ASCII chars (lower than 32) are converted to spaces.
- PC end-of-lines (CR+LF) and MAC (CR) are converted to Amiga format (LF).
- For better readability of text in tables, <TD> is converted to a space and
<TR> TAGS are converted to EOL. Between two <TR> (a table row) a
maximum of one separator (<HR>) is shown.
- Consecutive EOLs are reduced to one EOL (except for <BR>).
These rules are followed except in <SCRIPT>, <PRE>, <LISTING> and <XMP>
elements). See section 8.3 for more informations about this.
8.2 Notes about ANSI conversion
-------------------------------
If ANSI option is enabled (values 1 or 2) HTTX uses ANSI escape sequences for
converting HTML styles (such as bold, italic, underlined and so on...), links
(rendered as underlined blue), centering and indentation of text.
The ANSI codes used are taken from standard ANSI specifications and should be
supported by any program (should be...).
These are the sequences (ESC is replaced by "\e"):
Bold \e[1m
Italic \e[3m
Underlined \e[4m
Color (Blue) \e[33m (\e[34m with the PRINT option)
Multiple ANSI definition is done using the ";", ie. to set bold and italic
HTTX uses "\e[1;3m".
If ANSIMODE option gets value 2, for list indentation and text alignment,
HTTX uses the cursor position sequence "\e[nnC" where nn is the number of
characters to move right. This sequence is not used when printing.
Compatibility problems:
- ANSI standard says that the end of line doesn't cause a style to stop, so
if a style is terminated after first line the problem is in the program
used to display text. Standard shell and MultiView works correctly.
- Some text viewers (like Multiview) don't support VT100 cursor positioning
sequences that are used when ANSIMODE=2 (optimized ANSI) is enabled. If a
page contains lists or aligned text that appears badly converted, try to
convert it using ANSIMODE=1 option or use a different text viewer.
8.3 Notes about conversion of <PRE>, <XMP>, <LISTING> and <SCRIPT> contents
---------------------------------------------------------------------------
Rules specified for implementation of the standard (section 8.1) wordwrap of
text and 7 bit conversion of 8 bit characters aren't totally valid for some
elements. The HTML specifications require them to be treated differently.
- <PRE> element (preformatted text)
In this mode wordwrap is not done. Parsing of HTML TAGS (except lists and
alignment) and entities is done. Numeric entities lower than 32 aren't
converted to avoid problems with uuencoded files contained in HTML pages.
- <XMP>, <LISTING> and <SCRIPT> elements.
The contents of these elements are left unchanged. No wordwrap, entities
conversion, or TAGS parsing is done at all. If the 7BIT option is used,
ASCII characters lower than 32 are converted to spaces. ASCII characters
higher than 128 are left as they are and Win'95 entities are not remapped.
Remember this, if the converted text will go on message areas, like Fidonet
or Usenet newsgroups!
9. How to contact the Author
----------------------------
Beyond every communication, problem, bug report, advise or other things, a
comment about HTTX will be appreciated, and information of actions toward
corporations who takes care of children (see CHILDWARE).
E-Mail : favrin@tin.it
FidoNet: 2:333/726.8
Please write me in italian or english, thank you.
HTTX support page:
http://www.freeweb.org/varie/poing/soft/httx/index.html
10. Greetings
-------------
Beta testing:
William Parker bill@amitrix.com
Giuseppe Pasanisi amicus@net-service.net
Neil Bothwick aweb@wirenet.co.uk
Marco L. Buschini shido@mclink.it
Claudio Mazzuco kirk@maya.dei.unipd.it
Giuseppe Ammendolia ryuga@freenet.hut.fi
English documentation and spell check:
Fabio Belli zak@anturio.com
William Parker bill@amitrix.com
Beta testing of AWeb Plugin:
William Parker bill@amitrix.com
Neil Bothwick aweb@wirenet.co.uk
Dale Currie dalec@zorro.amitrix.com
Special thanks:
To Yvon Rozijn for having wanted HTTX inside AWeb II and for all the help he
gives me!
To Bill Parker for all the help, his suggestions and his friendship.
To Wouter van Oortmerssen for the splendid AmigaE and to Tomasz Wiszkowski
for its evolution, CreativE.
To Enrico Altavilla for his programming hints.
To the italian music group Pooh (http://www.pooh.it) for the emotions they
gived me for years with their music. Maybe HTTX is born thanks to them too...
Finally special thanks to those who wrote me about HTTX and to those who use
it!
11. Program history
-------------------
V2.0b (December 1999)
-Fixed a bug that could cause insertion of incomplete ANSI codes
in standard ANSI conversion.
+Made the source compatible with the new CreativE compiler.
V2.0 (September 1999)
+Reorganized in a modular way the HTML parser, now faster and more
expandable.
+Style HTML tags are now recognized only when ANSI conversion is enable,
thus making non ANSI conversion a bit faster.
+Enlarged the file writing buffer. This makes HTTX faster on fast devices
(such as RAM or SCSI HDs) but slower on slow ones (like Zip drives).
+Added full support for alphabetical and mixed (numeric/alphabetical)
lists.
+Added options ANSIMODE (choice of the used ANSI mode), ANSICOL (colour of
the links), TDEOL (insertion of an EOL between the cells of a table) and
SETNOTE (type of the file note added to the saved file).
+Reorganized the display of current settings.
+Content of the <SELECT> element is now skipped.
+If ANSI conversion is enabled, content of the <TH> element is now rendered
in bold.
+Added an easter egg ;-)
-Fixed (hopefully) all the wordwrap bugs and their side effects, such as
duplication of part of the text, bad line length sizing, and so on...
-If SCRIPT option was enabled a bad output file length could have been
showed.
-After an ANSI reset, with underlined text (<U> HTML element), text could
have been showed in underlined blue.
-Improved ANSI compatibility while printing: now starting spaces are always
preceded by an ANSI reset sequence.
-The settings showed when no argument was given are bad.
-Error 'ENV config 'NAME' not found.' could have been showed even with
options QUIET or STDIO.
-Each cell in a table could contain only one <HR>.
-FILENOTE option disabled insertion of URL and TITLE of the document even
when NOHEADER option wasn't used.
-When printing, links was showed in yellow.
V1.7a (June 1998)
+Added alias PRT for PRINT.
-Fixed a dangerous bug that could cause HTTX to crash when a (too) long
line was aligned.
-Fixed an error in wordwrap that caused layout problems after some lines.
V1.7 (January 1998)
+Rewritten in clearer way various important parts of the source code like
wordwrap and ANSI management.
+Made HTML parser more SGML compliant.
+Improved table support by managing each table cell as a separate document.
+Added support for hex numeric entities (&#xnn) up to 255 and Win'95
numeric entities and special characters in the range 128-159.
+Added support for <SELECT>, <XMP> and improved <LISTING> one.
+Now text into a <BLOCKQUOTE> (or <BQ>) element is indented.
+In <PRE> mode numeric entities (&#nn) lower than 32 are left as they are.
+Added options SCRIPT and INCLUDE.
+Now when option NOHEADER is used the HTTX version string is not added.
+Improved support for bad HTML coding.
+And many other things!
-Fixed various little aesthetical bugs and bugs related to bad HTML coding
in documents.
-Fixed many little problems with old wordwrap (the more relevant was that
if a line didn't contain spaces and ended with an entity, wordwrap didn't
work correctly).
-Numeric entities lower than ASCII 32 weren't converted to spaces.
-In case of bad arguments, false errors messages were displayed.
-Last character of file could be skipped in some cases.
-Various other bugs wiped out!
V1.5 (May 1997)
+Speeded up very much the program, due to a complete rewrite of HTML parser
and optimization of many functions.
+Added options HRMODE, NOALIGN, PRINT, APPEND, NOCFG, CFG.
+Added alignment support (center and right) of text and separators.
+Added support for various TAGS and HTML attributes (like START in numbered
lists).
+Optimized the ANSI output.
+Added support for ANSI separators.
+Added support for external configuration.
+Improved entities support, now identified, despite of closing character.
+More clear HTTX internal error codes. Custom DOS error messages replaced
with PrintFault() of AmigaDOS.
+Added support for "HTTX source_filename destination_path".
+Improved support of separators in tables.
+Now HTTX follows with higher fidelity HTML DTD for many TAGS.
-Fixed some bugs in wordwrap.
-Fixed all signalled bugs in version 1.1b.
+... Many many other improvements!!!
V1.1b (January 1997)
-Fixed management of some entities.
V1.1a (January 1997)
-Removed a stupid and rare bug in TAGS management.
-Fixed support for MAC EOLs.
V1.1 (November 1996)
+Improved speed.
+Added STDIO, QUIET, BADHTML, NOHEADER, GETNOTE options.
+Added AmigaDOS return codes support.
+Added <ADDRESS> and <LISTING> TAG support.
+Modified <BR> and <LI> management, as requested by HTML 3.2 standard.
+Rewritten entities management: now they will be converted even if they
are not closed with ";".
+Extended 7 bit conversion, now faster, more complete (almost all
characters are converted) and more accurate (no accent letters are left in
words, for example "HTTX è bello" becomes "HTTX e` bello" while
"Belphégor" will become "Belphegor".
+As popular demand, default 7BIT option is now OFF.
+Improved support for multiline comments and badly written HTML.
+Added support for TAGS or ALT-Text that contains LF or "<" and ">".
-Fixed many little aesthetical bugs in conversion (double spaces in some
cases, bad chars exiting from <PRE> and <SCRIPT> and other).
-Fixed an error that could cause a lock on file to convert.
-Fixed an error in FILENOTE option.
V1.0 (July 1996)
First public release.
12. Future versions
-------------------
HTTX is a program in continuous growth, because I use it daily, I notice
lacks or possible improvements. This version will be the basis for a new even
better version that will be out... soon.
-----------------------------------------------------------------------------
*** CHILDWARE ***
This software is "CHILDWARE". The author explicitly asks whoever uses this
program to make a donation to a beneficial corporate body which works helping
children in some way.
If you don't know of any, ask at your local post office and inform yourself
on how to make a donation to UNICEF.
The amount of the offer is up to you, but please do it!
-----------------------------------------------------------------------------