home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Source Code 1992 March
/
Source_Code_CD-ROM_Walnut_Creek_March_1992.iso
/
usenet
/
altsrcs
/
1
/
1128
/
README.8bit
Wrap
Text File
|
1990-12-28
|
14KB
|
398 lines
SUMMARY
I have modified GNU Emacs version 18.55 to handle many 8-bit
character sets, including the ISO 8859 character sets. For each
character, it is possible to customize the byte(s) sent to the
terminal to display that character. X11R4 is also supported, to
an extent. Case determination, case changing, and sorting can
all be customized. Input facilities are primitive.
DISCOURAGEMENT
Emacs version 19 will support 8-bit character sets. That
support is based on my modifications, but there will probably be
some differences between the 8-bit character set support in this
modified version 18.55 and the support in version 19.
Therefore, if you can wait for version 19 I urge you to do so.
Richard Stallman says he does not know when version 19 will be
available.
This is alpha-test software. It has known bugs. I'm posting it
to alt.sources to emphasize that, and to avoid having it
archived. If you don't know your way around GNU Emacs, please
don't try to install it. I don't have time to provide support.
(But please send bug reports anyway.)
Input support is primitive. X windows support is for X11 only,
and is incomplete.
CHARACTER SETS SUPPORTED
So I haven't scared you off yet. OK, you were warned. My
modifications allow GNU Emacs to handle any character set
provided that each character is represented by exactly one 8-bit
byte, and the codes for space, newline, and horizontal tab are
the same as in ASCII. Now for some definitions.
DEFINITIONS
A glyf is something that takes up exactly one position on the
display of a terminal, terminal emulator, or window system. For
example, 'a' is a glyf, as is a yellow, blinking, underlined '7'
on a red background. It may be necessary to transmit many bytes
to a terminal to display one glyf. A rope is a sequence of
glyfs. (The name is an analogy to string, which is a sequence
of characters.) For example, the glyf '^' followed by the glyf
'C' forms a rope of length 2. Glyfs are represented as unsigned
16-bit integers. Ropes are represented as vectors of glyfs.
CHAR TABLES
There's a new lisp object: char tables. A char table specifies,
for each 8-bit character, the rope to use to display that
character. Char tables are associated with windows, not
buffers, so one buffer can be displayed in several different
windows with several different char tables.
CASE TABLES
Another new lisp object, case tables, specify for each 8-bit
character the case: upper, lower, or none.
SORT TABLES
Another new lisp object, sort tables, specify for each 8-bit
character its sorting position. Sort tables are also used for
searching. Special sort tables can be set up, for example, to
ignore diacritical marks when searching.
TRANS TABLES
Finally, trans tables are lisp objects that map each 8-bit
character into some other character. They are used for case
conversion, and can also be used for character set conversion.
ISO 8859/1 SUPPORT
I include support for displaying ISO 8859/1 characters. On
ASCII terminals they display as various ropes, e.g. A with grave
accent displays as {`A}. If your terminal can display some of
the characters correctly, e.g. by using shift-out and shift-in,
then you can write a lisp/term file to do that. I include as an
example lisp/term/fa4440a.el for the Facit 4440 Twist terminal
with a Swedish PROM. If your terminal (emulator) provides full
ISO 8859/1, you can just send 8-bit characters to it directly.
See the code in lisp/term/x-win.el starting with "(if (fboundp
'get-glyf)" for an example.
SWEDISH SUPPORT
I include support for Swedish as an example of language
support. This includes a swedish mode analogous to text mode,
and sort tables for Swedish alphabetical order.
INPUT
Input is kludgy. The file lisp/iso8859-1-insert.el defines
little functions to insert each non-ASCII ISO 8859/1 character.
These are put into the global keymap under C-x 8, which is
supposed to be mnemonic for 8859. So e.g. "C-x 8 ` A" runs
insert-A-grave. This is OK for infrequently used characters,
but for those you use often I suggest you use programmable keys
on your terminal, if possible. For example, Swedish uses o with
umlaut a lot, so I have one of the programmable keys on my
terminal set up to transmit "C-q 3 6 6". Using C-q also means
this works with e.g. incremental search, not just for
inserting.
Here's what I do on my Facit 4440 Twist:
1) Press Setup
2) Press 5 to enter Setup B mode
3) Press F4 C-q 3 4 5 C-Return
Press F5 C-Q 3 4 4 C-Return
Press F6 C-Q 3 6 6 C-Return
Press F7 C-Q 3 5 1 C-Return
Press F8 C-Q 3 7 4 C-Return
Press Shift-F4 C-Q 3 0 5 C-Return
Press Shift-F5 C-Q 3 0 4 C-Return
Press Shift-F6 C-Q 3 2 6 C-Return
Press Shift-F7 C-Q 3 1 1 C-Return
Press Shift-F8 C-Q 3 3 4 C-Return
4) Press S to save everything in nonvolatile memory.
This puts a with ring on function key 4, a with umlaut on F5, o
with umlaut on F6, e with acute accent on F7, and u with umlaut
on F8.
X WINDOWS SUPPORT
Only X11 is supported, not X10. I've only tried this on X11R4.
Eventually, the idea is for each glyf, which is really just an
unsigned 16-bit integer, to be treated as two bytes. The low
order byte selects one face code in a font, for example 'g'.
The high order byte selects a graphic context (GC). But for
now, there's only one GC.
For input of frequently-used characters I just hacked
stringFuncVal in src/x11term.c. You may wish to do the same.
Many of the X11R4 fonts advertised as ISO 8859/1 don't really
contain all the characters; 7x14 does, so that's what I use for
now. Here's another font to try:
>From: jw@sics.se (Johan Widen)
>Newsgroups: comp.windows.x
>Subject: eightbit version of the 'fixed' font available
>Message-ID: <1990Mar9.164011.1775@sics.se>
>Date: 9 Mar 90 16:40:11 GMT
>Distribution: comp
>Organization: Swedish Institute of Computer Science, Kista
>
>An eightbit version of the X11R4 'fixed' font (also known as 6x13) is available
>for anonymous ftp from
> sics.se (192.16.123.90)
>in the compressed tar file
> archive/fixed.bdf.Z
>
>The glyphs below 128 are unchanged. The ISO-8859-1 characters from 160 to 255
>have been added.
>
>I'm interested in any improvements/fixes that you make to this font.
>
>--
>Johan Widen
>SICS, PO Box 1263, S-164 28 KISTA, SWEDEN Internet: jw@sics.se
>Tel: +46 8 752 15 32 Ttx: 812 61 54 SICS S Fax: +46 8 751 72 30
OTHER APPLICATIONS
These modifications have other uses than supporting 8-bit
character sets. The file lisp/emphasis.el uses the high bit to
indicate emphasis, e.g. underlining, of 7-bit ASCII. A hook in
lisp/man.el then displays italicized test in manual entries with
emphasis if possible.
The file lisp/rot13.el contains a disgusting hack that displays
a buffer in another window, but with a rot13 char table. I
really use this when reading rec.humor.funny with Gnews.
If you don't like unprintable characters to be displayed in
octal, you can change to hex or whatever.
RELATED SOFTWARE
My cz system lets you print ISO 8859/1 text on PostScript
printers. It interfaces to GNU Emacs. To get it, get these
articles from your nearest comp.sources.misc archive:
cz comp.sources.misc volume 8 issues 65-75, 77-78 ( 1 Oct 1989)
issue 97 (28 Oct 1989)
libhoward comp.sources.misc volume 8 issues 80-87 ( 1 Oct 1989)
issue 96 (28 Oct 1989)
BUGS
It should be possible to format texinfo files into info files by
doing this (e.g. for cl.texinfo):
% cd man; emacs -batch -funcall batch-texinfo-format cl.texinfo
texinfo formatting /usr/local/free/gnu-emacs/18.55i/man/cl.texinfo...
Formatting Info file...
Making tags table for Info file...
>> Error: (void-variable This)
>> point at
>> Info file: cl, -*-Text-*-
>> produced by texinfo-format-buffer
>> from file: cl.texinfo
>> Copyright (C
But that gives the error shown. However this works:
% emacs -batch -load info -funcall batch-texinfo-format cl.texinfo
To the first person who supplies me with a fix for this bug, I
offer a color portrait of the Swedish Royal Family, with a
genuine Swedish postage stamp on the other side.
INSTALLATION
Start with a copy of GNU Emacs 18.55 as distributed. Parts 1
through 4 are shar archives; unshar them.
Two of the lisp files have high-order bits set. They are
encoded with Brad Templeton's abe system, which was posted to
comp.sources.misc on 4 June 1989 as volume 7, issues 1 and 2,
archive name abe. To extract them, you must have the dabe
command. Do:
% cd lisp
% dabe el.abe
% cd ..
Parts 5 through 12 are context diffs. Parts 11 and 12 are
together the diffs to man/emacs.tex; they must be concatenated.
Apply the diffs with patch.
Now install Emacs as usual. When byte-recompiling the elisp
code, it may be necessary to load case-table.el, char-table.el,
sort-table.el, and trans-table.el first. Be sure to
byte-compile all the new .el files you intend to use. Here's
the complete list:
case-table.el
char-table-vt100.el
char-table.el
emphasis.el
iso8859-1-ascii.el
iso8859-1-insert.el
iso8859-1-swedish.el
iso8859-1.el
rot13.el
sort-table.el
swedish.el
trans-table.el
term/id100.el
term/fa4440a.el
term/fa4440b.el
You'll probably want to load some character set and language
support from lisp/site-init.el. For example, ours starts like
this:
(load "iso8859-1")
(garbage-collect)
(load "iso8859-1-insert")
(garbage-collect)
(load "swedish")
(garbage-collect)
CHANGES
Here's a brief summary of what I changed in each file. In src:
abbrev.c: expand-abbrev: Use casetab.h macros.
Use HYPHEN.
alloc.c:
GC case, char, sort, and trans tabs.
buffer.c:
reset_buffer_local_variables: Initialize case_table_v, etc..
Drop selective_display_ellipses.
buffer.h:
Add case_table_p, etc. & buffer_char_table. Drop ctl_arrow.
casefiddle.c: casify_object & casify_region: Use casetab.h macros.
config.h-dist: Add 30000 to PURESIZE.
cmds.c: Use chartab.h macros.
data.c: Add arg_out_of_range.
dired.c: Use standard_downcase_table_p instead of downcase_table.
dispextern.h: Change char to glyf_t.
dispnew.c: Use chartab.h macros. Change char to glyf_t. Check for X
windows in chartab.c now.
editfns.c: Use casetab.h & chartab.h macros.
emacs.c: Call init_case_table_once, init_char_table_once,
syms_of_case_table, and syms_of_char_table.
fileio.c: #include casetab.h
fns.c: Add string-lessp*.
indent.c:
Use chartab.h macros.
Use char table to compute lengths instead of hard code.
Drop selective_display_ellipses.
keyboard.c: Use ROPE_LEN to check if direct insertion OK.
lisp.h:
Move case macros to casetab.h.
Add Lisp_Chartab and related definitions.
minibuf.c: Use casetab.h macros.
process.c: Use transtab.h macros.
print.c: Print out char tables.
regex.c: Drop translate.
regex.h: Use sort table when compiling pattern.
scroll.c: lisp.h must be included before dispextern.h.
search.c: Remove downcase_table & compute_trt_inverse.
syms_of_search: Remove initialization of downcase_table.
Use NEWLINE.
term.c: char -> glyf.
termchar.h: Replace vector DCICcost by function.
termhooks.h: {insert,write,delete}_chars_hook ->
{insert,write,delete}_glyfs_hook
window.c:
Add window-char-table & set-window-char-table.
Save char tables for saved windows.
window.h: Add window_char_table.
xdisp.c:
Use chartab.h macros. char->glyf.
Drop selective_display_ellipses.
x11term.c: char->glyf
ymakefile: Add new files and include dependencies.
In lisp:
keypad.el: Add backtab code. Comments.
man.el: Add manual-entry-hook. Default to default-manual-entry-hook,
which removed underlining and overstriking.
mlconvert.el: Changing control-code display is different.
rmail.el: Run rmail-get-new-mail-hook after getting new mail.
sendmail.el: Run mail-send-hook just before sending mail.
sort.el: string< -> string-lessp*
text-mode.el: (provide 'text-mode)
term/x-win.el: direct-map high-order ISO 8859 bits
In etc:
NEWS
makedoc.com
In man:
emacs.tex
EMAIL
Here's how I read and send email in ISO 8859/1 while still
living in a 7-bit (ISO 646) world. I run Chip Salzenberg's
deliver program. My .deliver file looks like this:
cat $HEADER $BODY | 78seus | deliver -n "$1"
echo DROP
(OK, I'm lying. My real .deliver file also saves a copy of
incoming messages. Also, it has absolute path names to 78seus
and deliver, because they're not in /usr/bin. But you get the
idea.)
The 78seus filter is part of my cz system (see above). It
converts mixed English and Swedish to ISO 8859/1. Cz also has
one for Danish, plus a paper on how to make your own.
I then read mail with GNU Emacs rmail mode, as usual.
When sending mail I write it in ISO 8859/1 in Emacs sendmail
mode. Just before sending it, sendmail runs mail-send-hook,
which is set in lisp/swedish.el to call the function
8859-to-swascii-buffer. This function maps the ISO 8859/1 to
ISO 646.
Deliver was posted to comp.sources.unix on 16 October 1989 as
volume 20, issues 23 through 26, archive name deliver2.0. These
are the patches I know about:
1 comp.sources.unix volume 20 issue 27 (16 Oct 1989)
2 comp.sources.bugs,comp.mail.misc 15 Dec 1989
3 comp.sources.bugs,comp.mail.misc 15 Dec 1989
4 comp.sources.bugs,comp.mail.misc 15 Dec 1989
5 comp.sources.bugs,comp.mail.misc 19 Dec 1989
6 comp.sources.bugs,comp.mail.misc 19 Feb 1990
7 comp.sources.bugs,comp.mail.misc 7 Mar 1990
8 comp.sources.bugs,comp.mail.misc 7 Mar 1990
9 comp.sources.bugs,comp.mail.misc 7 Mar 1990
--
Howard Gayle
TN/ETX/TT/HL
Ericsson Telecom AB
S-126 25 Stockholm
Sweden
howard@ericsson.se
uunet!ericsson.se!howard
Phone: +46 8 719 5565
FAX : +46 8 719 8439