home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
AmigActive 13
/
AACD13.ISO
/
AACD
/
Online
/
InetDial
/
translator42.lha
/
Translator
/
Docs
/
translator.man
< prev
next >
Wrap
Text File
|
1995-05-16
|
17KB
|
511 lines
translator.library - Version 42.1
Copyright (c) 1995 Francesco Devitt
April 10, 1995
29a Kinghorne St
Strathmore Telephone: +64 4 388 3215
Wellington Internet: ffranc@comp.vuw.ac.nz
New Zealand
1 Status
Version 42.1 of the translator library is not in the public domain. Source
is not available. The library and accent files are freely distributable
provided no profit is made from them. Accent files may have additional
or separate restrictions placed on them by their authors. Note that the
American and English files have been derived from work by the Naval
Research Laboratory, USA - these may not be used commercially.
2 Introduction
With versions of the Amiga OS before 2.1 Commodore supplied text-
to-speech software on the Workbench discs. This library replaces the
Commodore supplied translator library. The original translates text to
phonemes for use with the narrator device. It is used by programs that
produce speech output such as `Say' and `Term'. Unfortunately for non-
American users the original library translates all text as if it were
American English. It does not handle other languages or dialects well.
Version 42.1 of the translator library is a drop-in replacement and
works with all software that currently uses the Commodore speech sys-
tem. With this version of translator library the user can specify which
language the translator should use. The library translates text faster
than the original and, when provided with an accent file extracted from
the original library, to the same quality. It is not difficult to write
translation files for most languages - with the exception of languages
like English and French which have more exceptions than most. I hope that
users of this system will be motivated to create accent description files
for the language(s) they speak.
The accent may be selected with the use of the Translator preferences
program, or alternatively by the use of accent hints within the text to
be translated. See the section on the `scope' directive.
The translator is implemented as a shared library, so what is shared?
o A sempahore to control access to shed data
o A list of loaded accent files
o The loaded accent files themselves
o The library code
o ENV:Sys/Translator.prefs specifying the default accent
The following items are not shared, each time the library is opened
new copies are made.
o The currently selected accent
o A stack of selected-accent scopes
o Other open library pointers that are needed
o The library base
3 Requirements
o For the translator library to be of any use the narrator device is
also necessary. This was supplied with the Amiga operating system
up to version 2.04. All versions of the device still appear to
work with 3.0 systems. Unfortunately the narrator device pronounces
everything in an American accent and does not support some phonemes
needed to pronounce some languages (including English). But at least
using the new translator library the pronunciation will be closer to
what it should like.
o This library works with any Amiga system with at least version
1.3 of the OS. There are three versions supplied, one that runs on
any system, one for systems with OS versions 2.04 or better, and
one for such systems with a 68020 or better processor.
o The locale library is used if available but is not required.
o The installation script requires Commodore's `Installer' pro-
gram, supplied with KS 2.1 and above, and available from Aminet.
o Note that the `Say' command provided with the Amiga OS 2.04
appears to have a bug in the GUI when used with KingCON. This
cause strange window update behaviour. I say this because with
the release of this library it may be the first time you use the Say
command.
4 Installation
Click on the Install icon and follow directions. This will install the library
and associated files on to your Workbench disc named `SYS:'. This may
be a hard disc.
If you do not have the Installer program you are on your own. Basi-
cally this is what you need to do:
Copy "XXX_translator.library" to Libs:translator.library
Copy #?.accent to LOCALE:Accents/
The version of library to install depends on your system. The `Libs/'
directory contains 3 library files. The library with a `v33' prefix
is designed for systems with any CPU and an OS version before 2.04
although it will work on any Amiga system. Prefix `v37' is for
systems with an OS version 2.04 or better and any CPU. `020' is
for systems with a 68020 or better and Amiga OS 2.04 or better.
For some reason the 020 version is larger that the v37 version.
It *is* very *slightly* faster though.
The `Translator' preferences program and the `translate' and `flushlib'
utilities do not work with versions of the Amiga OS before 2.04.
5 Translator preferences program
The Translator prefs program does not work with pre-2.04 OS versions.
It is run by double clicking on its icon. In addition to allowing the user
to select the accent to be used, it will allow the user to have a test string
translated by the library and spoken by the narrator device. The translator
preferences program was made using GadToolsBox and is not localised.
This command also controls how a new accent may be selected in the
text to be spoken. By default braces are used to begin new scopes and
backslash and space used to delimit the names of accents. For example:
\english
Hello there my name is {\maori Hone Ropata}
and I am \maori{Maori.}
In this example the text is pronounced as English except for the
Maori words. Changing the accent is effective until the scope defined by
the braces is closed. If the scope starts immediately after the accent is
changed that change is as if it were within that scope and ends when
the scope ends as in the second `\maori{Maori.}' case above. Changing
the accent outside any scope is effective until the application using the
library quits. Note that these changes are per open library.
Note that the after-accent-name character may be specified as null
(ie: "") which means any non-alphanumeric character terminates the ac-
cent name. This is the default. To deactivate the change-accent feature
set all characters to the empty string.
6 Utilities
NAME
accent
USAGE
accent [ SET _ LOAD ] <language-file>
Systems with OS 2.04 or better should NOT use this command
as it does not allow the scope/accent codes to be specified. Use the
translator preferences instead.This command loads and/or sets the cur-
rent language. Loading a language brings it into the libraries inter-
nal cache. This is also useful when debugging a language file; the
language in the cache that contains the same name will be replaced.
Setting a language indicates which language should be used and modi-
fies `ENV:Sys/Translator.prefs'. For pre-2.04 systems this utility and
setting `ENV:Sys/Translator.prefs' are the only ways of selecting the
accent.
NAME
extract
USAGE
extract [ <pre-V42-translator.library-file> ]
This command reads in the specified pre-V42 translator library (or
libs:translator.library) and extracts from it a language file that produces
the same output as the origional translator library. The result is output
to standard output and may be redirected to "LOCALE:Accents/american.accent".
This command also works with libraries produced by `TLpatch'.
NAME
flushlib
USAGE
flushlib <library-name> [ REMOVE ]
Flushes the specified library from memory if currently un-opened.
If a process currently has the library open, it will be flushed when all
processes have closed it. The REMOVE keyword specifies that the library
should not be flushed, only removed from the Exec library list. This
should only be used if a library has crashed. This command does not
work with pre-2.04 systems.
NAME
translate
USAGE
translate [ LIB <library-name> ] text ...
Translates the text into phoneme form using the `translator.library'
(of whatever version). An alternative library may be specified. Note that
this must be renamed; both its filename and the name found within the
file using AZap or a similar utility. In other words the system does not
allow two libraries with the same internal name to be loaded at the same
time. This command does not work with pre-2.04 systems.
7 Accent file format
The files are to be kept in `LOCALE:Accents/' and have a `.accent' ex-
tension. They are named `XXX.accent' where the `XXX' is replaced with
`american', `italiano', `cymraeg', `english', `deutsch', `latinus', `jive',
etc. That is, it should be the written name of the language in the lan-
guage. At a later date they may be accompanied by `.voice' files which
will contain information for a new implementation of the narrator device
that someone (else) might write. Each line of the file may be one of four
things:
1.blank lines are ignored
2.comment lines beginning with `#' are ignored
3.directives begin with `%'
4.all other lines are pronuciation rules
Spaces are ignored except where they separate arguments to `com-
mands. Spaces are only significant if enclosed in double quotes (") or
escaped with a backslash (\). Characters that would otherwise have
special meaning may be used anywhere if they are preceded with a back-
slash.
7.1 Directives
Directives in an accent file are introduced by a percent character (%)
followed by the name of the directive and its arguments.
DIRECTIVE
stress
SYNTAX
%stress <N>
Syllabic stress can be automatically added to words that do not al-
ready contain stress marks (digits). This command controls whether
stress is added and which syllable should be stressed. Stressed syllables
have the stress value of 4 added unless this is changed by the `emphasis'
directive.
For example, the English accent file contains a `%stress 1' command
which indicates that the 1st syllable is to have stress added. Unfor-
tunately this is only occasionally the right thing to do in the English
language! Other languages are more regular. Italian and Welsh nearly
always put the stress on the penultimate syllable.
Negative values for N indicate a count of syllables backward from the
end of a word. For example, -2 indicates that the penultimate syllable
should have stress added. Here are some examples:
%stress 1 # Stress the first syllable as in English
%stress -2 # Stress the penultimate syllable as in Welsh
%stress 0 # Do not automatically add stress
Words already containing stress indicating digits, and words con-
taining a back-quote (`) will not have stress added. The symbol `#' has
special meaning in that is separates groups of phonemes into "words"
in so far as the syllable count is concerned but does not appear in the
resulting phoneme string.
DIRECTIVE
emphasis
SYNTAX
%emphasis <N>
Specifies the level that the `stress' directive applies to words.
DIRECTIVE
class
SYNTAX
%class <member> [ <member> ... ]
Classes are sets of lists of characters which may be tested against
in a pattern. The members may contain more than one character. For
example Vowels may be declared as:
%class vowel a e i o u y
%class suffix e ely er ent
DIRECTIVE
complain
SYNTAX
%complain <level>
Specifies what should be done if the right-hand-side of a rule is an
invalid phoneme string: 1 = do not check, 2 = give a warning for each
errorneous phoneme, and 3 = stop processing the language, consider
it an error. This command is implemented so that if a new translator
device is made, with extra phonemes this library will still work.
DIRECTIVE
separator
SYNTAX
%separator <string>
This commands lists the *characters* used to separate words. The
default consists of space, end of line, tab, full stop, comma, question
mark, exlamation mark, colon, left/right parentheses and semicolon.
These may be redefined. For example:
%separator " \n\t,.()-"
Note that `\n' represents a new line character and `\t' a tab. Note
also that the whole string is in double quotes bacause it contains a space.
7.2 Context Rules
Rules are of the form:
<left-context> [ <match-string> ] <right-context> = <phonemes> OR { <text> }
When processing a character of a word, each rule for which the match
string matches has its left and right contexts compared with the left and
right context strings. If they match the phonemes on the right hand side
are output OR the text enclosed in braces is then translated.
The match string must not contain pattern codes, only text.
THE ORDER OF CONTEXT RULES IN THE FILE IS SIGNIFICANT.
The rules are searched in the order they appear in the file. This
means that a general rule should appear after a more specific rule. It
also means that the language file may contain rules which can never be
used. These are called bugs!
The left and right contexts are strings which may contain pattern
codes. These include:
(<class>) Must match one member of class
(<class>+) Must match one or more members of class
(<class>*) Must match zero or more members of class
(<class>~) Must not be a member of class
$ Must be a number of word separator
Other characters must match exactly. Case is ignored; so `a' and `A' are
the same, as are `ë' and `Ë'. For example a right context consisting of
`(vowel+)(suffix)$' will match one or more vowels followed by a suffix
and then an end of word separator. If no rule is found for a particular
accented character the the character will be de-accented and the system
will try again.
A backslash may be used to include characters such as `$' and `('
that would otherwise have special meaning. Note that the left context
is searched from the matched string backwards. This means that the
pattern `a(vowel+)' can never match as a left context. The `(vowel+)'
will have consumed all the a's possible.
The right context may be recursive. If it starts with `{'
and ends with '}' the contents are not treated as phonemes. Instead
they are translated as text. Spaces within the braces are significant,
for example:
[1] = { one }
[2] = { two }
There may be language changing directives within the braces,
for example in the Italiano accent file the following might be
found:
[computer] = {\english computer}
7.3 Phonemes
The problem with me reproducing the list of what the phonemes sound
like, is that the examples are in English. eg: IY as in `beet'.
To tell what they really sound like type:
`echo > SPEAK:opt/a1 <phonemes>'
The phonemes used by Translator and Narrator systems are docu-
mented in the Devices RKM. The list is reproduced here. Take special
note of the QX, Q and DX phonemes which are especially useful for
non-english languages.
There are some phonemes which are not supported by the device,
including the non-dipthongal U and aIR sounds. It also has problems
pronouncing the T in thirty properly. Both THER4TIY and THER4DIY
are pronounced as the latter. Doubling the T as in THER4TTIY sounds
fine.
Vowels
IY bEEt, EAt
IH bIt, In
EH bEt, End
AE bAt, Ad
AA bArgain, tArget
AH tUg, bUg, bUt, Up
AO shORE, wAR
UH bOOk, sOOt
ER bIRd, EArly
OH bOrder (sounds like the letter 'O' when used by itself)
AX About (never stressed)
IX solId (never stressed)
Dipthongs
EY bAY, AId
AY bIde, I
OY bOY, OIl
AW bOUnd, OWl
OW bOAt, OWn
UW brEW, bOOlean, pOO, crEW (except that it is a dipthong)
Consonants
R Red
RX Red (This is not mentioned in RKRM:Devs)
W Wag
M Men
NX siNG
S Soon
F Fed
Z haS, Zoo
V Very
CH CHeck
/H Hole
B But
D Dog
K Keg, Copy
L Long
LX Long (This is not mentioned in RKRM:Devs)
Y Yellow
N No
SH SHy
TH THin
ZH pleaSure
DH THen
WH WHen
J JuDGE
/C supposedly loCH, or (german) baCH, but really like CHurCH
P Put
T Toy (except before IY when it is pronounced D)
G Guest
Others
DX piTY (tongue flap)
Q kitt(Q)en (glottal stop)
QX (Silent vowel - can lenghten the previous vowel)
Contractions
UL AXL
IL IXL
UM AXM
IM IXM
UN AXN
IN IXN
Symbols
Digits 1-9 Syllabic stress
. Sentence final character
? Question sentence final character
- Phrase delimiter
, Clause delimiter
() Put parentheses about noun phrases
Translator
` Do not add stress marks to this word
# Word boundard for the purposes of adding stress marks