home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Audio Version 4.94
/
audioversion4.94knowledgemediaresourcelibraryoctober1994.iso
/
amiga
/
speech
/
animan41
/
voicelib.doc
< prev
Wrap
Text File
|
1992-12-31
|
14KB
|
398 lines
*****************************************************************
Voice Recognition for the Amiga using an audio digitizer.
Voice.library (Ver 6.4) by Richard Horne - December 1992
*****************************************************************
FUNCTION OFFSET DEFINITIONS
_LVOLearn EQU -30
_LVORecognize EQU -36
_LVOAddVoiceTask EQU -42
_LVORemVoiceTask EQU -48
_LVOGainUp EQU -54
_LVOGainDown EQU -60
_LVORecDataAddress EQU -66
_LVORecMapAddress EQU -72
_LVOWordScore EQU -78
_LVOPickSampler EQU -84
_LVOSetVoicePri EQU -90
_LVOPickTimer EQU -96
************************* FUNCTION DEFINITIONS ******************
>>>>> All variables are long words unless otherwise noted. <<<<<<
NOTE: Voice.library is opened with a call to the exec OpenLibrary
function. OpenLibrary can fail for one of three reasons:
1. The voice.library file is not available in the libs: directory or
cannot be found.
2. The parallel port is busy.
3. Voice.library is currently opened and being used by another
application program.
*****************************************************************
NAME:
Learn -- Learn a spoken phrase.
OFFSET:
-30
SYNOPSIS:
MapAddress = Learn (MapBuffer, Text, Screen, SequenceNum, X, Y)
d0 a0 a1 a2 d0 d1 d2
FUNCTION:
The "Learn" function stores a frequency map of a spoken word or
phrase. Each frequency map is made up of 72 long words of data
plus a 16 byte header for the associated ASCII text (304 bytes
total). "Learn" requires the user to reserve a MapBuffer in
memory equal to the size of vocabulary desired (number of words)
times 304 bytes. MapBuffer address is passed to "Learn" in a0.
Address of a null terminated text string representing the word or
phrase to be learned is passed to "Learn" in a1.
The "Learn" function will open it's own window on the screen
specified in a2 (use NULL for WBENCHSCREEN), at a position X, Y
specified in d1 and d2. The user will then be prompted to speak
the specified word or phrase to obtain three good digital
samples. Internally, these three samples are analyzed for
frequency content and transformed into a frequency map (304
bytes) which is stored in the MapBuffer according to the Sequence
Number specified in d0. "Learn" returns the memory address
within MapBuffer at which this particular frequency map is
stored. If "Learn" is intentionally cancelled using the close
gadget of the Learn Window, then a zero will be returned.
"Learn" is called separately for each word or phrase in the
vocabulary. After every word has been learned, MapBuffer will be
filled with a sequence of frequency maps (each 304 bytes). Then
the "Recognize" or "AddVoiceTask" functions can be called which
will listen to the audio digitizer, compute a frequency map
of incoming words compare them to the words in MapBuffer, and
indicate by Sequence Number which word or phrase is the best
match. The maximum number of words or phrases in the vocabulary
is 64.
Note that you must select an audio sampler (PerfectSound3,
SoundMaster, or Generic) using the "PickSampler" function before
using the "Learn" function.
*****************************************************************
NAME:
Recognize -- Recognize a spoken word or phrase.
OFFSET:
-36
SYNOPSIS:
SequenceNum = Recognize (MapBuffer, SizeVocabulary, Resolution)
d0 a0 d0 d1
FUNCTION:
"Recognize" assumes that the user has learned a sequence of words
or phrases using the "Learn" function. MapBuffer contains a
sequence of frequency maps produced by "Learn" corresponding to
each word or phrase in the vocabulary. Mapbuffer address is
passed to "Recognize" in a0. Number of words or phrases in the
vocabulary are passed to "Recognize" in d0.
"Recognize" listens for an incoming word, computes it's frequency
map, and compares this map to the sequence of maps contained in
MapBuffer. The Sequence Number of the word or phrase in
MapBuffer which is most similar to that of the incoming word is
returned in d0. Note that the number "0" represents the first
word, "1" the second, and so on.
"Recognize" will operate at either high resolution (d1 = 0) or
low resolution (d1 = 1). High resolution computes a frequency
analysis of the incoming word or phrase at twice the number of
points in time as low resolution. High resolution is somewhat
better at word recognition, but takes almost twice the processing
time.
"Recognize" will return the following error codes if it cannot
find a match.
d0 = -1 if there is no match between the incoming frequency map
and any of the maps in MapBuffer.
d0 = -2 if the incoming word causes unacceptable digital
clipping. Volume should be reduced by moving your
microphone or by using the "GainDown" function.
d0 = -3 if incoming word is too low in volume. Volume should be
increased by moving your microphone or by using the "GainUp"
function.
d0 = -4 if the incoming sample is confused by extraneous noise.
*****************************************************************
NAME:
AddVoiceTask -- Initiate a separate task to recognize a spoken
word or phrase.
OFFSET:
-42
SYNOPSIS:
AddVoiceTask (MapBuffer, MsgPort, SizeVocabulary, Resolution)
a0 a1 d0 d1
FUNCTION:
"AddVoiceTask" is similar in function to "Recognize" except that
here, a separate task is started under the Amiga multitasking
operating system which listens for incoming words or phrases and
returns messages to the user's Message Port indicating the
Sequence Number of the frequency map in Mapbuffer which best
matches the frequency map of the incoming word. MapBuffer
address and Message Port address are passed to "AddVoiceTask"
in a0 and a1. Number of words or phrases in the vocabulary are
passed to "AddVoiceTask" in d0.
"AddVoiceTask" will operate at either high resolution (d1 = 0) or
low resolution (d1 = 1). High resolution computes a frequency
analysis of the incoming word or phrase at twice the number of
points in time as low resolution. High resolution is somewhat
better at word recognition, but takes almost twice the processing
time.
The messages sent to MessagePort are designed to mimic shortened
IDCMP messages with a im_Class = $0. Thus you can receive and
process these messages at either an Intuition window IDCMP
message port or at a custom message port of your own.
Messages sent by this task are as follows.
im_Code = Sequence number of frequency map in MapBuffer that
best matches the frequency map of the incoming
word or phrase.
im_Code = -1 if there is no match between the incoming
frequency map and any of the maps in MapBuffer.
im_Code = -2 if the incoming word causes unacceptable
digital clipping. Volume should be reduced by
moving your microphone or by using the "GainDown"
function.
im_Code = -3 if incoming word is too low in volume. Volume
should be increased by moving your microphone or
by using the "GainUp" function.
im_Code = -4 if the incoming sample is confused by
extraneous noise.
Upon calling "AddVoiceTask", the PerfectSound digitizer becomes
immediately active, listening for an incoming word. After
receipt of a word or phrase, a message as described above is sent
to Message Port. The VoiceTask then goes into a WAIT mode and
remains inactive until it receives a reply to the message it has
sent to Message Port. Upon receipt of a reply, VoiceTask again
becomes active and listens for an incoming word. The priority
of this task will be 127 for fastest possible voice recognition.
You may change this priority to a lower value with the "SetVoicePri"
function.
*****************************************************************
NAME:
RemVoiceTask -- Remove task initiated by AddVoiceTask
OFFSET:
-48
SYNOPSIS:
RemVoiceTask ()
FUNCTION:
Deallocates memory and removes VoiceTask from the Amiga system.
Note that the Message Port specified for the "AddVoiceTask" function
must still exist at the time you call "RemVoiceTask". Also you
must reply to all outstanding messages from VoiceTask BEFORE calling
this function.
*****************************************************************
NAME:
GainUp -- Increase gain of PerfectSound 3 audio digitizer.
OFFSET:
-54
SYNOPSIS:
GainUp()
FUNCTION:
Increases gain of the PerfectSound audio digitizer by one step.
Note that when gain reaches maximum, "GainUp" will wrap around
and return gain to it's lowest value. Do not call this function
if you are using the SoundMaster audio digitizer.
*****************************************************************
NAME:
GainDown -- Decease gain of PerfectSound 3 audio digitizer.
OFFSET:
-60
SYNOPSIS:
GainDown()
FUNCTION:
Decreases gain of the PerfectSound audio digitizer by one step.
Note that when gain reaches minimum, "GainDown" will wrap around
and return gain to it's highest value. Do not call this function
if you are using the SoundMaster audio digitizer
*****************************************************************
NAME:
RecDataAddress -- Return memory address of digital sample of
incoming word or phrase.
OFFSET:
-66
SYNOPSIS:
Address = RecDataAddress()
d0
FUNCTION:
When an incoming word or phrase is digitized, 3/4 second of
digital data is stored in an internal buffer. This is 8 bit
digitized data is sampled at a rate of 6400 Hz. Thus the buffer
for storing this data is 4800 bytes in size. This function
returns the address of this buffer for possible additional
experimental uses.
*****************************************************************
NAME:
RecMapAddress -- Return memory address of frequency map of
incoming word or phrase.
OFFSET:
-72
SYNOPSIS:
Address = RecMapAddress()
d0
FUNCTION:
A frequency map of each incoming word or phrase is computed for
comparison with maps learned and stored in MapBuffer. Each map
consists of a frequency analysis of 3/4 second of audio data at
72 points in time. For each of these 72 time points, the data is
examined for frequency content at 32 points between 0 Hz and 3200
Hz. A frequency map is made up of 72, 32 bit words corresponding
to the 72 time points analyzed. For each of these 32 bit words,
bit 0 is set if the signal contains frequency components from
0-100 Hz. Bit 1 is set if the signal contains frequency
components from 100-200 Hz. Bit 2 is set if the signal contains
frequency components from 200-300 Hz etc. This function returns
the address of this frequency map for possible additional
experimental uses. Note that this internal frequency map does
not have the 16 byte ASCII header as do the frequency maps
stored in MapBuffer.
*****************************************************************
NAME:
WordScore -- Return recognition score of a recognized word.
OFFSET:
-78
SYNOPSIS:
Value = WordScore()
d0
FUNCTION:
The "Recognize" function computes a numerical score representing the
"goodness" of a match between the frequency map of an incoming word
and each frequency map stored in MapBuffer. The recognized word
is determined by highest score. This function returns the score value
for the recognized word. Internally, a score of #2000 must be achieved
in order for a match to be declared. If you wish to have a higher match
score threshold to reduce false matches, you may call "WordScore" after
each word is recognized and set your own higher score threshold before
accepting a match. Increasing the match score threshold will reduce
false matches, but will also decrease recognition performance.
*****************************************************************
NAME:
PickSampler -- Specify which model audio sampler to use (either
PerfectSound3, SoundMaster, or Generic).
OFFSET:
-84
SYNOPSIS:
PickSampler (SamplerID)
d0
FUNCTION:
Select the audio sampler to be used with this function. SamplerID = 0
for PerfectSound3. SamplerID = 1 for SoundMaster. SamplerID = 2 for
Generic Sampler. You only need to PickSampler once. However, you should
PickSampler before you Learn, Recognize, or AddVoiceTask.
*****************************************************************
NAME:
SetVoicePri -- Set the multitasking priority of a voice recognition
task that has been started by the "AddVoiceTask"
function.
OFFSET:
-90
SYNOPSIS:
Old Priority = SetVoicePri (New Priority)
d0 d0
FUNCTION:
When "AddVoiceTask" is called, a voice recgnition task of priority 127
is started for the fastest possible voice recognition. You may modify
this priority by setting New Priority to any value between -128 and 127
and calling "SetVoicePri" which changes task priority to the new value
and returns the value of the old task priority. "AddVoiceTask" must
be called before "SetVoicePri."
*****************************************************************
NAME:
PickTimer -- Select either Timer A or Timer B of the CIA B for use
in timing digital audio samples.
OFFSET:
-96
SYNOPSIS:
PickTimer(TimerID)
d0
FUNCTION:
Voice.library uses CIA B Timer B by default for setting the time interval
between digital audio samples. You may find situations where other
applications require Timer B, causing a conflict. Use this function to
choose either Timer B or Timer A as required. TimerID = 0 for selection
of Timer B. TimerID = 1 for selection of Timer A. You only need to
PickTimer once. However, you should PickTimer before you Learn, Recognize,
or AddVoiceTask.