home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Gold Fish 1
/
GoldFishApril1994_CD1.img
/
d2xx
/
d240
/
dis
/
library
/
dislib.txt
< prev
next >
Wrap
Text File
|
1989-08-28
|
14KB
|
336 lines
disassemble.library
General
'disassemble.library' is a shareable AmigaDOS library which is a
disassembler for the MC68000 family of processors. It disassembles code
for the MC68000, MC68010, MC68020 and MC68030 processors, for the
MC68851 memory management unit and for the MC68881 and MC68882 floating
point coprocessors. It is capable of symbolic disassembly, will
generate labels at referenced locations, and is highly controllable
through a set of style flags.
The library's single entry point, Disassemble, will attempt to
disassemble one instruction per call. It communicates with its caller
through a passed information vector, which includes pointers to
routines to call to process text output, access symbolic information,
record label locations, etc.
There are two main reasons why I separated this functionality into a
shareable library. One is that I wanted to share the code (which is
fairly bulky) between a file disassembler/dumper and a debugger. The
second is that I plan to write an entire set of such shared libraries,
and this one has given me experience in how to go about it, and some of
the consequences of doing it.
In order to use this library you must copy file 'disassemble.library'
to your LIBS: directory. This is where AmigaDOS looks when it needs to
load the library in response to an OpenLibrary system call.
To use the library with your own programs, you will need a set of
interface stubs or definitions, depending on the language and compiler
you use. The needed information is in the accompanying 'fd' file
(disassemble_lib.fd) and in this document. I have included a defining
include file and an interface library for Draco users.
I have tested the library, as used by my disassembler/dumper, Dis,
fairly extensively. There are bound to be some bugs left, however.
Please let me know at one of the following electronic mail addresses if
you find any:
Chris Gray
usenet: {uunet,alberta}!myrias!ami-cg!cg
CIS: 74007,1165
Sending me physical mail works, but I am VERY slow at answering (up to
6 months on one occasion!). Trying to telephone me can be expensive -
you are more likely to get my modem.
Interfacing to the Library
All communications to and from the library is done through an
information structure, the address of which is passed in register A0.
The structure is declared (in Draco) as follows:
type DisassemblerState_t = struct {
proc(/* ulong address(d0) */)uint ds_readWord;
proc(/* char ch(d0) */)void ds_putChar;
proc(/* ulong addr(d0) */)*char ds_findLabel;
proc(/* ulong addr(d0), refAt(d1); *ulong pTrueAddr(a0) */)*char
ds_findAbsSymbol;
proc(/* long offset(d0); ulong refAt(d1) */)*char ds_findRelCode;
proc(/* long offset(d0); ulong refAt(d1);*long pTrueOffset(a0) */)*char
ds_findRelData;
proc(/* ulong addr(d0) */)void ds_labelAt;
proc(/* ulong addr(d0) */)void ds_branchTo;
proc(/* ulong addr(d0) */)bool ds_isLabel;
ulong ds_address;
ulong ds_relativeBase;
*char ds_errorMessage;
uint ds_operandColumn;
uint ds_column;
uint ds_extraWord;
bool ds_putPosition;
bool ds_absoluteAddress;
bool ds_putErrors;
bool ds_capExtended;
bool ds_putAddress;
bool ds_putRelForm;
bool ds_extended;
bool ds_extendedNow;
bool ds_illegal;
bool ds_hadExtraWord;
};
The first few fields are the addresses of functions which the library
can call to perform various needed operations. All such addresses are
32 bit values. Fields of type 'ulong' are 32 bit unsigned integers.
Fields of type 'uint' are 16 bit unsigned integers. Fields of type
'bool' are 8 bit 1/0 true/false values. In more detail:
ds_readWord - this function is passed a 32 bit address or offset in
register D0. It should return the 16 bit contents of that location
in register D0. The addresses passed are all based on the value
given in field 'ds_address', thus they can be real addresses or
offsets into a buffer or hunk, depending on what the caller does.
This routine MUST be supplied. The library does not try to
reference any memory directly - all references will be through this
function.
ds_putChar - this function is passed a character in the low 8 bits of
register D0. That character is part of the disassembled
instruction. All output from the library will go through this
function. If this function is not present (value is nil, a 32 bit,
0 value), then no output is done. This mode of operation runs
slightly faster, and can be used to simply check for valid
instructions or for a pre-scan to find label references.
ds_findLabel - this function is passed a 32 bit address in register D0,
and should return nil or the address of a symbol which is a
symbolic label for that address. If no symbolic information is
being used, this routine can be omitted. Any pointer returned must
be valid until this call of Disassemble returns, but not beyond.
ds_findAbsSymbol - this function is used to find symbolic names for
addresses that are referenced as 32 bit absolute addresses. The
address in question is passed in register D0. The address or offset
within the code being disassembled (based on ds_address) at which
the reference occurs is passed in register D1. This information can
be used with relocation information supplied in AmigaDOS object
files. Register A0 contains the address of a 32 bit value which
should be filled in with the true address of the symbolic value.
The pointer returned in D0 should be nil if no appropriate symbol
was found or the address of a null-terminated string. As an
example, suppose that label 'Fred' represents offset 0x208 in the
code being disassembled, and a call to 'ds_findAbsSymbol' is made
by the library with the following parameters:
D0 - 0x20d
D1 - 0x32
A0 - ????
It would be appropriate to return the string 'Fred', and to store
the value 0x208 into the region pointed to by A0. The library would
then show a reference like 'Fred+0x5'. As usual, this routine can
be omitted if no symbolic information is available.
ds_findRelCode - this function is used for references that are PC-
relative, so what it should return are labels within the code. No
ability is provided on this function to provide the closest label -
most code doesn't branch to just past a label.
ds_findRelData - this function is used for references that are relative
to register A4. This allows symbolic disassembly of small-model
data references generated by the Lattice and Aztec C compilers.
ds_labelAt - this function is called when a PC relative data
refererence is found in the code. A user program would supply an
address here if it wanted to keep track of where labels should be.
A bitmap is a good way of doing this. Keeping track of labels this
way is generally only of use if a two-pass disassembly is going to
be used.
ds_branchTo - this function is similar to 'ds_labelAt' except that it
is called only for branch and jump targets. In other words, the
address given must be a code address, since it is a branch target.
ds_isLabel - this function, if present, is called to determine if there
was a reference to the given address. This is used to know whether
or not to generate a label in front of the instruction being
disassembled.
ds_address - this is the address or offset that disassembly is
occurring at. It is the value which will be given to 'ds_readWord'
to get the first word of the instruction. The field is properly
updated as disassembly occurs, so it need only be set before the
first call to Disassemble. If multi-pass disassembly is being used
(e.g. to produce labels), it should be reset before each pass.
ds_relativeBase - this is the current base address for disassembly.
Labels will be relative to this base. E.g. if an instruction 8
bytes past this address needed a label, the label would be either
'L008' or 'L00000008', depending on label size. It will be updated
by the libary to the current value of 'ds_address' if
'ds_findLabel' yields a label for the current value of
'ds_address'. Even though the field is maintained, it is not always
used. See the description of 'ds_absoluteAddress'.
ds_errorMessage - this field is occasionally filled in by the library
with a specific error message concerning the disassembly. It is
cleared at the start of each call, so if the field is non-null when
Disassemble returns, it points to an error message. The message is
not dynamically allocated, so it should not be freed or modified by
the caller.
ds_operandColumn - this 16 bit field should be filled in with the
column at which the caller wants instruction operands to start.
Spacing with blanks will be used to pad out to the desired column.
If the instruction field, etc. already extends past the target
column, no spacing will be used. A reasonable value for this field
is 20 if initial addresses are not enabled, or 31 if they are.
ds_column - this 16 bit field is used internally to count columns
ds_extraWord - this 16 bit field is used internally to remember a
second word of an invalid instruction, so that it can be dumped in
hexadecimal.
ds_putPosition - this 8 bit flag field controls whether or not the
library will display hexadecimal addresses at the beginning of the
output lines. As with the other flag fields, a value of 0 is
treated as 'false', and any other value as 'true'. The addresses
will either be 32 bit absolute ones (the value of 'ds_address') or
will be 16 bit relative ones ('ds_address' - 'ds_relativeBase')
depending on whether or not 'ds_absoluteAddress' is set.
ds_absolueAddress - this flag field controls the form of labels and of
the position display. If it is set, they are 32 bit values taken
direct from 'ds_address'. If not set, they are 16 bit relative
values computed as (address - 'ds_relativeBase'). For most
purposes, the relative form is tidier.
ds_putErrors - this flag controls whether or not the library will
output error messages that are returned in 'ds_errorMessage'.
Tighter formatting control can be obtained if this option is not
used.
ds_capExtended - this flag controls whether or not the library will
capitalize instructions and modes that are not available on the
MC68000. This is useful to make the non-68000 instructions stand
out.
ds_putAddress - this flag controls whether or not the hex address is
displayed along with a symbolic or label form. It is useful if the
symbolic or label forms are confused for some reason, and would be
of value to a debugger, where all addresses are real.
ds_putRelForm - this flag controls whether or not the relative form of
PC-relative and A4-relative addressing is displayed along with any
symbolic or label form. This is useful for those who wish to see
the actual encoded form of the instructions, or if the symbolic or
label forms are confused.
ds_extended - this flag is initially cleared by the library and is set
whenever a non-68000 instruction or mode is seen. Thus, after each
call to Disassemble, this flag can be checked to see if a non-68000
form was seen.
ds_extendedNow - this flag is used internally to know whether or not
output should be capitalized. Note that symbolic names are never
capitalized.
ds_illegal - this flag, initially cleared, is set whenever any illegal
instruction or mode is encountered. There will not always be an
accompanying error message. Note also that I have not gone to the
trouble of checking each addressing mode for each instruction, thus
there are instruction forms which will not cause 'ds_illegal' to be
set but which the actual processor will not execute. Also, the
specific 'illegal' instruction, opcode 0x4afc, will not cause this
flag to be set.
ds_hadExtraWord - this flag is used internally to indicate that an
illegal instruction encountered had a second or extended opcode
word that should also be printed in hex.
As an example, here is a simple one-pass disassembly of a small hunk of
code:
#drinc:disassemble.g
uint
R_D0 = 0,
R_FP = 6,
OP_MOVEB = 0x1000,
OP_MOVEL = 0x2000,
M_DDIR = 0,
M_DISP = 5;
proc readWord(/* ulong address */)uint:
ulong address;
code(
OP_MOVEL | R_FP << 9 | M_DISP << 6 | M_DDIR << 3 | R_D0,
address
);
pretend(address, *uint)*
corp;
proc putChar(/* char ch */)void:
char ch;
code(
OP_MOVEB | R_FP << 9 | M_DISP << 6 | M_DDIR << 3 | R_D0,
ch
);
if ch = '\n' then
writeln();
else
write(ch);
fi;
corp;
proc main()void:
extern tail()void;
DisassemblerState_t ds;
if OpenDisassembleLibrary(0) ~= nil then
ds.ds_readWord := readWord;
ds.ds_putChar := putChar;
ds.ds_findLabel := nil;
ds.ds_findAbsSymbol := nil;
ds.ds_findRelCode := nil;
ds.ds_findRelData := nil;
ds.ds_labelAt := nil;
ds.ds_branchTo := nil;
ds.ds_isLabel := nil;
ds.ds_address := pretend(readWord, ulong);
ds.ds_relativeBase := 0;
ds.ds_operandColumn := 31;
ds.ds_putPosition := true;
ds.ds_absoluteAddress := true;
ds.ds_putErrors := true;
ds.ds_capExtended := true;
ds.ds_putAddress := false;
ds.ds_putRelForm := false;
while ds.ds_address < pretend(tail, ulong) do
ignore Disassemble(&ds);
od;
CloseDisassembleLibrary();
else
writeln("Can't open Disassemble.library");
fi;
corp;
proc tail()void:
corp;
Note the use of the 'code' construct to retrieve parameters passed in
registers. Slightly different tricks would be needed to do this in
other languages/compilers. For an example of using the library for full
symbolic disassembly with label generation, see the source to the 'Dis'
file disassembler/dumper, which is included in this archive.