home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
GEMini Atari
/
GEMini_Atari_CD-ROM_Walnut_Creek_December_1993.iso
/
files
/
program
/
distomwc
/
distomwc.doc
next >
Wrap
Text File
|
1989-01-24
|
8KB
|
171 lines
DISTOMWC: By Howard C. Johnson
30 Roosevelt Ave.
Morganville, NJ 07751
PURPOSE
Disassemble DRI formatted files to Mark Williams C (MWC)
assembler directives. This produces output similar to as68toas
output. Ideally, output of distomwc may be assembled to a file
that is identical to the one being disassembled.
HEREDITY
DISTOMWC was created by disassembling DIS2ND by Scott
Swintec and then modifying it. To be used by MWC, the output of
dis2nd must be further processed by as68toas. Four basic
problems are addressed:
1. The operand order is reversed for 'eor' instructions.
2. Addresses in the data section do not become labels,
producing many undefined symbols.
3. As68toas does not recognize either short branches (bxx.s)
or short addresses (wwww:s). It also cannot handle intermixed
hex and ascii strings.
4. Data embedded in the text section is disassembled as
meaningless, and often illegal, instructions.
DISTOMWC OPERATION
Symbol Table:
Input to DISTOMWC is a DRI compatible object file. Several
assemblers/loaders may produce load modules that are not DRI
compatible. MWC is one of these. The principal differences lie
in handling symbols. DRI symbols have two properties:
a. The length of the symbol table is in the file header.
b. Symbol names are a maximum of 8 characters long.
MWC, for example, has 16 character names and the length of the
symbol table is 0 in the file header. The program MWTODRI (GEnie
# 4098), will convert a MWC file to DRI symbol format. It
includes source, so that other files can be handled with reasona-
ble effort. However, as very few files worth disassembling have
symbol tables attached, it is not very important to worry about
symbol handling.
External variable names are preceded with a '_' in DRI. MWC
uses a trailing '_' for the same purpose. All symbol names are
converted in DISTOMWC.
Backward and Forward References:
All text references are resolved by processing the text
section twice. Thus a branch backward will produce a label for
the target of the branch. However, problems occur when longs in
the data section reference otherwise unreferenced addresses.
This occurs in 'C' from two constructs:
a. switch (c) { case: .... }
b. char *list = {"a", "b", ...}
The switch constructs produce text references that must be known.
Normally these target addresses are not the object of a branch,
but are preceded by one. Text processing will add labels to the
addresses following bra, jmp, and rts instructions. This picks
up most switch constructs.
Data lists that are pointers to strings usually proceed the
string. All forward references in the data section are resolved.
A .long in the data section will show up as undefined if it
reverences an address preceding it that isn't preceded by a
branch.
Data Embedded In Text:
Some programs have variables, and initialized data in the
text section rather than the data section. When disassembled,
this produces both unrealistic instruction and invalid instructi-
ons. The human is best at detecting this problem. The resoluti-
on is to introduce a new file, name.emb to allow the operator to
mark special areas that need exception handling. Name.emb is the
same 'name' as is being disassembled. This file may be located
in either the current directory, or in the same directory as the
file being disassembled.
.EMB File:
The name.emb file allows three kinds of addresses to be
indicated:
1. S lines, such as 's a010' permit labels to be defined. A
location that needs a label can be handled with this. All S
lines must be located in the first part of the file before any
other lines.
2. W lines such as 'w b124 b135' will cause the text
addresses within the range, inclusively, to be output as words or
longs. The first address must be even, the second must be odd.
3. A lines such as 'a b136 c001' will cause the text
addresses within the range, inclusively, to be output as ASCII
pairs. The first address must be even, the second must be odd.
W and A lines can be intermixed as necessary after any S lines.
The addresses must increase monotonically.
Longs in Text:
1. A .long is recognized in the text section because the
opcode word is a relocatable value. Whenever this is
encountered, a .long will be produced, and the symbol it
references will be entered into the symbol table.
2. As many instructions are multiple words, .longs may not
appear until the area in which they are located is first marked
in the name.enb file as W or A. However, they will become longs,
overriding the name.emb file.
Special Notes:
1. When a disassembled file is reassembled it will (by
default) contain a symbol table of all the generated labels. The
presence of these symbols may prevent the program from running.
I believe that this is caused by symbol table residue producing
non zero values in the bss (.bssi) section. This occurs because
MWC fails to indicate the symbol table size in the file header.
Stripping the symbols or using the mwtodri program will correct
this.
2. MWC will not reassemble addresses in the form
0x416:s
which is produced by the disassembler. It is necessary to equate
the values to a name, and use the name in the program. I.e.,
my_name = 0x416
move my_name:s, d0
3. Short ascii lines in the data section may actually be hex
binary values. The binary values are output as a comment to the
ascii lines. These are suppressed if the HEX output option is
not used.
4. Embedded ASCII (or constants) can and will produce
nonsense instructions that can affect other valid instructions.
For example, ASCII produces many b.. instructions whose apparent
destination address are added to the symbol table. Besides
producing strange internal symbols, absolute address fields can
become marked as relocatable. This in turn can cause really
strange output. Further, some of these corrupted instructions
may break the disassembler, causing it to fail with no useful
output. When this occurs, try marking the whole text space as
ASCII in the name.emb file to identify the ASCII areas first.
A little more now that MWC 3.0 is out.
1. The symbol table length is now in the header but it still is not
in DRI format.
2. Don't throw away the 2.0 version of the MWC assembler. They
broke it in 3.0, and it can not handle addr:s format at all.