home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
World of A1200
/
World_Of_A1200.iso
/
programs
/
develop
/
adis
/
adis.doc
< prev
next >
Wrap
Text File
|
1995-02-27
|
15KB
|
329 lines
ADis (Advanced Disassembler) User's Guide
Version 1.1
$Date: 93/05/16 22:41:50 $
written by Martin Apel
email: apel@physik.uni-kl.de
CONTENTS
1. Introduction
2. Command line options
3. Hints and Problems
4. Reassembly
5. Theory of operation
6. Future plans
7. Miscellaneous
*************************************************************************
IMPORTANT NOTICE: This program is copyrighted by Martin Apel, but can
be freely distributed, providing that the following rules are respected.
- No change is made to the program nor to the accompanying documentation
- The package is always distributed in its complete form consisting of
the 4 files: "ADis", "ADis.doc", "README" and "libs/ixemul.library".
- Every form of distribution is allowed and encouraged, but no fee can
be charged for this program except for, possibly, the cost of magnetic
media and/or disk duplication and shipping.
- Inclusion in PD software libraries such as Fish Disks is allowed,
provided the fees charged for these disks are comparable with those
charged by Fred Fish.
- The program cannot be distributed in any commercial product without the
written consent of the author.
By copying, distributing and/or using the program you indicate your
acceptance of the above rules.
*************************************************************************
1. INTRODUCTION
ADis is a 68000+ disassembler which can automatically recognize data
and strings put into the code segment. It also generates only those
labels that are really referenced. The generated file will often be
reassemblable. In V1.1 ADis is capable of recognizing all 68020 and
68881 instructions even with the 68020's extended addressing modes.
ADis will also try to resolve addressing relative to a4, which many C
compilers use in a small memory model.
If you have ever worked with a usual disassembler, you may have
wondered about all those nasty ORI.B #0,D0 or BVS $4711 instructions.
The first example is valid, but not very useful. ADis will recognize
this and generate a DC.W 0 instead. The second example is even
illegal, as it branches to an odd address.
*************************************************************************
2. COMMAND LINE OPTIONS
I will list the possible options for a quick look here. They will be
explained in detail later. The following options are available:
-b num set buffersize for file buffers in KB
default: 10 KB
-c2 enable 68020 instruction disassembly
-c8 enable 68881 instruction disassembly
-ce enable extended 68020 addressing modes
(only in conjunction with -c2)
-fs put hunks in a single file
-fm put each hunk in a separate file
-i print addresses where illegal
instructions were found
-lc hex_address disassemble as code
-ld hex_address disassemble as data
-ml disassemble using large memory model
-ms[base_offset] attempt to address code and data
relative to A4
-o outfilename filename of output file
default: <filename>.dec
-q quick disassembly, no labels
no data recognition in code segments
-v verbose
Option -b n:
You can set the size of the output buffer to increase the speed of
writing. Default is 10 KB.
Option -c2:
This option will enable 68020 opcodes and address register indirect
with scaled index addressing modes. If not set, all 68020
instructions will be recognized as illegal, and printed as data.
Option -c8:
This option enables 68881 opcodes. If not set all FPU opcodes will be
disassembled as data.
Option -ce:
This option enables the extended 68020 addressing modes. They are
disabled by default, because I have never seen a program generated by
a compiler which really used one of them. To perform well upon data
recognition, the disassembler relies heavily upon detecting illegal
instructions. By leaving this option disabled, ADis will detect more
illegals.
Option -fs:
ADis can either put all hunks into a single output file or generate a
separate file for each hunk. The second option comes in handy when
disassembled files grow so large that they aren't easily processed any
further if everything is put together. The default is to generate a
single file if not more than one hunk of each type exists in the
program (CODE, DATA, BSS). This default can be overridden by the -fs
or -fm option. -fs forces ADis to generate a single file even if
there are multiple hunks of the same type. -fm forces to write a
separate file for each hunk even if there are only few of them.
Option -fm:
See option -fs.
Option -i:
This option will cause ADis to print its current address every time it
steps onto an illegal instruction. This will make it easier for you
to set additional labels to aid ADis.
Option -lc hex_address:
ADis keeps a table of all references. If it doesn't recognize an
instruction as code, it will print it as data. To help ADis with its
task to recognize code, you can set a reference at that address
telling it that this address is meant as code.
Option -ld hex_address:
Generates a label in ADis' internal tables, which will force it to
write a label out for that address and interpret the bit combination
at that address as being data. However you cannot force something to
be disassembled as data, if it is valid code. It's simply an aid to
ADis where to start a new try for disassembly.
Option -ml:
By default, ADis assumes that the program to be disassembled has been
compiled using a small memory model with A4 as its base register.
Many times this default is very useful, but if either it has been
compiled with the large memory model or it has been hand-coded
assembler or the like, the -ml option will forbid this default.
Option -ms:
By default, ADis assumes that the program to be disassembled has been
compiled using a small memory model with A4 as its base register. It
tries to guess the contents of A4 to generate correct labels. When
you see many parts of the program being disassembled as data, this
value is probably wrong. Then you have to look for a "LEA xxx,A4"
instruction or the like and give the value "xxx" as parameter to the
-ms option.
Option -o filename:
This option is easy to explain: It sets the name of the output file.
Default is <input file>.dec.
Option -q:
Quick disassembly. When this option is set, the analysis will be
skipped and everything inside the code segment will be disassembled as
code if possible. Illegal instructions will be printed as "DC.W
hex_val".
Option -v:
Lets ADis display verbose information. Working on long files ADis may
produce no output for longer time and not access disks, so you might
think it isn't doing anything. With -v it displays the current
address it is working on.
Additionally to those command line options the analysis pass can be
terminated by pressing CTRL-D. This will cause the not yet analyzed
part of the program to be disassembled as data.
*************************************************************************
3. HINTS AND PROBLEMS
Because ADis doesn't really understand the code it is disassembling,
it is possible that it makes a few mistakes, which in general aren't
very important, but they can be very annoying. To circumvent this,
there are a few options which will modify ADis' behaviour in such
cases. There are four problematic areas, in which ADis may generate
strange disassemblies:
- Code generated by switch statements (PC relative)
- Code referenced only through jump tables (Relocated references)
- Different optimizations used by the original assembler the code was
produced with and the assembler you are using for reassembly.
- Unterminated code threads
3.1 Code generated by C switch statements (PC relative)
ADis can recognize jump tables as such in most cases, but it cannot
compute the locations it references, because each compiler generates
different code for jump tables. This might lead to ordinary
statements disassembled as data, because they are never referenced
directly by any code. You can change that by looking at the output
and place a code label at the first address after the jump table with
the -lc option. To have these switch statements reassembled properly
you have to modify the output by hand.
3.2 Code referenced only through jump tables (Relocated references)
When the disassembled program uses a function pointers and jump
tables, ADis will only generate data references for such locations.
To force these to be disassembled as code you can place a code label
at a certain address with the -lc option.
3.3 Different optimizations used by the original assembler the code was
produced with and the assembler you are using for reassembly
Most assemblers do not allow instructions to be specified very
detailed. A "MOVE $4,A6" instruction (a very common instruction on
the Amiga) might be assembled as an absolute word or long reference to
address 4. This will change the length of the code generated and all
subsequent labels to be moved. If this happens often, it might occur
that a branch might specifiy a destination outside of its range, i.e.
a branch to an instruction just below 32K away in the original program
might lead to an instruction more than 32K away in the reassembled
one. Compilers do not have this problem, because most of them
generate JMP or JSR instructions which will be optimized to BRA or BSR
by the assembler if possible.
Matter is complicated further, if code length is changed in a C switch
statement. Then the locations referenced by the offset table will not
fit anymore. The only solution to this is to find the offending
instruction which changes the code length and try to specify it more
detailed (dependent on the assembler you use).
3.4 Unterminated code threads
ADis assumes all code threads to be terminated by a RTS, JMP, BRA...
instruction, i.e. an instruction that unconditionally changes the
control flow and does not return. Some compilers such as GNU C
generate special code for the exit statement: They generate a JSR to
the exit function, but since this will never return, there is no code
following the JSR. This will cause ADis to run into the following
locations, which might be data. Such data often represents illegal
code and therefore the whole thread will be marked as illegal. The
same occurs for code sequences such as
BEQ label1
BNE label2
Neither of these instructions is an "end instruction" and the thread
is assumed to be continued after that. In a future version the second
possibility might be recognized as terminating a thread. Until now
there is no option to fix the first case; in a future version I might
add an option for generating artificial "end instructions".
*************************************************************************
4. REASSEMBLY
Reassembly is somewhat complicated, because I haven't found an
assembler, which fulfills all requirements. Some of the available PD
assemblers such as "DAS" from DICE and "NCode" load the whole file
into memory, which is impossible for the large files, which may be
generated by ADis. Others (e.g. A68k) will not accept more than one
segment of each type in one file. Still others (e.g. ASM68k) don't
support 68020 and 68881 opcodes. When I have found a usable PD
assembler I will adapt ADis to directly generate code for that
assembler. For the time being, the disassembled code has to be
modified to be understood by different assemblers. However the only
sections that need to be changed should the NEAR and FAR directives
and maybe cross-references.
*************************************************************************
5. THEORY OF OPERATION
ADis makes three passes through the program. The first one only reads
the relocation and symbol hunks and enters them into a symbol table.
The second pass analyzes the code and stores the type of each location
(code or data) in a temporary file. The third pass writes out the
code and data using the temporary file generated in the second pass.
Analysis
The analysis is sort of a backtracking algorithm. It assumes that the
first location in the first code segment is executable and traces all
jumps and branches from there on recursively. When stepping onto an
illegal instruction it takes back all decisions that lead there and
flags the offending location as data. There are several heuristics
which are used when every code thread has been followed and there are
still undetermined parts of the file. E.g. it will assume the next
instruction after a RTS or JMP as code, tries to follow it until
either having followed all threads again or stepping onto an illegal
instruction. During all this trial and error it takes the labels
generated so far as an orientation where to try disassembling. By
inserting labels "by hand" you can modify the behaviour of the
analysis pass, but you can also mislead ADis, for instance by placing
a code label at the second word of a legal instruction. This will
cause ADis to take back all its decisions, because an instruction is
not allowed to cross a label.
*************************************************************************
6. FUTURE PLANS
There are several things I still want to add to ADis including the
following:
- Support for 68030/040 and MMU
- An option for disassembling libraries or devices
- An option to generate artificial "end instructions" (s. 3.4)
- Generate labels for switches (s. 3.1)
*************************************************************************
7. MISCELLANEOUS
Any hints, bugs, praise,... may be addressed to
apel@physik.uni-kl.de
When reporting bugs, please include the program that was disassembled
incorrectly or otherwise revealed a bug in ADis (if possible). Also
include the version number of ADis that produced the error. Requests
for future enhancements are also welcome.
-----------
Martin Apel