so. An exception is the saving and restoring of registers at
entrance to and exit from a subroutine; here, if the subroutine is
long, you should probably PUSH everything which the caller may need
saved, whether you will use the register or not, and POP it in
reverse order at the end.
Be aware that CALL and INT push return address information on the
stack and RET and IRET pop it off. It is a good idea to become
familiar with the structure of the stack.
c. In practice, to invoke system services you will use the INT
instruction. It is quite possible to use this instruction effec-
tively in a cookbook fashion without knowing precisely how it
works.
d. The transfer of control instructions (CALL, RET, JMP) deserve care-
ful study to avoid confusion. You will learn that these can be
classified as follows:
1) all three have the capability of being either NEAR (CS register
unchanged) or FAR (CS register changed)
2) JMPs and CALLs can be DIRECT (target is assembled into instruc-
tion) or INDIRECT (target fetched from memory or register)
3) if NEAR and DIRECT, a JMP can be SHORT (less than 128 bytes
away) or LONG
In general, the third issue is not worth worrying about. On a for-
ward jump which is clearly VERY short, you can tell the assembler
it is short and save one byte of code:
JMP SHORT CLOSEBY
On a backward jump, the assembler can figure it out for you. On a
forward jump of dubious length, let the assembler default to a LONG
form; at worst you waste one byte.
Also leave the assembler to worry about how the target address is
to be represented, in absolute form or relative form.
e. The conditional jump set is rather confusing when studied apart
from the assembler, but you do need to get a feeling for it. The
interactions of the sign, carry, and overflow flags can get your
mind stuttering pretty fast if you worry about it too much. What
is boils down to, though, is
JZ means what it says
JNZ means what it says
JG reater this means "if the SIGNED difference is positive"
JA bove this means "if the UNSIGNED difference is positive"
JL ess this means "if the SIGNED difference is negative"
JB elow this means "if the UNSIGNED difference is negative"
JC arry assembles the same as JB; it's an aesthetic choice
IBM PC Assembly Language Tutorial 10
You should understand that all conditional jumps are inherently
DIRECT, NEAR, and "short"; the "short" part means that they can't
go more than 128 bytes in either direction. Again, this is some-
thing you could easily imagine to be more of a problem than it is.
I follow this simple approach:
1) When taking an abnormal exit from a block of code, I always use
an unconditional jump. Who knows how far you are going to end
up jumping by the time the program is finished. For example, I
wouldn't code this:
TEST AL,IDIBIT ;Is the idiot bit on?
JNZ OYVEY ;Yes. Go to general cleanup
Rather, I would probably code this:
TEST AL,IDIBIT ;Is the idiot bit on?
JZ NOIDIOCY ;No. I am saved.
JMP OYVEY ;Yes. What can we say...
NOIDIOCY:
The latter, of course, is a jump around a jump. Some would say
it is evil, but I submit it is hard to avoid in this language.
2) Otherwise, within a block of code, I use conditional jumps
freely. If the block eventually grows so long that the assem-
bler starts complaining that my conditional jumps are too long
I
a) consider reorganizing the block but
b) also consider changing some conditional jumps to their
opposite and use the "jump around a jump" approach as shown
above.
Enough about specific instructions!
6. Finally, in order to use the assembler effectively, you need to know
the default rules for which segment registers are used to complete
addresses in which situations.
a. CS is used to complete an address which is the target of a NEAR
DIRECT jump. On an NEAR INDIRECT jump, DS is used to fetch the
address from memory but then CS is used to complete the address
thus fetched. On FAR jumps, of course, CS is itself altered. The
instruction counter is always implicitly pointing in the code seg-
ment.
b. SS is used to complete an address if BP is used in its formation.
Otherwise, DS is always used to complete a data address.
c. On the string instructions, the target is always formed from ES and
DI. The source is normally formed from DS and SI. If there is a
segment prefix, it overrides the source not the target.
IBM PC Assembly Language Tutorial 11
Learning about DOS
__________________
Learning about DOS
Learning about DOS
Learning about DOS
I think the best way to learn about DOS internals is to read the technical
appendices in the manual. These are not as complete as we might wish, but
they really aren't bad; I certainly have learned a lot from them. What you
don't learn from them you might eventually learn via judicious disassembly
of parts of DOS, but that shouldn't really be necessary.
From reading the technical appendices, you learn that interrupts 20H
through 27H are used to communicate with DOS. Mostly, you will use inter-
rupt 21H, the DOS function manager.
The function manager implements a great many services. You request the
individual services by means of a function code in the AH register. For
example, by putting a nine in the AH register and issuing interrupt 21H you
tell DOS to print a message on the console screen.
Usually, but by no means always, the DX register is used to pass data for
the service being requested. For example, on the print message service
just mentioned, you would put the 16 bit address of the message in the DX
register. The DS register is also implicitly part of this argument, in
keeping with the universal segmentation rules.
In understanding DOS functions, it is useful to understand some history and
also some of the philosophy of MS-DOS with regard to portability. General-
ly, you will find, once you read the technical information on DOS and also
the IBM technical reference, you will know more than one way to do almost
anything. Which is best? For example, to do asynch adapter I/O, you can
use the DOS calls (pretty incomplete), you can use BIOS, or you can go
directly to the hardware. The same thing is true for most of the other
primitive I/O (keyboard or screen) although DOS is more likely to give you
added value in these areas. When it comes to file I/O, DOS itself offers
more than one interface. For example, there are four calls which read data
from a file.
The way to decide rationally among these alternatives is by understanding
the tradeoffs of functionality versus portability. Three kinds of porta-
bility need to be considered: machine portability, operating system porta-
bility (for example, the ability to assemble and run code under CP/M 86)
and DOS version portability (the ability for a program to run under older
versions of DOS>.
Most of the functions originally offered in DOS 1.0 were direct descendents
of CP/M functions; there is even a compatibility interface so that programs
which have been translated instruction for instruction from 8080 assembler
to 8086 assembler might have a reasonable chance of running if they use
only the core CP/M function set. Among the most generally useful in this
original compatibility set are
IBM PC Assembly Language Tutorial 12
09 -- print a full message on the screen
0A -- get a console input line with full DOS editing
0F -- open a file
10 -- close a file (really needed only when writing)
11 -- find first file matching a pattern
12 -- find next file matching a pattern
13 -- erase a file
16 -- create a file
17 -- rename a file
1A -- set disk transfer address
The next set provide no function above what you can get with BIOS calls or
more specialized DOS calls. However, they are preferabƒâƒ9$ ƒα߃ ƒƒƒüα߃ƒÅŃƒƒÇǃƒ ƒα߃ƒÅƒÇǃƒ ƒ ƒ α߃ƒÅ냃ƒÇǃƒ ƒ ƒ α߃ƒÅꃃƒÇǃƒ ƒ8 ƒ ƒ α߃ƒÅ⃃ƒÇǃƒ ƒ8 ƒ ƒ α߃ƒÅǃƒƒÇǃƒ ƒƒ α߃ƒÅüƒƒƒÇǃƒ ƒnstruction set
Phase 1: Learn the architecture and instruction set
The Morse book might seem like a lot of book to buy for just two really
important chapters; other books devote a lot more space to the instruction
set and give you a big beautiful reference page on each instruction. And,
some of the other things in the Morse book, although interesting, really
aren't very vital and are covered too sketchily to be of any real help.
The reason I like the Morse book is that you can just read it; it has a
very conversational style, it is very lucid, it tells you what you really
need to know, and a little bit more which is by way of background; because
nothing really gets belabored to much, you can gracefully forget the things
you don't use. And, I very much recommend READING Morse rather than study-
ing it. Get the big picture at this point.
Now, you want to concentrate on those things which are worth fixing in mem-
ory. After you read Morse, you should relate what you have learned to this
outline.
1. You want to fix in your mind the idea of the four segment registers
CODE, DATA, STACK, and EXTRA. This part is pretty easy to grasp. The
8086 and the 8088 use 20 bit addresses for memory, meaning that they
can address up to 1 megabyte of memory. But, the registers and the
address fields in all the instructions are no more that 16 bits long.
So, how to address all of that memory? Their solution is to put
together two 16 bit quantities like this:
calculation SSSS0 ---- value in the relevant segment register SHL 4
depicted in AAAA ---- apparent address from register or instruction
hexadecimal --------
RRRRR ---- real address placed on address bus
In other words, any time memory is accessed, your program will supply a
sixteen bit address. Another sixteen bit address is acquired from a
segment register, left shifted four bits (one nibble) and added to it
to form the real address. You can control the values in the segment
registers and thus access any part of memory you want. But the segment
registers are specialized: one for code, one for most data accesses,
one for the stack (which we'll mention again) and one "extra" one for
additional data accesses.
Most people, when they first learn about this addressing scheme become
obsessed with converting everything to real 20 bit addresses. After a
while, though, you get use to thinking in segment/offset form. You
IBM PC Assembly Language Tutorial 4
tend to get your segment registers set up at the beginning of the pro-
gram, change them as little as possible, and think just in terms of
symbolic locations in your program, as with any assembly language.
EXAMPLE:
MOV AX,DATASEG
MOV DS,AX ;Set value of Data segment
ASSUME DS:DATASEG ;Tell assembler DS is usable
.......
MOV AX,PLACE ;Access storage symbolically by 16 bit address
In the above example, the assembler knows that no special issues are
involved because the machine generally uses the DS register to complete
a normal data reference.
If you had used ES instead of DS in the above example, the assembler
would have known what to do, also. In front of the MOV instruction
which accessed the location PLACE, it would have placed the ES segment
prefix. This would tell the machine that ES should be used, instead of
DS, to complete the address.
Some conventions make it especially easy to forget about segment regis-
ters. For example, any program of the COM type gets control with all
four segment registers containing the same value. This program exe-
cutes in a simplified 64K address space. You can go outside this
address space if you want but you don't have to.
2. You will want to learn what other registers are available and learn
their personalities:
AX and DX are general purpose registers. They become special only
when accessing machine and system interfaces.
CX is a general purpose register which is slightly specialized for
counting.
BX is a general purpose register which is slightly specialized for
forming base-displacement addresses.
AX-DX can be divided in half, forming AH, AL, BH, BL, CH, CL, DH,
DL.
SI and DI are strictly 16 bit. They can be used to form indexed
addresses (like BX) and they are also used to point to strings.
SP is hardly ever manipulated. It is there to provide a stack.
BP is a manipulable cousin to SP. Use it to access data which has
been pushed onto the stack.
Most sixteen bit operations are legal (even if unusual) when per-
formed in SI, DI, SP, or BP.
IBM PC Assembly Language Tutorial 5
3. You will want to learn the classifications of operations available
WITHOUT getting hung up in the details of how 8086 opcodes are con-
structed.
8086 opcodes are complex. Fortunately, the assembler opcodes used to
assemble them are simple. When you read a book like Morse, you will
learn some things which are worth knowing but NOT worth dwelling on.
a. 8086 and 8088 instructions can be broken up into subfields and bits
with names like R/M, MOD, S and W. These parts of the instruction
modify the basic operation in such ways as whether it is 8 bit or
16 bit, if 16 bit, whether all 16 bits of the data are given,
whether the instruction is register to register, register to
memory, or memory to register, for operands which are registers,
which register, for operands which are memory, what base and index
registers should be used in finding the data.
b. Also, some instructions are actually represented by several differ-
ent machine opcodes depending on whether they deal with immediate
data or not, or on other issues, and there are some expedited forms
which assume that one of the arguments is the most commonly used
operand, like AX in the case of arithmetic.
There is no point in memorizing any of this detail; just distill the
bottom line, which is, what kinds of operand combinations EXIST in the
instruction set and what kinds don't. If you ask the assembler to ADD
two things and the two things are things for which there is a legal ADD
instruction somewhere in the instruction set, the assembler will find
the right instruction and fill in all the modifier fields for you.
I guess if you memorized all the opcode construction rules you might
have a crack at being able to disassemble hex dumps by eye, like you
may have learned to do somewhat with 370 assembler. I submit to you
that this feat, if ever mastered by anyone, would be in the same class
as playing the "Minute Waltz" in a minute; a curiosity only.
Here is the basic matrix you should remember:
IBM PC Assembly Language Tutorial 6
Two operands: One operand:
R <-- M R
M <-- R M
R <-- R S *
R|M <-- I
R|M <-- S *
S <-- R|M *
* -- data moving instructions (MOV, PUSH, POP) only