[Chapter Fourteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Fourteen
- 14.4.3 - The FPU Instruction Set
- 14.4.4 - FPU Data Movement Instructions
- 14.4.4.1 - The FLD Instruction
- 14.4.4.2 - The FST and FSTP Instructions
- 14.4.4.3 - The FXCH Instruction
- 14.4.5 - Conversions
- 14.4.5.1 - The FILD Instruction
- 14.4.5.2 - The FIST and FISTP Instructions
- 14.4.5.3 - The FBLD and FBSTP Instructions
- 14.4.6 - Arithmetic Instructions
- 14.4.6.1 - The FADD and FADDP Instructions
- 14.4.6.2 - The FSUB, FSUBP, FSUBR, and
FSUBRP Instructions
- 14.4.6.3 - The FMUL and FMULP Instructions
- 14.4.6.4 - The FDIV, FDIVP, FDIVR, and
FDIVRP Instructions
- 14.4.6.5 - The FSQRT Instruction
- 14.4.6.6 - The FSCALE Instruction
- 14.4.6.7 - The FPREM and FPREM1 Instructions
- 14.4.6.8 - The FRNDINT Instruction
- 14.4.6.9 - The FXTRACT Instruction
- 14.4.6.10 - The FABS Instruction
- 14.4.6.11 - The FCHS Instruction
14.4.3 The FPU Instruction Set
The 80387 (and later) FPU adds over 80 new instructions to the 80x86
instruction set. We can classify these instructions as data movement instructions,
conversions, arithmetic instructions, comparisons, constant instructions,
transcendental instructions, and miscellaneous instructions. The following
sections describe each of the instructions in these categories.
14.4.4 FPU Data Movement Instructions
The data movement instructions transfer data between the internal FPU
registers and memory. The instructions in this category are fld, fst,
fstp, and fxch
. The fld
instructions always pushes its
operand onto the floating point stack. The fstp
instruction
always pops the top of stack after storing the top of stack (tos) into its
operation. The remaining instructions do not affect the number of items
on the stack.
14.4.4.1 The FLD Instruction
The fld
instruction loads a 32 bit, 64 bit, or 80 bit floating
point value onto the stack. This instruction converts 32 and 64 bit operand
to an 80 bit extended precision value before pushing the value onto the
floating point stack.
The fld
instruction first decrements the tos pointer (bits
11-13 of the status register) and then stores the 80 bit value in the physical
register specified by the new tos pointer. If the source operand of the
fld
instruction is a floating point data register, ST
(i),
then the actual register the 80x87 uses for the load operation is the register
number before decrementing the tos pointer. Therefore, fld st
or fld st(0)
duplicates the value on the top of the stack.
The fld
instruction sets the stack fault bit if stack overflow
occurs. It sets the the denormalized exception bit if you load an 80 bit
denormalized value. It sets the invalid operation bit if you attempt to
load an empty floating point register onto the stop of stack (or perform
some other invalid operation).
Examples:
fld st(1)
fld mem_32
fld MyRealVar
fld mem_64[bx]
14.4.4.2 The FST and FSTP Instructions
The fst
and fstp
instructions copy the value
on the top of the floating point register stack to another floating point
register or to a 32, 64, or 80 bit memory variable. When copying data to
a 32 or 64 bit memory variable, the 80 bit extended precision value on the
top of stack is rounded to the smaller format as specified by the rounding
control bits in the FPU control register.
The fstp
instruction pops the value off the top of stack when
moving it to the destination location. It does this by incrementing the
top of stack pointer in the status register after accessing the data in
st(0)
. If the destination operand is a floating point register,
the FPU stores the value at the specified register number before popping
the data off the top of the stack.
Executing an fstp st(0)
instruction effectively pops the data
off the top of stack with no data transfer. Examples:
fst mem_32
fstp mem_64
fstp mem_64[ebx*8]
fst mem_80
fst st(2)
fstp st(1)
The last example above effectively pops st(1)
while leaving
st(0)
on the top of the stack.
The fst
and fstp
instructions will set the stack
exception bit if a stack underflow occurs (attempting to store a value from
an empty register stack). They will set the precision bit if there is a
loss of precision during the store operation (this will occur, for example,
when storing an 80 bit extended precision value into a 32 or 64 bit memory
variable and there are some bits lost during conversion). They will set
the underflow exception bit when storing an 80 bit value value into a 32
or 64 bit memory variable, but the value is too small to fit into the destination
operand. Likewise, these instructions will set the overflow exception bit
if the value on the top of stack is too big to fit into a 32 or 64 bit memory
variable. The fst
and fstp
instructions set the
denormalized flag when you try to store a denormalized value into an 80
bit register or variable[7]. They set the invalid
operation flag if an invalid operation (such as storing into an empty register)
occurs. Finally, these instructions set the C1
condition bit
if rounding occurs during the store operation (this only occurs when storing
into a 32 or 64 bit memory variable and you have to round the mantissa to
fit into the destination).
14.4.4.3 The FXCH Instruction
The fxch
instruction exchanges the value on the top of
stack with one of the other FPU registers. This instruction takes two forms:
one with a single FPU register as an operand, the second without any operands.
The first form exchanges the top of stack with the specified register. The
second form of fxch
swaps the top of stack with st(1)
.
Many FPU instructions, e.g., fsqrt
, operate only on the top
of the register stack. If you want to perform such an operation on a value
that is not on the top of stack, you can use the fxch
instruction
to swap that register with tos, perform the desired operation, and then
use the fxch
to swap the tos with the original register. The
following example takes the square root of st(2)
:
fxch st(2)
fsqrt
fxch st(2)
The fxch
instruction sets the stack exception bit if the stack
is empty. It sets the invalid operation bit if you specify an empty register
as the operand. This instruction always clears the C1
condition
code bit.
14.4.5 Conversions
The 80x87 chip performs all arithmetic operations on 80 bit real quantities.
In a sense, the fld
and fst/fstp
instructions
are conversion instructions as well as data movement instructions because
they automatically convert between the internal 80 bit real format and the
32 and 64 bit memory formats. Nonetheless, we'll simply classify them as
data movement operations, rather than conversions, because they are moving
real values to and from memory. The 80x87 FPU provides five routines which
convert to or from integer or binary coded decimal (BCD) format when moving
data. These instructions are fild
, fist
, fistp
,
fbld
, and fbstp
.
14.4.5.1 The FILD Instruction
The fild
(integer load) instruction converts a 16, 32,
or 64 bit two's complement integer to the 80 bit extended precision format
and pushes the result onto the stack. This instruction always expects a
single operand. This operand must be the address of a word, double word,
or quad word integer variable. Although the instruction format for fild
uses the familiar mod/rm fields, the operand must be a memory variable,
even for 16 and 32 bit integers. You cannot specify one of the 80386's 16
or 32 bit general purpose registers. If you want to push an 80x86 general
purpose register onto the FPU stack, you must first store it into a memory
variable and then use fild
to push that value of that memory
variable.
The fild
instruction sets the stack exception bit and C1
(accordingly) if stack overflow occurs while pushing the converted value.
Examples:
fild mem_16
fild mem_32[ecx*4]
fild mem_64[ebx+ecx*8]
14.4.5.2 The FIST and FISTP Instructions
The fist
and fistp
instructions convert the
80 bit extended precision variable on the top of stack to a 16, 32, or 64
bit integer and store the result away into the memory variable specified
by the single operand. These instructions convert the value on tos to an
integer according to the rounding setting in the FPU control register (bits
10 and 11). As for the fild
instruction, the fist
and fistp
instructions will not let you specify one of the
80x86's general purpose 16 or 32 bit registers as the destination operand.
The fist
instruction converts the value on the top of stack
to an integer and then stores the result; it does not otherwise affect the
floating point register stack. The fistp
instruction pops the
value off the floating point register stack after storing the converted
value.
These instructions set the stack exception bit if the floating point register
stack is empty (this will also clear C1). They set the precision (imprecise
operation) and C1
bits if rounding occurs (that is, if there
is any fractional component to the value in st(0)
). These instructions
set the underflow exception bit if the result is too small (i.e., less than
one but greater than zero or less than zero but greater than -1). Examples:
fist mem_16[bx]
fist mem_64
fistp mem_32
Don't forget that these instructions use the rounding control settings to
determine how they will convert the floating point data to an integer during
the store operation. Be default, the rouding control is usually set to "round"
mode; yet most programmers expect fist/fistp
to truncate the
decimal portion during conversion. If you want fist/fistp
to
truncate floating point values when converting them to an integer, you will
need to set the rounding control bits appropriately in the floating point
control register.
14.4.5.3 The FBLD and FBSTP Instructions
The fbld
and fbstp
instructions load and store
80 bit BCD values. The fbld
instruction converts a BCD value
to its 80 bit extended precision equivalent and pushes the result onto the
stack. The fbstp
instruction pops the extended precision real
value on tos, converts it to an 80 bit BCD value (rounding according to
the bits in the floating point control register), and stores the converted
result at the address specified by the destination memory operand. Note
that there is no fbst
instruction which stores the value on
tos without popping it.
The fbld
instruction sets the stack exception bit and C1
if stack overflow occurs. It sets the invalid operation bit if you attempt
to load an invalid BCD value. The fbstp
instruction sets the
stack exception bit and clears C1
if stack underflow occurs
(the stack is empty). It sets the underflow flag under the same conditions
as fist
and fistp
. Examples:
; Assuming fewer than eight items on the stack, the following
; code sequence is equivalent to an fbst instruction:
fld st(0) ;Duplicate value on TOS.
fbstp mem_80
; The following example easily converts an 80 bit BCD value to
; a 64 bit integer:
fbld bcd_80 ;Get BCD value to convert.
fist mem_64 ;Store as an integer.
14.4.6 Arithmetic Instructions
The arithmetic instructions make up a small, but important, subset of
the 80x87's instruction set. These instructions fall into two general categories
- those which operate on real values and those which operate on a real and
an integer value.
14.4.6.1 The FADD and FADDP Instructions
These two instructions take the following forms:
fadd
faddp
fadd st(i), st(0)
fadd st(0), st(i)
faddp st(i), st(0)
fadd mem
The first two forms are equivalent. They pop the two values on the top of
stack, add them, and push their sum back onto the stack.
The next two forms of the fadd
instruction, those with two
FPU register operands, behave like the 80x86's add
instruction.
They add the value in the second register operand to the value in the first
register operand. Note that one of the register operands must be st(0)
[8].
The faddp
instruction with two operands adds st(0)
(which must always be the second operand) to the destination (first) operand
and then pops st(0)
. The destination operand must be one of
the other FPU registers.
The last form above, fadd
with a memory operand, adds a 32
or 64 bit floating point variable to the value in st(0)
. This
instruction will convert the 32 or 64 bit operands to an 80 bit extended
precision value before performing the addition. Note that this instruction
does not allow an 80 bit memory operand.
These instructions can raise the stack, precision, underflow, overflow,
denormalized, and illegal operation exceptions, as appropriate. If a stack
fault exception occurs, C1
denotes stack overflow or underflow.
14.4.6.2 The FSUB, FSUBP, FSUBR, and FSUBRP Instructions
These four instructions take the following forms:
fsub
fsubp
fsubr
fsubrp
fsub st(i). st(0)
fsub st(0), st(i)
fsubp st(i), st(0)
fsub mem
fsubr st(i). st(0)
fsubr st(0), st(i)
fsubrp st(i), st(0)
fsubr mem
With no operands, the fsub
and fsubp
instructions
operate identically. They pop st(0)
and st(1)
from
the register stack, compute st(0)-st(1)
, and the push the difference
back onto the stack. The fsubr
and fsubrp
instructions
(reverse subtraction) operate in an almost identical fashion except they
compute st(1)-st(0)
and push that difference.
With two register operands (destination, source ) the fsub
instruction computes destination := destination - source. One of the two
registers must be st(0)
. With two registers as operands, the
fsubp
also computes destination := destination - source and
then it pops st(0)
off the stack after computing the difference.
For the fsubp
instruction, the source operand must be st(0)
.
With two register operands, the fsubr
and fsubrp
instruction work in a similar fashion to fsub
and fsubp
,
except they compute destination := source - destination.
The fsub mem
and fsubr mem
instructions accept
a 32 or 64 bit memory operand. They convert the memory operand to an 80
bit extended precision value and subtract this from st(0)
(fsub
)
or subtract st(0)
from this value (fsubr
) and
store the result back into st(0)
.
These instructions can raise the stack, precision, underflow, overflow,
denormalized, and illegal operation exceptions, as appropriate. If a stack
fault exception occurs, C1
denotes stack overflow or underflow.
14.4.6.3 The FMUL and FMULP Instructions
The fmul
and fmulp
instructions multiply two
floating point values. These instructions allow the following forms:
fmul
fmulp
fmul st(0), st(i)
fmul st(i), st(0)
fmul mem
fmulp st(i), st(0)
With no operands, fmul
and fmulp
both do the same
thing - they pop st(0)
and st(1)
, multiply these
values, and push their product back onto the stack. The fmul
instructions
with two register operands compute destination := destination * source.
One of the registers (source or destination) must be st(0)
.
The fmulp st(i), st(0)
instruction computes st(i) :=
st(i) * st(0)
and then pops st(0)
. This instruction
uses the value for i before popping st(0)
. The fmul mem
instruction requires a 32 or 64 bit memory operand. It converts the specified
memory variable to an 80 bit extended precision value and the multiplies
st(0)
by this value.
These instructions can raise the stack, precision, underflow, overflow,
denormalized, and illegal operation exceptions, as appropriate. If rounding
occurs during the computation, these instructions set the C1
condition code bit. If a stack fault exception occurs, C1
denotes
stack overflow or underflow.
14.4.6.4 The FDIV, FDIVP, FDIVR, and FDIVRP Instructions
These four instructions allow the following forms:
fdiv
fdivp
fdivr
fdivrp
fdiv st(0), st(i)
fdiv st(i), st(0)
fdivp st(i), st(0)
fdivr st(0), st(i)
fdivr st(i), st(0)
fdivrp st(i), st(0)
fdiv mem
fdivr mem
With zero operands, the fdiv
and fdivp
instructions
pop st(0)
and st(1)
, compute st(0)/st(1)
,
and push the result back onto the stack. The fdivr
and fdivrp
instructions also pop st(0)
and st(1)
but compute
st(1)/st(0)
before pushing the quotient onto the stack.
With two register operands, these instructions compute the following quotients:
fdiv st(0), st(i) ;st(0) := st(0)/st(i)
fdiv st(i), st(0) ;st(i) := st(i)/st(0)
fdivp st(i), st(0) ;st(i) := st(i)/st(0)
fdivr st(i), st(i) ;st(0) := st(0)/st(i)
fdivrp st(i), st(0) ;st(i) := st(0)/st(i)
The fdivp
and fdivrp
instructions also pop st(0)
after performing the division operation. The value for i in this two instructions
is computed before popping st(0)
.
These instructions can raise the stack, precision, underflow, overflow,
denormalized, zero divide, and illegal operation exceptions, as appropriate.
If rounding occurs during the computation, these instructions set the C1
condition code bit. If a stack fault exception occurs, C1
denotes
stack overflow or underflow.
14.4.6.5 The FSQRT Instruction
The fsqrt
routine does not allow any operands. It computes
the square root of the value on tos and replaces st(0)
with
this result. The value on tos must be zero or positive, otherwise fsqrt
will generate an invalid operation exception.
This instruction can raise the stack, precision, denormalized, and invalid
operation exceptions, as appropriate. If rounding occurs during the computation,
fsqrt
sets the C1
condition code bit. If a stack
fault exception occurs, C1
denotes stack overflow or underflow.
Example:
; Compute Z := sqrt(x**2 + y**2);
fld x ;Load X.
fld st(0) ;Duplicate X on TOS.
fmul ;Compute X**2.
fld y ;Load Y.
fld st(0) ;Duplicate Y on TOS.
fmul ;Compute Y**2.
fadd ;Compute X**2 + Y**2.
fsqrt ;Compute sqrt(x**2 + y**2).
fst Z ;Store away result in Z.
14.4.6.6 The FSCALE Instruction
The fscale
instruction pops two values off the stack. It
multiplies st(0)
by 2st(1)
and pushes the result
back onto the stack. If the value in st(1)
is not an integer,
fscale
truncates it towards zero before performing the operation.
This instruction raises the stack exception if there are not two items currently
on the stack (this will also clear C1
since stack underflow
occurs). It raises the precision exception if there is a loss of precision
due to this operation (this occurs when st(1)
contains a large,
negative, value). Likewise, this instruction sets the underflow or overflow
exception bits if you multiply st(0)
by a very large positive
or negative power of two. If the result of the multiplication is very small,
fscale
could set the denormalized bit. Also, this instruction
could set the invalid operation bit if you attempt to fscale
illegal values. Fscale
sets C1
if rounding occurs
in an otherwise correct computation. Example:
fild Sixteen ;Push sixteen onto the stack.
fld x ;Compute x * (2**16).
fscale
.
.
.
Sixteen word 16
14.4.6.7 The FPREM and FPREM1 Instructions
The fprem
and fprem1
instructions compute
a partial remainder. Intel designed the fprem
instruction before
the IEEE finalized their floating point standard. In the final draft of
the IEEE floating point standard, the definition of fprem
was
a little different than Intel's original design. Unfortunately, Intel needed
to maintain compatibility with the existing software that used the fprem
instruction, so they designed a new version to handle the IEEE partial remainder
operation, fprem1
. You should always use fprem1
in new software you write, therefore we will only discuss fprem1
here, although you use fprem
in an identical fashion.
Fprem1
computes the partial remainder of st(0)/st(1)
.
If the difference between the exponents of st(0)
and st(1)
is less than 64, fprem1 can compute the exact remainder in one operation.
Otherwise you will have to execute the fprem1
two or more times
to get the correct remainder value. The C2
condition code bit
determines when the computation is complete. Note that fprem1
does not pop the two operands off the stack; it leaves the partial remainder
in st(0)
and the original divisor in st(1)
in
case you need to compute another partial product to complete the result.
The fprem1
instruction sets the stack exception flag if there
aren't two values on the top of stack. It sets the underflow and denormal
exception bits if the result is too small. It sets the invalid operation
bit if the values on tos are inappropriate for this operation. It sets the
C2
condition code bit if the partial remainder operation is
not complete. Finally, it loads C3
, C1
, and C0
with bits zero, one, and two of the quotient, respectively.
Example:
; Compute Z := X mod Y
fld y
fld x
PartialLp: fprem1
fstsw ax ;Get condition bits in AX.
test ah, 100b ;See if C2 is set.
jnz PartialLp ;Repeat if not done yet.
fstp Z ;Store remainder away.
fstp st(0) ;Pop old y value.
14.4.6.8 The FRNDINT Instruction
The frndint instruction rounds the value on tos to the nearest integer
using the rounding algorithm specified in the control register.
This instruction sets the stack exception flag if there is no value on the
tos (it will also clear C1 in this case). It sets the precision and denormal
exception bits if there was a loss of precision. It sets the invalid operation
flag if the value on the tos is not a valid number.
14.4.6.9 The FXTRACT Instruction
The fxtract
instruction is the complement to the fscale
instruction. It pops the value off the top of the stack and pushes a value
which is the integer equivalent of the exponent (in 80 bit real form), and
then pushes the mantissa with an exponent of zero (3fffh in biased form).
This instruction raises the stack exception if there is a stack underflow
when popping the original value or a stack overflow when pushing the two
results (C1
determines whether stack overflow or underflow
occurs). If the original top of stack was zero, fxtract sets the zero division
exception flag. The denormalized flag is set if the result warrants it;
and the invalid operation flag is set if there are illegal input values
when you execute fxtract
.
Example:
; The following example extracts the binary exponent of X and
; stores this into the 16 bit integer variable Xponent.
fld x
fxtract
fstp st(0)
fistp Xponent
14.4.6.10 The FABS Instruction
Fabs
computes the absolute value of st(0)
by clearing the sign bit of st(0)
. It sets the stack exception
bit and invalid operation bits if the stack is empty.
Example:
; Compute X := sqrt(abs(x));
fld x
fabs
fsqrt
fstp x
14.4.6.11 The FCHS Instruction
Fchs changes the sign of st(0)'s value by inverting its sign bit. It
sets the stack exception bit and invalid operation bits if the stack is
empty. Example:
; Compute X := -X if X is positive, X := X if X is negative.
fld x
fabs
fchs
fstp x
[7] Storing a denormalized value into a 32
or 64 bit memory variable will always set the underflow exception bit.
[8] Because you will use st(0)
quite
a bit when programming the 80x87, MASM allows you to use the abbreviation
st
for st(0)
. However, this text will explicitly
state st(0)
so there will be no confusion.
- 14.4.3 - The FPU Instruction Set
- 14.4.4 - FPU Data Movement Instructions
- 14.4.4.1 - The FLD Instruction
- 14.4.4.2 - The FST and FSTP Instructions
- 14.4.4.3 - The FXCH Instruction
- 14.4.5 - Conversions
- 14.4.5.1 - The FILD Instruction
- 14.4.5.2 - The FIST and FISTP Instructions
- 14.4.5.3 - The FBLD and FBSTP Instructions
- 14.4.6 - Arithmetic Instructions
- 14.4.6.1 - The FADD and FADDP Instructions
- 14.4.6.2 - The FSUB, FSUBP, FSUBR, and
FSUBRP Instructions
- 14.4.6.3 - The FMUL and FMULP Instructions
- 14.4.6.4 - The FDIV, FDIVP, FDIVR, and
FDIVRP Instructions
- 14.4.6.5 - The FSQRT Instruction
- 14.4.6.6 - The FSCALE Instruction
- 14.4.6.7 - The FPREM and FPREM1 Instructions
- 14.4.6.8 - The FRNDINT Instruction
- 14.4.6.9 - The FXTRACT Instruction
- 14.4.6.10 - The FABS Instruction
- 14.4.6.11 - The FCHS Instruction
Art of Assembly: Chapter Fourteen - 28 SEP 1996
[Chapter Fourteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]