home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
World of A1200
/
World_Of_A1200.iso
/
programs
/
benchmark
/
aibb_6.5
/
aibb_users_manual
< prev
next >
Wrap
Text File
|
1995-02-27
|
123KB
|
2,129 lines
A.I.B.B.
Amiga Intuition Based Benchmarks
A system performance evaluation utility for the Amiga
Program Release Version 6.5
Copyright 1991-1993 LaMonte Koop
All Rights Reserved
This software is provided as is. No warranty as to the performance or
validity of data obtained within is stated or implied. Bug reports and
suggestions for improvement are welcomed, and every effort will be made
to evaluate such reports.
AIBB is freely distributable provided no fee other than a moderate
fee for disk copying charges is made for its acquirement. It may be
distributed across any electronic network, provided no fee is charged
specifically for it's download. A broad-based download fee is acceptable
provided it is charged universally for all such file downloads. All
associated files included with the distribution archive of AIBB are to
remain intact and unaltered. BBS listing notices and the like may be
included in the archive provided no alterations are made to the actual
distribution files themselves.
This program, and all accompanying files are not public domain. They
are copyright material and may not be used for commercial purposes without
permission from the author. In most circumstances such permission will
be granted, but the author must be contacted before any distribution with
a commercial product.
AIBB is not shareware, as no donation or usage fee is required.
However, any donations are always appreciated, and can only encourage
further development of the program. This is an ongoing project, and will
continue to be so as long as interest in it is shown.
INTRODUCTION
AIBB is a utility primarily designed to assist in the evaluation of
system performance on a basic level. It consists of a series of
performance tests, the results of which are evaluated against other systems
and the displayed for comparison purposes. It should be noted that care
must be taken when making a definitive evaluation of the performance of
any system, as much more is involved in making a thorough determination
than the data which can be provided by AIBB alone.
System performance evaluation, commonly referred to as "benchmarking",
is the rather dubious science of trying to determine which system or
system architecture is "fastest". Unfortunately, all to often it is not
completely clear what is meant by which system is "fast".
Computer systems in general usually consist of a number of devices
interconnected to form a whole. These individual devices can be on one
circuit board, such as the case with certain coprocessor devices, etc...
or even as seperate entities completely, physically connected in some
external fashion, such as with expansion boards. All of these devices will
have certain advantages and disadvantages with respect to performance
levels. Combined together, it is generally the overall use of the system
in general which determines how much of an effect is seen in these factors
when observing overall system performance. Before delving into these
factors further, it is necessary to first clarify a few of the key
components which are main players in the performance game.
I. The system CPU.
The CPU ( Central Processing Unit ) of a computer is often the focus
of most performance discussions. This unit is generally responsible for the
non-specific portion of any computing task. It's duties involve general
program instruction execution, and in many cases it is the device
responsible for 'mastering' the system and coordinating the system effort
as a whole. Note that this is a generalization. Systems do exists which
are distributed; their CPU is not as readily defined, or consist of multiple
processing units each coordinated as a whole. However, in the context of
this discussion a single primary device will be assumed.
Since the CPU of any system does often receive a great deal of the
overall responsibility for program execution and task organization, it is
thus a very key part in the overall performance of the system as a whole.
However, often times it is considered solely as the factor which determines
the "speed" a computer can perform a particular operation. This assumption
is not always valid, and must be thought out carefully. Many other factors
may affect the efficiency of the CPU itself in performing it's operations,
which is why the system as a whole must be evaluated towards a particular
job which it is to be given. But before this relationship becomes clear,
the other components which are factors must first be recognized.
II. Coprocessor Devices
A coprocessor is any system processing unit which works in conjunction
with the primary processor (CPU) in the actions of the system. Such devices
are often subsystem-specific, and are responsible for a particular set of
computing tasks. For example, a system may include a FPU, or Floating
Point Unit to take on the task of floating point computations. These
processors are generally fine-tuned to that specific task, and thus are
more efficient at it than the main processor would be if it were to do the
same job.
Thus, the primary use of coprocessors is to alleviate some of the total
system computing load from the CPU. These devices may be directly coupled
to the CPU, thus being closely tied to the performance of the master
processor, or may be of a loosly coupled variety. This latter type of
coprocessing unit is tied to the CPU only when it requires data and
information from the main processor, and in some situations may be capable
of accessing and modifying system memory without going through the
CPU at all. Although this concept is not unique to coprocessors alone,
it is relevant, and thus will be explained here. Such memory accessing
capabilities denote a Direct Memory Access device (DMA). These devices
do not necessarily rely on the CPU to transfer data to them, and thus are
often 'decoupled' from the CPU in such a way as to have a different
performance ratio from the CPU itself. Even non-DMA devices are often
afforded a level of concurrent, or simultaneous operation with the main
CPU, so as to provide a more efficient method of task completion. However,
DMA devices are more closely tied with another set of subsystems to be
considered when dealing with system performance.
III. Bus interfaces.
This is often a confusing topic. The term 'bus' is used a great deal,
but all to often it is not clear what is meant by it. As stated before,
a computer system consists of a number of devices integrated together to
form the whole. A bus is, simply put, a communications pathway between
devices. Over these pathways control, address, and data signals are
transferred to devices which are required to perform a portion of any
particular task. Most systems contain more than one bus in which this
communication takes place. Usually, a primary bus or combination of
specific primary buses is responsible for the majority of data transfer and
communications between all devices in general, with lesser buses used as
specific pathways between certain devices. Buses are often 'sized', or
given in terms of bit-bandwidth. Basically, this is a determination of the
maximum size of a single data transfer across the pathway between devices.
For example, an 8-bit bus can transfer an 8-bit quantity of data across
it at once, while a 32-bit bus can transfer 32 bits at a single time
( Where a bit is defined as an electrical signal value representing a binary
number, either 0 or 1 [ Logical FALSE or TRUE, which orientation depending
upon the design of the system ] for each bit ). Although there are other
sizing factors which come into play, this is a general idea, and suitable
for the discussion at hand.
As any system relies on the coordinated efforts of all its components,
the efficiency and effectiveness of communication between each device is
of importance when considering the overall performace of the computer. A
bus which is not up to par with the capabilities of the devices it
interconnects will hinder the system while one which is capable of handling
the individual components will allow for a more efficient setup. More of
this relationship will be given later after the other component members are
introduced.
IV. Input and Output ( I/O ) Devices.
This is a lose subset of devices collectively describing such units as
storage media devices ( disk/tape drives, etc... ), external communications
devices ( serial and parallel communications to external units ), and
specific control input units, such as keyboards and other data input means.
While the latter of these devices is generally not considered to be of much
influence in system performance, the former members, such as storage devices,
can have a great impact on performance levels.
Storage devices are in general the slowest of data transfer devices
on any system. For this reason they are often considered to be a
'bottleneck' in system performance evaluation. However, many advances have
been made in the design of such units, including the use of DMA access from
storage device control units to the system main memory, which helps by
alleviating the CPU's responsibility in data transfer from these devices.
Generally, I/O devices are more important to systems requiring a great
deal of access to large quantities of data, or ones involved in data
transfer as their primary mechanism of use.
V. System Memory.
This subsystem has been mentioned in passing previously, but until
this section not given full attention. System memory resources also play
a big part in overally system performance evaluation. Memory can affect
a system's performance in many ways. Depending on the speed of other
devices, utilizing memory subsystems which are slower (requiring the
addition of 'wait states - periods of time in which the data requesting
device waits for the data to be available - to properly interface to the
system) can cause any data accesses to occur at a slower rate than the rest
of the system could otherwise handle them. Many memory subsystems do
indeed utilize wait states, as other devices are too fast for such memory and
the memory access speeds required for zero-wait-state access would make for
prohibitively expensive systems. Although a completely zero-wait state
system is often not feasible, methods are available to system designers to
try and reduce the overall memory latency periods. One widely used method
is the use of cache memory.
VI. Cache Memory.
Cache memory is a memory storage medium which is usually designed for
the fastest possible access to frequently used resources, usually
microprocessor instructions and/or data. This area is generally small
compared to the size of an entire system memory complement, and thus can be
implemented at a cost lower than that of employing very fast components for
all memory. The general operation of most memory caches is to store the
most recently accessed instructions or data within the cache, then make a
check for them there upon the next memory access call. In this sense, if
the instruction or data is in the cache, it can be accessed almost
immediately, rather than having the processor fetch the required data from
the system's main memory resources. A cache 'hit' is the term used to
indicate the processor did indeed find the data within the cache, and did
not have to fetch from main memory, whereas a 'miss' denotes when the
processor was forced to get the needed data or instructions from the main
system memory. When a miss occurs, the cache will usually be updated with
this new data in the case it is called for again, thus keeping the data
in the cache fresh.
The main theory behind such caches is that many programs spend a
great deal of time within the confines of a definable event loop.
Therefore, depending on the size constraints, part or all of such a loop
can be held within the cache, decreasing execution time. Caches can be
found both external to the microprocessor or, increasingly, within the
microprocessor itself. They may be seperated such that they only
instructions or data are held individually, or may be set up such that both
types of memory accesses are kept within one cache. There are tradeoffs to
both types of design, but in general the cache in any form is a
useful mechanism for increasing system performance. One must be cautioned
however, as the cache can also lead to a misrepresentation of system
performance comparisons. Benchmarking tools are often small segments of
programs, and as such may be easily completely cached on systems equipped
with such. Thus, a benchmark result may not accurately depict the true
system performance with a real-world application which would not be
entirely housed within a such a cache.
VII. A word on clocks and clockspeed ratings.
No mention has been made of clockspeed ratings of various devices
so far because they are often misleading terms and can be taken in the
wrong context in many cases. Therefore this subject is placed in a
seperate section of discussion.
"Clockspeed" ratings of devices are in actuality frequency
measurements. Almost all digital devices operating in a computer system
today require some sort of timing input to coordinate their internal and
external responses. Generally, this is provided by a clock signal fed to
that device, and in some cases the device itself may be responsible for
the generation of additional clock outputs to other devices.
Clock frequency ratings for system components are usually today given
in terms of MegaHertz ( MHz ). This is a cyclic frequency rating indicating
a the number of cycles per second an oscilating periodic signal undergoes.
As an example, a rating of one MegaHertz indicates a frequency of one
million cycles per second.
As indicated earlier, almost all digital system components require
some form of clock input. To see where this is important, take the case
of the CPU. Generally, instruction execution timing is stated in terms
of the number of clocks a given instruction takes to complete. A faster
clock means that although an instruction takes the same number of clocks
to finish, more clock input edges occur in a given time frame, and thus
afford a faster response. In this sense, faster clock rates generally
indicate faster devices. The system bus, and other devices are also
managed in terms of clock inputs signals. These may or may not be the
same input as given to the CPU, or the CPU itself may control them itself.
Thus, differences in clock ratings between subsystems can be a source of
bottlenecking, if one faster clocked subsystem is forced to wait to
synchronize with a slower subsystem in order to transfer data and control
signals.
Let it not be thought that clock input frequency is the sole governing
force in determining component speed, however. In many cases, other
effects cause similarly clocked devices doing the same task to finish
in differing amounts of time. One way this can happen is if one device has
been enhanced in such a way as that it's internal operations are more
efficient, thus requiring fewer clocks to complete. Therefore, this factor
must be weighed as well as clockspeed in even single device evaluations.
Device designers are constantly using both increased clock rates, as well
as increased internal efficiency to advance the performance of system
components.
It should be noted here that the term "bus cycle" is often confused
with the concept of of clockspeed, because of the term cycle. A bus cycle
is related to the clock cycle rate, but not usually identical. Bus cycles
are the time required for the CPU or other device to access data and
complete an external bus operation on it. For example, the MC68000 CPU runs
a 4 clock memory access cycle in general ( asynchronous memory transfers ),
requiring 4 CPU clocks to access a given memory operand. This is assuming
a no-wait state operation. Wait states are additional clock periods added
to this cycle time in order for the data to be validly returned from the
accessed device, and are placed in the bus cycle period when a device is
incapable of responding to the data transfer request within the normal 4
clock period. This is only given as a particular example; other CPUs and
architectures have differing bus cycle timing layouts (i.e, the MC68020,
MC68030, etc... run 3 clock asynchronous bus cycles normally at zero wait
states).
VIII. Putting it all together
Many factors are involved in the evaluation of a system's performance.
But just as a computer is the sum of its parts, these factors cannot be
considered alone. They must be put together and seen in entirety in order
to get a whole picture. Moreover, the intent of the system in use is
important in weighting these factors towards which are more influencing
for any particular task.
As an example, consider a system primarily intended for data processing
tasks. One might expect that it should have a relatively fast CPU in order
to work through the data at a reasonable pace. However, if the system's
memory resources are such that they require the addition of many wait states
into their accesses, then some of the effect of having a fast CPU is offset. Even further, what type of data is being processed?
Then again, if the data is of a floating-point varieny, then a very fast
CPU might not necessarily be as effective as a moderately fast Floating
Point coprocessor added to the system. Another important factor might be
the amount of data which needs to be continously accessed from storage
devices. In the case where a great deal is being pulled from such devices,
and they are slow in providing the data to the system, then no blazingly
fast component elsewhere is going to be able to make that system setup mark
high in it's environment as the data is only able to get to the 'fast'
devices as fast as the 'slow' storage devices can provide it.
It is obvious that care must be taken in evaluating any system's
performance in order to properly take into account all factors involved.
This includes determination of the usage of the system, and how individual
components may affect this speed.
THE COMMODORE AMIGA
The Commodore Amiga is a particularly interesting system as a whole to
evaluate, as it houses a fairly complex architecture for its relative price
range. It includes aspects of multiprocessing within it's design, as well
as a multitude of different system layouts to consider. However, only
subsystems relevant to the type of testing performed by AIBB will be
considered here, these being the 'core' elements of the system, discounting
I/O devices and external communications units. Of primary interest in this
discussion is the system CPU, coprocessing devices, and memory subsystems of
the Amiga.
I. System Layout
A. Primary system processors.
The Motorola M68000 series of microprocessors is utilized as the
main CPU in all Amigas in production today. Various models of Amigas
exist which utilize all of the primary variants of this microprocessor
family, with third-party add-on accelerator units providing an upgrade
path for many systems originally borne with earlier 68000 series CPUs.
An overview of the various M68000 microprocessors and their main uses in
Amigas is as follows:
MC68000/MC68HC000
The MC68000 was the CPU the Amiga was born with, utilized in
the Amiga 1000 first, and subsequently in the A500 and A2000
stock system models. This CPU is characterized by a 24-bit
address bus, giving it a 16 megabyte addressing capability, and
a 16-bit data bus. This microprocessor is classified as being
a 16/32 bit device. Its external data pathways are 16 bits
in size, while internally it supports a 32-bit model by
containing full 32-bit register implementations.
In all stock Amiga models utilizing this CPU, the device is
clocked at the rate of the system bus, approximately 7.15 MHz for
NTSC based systems, and about 7.09 MHz for PAL systems. Certain
add-on accelerators do exist which are built around this CPU,
replacing the stock motherboard component with an add-on board
which runs the CPU at 14.28 MHz, or in some designs, 16.0 MHz.
Recently, the MC68HC000 variant of the 68000 has been
introduced into the Amiga market on an accelerator board. The
68HC000 is a standard 68000, but manufactured in CMOS technology.
This design of the part allows it to run at higher clock rates,
and with less power consumption than the standard 68000. Aside
from this, the 68HC000 is identical to the 68000 stock device.
MC68010
This CPU has not seen wide use in Amiga systems, although
it can be found occasionally. The MC68010 is pin-compatible
with the MC68000, allowing for simple drop-in replacement in any
system utilizing the latter. Most systems do not see a
tremendous performance boost while utilizing the 68010 as it's
improvements over the 68000 are not a tremendous leap.
The MC68010 includes various internal microcode enhancements
over the MC68000, allowing for faster instruction execution in
some circumstances, as well as the addition of a specialized
programmer-transparent 'loop mode' which enhances CPU performance
in tight program loops by allowing said loops to be latched into
the CPU instruction prefetch queue where external bus cycles are
not necessary for the loop code proper. As indicated earlier
though, this CPU has not seen a great deal of use in Amiga
systems, and is mostly found in circumstances where owners of
68000-based Amigas have chosen to replace their stock CPUs with
this device directly.
MC68020
A major upgrade to the line, the MC68020 includes a great
many advances over the previous members of this microprocessor
family. The MC68020 is the first fully 32-bit capable
microprocessor of the M68000 series, incorporating full 32-bit
address and data buses, as well as a 256 byte instruction cache,
in order to keep program code sections used often within a
fast-access medium. The MC68020 is a major step above the
MC68000 or MC68010, with an architecture more capable of handling
larger demands upon its resources.
The 68020 is utilized in earlier acclerated Amiga systems,
including as the main processing engine of the first A2500 series
of machines which housed the CBM A2620 accelerator unit. Many
acclerators using this CPU were produced by third-party
manufacturers, including low-cost units found in some A500 units,
as well as in the A2000 line. In most designs, this CPU is
clocked at approximately 14.28 - 16.0 MHz, with a few of the
lower-cost accelerators running the CPU at the ~7.15 MHz (NTSC) /
~7.09 MHz (PAL) system clock of the Amiga.
MC68030
Improvements were made to the MC68020, including the addition
of a 256-byte data cache to complement the existing instruction
cache, and the inclusion of an on-board memory management unit
( MMU ) in order to produce the MC68030. Additional improvements
exist internally to this CPU over the MC68020 to give it a stand
against its generation of competing microprocessors. The 68030
can be viewed as an incremental improvement to the 68020, adding
additional features but not being a tremendous architectural
change from its predecessor.
The MC68030 is found as the accelerated CPU of the later
A2500 series of Amigas, as well as being the main processor of
the Amiga 3000 line. This microprocessor has also been widely
implemented in accelerator units for all models of Amigas and is
used at a wide variety of clock frequencies ranging from 16.0 MHz
to 50.0 MHz.
MC68040
Currently found in a variety of accelerators, and as the
main processor for the A4000/040, the 68040 is a generation
leap over the previous MC68030 model and incorporates a great
many advances over all previous models in this series of
microprocessors. Both instruction and data caches found in the
MC68030 are present, but their size has been increased to 4K
bytes each. In addition, the data cache of this processor now
supports a 'CopyBack' mode of operation, providing for faster
data access times by allowing memory writes to be deferred to the
cache until an update of memory contents is absolutely required.
On-chip MMUs exist for both data and instruction streams within
the CPU, and the internal pipelines have been further optimized
for increased performance. A subset Floating Point Unit (FPU) is
also included on-chip for floating-point calculations.
The 68040 is at present found in only 25 and 33 MHz rated
varieties at this writing, though this will likely change in the
future. Unfortunately, it does seem to be a developing trend in
the Amiga community to somewhat overclock the 68040, an action
neither sanctioned nor recommended by Motorola.
There are several variants of these primary microprocessor models in
production. The newest such variants are the Motorola "EC" series of
M680x0 parts, and "LC" series of MC68040 parts. The "EC" ( Embedded
Controller ) series are characterised by changes from the standard part
ranging from simple packaging to the removal of certain internal features.
This latter option is what has been taken with the MC68020, MC68EC030, and
MC68EC040 parts. The MC68020 is given by a 24 bit address range, as opposed
to the normal 32 bit address range of the standard 68020 part. Aside from
this difference, it is identical to the 68020. The MC68EC030 is
characterized by the lack of an on-chip MMU. It functions identically to
the standard MC68030 with this exception. The MC68EC040 and MC68LC040
are similar to each other except that the on-board MMUs of the normal 68040
are preserved, in the LC part, with just the FPU not functional on the unit,
while the EC part removes both the FPU and MMU units from the chip.
At this point it is of interest to bring up a point of common interest
with accelerated Amiga systems; that of asynchronous vs. synchronous
accelerator designs.
Synchronous designs were the first accelerators to appear for the
Amiga. These are generally found in the MC68020 based accelerator units,
and also in many of the low-cost MC68000-based accelerators. A synchronous
design is one in which the devices present on the accelerator are clocked
at a rate which is absolutely synchronized to the main system clock signals.
For the A500 and A2000, this means the clock rate of such accelerators
must be an even multiple of the ~7.15MHz (NTSC) / ~7.09 MHZ (PAL) system
clock rate. Because of the difficulties involved in maintaining
synchronicity at high clock rates, generally these accelerator units are
restricted to about 14 MHz, or double the system clock rate.
Asynchronous designs, on the other hand, have no such restrictions.
These units are somewhat more difficult to design, but in general the
accelerator components may be operated at nearly any clock input, provided
they are themselves capable of performing at the given frequency. This
operation mode is what all MC68030-based accelerator designs for the A500
and A2000 utilize, thus giving the wide range of clock rates found in these
accelerators.
It must be noted however that an ambiguity exists in the terms
synchronous and asyncronous. The 680x0 microprocessor series is characterized
by normally running asyncronous bus cycles. This simply means the processor
initiates a read/write action, and it is up to the external device to terminate
( acknowledge ) the cycle, thus completing it. This behavior is NOT related
to accelerator design as might be confused by the use of the same terms. In
accelerator design terms, asyncronous and synchronous are designating how the
accelerator state machine relates to the main system clock, and NOT how
individual bus cycles are run by the CPU in general.
Many accelerated Amigas also utilize an FPU for floating-point math
intensive operations. The main FPUs in use by the various Amigas available,
and the add-on accelerators in use on the Amiga, are manufactured by
Motorola as well, either as seperate coprocessor devices, or as in the
case of the MC68040 are embedded within the main CPU itself. An overview
of the various FPUs in use is given below:
MC68881
This is a seperate floating point coprocessor device
which provides fast hardware-supported floating-point operations
to any system software which supports it's use. This unit does
provide a certain level of concurrancy, giving it the abililty to
perform certain instructions at the same time the main CPU is
performing other operations. Support for this coprocessor is
provided either by a built-in hardware microcode interface, found
on the MC68020 and MC68030, or by software trap interfacing for
the MC68000 and MC68010. The latter method is used in but a few
early Amiga accelerator boards, while the preferred interface,
that to the MC68020 or MC68030, is supported by virtually all
accelerators utilizing those CPUs.
The MC68881 may be run asynchronous to the CPU clock input,
meaning it need not run at the same clockspeed as the CPU itself.
Thus, a faster FPU may be used to give somewhat of a boost to
floating-point operations. The MC68881s in use in Amigas today
are found mostly running at clock frequencies ranging from
12-20 MHz.
MC68882
The successor to the MC68881, this unit incorporates the
same interface and operations as the former device, but with
certain internal enhancements. The microcode for many operations
has been optimized for faster response, and support for further
multiple floating point instruction concurrency was added. In
general this FPU will perform at about 1.5 times the speed of the
MC68881 at the same clock input frequency. The MC68882 is
primarily operated at clock rates of 12-50 MHz, depending on the
accelerator or system utilizing it.
MC68040
The MC68040 CPU incorporates an FPU within the processor
itself. This FPU unit is a basic subset FPU of the MC68882,
eliminating mainly the transcendental (sin, cos, etc...), and
complex functions found in microcode on the former. Nevertheless,
the optimized nature of the existing FPU instructions provided
allow for emulation of the missing functions in such a way as to
give faster execution than the MC68882 for almost all operations.
B. The custom chips.
In addition to the main processing units, the Amiga also incorporates
a number of custom designed devices, known collectively as the Amiga's
custom chips. Their primary purposes are varied, but they are generally
in charge of such things as DMA access and arbitration to various memory
areas, and graphics/sound generation and effects. These custom chips are:
Agnus/Alice
Probably the most talked about custom chip, Agnus is found
in a number of flavors, ranging from the original device, to the
'super' version found in the A3000. Aside from minor internal
changes, the main differences between these different versions is
the amount of memory they can directly access. Agnus is
responsible for for control of 25 system DMA channels, generation
of all system clocks in the A500 and A2000, and provides control
and addressing for CHIP RAM, which is the memory accessable by
these custom chips. The size of this memory region is determined
by the Agnus in use, and is either 512 KBytes, 1 Megabyte, or
2 Megabytes in range. As the custom chips are utilized primarily
for graphics and sound coprocessing tasks, all such data must be
located in this CHIP RAM area.
Agnus also contains within it what is referred to as a
Blitter. This internal device is a fast memory copy unit designed
to move areas of memory as efficiently as possible, and has the
capability to also perform specific logic manipulations to the
data in the process.
Finally, Agnus also contains Copper. Copper is the system's
Display Synchronized Coprocessor. This device assists with screen
refreshes and display building, and is a major factor in the
Amiga's graphics engine.
Alice is the successor to Agnus, and part of the AGA graphics
chip found in the latest Amiga models. Containing the same 16 bit
data bus interface to CHIP RAM, Alice is nonetheless capable of
directing 32-bit fetches to RAM, as well as take advantage of
double CAS page mode cycles, providing for a larger bandwidth to
memory, and increased performance.
Denise/Lisa
The Denise custom chip is primarily responsible for color
generation and display resolution modes. This chip also contains
the eight hardware display sprite controllers used in the system.
Lisa, part of the AGA custom chip set, is the replacement
for the aging Denise. This new chip is implemented in full CMOS
technology, and incorporates the ability to handle up to 24-bit
RGB video, as well as do double 32-bit fetch cycles to memory
which increase its data bandwidth rate to 64 bits per cycle, or
four times that of the earlier Denise chip.
Paula
Paula is a more or less diverse device. It controls sound
generation, contains the system floppy disk control circuitry,
and houses the I/O control circuitry for the disks as well as
external control ports. Paula also contains an interrupt control
system for various system operations.
The custom chips of the Amiga and the coprocessors associated with
them are designed in such a way as to alleviate the main CPU of many
intensive tasks, such as graphics operations and sound generation. They
support a concurrent level of operation, allowing the main CPU to continue
with non-specific computing tasks while the custom chips handle their
respective operations. The devices are capable of DMAing directly into
the CHIP RAM area, freeing the CPU completely from task responsibility
in those respects.
Bus layout.
The seperation of operations and the definition of the CHIP RAM memory
area is further accentuated by the fact that the Amiga utilizes two buses
along these lines. The CHIP RAM bus is a seperate entity from the main bus
utilized by the CPU and other devices, but is accessable by the CPU as
well. The seperation can even be greater given the fact that the CHIP RAM
bus can be decoupled from the CPU bus completely under certain
circumstances.
The CHIP RAM bus is primarily utilized by the custom chips, with the
CPU being given access to it on an interleaved cycle basis ( every other
bus cycle can be a CPU access cycle ). The custom chips have priority in
this domain, and this is where the idea of bus contention arises. If a
great deal of bus activity is in progress by the custom chips, they may
'lock out' the CPU, forcing it to wait if it needs data or information
from this bus' memory space. This is where the touted 'FAST RAM' comes in.
FAST RAM is memory not on the CHIP RAM bus, but rather on the main
system bus or expansion bus. This memory is not accessable by the custom
chips, and thus no contention for it's access occurs between them and
the CPU. Due to the seperate nature of the buses, it is possible for the
CPU to be processing instructions and data utilizing FAST RAM while the
custom chips are concurrently operating in the CHIP RAM area. This
parallel operational status allows the Amiga to perform a great variety
of graphics operations in such a way as to done on a bus which is not
operated at a great speed.
The CHIP RAM bus on all Amigas is operated at a clock frequency of
approximately 7.15 MHz. On the A500 and A2000, this is the main system
clock frequency. For those machines, the CHIP RAM bus is accessed via
a 16-bit wide bus port, while on the later A3000/A4000 systems the bus port
for external accesses is a full 32-bit interface, affording larger data
transfer sizes at the same clock rate.
Because of bus contention, a system containing only CHIP RAM may very
well have slower operations than one which contains FAST RAM as well. The
FAST RAM equipped machine will be capable of having the CPU operate
concurrently on information on that bus, while the custom chips operate on
their tasks. The CHIP RAM only system is going to have circumstances where
the CPU will be forced to wait to access data, as the custom chips may be
utilizing the CHIP RAM bus heavily.
FAST RAM in the A500 and A2000 series of machines can be located on
many devices, from standard expansion card extenders which exist on the
system expansion bus and operate at the system clock frequency, to other
methods of RAM addition which have been devised that do not directly use
the common Amiga expansion routes. FAST RAM located along the standard
expansion backplane on these systems operates at the system bus clock
rate ( 7.15 MHz ), and is accessed accordingly. On A3000 machines, FAST
RAM is generally located on the system motherboard, and is accessed
according to the system clock rate of those machines, which on stock models
may be 16 or 25 MHz.
It should be noted that some systems utilizing only 512K of CHIP RAM
have in their memory lists a region of RAM which is called FAST, but in
fact is on the same bus as CHIP RAM. This is generally the memory found
on the A2000 motherboard for 512K CHIP RAM machines, or on the A501
expansion card for A500s. This memory will suffer from the same bus
contention that CHIP RAM is exposed to, and thus it is generally advisable
to be sure that program code is not put here unless it has to be ( e.g, if
true FAST RAM exists, it should be prioritized ). The utility program
"FastMemFirst" supplied by CBM is meant to do just that.
FAST RAM located within the domain of an accelerator is not limited to
the system bus clock rate. It may be operated at such, but in general can
be accessed at a clock rate much different, usually at the accelerator's
CPU clock. Systems utilizing accelerators benefit from this setup, as
an accelerator does not change the system clock rate, and therefore in
order for an accelerator's CPU to use system resources, it has to
synchronize with the system clock, and may even have to contend with a
narrower bus interface. Such is often the case on the A500/A600 or A2000
when utilizing MC68020 or MC68030 based accelerators, which are best suited
for 32-bit bus ports. Since those processors take a performance hit when
accessing narrower bus ports, as well as a hit from the possibly slower
clock rate of the system bus, accelerators often are equipped with their
own RAM resources which is designed to operate at the CPU clock frequency
and utilizes a more efficient bus port size ( 32-bit ). The case with the
A3000/A4000 is slightly different.
The A3000 and A4000 utilize a 32-bit bus for their memory resources
already, therefore this is not a problem with accelerators for those
machines. However, the bus on the A3000/A4000 is clocked at 16 or 25 MHz
( depending on the model ), and if a faster CPU is used in an accelerator
it may be profitable for the unit to contain it's own RAM resources in order
to lower access delays to a minimum. The A3000/A4000 does include
provisions for an accelerator to supply it's own clock signal to the
motherboard, but as of this writing, this has not been employed by any
devices.
II. Summary and overview.
It can be seen from all this that there is a great deal to be visualized
when trying to make a comparison of system performance levels. A great
many factors come into play when trying to determine just what system is
best and quickest for the task at hand. Various factors can determine how
efficient an accelerator is on a particular system, or how efficient a
system is in general. Interface efficiency, accelerator or general
system design, and intended use all play a part in determining which setup
is the 'winner' in the speed race. Indeed, there may not be a winner,
except in a particular task category, and this must always be remembered.
No benchmark or performance test can possibly hope to test all of these
categories, and the others which also play roles. Thus, it is necessary
to utilize data obtained from any set of benchmarks as only a portion of
the picture to be analyzed, and not as a rock-solid performance indication.
System design has improved to the point where many benchmarks can be fooled
into giving higher performance measures than would be found in any typical
application. As benchmarks are typically small pieces of code, they must
be evaluated as such. They can indeed give clues as to the performance
level of a system, but certainly not a definitive answer.
OVERVIEW OF AIBB
Amiga Intuition Based Benchmarks ( AIBB ) is a program primarily
designed to test various aspects of system performance at the CPU and
accompanying device level. It does not test such things as I/O efficiency
and storage media data retrieval and placement efficiency ( storage I/O ).
The tests contained within AIBB by no means give a complete picture of any
system's performance level, but does provide some basic information and
comparison data for a variety of systems.
AIBB is divided into a number of sections. Several are simply
informative in nature and are designed to give a better picture of the
system conditions during the actual testing phases. Other portions of the
program allow for a certain measure of system control, giving the ability
to somewhat modify the parameters under which tests are performed. It is
important to try to pay attention to the parameters and information given
by AIBB, as they may in turn give important clues as to the nature of the
test results reported.
AIBB is set up to allow a user to perform a number of tests on the host
system, and compare those results against a series of other systems.
Comparison data is given in both graphical and numerical form. AIBB also
allows the entire series of tests to be performed, and the results and
system state stored as a "load module" which may later be loaded and used
as one of the comparison systems against which a possibly different host
will be checked against. Tests may be manipulated by code type and system
situation in order to allow a better picture of the system performance
criteria being looked at.
I. System Requirements.
AIBB may be run on any Amiga system utilizing AmigaOS 1.3 or greater,
but it should be noted that the tests performed are designed primarily for
accelerated systems or fast systems in general. Therefore, tests may be
exceedingly long on Amigas utilizing slower CPU units, and the general
speed of the program may seem a bit slow on such platforms.
Users of MC68040 based systems must be utilizing AmigaOS 2.0 or
greater in order to run AIBB. Modified versions of AmigaOS 1.3 do exist
which are patched to somewhat deal with the problems of that OS version
and the 68040, but as per CBM's official stance, this is not a supported
method of utilizing the 68040 as a system processor. For this reason,
AIBB will abort if it detects a 68040 and the system OS version is less
than 2.0.
AmigaOS 1.3 users with accelerators must be sure to be using the latest
SetPatch routines for those OS versions. ( SetPatch v1.34 ) SetPatch
corrects a problem with FPU code with those OS versions, and is necessary
for proper operation of AIBB. AmigaOS 2.0x also is shipped with a SetPatch
routine which should be executed in the Startup-Sequence to assure any
future OS bug fixes and corrections will be applied.
When AIBB first starts up, it performs a series of system tests to
determine the type of system it is being operated on, ascertaining such
things as CPU type, FPU type, MMU type, etc. Unfortunately, some low-cost
accelerator units may experience a problem here...most notably in the
MMU type tests.
The MMU on systems which house the unit as a seperate device ( such
as 68020 + 688851 systems ) is treated by the CPU as an external
coprocessor...much like the FPU on such systems is. The MMU or FPU in
such a setup responds to an instruction when the instruction coprocessor
ID field matches the hardware set ID of the device. This allows more than
one coprocessor in a system ( such as both an MMU and FPU ). The ID
decoding mechanism is handled in hardware...and this is where the problem
arises with some accelerators. Such accelerators do not fully decode the
coprocessor ID, and thus the FPU may respond as an MMU, etc. Most of the
time this causes no problems to the system, but it does for AIBB which is
looking for these devices. Unfortunately, AIBB will most likely not work
on systems afflicted with this until the hardware bug is corrected by the
manufacturer. It should be noted that most systems/accelerators do NOT
have this problem, but a few may show up from time to time.
This program does not absolutely have any absolute requirements other
than those previously mentioned in order to be operated, but it does have
some suggested configurations. In order to utilize the program's file
functions, AIBB must be able to find one of the following shared libraries
in the libs: directory on your system disk:
1. asl.library ( AmigaOS 2.0 systems only )
2. kd_freq.library ( library version 3.0 or greater )
3. req.library ( library version 2.0 or greater )
4. reqtools.library
AIBB will search for these libraries in this order, and utilize the first
one found. Primarily, the library need is for file requester utilizing
functions within AIBB. AIBB will still operate without finding one of
these libraries, but it will block access to the file-requesting functions
it normally provides.
This will be the last version of AIBB to include support for AmigaOS
versions below 2.0. At this time, more effort is being placed into
compatibility with later AmigaOS generations, and this will be the mode
of support emphasized.
Getting Started.
AIBB may be started from either the CLI/Shell or WorkBench. If the
latter method is used, it is imperative that the icon used ( if not the
supplied one ) have it's STACK value set to 20000. AIBB invocations from
the CLI/Shell have no special requirements or stack settings as AIBB will
perform the necessary set-up in this environment. It is recommended that
careful attention be paid to the existing system memory resources before
starting AIBB. AIBB is quite large, and if you wish it and it's test code
to be loaded into a certain memory medium ( generally a fast medium if
possible ), then enough contiguous memory must exist in that memory region.
AIBB will give information as to where exactly it's code is located, but
if you are interested in loading AIBB in a certain region, this must be
taken into account BEFORE starting the program.
Several options are available from the command line when invoking AIBB
from the CLI/Shell, or equivalently through the icon TOOLTYPES array when
starting from the WorkBench. These options are listed below:
CLI/Shell Options: These options must be preceded by a dash ('-'), with
no spaces between the dash and the option. The
argument following the option is listed below as <arg>
and should be formatted as such: -<option><arg>, such
as -m0.
-c<arg>: Sets the CPU type AIBB will use for the host system.
Available arguments are:
0 : 68000 CPU
1 : 68010 CPU
2 : 68020 CPU
3 : 68EC020 CPU
4 : 68030 CPU
5 : 68EC030 CPU
7 : 68040 CPU
8 : 68EC040 CPU
9 : 68LC040 CPU
Any other value will be ignored.
-f<arg>: Sets the FPU type AIBB will use for the host system.
Available arguments are:
0 : NO FPU
1 : 68881 FPU
2 : 68882 FPU
3 : 68040 FPU (Internal)
Any other value will be ignored.
-m<arg> Sets the MMU type AIBB will use for the host system.
Available arguments are:
0 : NO MMU
1 : 68851 MMU
4 : 68030 MMU (Internal)
7 : 68040 MMU (Internal)
Other values will be ignored.
-cs<arg> Sets the CPU clockspeed aibb will show/use for the
host system. The argument field should be a valid
clockspeed rating, such as 25.0 for a 25MHz rating.
-fs<arg> Sets the FPU clockspeed aibb will show/use for the
host system. The argument field should be a valid
clockspeed rating, such as 25.0 for a 25MHz rating.
-b This option accepts no arguments. Supplying it on
the command line turns off the 'Click' sound AIBB
makes when a gadget is pressed.
WorkBench options: These options mimic the ones given above for the
CLI/Shell, with the exception that they are contained
within AIBB's icon TOOLTYPES field. The options
available are:
CPU=<arg>:
Sets the CPU type AIBB will use for the host system.
The CPU type may be specified as:
68000
68010
68020
68EC020
68030
68040
68EC030
68EC040
For example, to specifiy a 68EC030 CPU, the option
to give would be CPU=68EC030.
FPU=<arg>:
Sets the FPU type AIBB will use for the host system.
The FPU type may be specified as:
NONE
68881
68882
68040
For example, to specifiy no FPU, the option to give
would be FPU=NONE.
MMU=<arg>:
Sets the MMU type AIBB will use for the host system.
The MMU type may be specified as:
NONE
68851
68030
68040
For example, to specifiy no MMU, the option to give
would be MMU=NONE.
CPUSPEED=<arg>:
Sets the CPU clockspeed aibb will show/use for the
host system. The argument field should be a valid
clockspeed rating, such as 25.0 for a 25MHz rating.
For example: CPUSPEED=16.0 would set a CPU speed of
16.0MHz which AIBB will then use internally.
FPUSPEED=<arg>:
Sets the CPU clockspeed aibb will show/use for the
host system. The argument field should be a valid
clockspeed rating, such as 25.0 for a 25MHz rating.
For example: CPUSPEED=16.0 would set a CPU speed of
16.0MHz which AIBB will then use internally.
NOBUTTONBEEP:
Using this tooltype option turns off the click sound
AIBB uses when a gadget is depressed.
IMPORTANT:
The CPU/FPU/MMU options given above are for special circumstances only!
Normally, AIBB will determine all of the above independently, and tampering
with these values will be detrimental. However, these options can come in
very handy under certain circumstances.
Some accelerator models on the market suffer from a hardware bug: They
do not properly decode the coprocessor ID in hardware for systems with
such devices. The end result is attempted accesses to an MMU may end up
with the FPU on the system erroneously responding instead. Now, since AIBB
relies on an 'exception' occuring when no MMU exists in its efforts to ID
the system MMU, this becomes a problem if the FPU responds instead. The
result of this is that AIBB may fail to function properly on such systems,
and this is where the above options come in.
When the options above are specified, AIBB will take them at face value.
No further testing of the system is attempted. Therefore, by specifying
various values, the problem above can be circumvented as AIBB will not
perform the internal checks which may cause errors. If you suspect your
system is one with such a hardware bug, try manually setting the system
CPU, FPU, and MMU types to see if this cures the problem. You should not
have to set the device clockspeed ratings manually, as AIBB will still
be able to perform this.
------------------------
ONCE AGAIN, do not take the CPU/FPU/MMU command options lightly! If
false values are given, it may very well result in program errors within
AIBB, or possibly a system failure. Under most circumstances, you will NOT
need to use these options AT ALL, and can allow AIBB itself to determine the
system configuration.
------------------------
Under some circumstances, AIBB may request that the processor type
be supplied manually by the user. This is primarily in situation where
AIBB can't positively determine whether a 68EC030 or 68030 exists, or in
the case of 68LC040/68EC040 determination. If AIBB requests this
information, please supply the correct processor type, as failing to do
so can result in serious problems on occasion. This is especially true
in the case of the 68EC030 vs. the standard 68030. AIBB may not be able
to determine the exact processor in this case if for some reason the
MMU enabled bit is set in the processor's Translation Control ( TC )
register. Both the 68EC030 and 68030 have valid TC registers, even if
with the EC part the MMU is non-functional. Since AIBB attempts to
parse MMU tables if the MMU is active (for locating system structures),
fooling AIBB into thinking that an EC part is a standard 68030 in the
case of a seemingly active MMU can result in AIBB attempting to parse
a non-existant MMU table. This can be very problematic, and in extreme
cases result in a system failure.
------------------------
Once AIBB loads, a few moments may be needed by the program while
it evaluates the system it is being operated on, the exact time depending
on the relative speed of the host system in question. A screen displaying
a message of that sort will be given while this is in progress. Following
this evaluation, you will be presented with AIBB's main program screen.
III. Operation/Features of AIBB
A. Main Screen Description
AIBB's primary screen consists of several informational areas designed
to provide information about test operations and basic system information.
These areas are divided up as follows:
Performance Graph
The performance graph is a bar graph display of the comparisons
made after each test is performed. Ratings are given in reference
to the base machine for comparisons, with the highest performing
system having it's bar displayed in RED, while all others are
in YELLOW. Note that although numerically two machines may have
the same results out to 2 decimal places, AIBB may still show one
in red. This is due to rounding, and the fact that the one
highlighted machine does in fact have a higher rating if a few
more decimal places were shown numerically. However, such small
quantities should not be taken literally, as far too many variables
exist to use such values in accurate comparisons.
Test Result/Information
This area provides several pieces of data. First, it gives the
name of the test last whose information is being displayed
currently. The numerical result of the test performed is given
here, as well as the memory node reference number where the test
code, and possibly any test data is located. To reference these
node numbers, please see the section on the "System Information
Display".
Base Machine Indication
Below the Test Result/Information area is a small reference
which lists the current comparison system being utilized as the
base for all comparisons performed.
Comparison Information
This section provides several key pieces of information about
test performance. It gives the numerical ratings of all systems
utilizing the base machine as a reference. These values are the
same as those used to generate the performance graph.
The system headers here which label the machine in each
row are in fact gadgets that when pressed will move AIBB to its
System Information Display, showing data on the system selected.
In addition, this area houses the test code type gadgets/
indicators. Selection of code options for the host system causes
AIBB to perform any tests utilizing those options. Selections
under the comparison systems result in AIBB using the figures for
that code type ( previously obtained when the comparison data was
generated ) when making comparisons. Note that not all options
will be available, depending on system capabilities.
The gadgets allow for seperate selection of CPU and floating
point code models. Floating point code selections will only have
effect on tests which use such operations, while the CPU code
model will be in effect across all tests. Thus, when performing
a non-floating-point test, the current floating point code model
selection is ignored.
The gadgets are cyclic in nature; repeated selection will move
them through all available code models. The currently available
CPU code types are:
Standard 68000 Code
Having this item selected sets the code type to that which
is compatible with all MC680x0 series microprocessors. Note
that this means no advantage is taken of the capabilities
or code optimizations available on later-generation
microprocessors of this series, but it is a good base
selection as it can be utilized on all existing Amiga systems.
68020+ Code
This item selects code compatible with later generation
MC680x0 series processors. It will not be compatible under
most circumstances with earlier ( MC68000 or MC68010 ) based
systems, but will take advantage of some of the more advanced
capabilities of these later processors in the series.
The currently available floating point code options are given
below. As indicated earlier, they will affect only tests which
utilize floating-point math in nature.
Standard Math Code
Using this option sets the code type to use software
emulation of floating point routines. This is compatible with
all Amiga systems in use, as it is not hardware specific.
In-Line Coprocessor Code
This option sets the test code type to that which uses
faster in line FPU instructions for floating point operations.
As not all systems will have a coprocessor available, this
option is not universally available on all systems.
68040 Enhanced Math Code
For use with 68040-based systems, this option allows the
use of FPU code which is more optimized for 68040 processors.
Such processors do not have hardware-assisted transcendental
functions and this option will set up for in-line emulation of
such, alleviating the need for trap-based libraries such as
68040.library or similar vendor supplied code.
Basic Information
Located just below the performance graph, this area provides
key pieces of information about the current state of the host
system. The system CPU type, FPU type, and MMU type in use are
displayed, as well as the current operational status of the MMU.
Also displayed are the approximate CPU and FPU clock speed ratings,
as calculated when AIBB first evaluated the host system on startup.
This area also contains the system cache status indicators/
gadgets. These show the current state of any CPU caches which may
exist, and also allow their condition to be changed by selecting
the cache parameter desired. Clicking on a particular parameter
toggles it through both its "ON" and "OFF" states.
A lot of confusion tends to exist about the CPU cache modes,
and the MC680x0 cache BURST mode ( supported on the MC68030 and
MC68040 ) is often not understood. BURST mode operations are a
special form of cache filling ( updating the contents of the cache ) where an
entire "line" of cache data may be filled sequentially and faster
than the single-entry mode of cache filling. A cache "line" in
this case is a series of 4 longwords ( 32 bits each ) arranged
simplistically as:
entry: 1 2 3 4
line 1 ---- ---- ---- ----
line 2 ---- ---- ---- ----
...
where each entry is one longword. The MC68020 and MC68030 utilize
cache sizes of 16 lines, giving 256 bytes of cache storage. The
MC68040 increases this to give a total of 4K of cache space for
each of the data and instruction caches.
BURST mode is essentially a compromise in performance.
Average-case CPU performance is enhanced at the cost of worst-case
performance. The latter effect is true because during BURST mode
operations the CPU bus controller is committed to a memory fetch
sequence for a longer period of time than with single-entry mode.
The mode enhances average and best case performance by allowing the
CPU to sequentially fetch 3 additional longwords from memory faster
than normally done by the usual asynchronous single-fetch bus cycle.
Once it has fetched the first longword, the next 3 are clocked into
the cache line utilizing only 2 clocks per fetch, thus filling one
cache 'line' in 9 clocks ( assuming a zero-wait state initial
fetch ) rather than 15 clocks. The theory behind this is that the
data/operands sequentially surrounding the initial fetch will most
likely be needed soon in any case, and placing them in the cache
leads to their eventual faster access.
BURST mode operations are not universally applicable to all
systems however. Generally, the memory controller on the system
( or particular memory board ) must be capable of supporting BURST
mode operations, or the BURST request by the CPU will not be
fulfilled. In systems not capable of these modes, activating them
will not be detrimental, but will go unnoticed in performance terms.
The CPU will request BURST fills when it deems appropriate, but the
memory controller will not acknowledge the request and thus simply
force the CPU to do single-entry fetches as in standard operation.
Test Activation Gadgets
These are located in the lower right-hand corner of the screen
and serve several purposes. Normally, they are utilized to start a
test, but this is dependent upon the mode of operation AIBB is
currently in. See the section on "Review Mode" for further
information of this nature.
Activation of a gadget in standard mode starts a test with the
current code parameters and general settings, as detailed in the
appropriate sections later. Tests are divided into two groups:
"Standard" and Floating-Point. Standard test types, denoted with
WHITE lettering, are more general to the system, and represent code
more often found in operational situations. Floating-Point tests,
given YELLOW lettering, utilize a great deal of floating-point math
to test the system's performance across that domain. See the test
descriptions for more detailed information on the tests available
within AIBB.
B. Main Screen Menus.
AIBB's primary screen has attached to it a number of menu items, which
give even more options and control over program operation. Those operations
are described below, in the order of the menus as they appear on the screen.
Menu 1: General
About AIBB
This option presents a requester giving credits and
information about this version of AIBB.
Load Module Prefs
AIBB allows the use of alternate systems than those
contained internally in order to make comparisons against
the host system. This menu item will bring up a requester-like
arrangement which will allow the paths to load modules to be
used in place of the internal defaults to be specified. To
replace an internal module at startup for comparisons, simply
enter the full path name to the alternate load module in the
respective entry in this requester. Leaving an entry blank
informs AIBB to use it's internal default for that system.
Note that this configuration will take effect when AIBB is
next started, and the the next menu item, "Save Configuration"
as detailed below, must be selected to save the choices made
here.
Color Settings
The colors AIBB uses for its main screen displays are
user selectable, and may be changed if personal taste desires.
This menu option will bring up a color requester which will
allow AIBB's palette to be modified to suit. This may be
particularly useful for users of monochrome monitors which can
only display levels of grey, rather than color. Under such
circumstances some of AIBB's normal colors may map to grey
shades so similar as to be indistinguishable on the screen.
Use of this option can correct such a situation.
Use of the "Save Configuration" menu item will save the
color palette chosen with this option to file, and AIBB will
use that palette in subsequent invocations.
Save Configuration
This saves the current state of AIBB's menu item selections,
as well as the current order of the comparison machines as they
are placed. For more information on these regards, see the
section on loading new comparison modules from the default
systems within AIBB. AIBB currently saves this data to a file
called "aibb.prefs", which may be located in an assigned
directory called AIBB:, or your system S: directory. This
file will be searched for, in that order, when AIBB is first
invoked, and the values contained within will set AIBB's
startup options. If AIBB cannot locate a preferences
configuration file, it will notify you and use internal
default values.
QUIT
This item forces termination of AIBB.
Menu 2: System
AIBB Task Priority
A submenu-endowed item, this selection allows for the
changing of AIBB's task priority. This is primarily for
running tests while still allowing multitasking to occur,
while examining the effects of different task priority levels.
For information on disabling multitasking during test
operations, see the "Disable Multitasking" entry under the
Test Options menu descriptions.
Menu 3: Test Options
Disable Multitasking
When this item is selected, it indicates AIBB should
perform all tests in such a way as to disable all system
multitasking during the run of any test. This allows a figure
to be generated which indicates the system performance FOR
THAT TEST more accurately, as there is no task context
switching during the test runs. Note that all comparison
system figures are generated with this option enabled, so this
should be selected in order to compare the systems on an even
par. When this item is utilized, the previously mentioned
ability to set AIBB's task priority will have no impact on
test performance, as no task switching will occur, and thus
the task priority level becomes meaningless.
It should be noted that when using this option, it is a
good idea NOT to be running much in the background. The
Amiga's operating system is a near-real-time setup, requiring
in many cases fast response to system conditions. Use of this
option can affect certain other operations adversely, most
notably that of serial communications and the like.
Screen Overlay
Using this option results in AIBB putting a one bitplane
( two color ) low-resolution screen over it's main screen
during every test. AIBB's normal screen is a high-resolution
4 bitplane ( 16 color ) screen, and on CHIP RAM only systems,
and for some tests even on FAST RAM equipped systems this may
result in a great deal of bus contention on the CHIP RAM
bus. Subsequently, performance levels may be adversely
affected for the test. The use of this option attempts to
alleviate some of this problem by utilizing a screen overlay
which minimizes bus contention on the CHIP RAM bus by limiting
the required DMA activity by the custom chips to display it
while it is the topmost screen. Again, all comparison data
for the other systems is obtained with this option enabled,
so in order to keep comparisons on par this option should be
enabled, which it is by default values.
Note that for graphics-related tests this option will not
be activated as it would be detrimental to what those tests are
indeed trying to analyze. It is advised that if this option
is enabled while multitasking is permitted that screens not
be shuffled while a test is in progress. The uppermost screen
is the cause of the CHIP RAM bus display DMA effects, and to
shuffle to another screen during a test could nullify the
advantage of using this option.
Set Gfx Test Display Mode
AIBB allows all graphics tests to be run on any system
supported display mode, and this option allows the user to
select the display resolution and depth (number of colors)
to use when running such tests. Selection of this menu item
brings up a screen mode requester via the asl.library
requester functions. As versions of asl.library which support
the screen mode requester are required for this to function,
the host system must be running AmigaOS 2.1 or greater.
Once a particular screen mode is selected, any graphics
tests run will be done in that mode. This is particularly
useful for comparing the effects of differing resolutions and
display depths on graphics performance levels. One must be
careful to take note of the modes used for the other systems
as well, else improper conclusions as to how well a system
does in these tests could be drawn. For this reason, AIBB
will post a warning if the screen mode of either the host
system, or a comparison system does not match the modes in use
on the other machines. If simple, fair and straightforward
checks are desired, all systems should be compared using
the same screen mode.
View Comparison System Gfx Modes
As AIBB does allow differing screen modes to be used for
graphics tests, through this function it also allows browsing
though the various modes in use on the host/comparison systems.
Selection of this item brings up an interactive requester
which allows movement through various systems, and comparison
of the various display parameters in each.
Set Comparison Base
This item contains the names of the comparison systems in
a submenu area. Selecting one of these submenu items sets
the current comparison base system to that machine. The
comparison base is the system utilized as the 'base' value for
test results when computing performance ratings. All
percentages shown are given as percentages of the base system,
with a 1.0 value for a system indicating a performance
equal to the base system.
Menu 4: Special
Enter/Exit Review Mode
Entering Review Mode gives a method for reviewing
previously performed tests and their comparisons. When this
mode is active, selecting a test gadget, or setting a
comparison option ( code type, etc ), will result in the
display of the results last obtained for that test. If no
test results for the host system are available, the
information for the comparison systems currently in use will
be shown, and the host system will data will be marked with a
"N/A" indicating the information is not available. The
ability to display the comparison system data without running
the actual test on the host system is provided to allow a
quick view of the performance of said comparison machines
before running the test(s) on the host.
Code type options may be manipulated here, and if a test
result is available for those settings, it will be displayed.
For example, if you were to have the Matrix test as the
current test you are viewing, and you want to see the results
of the test under 68020+ code, selecting that item under the
"This Machine" code type selection will show the Matrix test
results utilizing this code type ( if they were previously
performed, making the data available ).
Start/Stop Log File
AIBB has the ability to keep a "log file" of test
activities. This option allows you to start this logging
operation, or stop it once in progress. The log files contain
basic information, in text form, about each test as it is
performed, as well as essential system information.
Starting a log file involves selecting a file name to
which AIBB will save this data. If the file is an existing
one, AIBB will check for the words "AIBBLogFile" at the start
of the file. If this is not found, you will be warned and
given the option of aborting the use of this file as a log
file. Heed this...AIBB WILL write into any file if told it
is acceptable, including executable load files. This checking
is done in order to prevent accidental file damage or
destruction.
All Tests | Make Module
This is a rather important option. As indicated earlier,
AIBB has the ability to create a "load module" of comparison
results in order to utilize them later in other runs as a
comparison system. This selection allows the generation of
just such a load module. Selecting this menu item will result
in a requester being displayed which warns that this option
may take considerable time, and that multitasking will not be
functional during it's operation. At this point, the
operation may be cancelled if it is not desired at that time.
When performing all the tests, the options "Disable
Multitasking", and "Screen Overlay" previously mentioned are
automatically enabled in order to give consistancy to all
such generated modules which may be utilized in AIBB. Using
this option, all tests are performed in all possible code
combinations available on the host system configuration, in
order that later comparisons will have as much data to go by
as possible.
Upon completion of all the tests, a requester will be
displayed informing you if the tests completed successfully,
and asking if you wish to create such a load module at that
time. If you choose to do so, a file requester will appear
asking for the name of the file to save the module under.
Following this, a smaller requester will appear asking for
the name to use with the module under the graph display for
it. This defaults to the first 8 characters of the filename,
but may be changed as desired. Note that only names of up
to 8 characters are supported at this time.
If "Cancel" is selected in reference to the module
creation requester, AIBB will go back to it's normal
operations, and other tests may be performed. In this manner,
it is possible to use this option simply to perform all
possible test combinations for later review. If you wish to
review the tests done before making a module, this is
possible by not saving the module at the time, and entering
"Review Mode" upon finishing. If no further tests are
performed ( which would invalidate the consistancy of the
module's data ), then selecting "All Tests | Make Module"
again after reviewing the data will result in a requester
informing you that the data for a module is still valid and
will ask you if you wish to create one now.
It should be noted that comparison options and settings
are not in effect during the performance of the tests with
this option. AIBB will merely do all tests with all code
types possible, and keep the results ( if desired ).
Comparison options are only effective ( and necessary ) when
viewing the information present, and are not important when
generating a load module.
Once all module options are completed, AIBB will present
an analysis of the overall system performance with respect to
the various comparison modules currently in use. This analysis
consists of averages in Integer, Graphics, and Floating-Point
performance when put against each comparison machine in turn.
This average gives somewhat of an "all around" look at the
host system's performance levels.
Show Aggregate Results
Once a load module has been performed on the host system,
this item becomes available for selection. When activated, a
requester displaying combined totals for the host system in
terms of Grapics, Integer, and Floating Point performance will
be shown. These totals are given as figures against the
currently loaded comparison systems. Additional tests may
be run after the original load module creation to see any
effects may take place in different configurations (cache,
etc...). Rerunning tests under the same situations as the
module run uses will most likely not affect these figures
significantly.
C. System Information Display
The System Information Display is a seperate display which is brought
up when the Main Display gadgets for individual systems are selected.
This display gives various information about the state of the system
selected, and is also the location from which other load modules to enter
as comparison systems may be selected.
The display here is broken into several sections, giving modular
information areas pertaining to various system data. If the host system
is the system being viewed, the data represents the current state of the
host system. If a comparison system's information is being viewed, then
the data is representative of the system state when that machine's module
was created for further comparisons.
The upper portion of the display consists primarily of CPU/FPU/MMU
data and state information which is fairly self-explanatory. Other
information given in this section includes the display type in use, Agnus
and Denise custom chip revisions of the system, and several items of
particular interest:
System Stack Memory Location
The system stack ( or "Supervisor Stack" ) is the memory region
reserved for use by the processor while operating in what is known
in M680x0 terms as "Supervisor Mode". Supervisor mode is the CPU
mode of operation most often associated with operating system
use, and various system maintenance operations. Supervisor mode
is characterized primarily by the fact that it allows unhindered
access to certain CPU operations which are of primary interest only
to system-level operating system functions. User Mode is the
operational status in which almost all applications function, and
said CPU operations are considered "off limits" in this mode. This
is to protect the integrity of the system from runaway programs and
the like, and to more easily facilitate multiprocessor/multiuser
system environments. It is a characteristic of the M68000
microprocessor series and serves to allow a seperation between
operating system priviledges and user program priviledges.
The system stack is where much CPU state information is stored
during operating system activities, and thus it is important to
recognize it's location in memory. Depending on the memory type
where this stack is located, it may affect certain operation speeds,
and it's location is thus given here to allow this to be taken into
account when evaluating system performance. It should be noted
that although this is an important item of interest, it is
generally not going to have much effect on the greater majority of
AIBB's operational modes and testing.
AIBB Process Stack Memory Location
This item is probably of more interest than the System Stack
location. AIBB's process stack is a memory region which is
assigned to AIBB ( and any user program ) when it is invoked.
Certain program variables and data are stored on the stack during
operations, and thus it's location can affect performance levels.
This should be taken into account carefully, as some of the testing
AIBB does utilizes this stack for data, and thus results will be
affected if it is located in a slower memory medium than optimal
for the system configuration.
Operating System Version
This field identifies the operating system version in use on
the system in question. Certain versions may have different
features, and may affect certain of the test performance levels.
Operating System Location
On certain MMU equipped accelerated systems, or on such system
with special hardware setups, the operating system ROM image may
be relocated to a faster memory medium. ROM access times are
generally slower than that of RAM resources, and in the case of an
A500 or A2000 with an accelerator which is more at home with a
32-bit data bus than those systems' normal 16-bit 7.15 MHz bus,
it is extremely advantageous to move the operating system kernel
code to such a faster accessed memory region. Often times, this
relocation is done by using a system's MMU ( Memory Management
Unit ), which allows for address translation of memory "pages".
Translation occurs by mapping a certain memory region such that
accesses to it are rediverted to an alternate location in this
kind of setup. Programs such as Dave Haynie's SetCPU and the
CPU program which comes with AmigaOS 2.0 and above allow this type
of operation. AIBB is capable of determining the actual memory
location of the ROM code image by checking through the MMU
translation tables, and will report where the code resides.
Some accelerators allow for translation of the ROM image
without utilizing an MMU. Such units utilize a custom hardware
arrangement, and at this time AIBB cannot accurately determine the
memory location of the ROM image for these systems. In these cases, it
is recommended that such translations be noted for further
reference if comparisons are to be made against other systems
utilizing a module or log file results so that no confusion about
the system setup occurs.
The lower portion of the System Information Display contains provisions
for examining system memory node, expansion board, or pertinent sytem
library information. Three gadgets to the right of this area provide the
means to select the desired display. The list of nodes or boards can be
moved through using the 'Next' and 'Previous' gadgets located below
the selection gadgets, while the library information is static. The
information given for memory nodes is:
Memory Node Index
This is an index value corresponding to which node is currently
being viewed, and how many total nodes exist. This value
directly relates to the main screen's "Code Loc" and "Data Loc"
test information values and can be used to determine where AIBB's
test code and data is located.
Memory Node Name
This is simply the name of the given memory node.
Memory Node Address Range
The address range for the current memory node is displayed here
in a hexadecimal form. Both the starting address, and ending
address are given.
Memory Node Total Size
The total usable memory within the given node is displayed here.
Memory Node Priority
Memory on the Amiga is prioritized for allocation. This means
that memory of a higher priority is given precendence over other
memory regions when an allocation request is attempted. For
example, a memory region of priority 5 will be scanned first for
a suitable memory chunk for a given allocation request before
attempting other regions. If there is not enough memory in this
region, the next priority region is tried, and so on. The main
item of note otherwise is that this is true for GENERAL memory
requests. Memory requests which specifically ask for CHIP memory
will have the allocation attempted there, regardless of priority.
Memory Node Bus Port Width
This is the bus width of a given memory region. A 16 bit bus
corresponds to a data path width of 16 bits, 32 meaning a 32 bit
data path width, etc. For 68020+ systems, memory port widths of
32 bits will have the advantage over 16 bit ports for efficiency
reasons, as the 68020 and above have 32 bit data paths, whereas
the 68000/68010 have 16 bit data paths.
Memory Node Type
Whether the given node is FAST memory or CHIP memory is
displayed here.
Custom Chip Bandwidth
This will only be seen when examining a CHIP memory node, and
only under AmigaOS 3.0 or greater, and indicates the bandwidth
specified for CHIP memory on the system. Note that at present
the only differences here will be seen between AGA chipset
equipped systems and non-AGA equipped machines.
CPU/Memory Access Latency Index
This figure represents the latency between a memory cycle, and
when another cycle can be performed. Lower ratings indicate better
response times for a particular memory node, with the unattainable
goal of 0.0 indicating that no latency occured at all. Basically,
this gives information as to the relative efficiency of various
memory nodes (eg, one with a rating of 5.0 would be more efficient,
and hence faster than one with a rating of 7.0.). Note that this
can only be used as a valid comparison across different systems if
other factors such as processor type, clockspeed, and bus width are
also taken into account. This figure is most useful in comparing
two different memory regions on similar systems, such as two memory
boards on a 68030 based system against each other for relative
efficiency. Note that this figure will only be given for
FAST RAM memory regions.
When Expansion Board information is selected, information about the
system AutoConfig® boards will be shown. The given fields will be as
follows:
Board Index
The index value for this board, and the total number of
expansion boards for which information is available is shown here.
Board Address
This is the configuration address of the given board. For
memory boards, this will generally reflect the starting address of
the memory region it occupies.
Board Size
The total byte size reqirements for the board will be displayed
here. This shows the amount of memory this board will take up
when configured onto the system. Note that with memory boards,
this will generally reflect the size of the memory available on
the expansion board.
Board Manufacturer ID
Commodore-Amiga assigns all valid AutoConfig® board
manufacturers a unique ID code. This field contains the ID of the
manufacturer of the given board being shown.
Board Product ID
Manufacturers have the option of assigning a product ID to
their boards. This shows the product ID given to a particular
expansion board.
Board Type
At this time, this field will simply designate whether the
given board is a memory board, or some other type of peripheral
expansion.
Board Attributes
This field basically gives information as to whether the
expansion device is configured as a valid Zorro-II or Zorro-III
setup.
Ident
AIBB contains a number of expansion board identifications
internally, and will attempt to match the board found with one of
these in the lists. If no match is found, the statement "No
Information Available" will be given to indicate this. If you see
this message, and wish the board in question to be listed in
AIBB's lookup tables, please let me know by way of providing me
with the expansion board Product ID, Manufacturer ID, and
the identity of the device.
When Library node information is selected, information about pertinent
system libraries will be shown. Note that not all system libraries
currently in use are displayed. Only selected ones which are of interest
when determining performance factors are recorded. Currently these
are:
exec.library
graphics.library
intuition.library
layers.library
expansion.library
The information given is of the following form:
Library Name
This is simply the name of the library in question.
Library Version
This field gives the version and revision of the system
library being displayed. This may be important when looking at
performance statistics of tests which make use of system kernel
calls (such as graphics tests).
Library Base:
This is the base address of the library, and indicates where
in memory it is located. Again, this may be of interest when
examining the performance of tests which make use of system
kernel calls.
The System Information Display also includes a number of menu options
which are explained below:
Select Other:
A submenu attached to this item allows you to switch to viewing
another system's attributes from within this display.
Load New:
This is the option to utilize if you wish to load a comparison
module in place of the ones alread in use. The loaded module
will replace the currently displayed system's location in the
comparison systems. This option is not available when viewing
the host system's data. Subitems attached to this menu item
allow you to select the type of module to load. These are:
From File:
This should be selected if you wish to load a previously
saved module in file form. A requester will be displayed
asking for the file name to load. AIBB will attempt to load
the module, and if all data consistancy checks are valid, it
will place this data in the location of the previously
displayed system.
Under this option is a list of the internal default modules
AIBB contains. This allows the rearranging of the order of the
default systems as they appear on the graph in the Main Display,
and also allows a default system's values to be re-loaded if one
is superseded by a file-based module at an earlier time. Note that
the order of the system default modules is one of the items saved
in the AIBB.prefs file, so you may choose any ordering of the
internal startup default systems which suits you best.
Return to Main:
Returns you to the Main Display portion of AIBB.
AIBB's Default Comparison Systems
AIBB's internal default comparison systems were selected to give a
broad overview of a number of system configurations and hardware types.
These systems are:
A600-NF
An Amiga 600 system with no FAST RAM ( NF ) complement. This
is an all CHIP RAM based machine, and is provided here to give a
comparison towards systems utilizing only CHIP RAM. This is a
stock machine, with accelerator devices or other additional
enhancements. AmigaOS 2.x was the operating system used and was
located in ROM.
A1200-NF
Commodore's low-end AGA machine, the Amiga 1200, was used to
gather the data for this system. No FAST RAM was used in this
machine, and AmigaOS 3.0 ( V39.106 ) in ROM was the operating
system present
A3000-25
The comparison data here was obtained from a 25 MHz CPU rated
system, which utilizes the MC68030 CPU and MC68882 FPU as it's
processing engines, and equipped with static-column (BURST mode
capable) FAST RAM. AmigaOS 2.x was the operating system in use,
and was located in ROM on the system A3000's motherboard.
A4000-25
An Amiga 4000 utilizing a 25 MHz 68040 CPU (stock configuration)
was utilized to obtain comparison data. AmigaOS 3.0 was utilized
as the system OS ( V39.106 ) and was located and run out of ROM on
the motherboard.
It should be kept in mind that all parameters for each system should
be noted when making comparisons by checking the statistics located
on AIBB's System Information Display. Small items such as the system
stack location, cache settings, OS version and image location, etc...,
could play a part in any apparent discrepency. Making note of these is
important to fully understand the figures being provided.
One important aspect of performance regarding the Amiga which has
come more seriously to light is the question of display parameter
effects on test results. With the advent AGA, and the new
display modes it contains, a great deal more care must be taken when
making system comparisons because of the system bus bandwidth limiting
effects some modes may have. Please do make sure to note the display
mode used on the default A4000 contained here (AIBB will show if Mode
Promotion was in effect (DBLModes)) when comparing systems. Also, when
making modules or test result notes, it is wise to carefully monitor
what types of screens are currently in use and displayed when AIBB is
performing tests.
In all the systems above, all tests performed were done with AIBB's
test code and data located in the fastest memory medium located on each
system.
No third-party accelerated machines were included in the lineup as
this would leave an unfair advantage/disadvantage to any particular
manufacturer. Comparisons of that sort can still be carried out
utilizing AIBB's load-module capability to bring in data from such
systems for direct comparisons.
OVERVIEW OF INCLUDED TESTS
The tests AIBB incorporates are described below. The type of test,
and it's basic operations are given in the descriptions, as well as the
amount of memory each test may need to allocate external to AIBB itself.
The "standard" tests are as follows:
A. WritePixel
The WritePixel benchmark will open a screen/window combination
and fill it completely with a given color pattern. The work is
done one pixel at a time, utilizing the operating system routines
SetAPen() ( sets the current RastPort primary pen color ) and
WritePixel() ( which sets a pixel to the current primary pen
color).
The test is basically a benchmark of the time needed to call
these routines, and for them to execute. For the most part, this
it will be primarily useful for evaluating the effective ROM
image access time for systems which differ from the conventional
ROM access method found on the Amiga 500/600 and 2000, namely
accessing the ROM over those systems' normal 16 bit bus. As these
routines also result in many accesses to the CHIP RAM bus, it can
also give a hint as to the efficiency of a system's CHIP RAM bus
interface.
WritePixel reports its results in pixels per second drawn.
Please note that this is NOT the maximum pixel rate of any
particular system, as there are more efficient methods of doing
this kind of work. This is the effective pixel rate of the system
when the methods and routines used by this test are employed.
Memory Usage: No direct memory resources external to AIBB are
allocated. CHIP memory is utilized for the screen
and window.
B. Dhrystone
This test should be fairly familiar to most people, as it has
been utilized on many different system for benchmarking purposes.
It is a test which attempts to put conditions upon the system
which more closely simulates a possible applications program
section. It returns, not run-time in seconds, but rather a rating
of Dhrystones per second, where in this case, the larger number
indicates better performance.
Memory Usage: No memory resources external to AIBB are allocated.
C. Matrix
A matrix manipulation benchmark utilizing 3 50x50 integer
matrices. The test simply performs a series of matrix operations
(addition/subtraction, multiplication, transposition, etc) upon
these matrices. The test is set up in such a way that a great
amount of time is spent moving data, as well as performing
arithmetic operations upon it. Therefore, this could be thought
of as also testing memory manipulation efficiency. The test
is an indicator of how well a processor/memory combination handles
memory accesses to data and operations on such, as the test does
not allow the processor to simply perform the data operations
solely within it's registers.
Memory Usage: 30,000 ( 29.3K ) bytes external to AIBB are allocated.
D. MemTest
This test is memory-bound, as its name implies. In essence,
it is a memory block movement test, timing the efficiency of memory
accesses and transfers using longword (32 bit) sizes. It should be
noted that the Data Loc portion of the test result information
will supply the node location of the RAM being tested. Systems
with FAST RAM will show higher results, as the test will execute
quicker, and as can be expected, 32-bit ported FAST RAM will
perform better than its 16-bit ported counterpart. Note that this
test will use FAST RAM as a memory medium if available, and
will report its results in megabytes transferred per second.
Memory Usage: 32,768 ( 32K ) bytes external to AIBB are used.
E. Sieve
Another test which should be familiar to most, the Sieve of
Erathosthenes. It uses a fairly simple algorithm to determine
prime numbers within a range of numbers. This test simply times
your system when implementing this algorithm, which is decribed
fully in many textbooks, or one can simply look at BYTE Magazine's
benchmarks, which use a similar Sieve test.
Memory Usage: No memory resources external to AIBB are allocated.
F. Sort
A series of 30,000 16-bit integers is sorted from a pseudo-
random setup, and the procedure is timed. "Pseudo-random" meaning
that the number arrangement is not created in a random fashion, but
rather in a mixed fashion so that on each invocation of the test
the numbers will be created in the SAME mixed fashion. This is
because the sorting algorithm is sensitive to the mixing, and if
each time the test was run a different group of values was used,
no two tests results could be compared well. The mixing method I
used was to insure that the algorithm would be forced to do the
most work for each test.
Memory Usage: 60,000 ( 58.6K ) bytes external to AIBB are allocated.
G. IMath
Integer Math. This test performs a wide variety of integer
math functions. Included among these operations are the standard
functions, such as addition, subtraction, multiplication, division,
and a few additional bitwise functions, such as ANDing, ORing, and
XORing.
Memory Usage: No memory resources external to AIBB are allocated.
H. TGTest
Text/Graphics test. This test is another one which is
dependent upon the efficiency of the system graphics routines'
execution speed, as well as the efficiency of the CHIP RAM bus
interface on the system.
Memory Usage: No direct memory resources external to AIBB are
allocated. CHIP RAM is used indirectly for the
screen/window creation.
I. EmuTest
This test is basically a small CPU emulator core running an
instruction set simulation (basically a small program). The Amiga
seems to have gained a bit of a precedence in CPU emulation, and
this test was developed for the purpose of showing various systems'
ability to perform such emulation efficiently and speedily. The
simulated CPU is a standard 68000, though the results from this can
be taken as indicative of other CPU emulators as the basic principle
is the same. All instructions and internal operations are
completely software emulated. The results for this test are given
in Simulated MegaHertz, basically a rating showing how fast the
emulation is towards an equivalent hardware-based CPU.
Memory Usage: No memory resources external to AIBB are allocated.
J. InstTest
This test is not affected by the code settings given for any
system. It performs a series of the most common CPU instructions in
a 6K loop, and times their execution. It then does a percentage
average of the instruction makeup, and gives a result in
Instructions per Second. THIS IS NOT A STANDARD "MIPS" TEST! Most
tests using the "MIPS" scale are very simplistic and for the most
part are not very useful whatsoever. A standard "MIPS" scale test
will most likely give you numbers much larger than AIBB will. AIBB
attempts to make an even spread of 680x0 instruction execution, thus
showing a somewhat more even look at things. This test is basically
to determine the raw speed of code execution on any given system.
Memory Usage: No memory resources external to AIBB are allocated.
K. EllipseTest
This is a test of an applied graphics operation. The test
draws a series of filled, anti-aliased ellipses and times the
operation. Anti-aliasing is the technique of "blending" line
curves so as to soften their sharper edges.
Memory Usage: No direct memory resources external to AIBB are
allocated. CHIP memory is utilized for the screen
and window.
L. LineTest
A test of line-drawing primitives. LineTest opens a screen/
window combination and draws a series of lines throughout them.
The lines are drawn in horizontal, vertical, and diagonal fashion,
with emphasis being on the former two. This test reports its
results in terms of lines drawn per second.
Memory Usage: No direct memory resources external to AIBB are
allocated. CHIP memory is utilized for the screen
and window.
The floating-point specific tests implemented by AIBB are given
below. Note that these tests are also dependent on any standard code
type selections which may be made, as well as the type of floating-
point code utilized.
Tests are marked as to their usage of transcendental functions
( sin(), cos(), log(), etc... ) for record keeping and comparisons by
68040 users, who should see the appropriate notes in this documentation
concerning the built-in 68040 FPU and transcendental functions. The
rating scale used below for such usage corresponds to this table:
Level Meaning
-----------------------------------------------------------------------
NONE | No transcendental functions are used
LIGHT | 5-20% of calculations are transcendental in nature.
MODERATE | 21-50% of calculations are transcendental in nature.
HEAVY | Greater than 50% of calculations are transcendental.
-----------------------------------------------------------------------
M. FMath
Floating Point Math. Similar to the IMath test, with the
exeception that Floating Point values and operations are utilized.
With this test, no bitwise operations are performed. Single
precision floating point operations/values are used here.
Transcendental Usage: NONE.
Memory Usage: No memory resources external to AIBB are allocated.
N. Savage
This is another of the "probably familiar" tests. It is a
standard implementation of the Savage test, which makes nested
calls to transcendental functions to create a single value.
Double precision floating point operations/values are used.
Transcendental Usage: HEAVY; this test is almost exclusively
transcendental in nature.
Memory Usage: No memory resources external to AIBB are allocated.
O. FMatrix
The FMatrix test is similar in concept to the Integer Matrix
test outlined above. Again, a great deal of data movement is
performed, in addition to the operations involved, which are
floating point operations in this case. With the matrix
operations, the results under Floating Point coprocessor equipped
systems can be interesting to note, as the system is not able to
keep the data within fast-access FPU registers, and thus must make
frequent bus accesses for the data it needs. Double-precision
floating point math is used for this test.
Transcendental Usage: NONE.
Memory Usage: 38,400 ( 37.5K ) bytes external to AIBB are allocated.
P. Flops
A common rating of floating-point operations, the term
'Flops' denotes Floating point operations per second. This test
takes a composite of operations and reports its results in terms
of scalar MFlops, where 1 MFlop is one million floating point
operations per second.
Transcendental Usage: NONE.
Memory Usage: No memory resources external to AIBB are allocated.
Q. TranTest
This is a test which is solely transcendental in nature. A
series of transcendental functions are performed in a large loop,
and timed for speed of operation. This test will tend to show
the relative efficiency of a system in performing more complex
mathematical functions.
Transcendental Usage: HEAVY ( Completely transcendental ).
Memory Usage: No memory resources external to AIBB are allocated.
R. BeachBall:
The BeachBall test was originally written by Bruce Holloway of
Weitek, and published in the March 1988 issue of Byte Magazine.
It is essentially a very math-intensive operation which draws a
beachball on the screen, complete with shading. The test opens a
640x400 interlaced 16-color screen, and proceeds to render the
picture. This test is closer to a true "application" test, in that
it actually does something visible, and produces an output. The
system will end up being tested in both the floating point arena,
and in CHIP RAM access performance, which is done through standard
operating system graphics handling calls ( thus will be affected by
the speed of such, which in turn can be affected by ROM image
re-mapping, etc ).
Transcendental Usage: LIGHT.
Memory Usage: No direct memory resources external to AIBB are
allocated. CHIP RAM is used indirectly for the
screen creation.
S. FTrace:
Another applications-type test. FTrace implements a subset of
the calculating functions which are used to perform ray-tracing
operations. Ray-tracing is a particularly floating-point intensive
art, and this test gives some indication of a system's performance
in this type of operation. No visible result is produced, so in
that matter it is not an 'ideal' test, but it can be used to give
some indications in this arena.
Transcendental Usage: LIGHT; Calculations are performed in such
a way that transcendental usage is minimized.
Memory Usage: No memory resources external to AIBB are allocated.
T. CplxTest:
This test implements a series of complex-number operations and
times their execution. Complex number applications are important
in many of the sciences, and are particularly prevalent in such
areas as electrical engineering ( circuit analysis ) and vector
analysis to some degree ( not specifically "complex numbers" in
that case, but the operations are similar ). This test utilizes
a lot of quick, small memory moves, as well as performing a
variety of floating-point operations.
Transcendental Usage: LIGHT TO MODERATE.
Memory Usage: No memory resources external to AIBB are allocated.
NOTES AND SUMMARY
It has been indicated before, but it should again be emphasized that
no benchmark or even suite of benchmarks can hope to give a complete picture
of system performance alone. A full picture of the system resources, as
well as an understanding of just what the system in question is being used
for is necessary to make any type of evalution. AIBB is merely one small
tool which may be used to try to gather a sampling of data when making
a performance determination.
When performing tests, it is very important to keep track of just
where test code and data is being placed in the system by using the
information provided by AIBB, and by using other methods if need be. For
example, if you have a 512K CHIP RAM machine, and some SLOW-FAST RAM
( sometimes mistakenly thought of as true FAST RAM ), this could affect
test results in ways not expected. Keeping careful track of these
variables can help in determining just what is occuring in the system
during performance analysis.
Of some interest in terms of FPU performance is the MC68040's
built-in FPU unit. This FPU is a subset of Motorola's previous MC68881
and MC68882 coprocessors, and does not include all functions on-chip
which were supported by the previous FPUs. Most notably, the transcendental
function such as sine and cosine, etc... are not hardware supported.
Rather, the simpler functions such as floating-point multiplication,
addition, division, etc.. have been greatly optimized and enhanced. The
MC68040 FPU relies on software emulation of the complex functions, and
most accelerator vendors, as well as CBM itself, supply a function library
to emulate these routines in the form of software 'traps'. Since the
complex functions utilize the simpler functions to derive their actions,
in theory all functions should still execute faster than on previous
coprocessors. However, this may not be the case.
Trap functions such as those supplied in the aforementioned libraries
are routines executed when the coprocessor indicates an unsupported
function routine is being called. This is a form of 'exception' routine,
requireing CPU/FPU internal context saving, and other related actions.
This is because the CPU/FPU treats the function call as an error, and
calls the error routine appropriate to it. In this case, it will be
the math support library, which will execute the proper function and return
the value needed. Unfortunately, all this activity results in a
performance hit, resulting in timings which are longer than that of the
previous coprocessors which emulated these functions in their hardware.
All this might imply that the 68040 is crippled in this respect. However,
this is not the case. Applications written to take advantage of of
68040's FPU will function much faster, as they will emulate the required
complex functions in forms not requiring the trap functions. The trap
functions are there for programs which are using FPU code set up for the
MC68881 or MC68882, which are at this time the more common FPU units.
AIBB includes an option, specified earlier, for more efficient 68040
FPU code. This code emulates the transcendental functions an other
functions unsupported by the 68040 within AIBB itself. This will alleviate
the overhead involved with trap-based emulation methods if selected.
CREDITS AND ACKNOWLEDGEMENTS
As with all large projects, nothing is accomplished entirely by one
person. I have many people to thank for their assistance in the
development of AIBB. A few of the more influential people who have
contributed greatly to this effort are:
Kimberly Polglase
For putting up with me throughout this ordeal :)
Redmond Simonsen
One heck of a nice guy and thought provoking fellow. His help
with interface ideas was very much appreciated, and are still
instrumental in any upcoming future versions of AIBB.
Dr. J. Scott Thayer
Sysop of AmigaFriends BBS, and a dedicated beta tester
extraordinaire. His comments and testing data were key to much
of what was done with this program over the course of it's
development.
Mathew Rouch
A good friend of mine, and a computer science student at
present. His help in several algorithmic coding problems allowed
me to solve some difficulties which would have taken a great deal
longer to overcome than they did.
Unfortunately, I cannot list everyone who has been of assistance with
this project, but to all of them, listed and unlisted, I wish to express
my deepest thanks and appreciation.
Comments and suggestions about this program are always welcomed, as I
hope to be able to continue its development. Please feel free to make
any suggestion you see fit, but do try to be constructive in any
criticism so that I may improve AIBB. Bug reports are certainly wanted,
and I will do my best to locate and correct such problems.
I can be reached electronically many ways, but the following are probably
the easiest methods for those with internet access:
lkoop@tigger.stcloud.msus.edu ( GP Acct )
f00012@kanga.stcloud.msus.edu ( Engineering Acct )
( Pick your paths :) )
I can also be found on BIX as "lkoop", and can be reached there easily
as well. For those wishing to correspond by mail, comments may be sent to:
LaMonte Koop
1001 Summit Ave. North #125
Sauk Rapids, MN 56379
As for me, well, I'm an Electrical/Computer Engineering student
( currently just a wee bit from done ) with an added major in Physics,
and an emphasis in systems architecture design. AIBB was originally
started as a bit of a hobby, and as time went on became a long-standing
project. This particular version is almost a year in the making, and I
do intend to continue enhancing the package as long as interest remains in
it. Enjoy the program; I hope you find it useful, and that it serves
whatever purpose you may need of it.