Techical Tips
- Observations on threading
- COM
Mike Robinson
As soon as Win95 and the like came out, programmers said, "ohboy! Threads!"
Well, sometimes they are really really good. But sometimes multithreaded apps
actually seem to run slower, and have more overhead, than their equivalent
Win31 implementation.
TIP: MULTIPLE THREADS DON'T MULTIPLY THE CPU.
If you've got more than one CPU in there, terrific. Otherwise, having
multiple "concurrent" threads won't make things faster because by definition
only one of them is executing at a time. Furthermore, the more threads you
have, the more overhead is being spent switching between them. If you have
more than one or two "always busy" threads in your design, multithreading
won't make the program faster. The opposite will be true!
TIP: USE THREADS TO OVERLAP COMPUTING WITH I/O.
In a sense that is what threads are -for.- A thread that does computing can
ignore the need of the system to respond rapidly to I/O requests. It can
ignore everything but itself, relying upon the operating system to pre-empt it
to handle input/output. Threading allows a program to remain responsive to
users and devices while maximizing use of the CPU as well. Threading also
allows the computer to respond to many devices expediently, AS LONG AS the
amount of computing-time required to service the device is minimal.
TIP: ALSO USE THREADS TO PRIORITIZE THINGS.
Threads can be given relative priorities so that less-urgent tasks can be
completed in the background. This is a way of fair-sharing the CPU resource.
Just remember that you have only one CPU, and some of those tasks are gonna
"take it in the shorts." If your program's overall function relies upon the
product of one of those low-priority tasks, then the program itself will
appear erratic.
TIP: WATCH FOR CONTENTION.
Threads work great when they are independent. They start to boggle down
(becoming no better than, and maybe worse than a message queue) when they have
to synchronize around some shared resource. Locking and critical-section
overhead is expensive.
TIP: DON'T FORGET THE USER INTERFACE IS A CONTENTION TOO!
One of the most hotly-contested resources in the system is the user interface.
After all, Windows itself is its own set of high-priority tasks that are
competing with yours. And if several threads are trying to update the same
window at the same time, visual chaos results.
TIP: MANAGE THE WORKLOAD... THERE ARE TOOLS TO DO THAT.
Too many threads, and Windows starts flopping like a dying fish. It's called
"thrashing" and it's not pretty. You can't simply create "one thread per user
request" and hope that Windows sorts it all out. It won't. You have to
manage the incoming workload so that it gets processed efficiently,
production-line style, and so that you can monitor and control it like a
machine.
It turns out, by the way, that this is a process that gets done so
much (MVS and DOS/VS were doing it before you were born, maybe...) that you
can buy "transaction monitors," and other pieces of software that can snap
into your applications, to queue and regulate the flow of work through a
multi-user or server-style application.
Remember also that "a process that can't really be completed in a concurrent
fashion ... doesn't need separate threads, it needs a message queue."
TIP: REMEMBER THAT MEMORY IS VIRTUAL.
The virtual-memory mechanism survives by guessing what pieces of 'memory' need
to be in RAM and what parts can be swapped out to disk. Threads complicate
the situation if they are working on different areas of the virtual-memory
space at the same time. They increase the "working-set size" of the program
significantly. Remember always that memory is not really "free." Think of
access to memory as being synonymous with access to a disk file, and build the
application accordingly.
TIP: SERIALIZE EVERYTHING EXPLICITLY.
Each time any one of your threads accesses a shared resource, be it memory or
a file or what-have-you, you must serialize that activity in some way.
Critical sections, semaphores, and good ol' message-queues are good ways to do
that. The pitfall is that you can test your application and not run into
these timing bugs, but the first time your users or customers run it, they
WILL! X-(
TIP: DELEGATE RESPONSIBILITIES, NOT JUST TIME.
So you must serialize everything that is shared, but serializing interferes
with concurrency which is, of course, the reason for having the multiple
threads in the first place. This is why it is often a good idea to delegate
specific responsibilities to the threads, and to use "loose coupling" (such as
message queueing) between them. It's better to have well-defined units of
work flowing in queues between a relatively small number of tasks with
discrete, independent responsibilities... than to have a large number of
supposedly-autonomous tasks locked up in a slugfest.
"Joel Shepherd" <joelshep@ix.netcom.com.SPAM>
COM is an acronym for "Component Object Model". It's a standard that
describes how class (or object) interfaces should work -- including issues
such as memory management and multithreading -- and how applications can
make use of components which follow the COM standard. The standard is
langauge-independent and (at least as I understand it) hardware independent
as well. The standard has been pushed primarily by Microsoft and IBM, but
there is no technical reason why, for instance, a Sun Sparc20 running
Solaris couldn't support COM as well.
If an object is COM-compliant, it's guaranteed to have several basic traits:
- It has a "globally unique identifier" -- a.k.a. GUID or CLSID -- which
is a 128-bit integer which is essentially guaranteed to be unique across all
computers, on this planet at least. In order to use a COM compliant, all a
client has to know is its GUID.
- Once it has the GUID of an object, a client can call a standard COM API
call (CoCreateInstance) to create an instance of the object.
- The object implements at least one interface, called IUnknown. IUnknown
has three methods: AddRef, Release and QueryInterface. AddRef and Release
are called by clients to control the object's "reference count": simply a
count of the number of times the object is being used by various clients.
Once the reference count falls to zero, the object can assume it's safe to
remove itself from memory.
The client can call QueryInterface to determine whether the object supports
a specific interface (an interface is a group of properties and methods). If
it does, QueryInterface returns a pointer to a table of pointers to the
properties and methods implemented by the interface. The client can then
beginning calling those methods, use pointers it obtains from the table.
OLE is an "evolving standard" (which you might read as "moving target") for
implementing services (components, applications, etc.) using COM-compliant
objects.
OLE automation is a particular OLE service which makes it possible for a
client to control a component. Actually, it's a little more specific than
that. A component which can be controlled through OLE automation has an
interface called IDispatch. IDispatch includes methods for determining what
methods an interface supports, the names, data types and so on of each
method's parameters, and for retrieving a special "dispatch ID" for each
method in the interface. If the client knows only the name of the method, it
can determine everything it needs to know to call the method through
IDispatch. It's time consuming, but it works. Better yet, the client can
determine in advance (e.g., when it is compiled) the dispID and parameter
info for each method it needs to call, and call the method directly through
IDispatch's Invoke method at run time.
When you compile a VB program, it uses objects' IDispatch interface (and/or
type libraries) to determine the dispID and parameter info for all your
object method calls, and compiles that information right into your program.
That's called "early-binding" and it results in the fastest possible calls
you can make using OLE automation. You can also use "late-binding": in that
case, VB has to use IDispatch at run-time to determine the dispID, etc., for
a method given its name. That's what happens whenever you use an object
variable declared "As Object". It's slow and somewhat unsafe (since VB can't
do any compile-time checking of your parameters), but it's sometimes used
for its flexibility.
Please email me and tell me if you liked this page.