Observations on threading

Mike Robinson

As soon as Win95 and the like came out, programmers said, "ohboy! Threads!"

Well, sometimes they are really really good. But sometimes multithreaded apps actually seem to run slower, and have more overhead, than their equivalent Win31 implementation.

TIP: MULTIPLE THREADS DON'T MULTIPLY THE CPU.

If you've got more than one CPU in there, terrific. Otherwise, having multiple "concurrent" threads won't make things faster because by definition only one of them is executing at a time. Furthermore, the more threads you have, the more overhead is being spent switching between them. If you have more than one or two "always busy" threads in your design, multithreading won't make the program faster. The opposite will be true!

TIP: USE THREADS TO OVERLAP COMPUTING WITH I/O.

In a sense that is what threads are -for.- A thread that does computing can ignore the need of the system to respond rapidly to I/O requests. It can ignore everything but itself, relying upon the operating system to pre-empt it to handle input/output. Threading allows a program to remain responsive to users and devices while maximizing use of the CPU as well. Threading also allows the computer to respond to many devices expediently, AS LONG AS the amount of computing-time required to service the device is minimal.

TIP: ALSO USE THREADS TO PRIORITIZE THINGS.

Threads can be given relative priorities so that less-urgent tasks can be completed in the background. This is a way of fair-sharing the CPU resource. Just remember that you have only one CPU, and some of those tasks are gonna "take it in the shorts." If your program's overall function relies upon the product of one of those low-priority tasks, then the program itself will appear erratic.

TIP: WATCH FOR CONTENTION.

Threads work great when they are independent. They start to boggle down (becoming no better than, and maybe worse than a message queue) when they have to synchronize around some shared resource. Locking and critical-section overhead is expensive.

TIP: DON'T FORGET THE USER INTERFACE IS A CONTENTION TOO!

One of the most hotly-contested resources in the system is the user interface. After all, Windows itself is its own set of high-priority tasks that are competing with yours. And if several threads are trying to update the same window at the same time, visual chaos results.

TIP: MANAGE THE WORKLOAD... THERE ARE TOOLS TO DO THAT.

Too many threads, and Windows starts flopping like a dying fish. It's called "thrashing" and it's not pretty. You can't simply create "one thread per user request" and hope that Windows sorts it all out. It won't. You have to manage the incoming workload so that it gets processed efficiently, production-line style, and so that you can monitor and control it like a machine.

It turns out, by the way, that this is a process that gets done so much (MVS and DOS/VS were doing it before you were born, maybe...) that you can buy "transaction monitors," and other pieces of software that can snap into your applications, to queue and regulate the flow of work through a multi-user or server-style application.

Remember also that "a process that can't really be completed in a concurrent fashion ... doesn't need separate threads, it needs a message queue."

TIP: REMEMBER THAT MEMORY IS VIRTUAL.

The virtual-memory mechanism survives by guessing what pieces of 'memory' need to be in RAM and what parts can be swapped out to disk. Threads complicate the situation if they are working on different areas of the virtual-memory space at the same time. They increase the "working-set size" of the program significantly. Remember always that memory is not really "free." Think of access to memory as being synonymous with access to a disk file, and build the application accordingly.

TIP: SERIALIZE EVERYTHING EXPLICITLY.

Each time any one of your threads accesses a shared resource, be it memory or a file or what-have-you, you must serialize that activity in some way. Critical sections, semaphores, and good ol' message-queues are good ways to do that. The pitfall is that you can test your application and not run into these timing bugs, but the first time your users or customers run it, they WILL!

X-(

TIP: DELEGATE RESPONSIBILITIES, NOT JUST TIME.

So you must serialize everything that is shared, but serializing interferes with concurrency which is, of course, the reason for having the multiple threads in the first place. This is why it is often a good idea to delegate specific responsibilities to the threads, and to use "loose coupling" (such as message queueing) between them. It's better to have well-defined units of work flowing in queues between a relatively small number of tasks with discrete, independent responsibilities... than to have a large number of supposedly-autonomous tasks locked up in a slugfest.

COM

"Joel Shepherd" <joelshep@ix.netcom.com.SPAM>

COM is an acronym for "Component Object Model". It's a standard that describes how class (or object) interfaces should work -- including issues such as memory management and multithreading -- and how applications can make use of components which follow the COM standard. The standard is langauge-independent and (at least as I understand it) hardware independent as well. The standard has been pushed primarily by Microsoft and IBM, but there is no technical reason why, for instance, a Sun Sparc20 running Solaris couldn't support COM as well.

If an object is COM-compliant, it's guaranteed to have several basic traits:

It has a "globally unique identifier" -- a.k.a. GUID or CLSID -- which is a 128-bit integer which is essentially guaranteed to be unique across all computers, on this planet at least. In order to use a COM compliant, all a client has to know is its GUID.
Once it has the GUID of an object, a client can call a standard COM API call (CoCreateInstance) to create an instance of the object.
The object implements at least one interface, called IUnknown. IUnknown has three methods: AddRef, Release and QueryInterface. AddRef and Release are called by clients to control the object's "reference count": simply a count of the number of times the object is being used by various clients. Once the reference count falls to zero, the object can assume it's safe to remove itself from memory.

The client can call QueryInterface to determine whether the object supports a specific interface (an interface is a group of properties and methods). If it does, QueryInterface returns a pointer to a table of pointers to the properties and methods implemented by the interface. The client can then beginning calling those methods, use pointers it obtains from the table.

OLE is an "evolving standard" (which you might read as "moving target") for implementing services (components, applications, etc.) using COM-compliant objects.

OLE automation is a particular OLE service which makes it possible for a client to control a component. Actually, it's a little more specific than that. A component which can be controlled through OLE automation has an interface called IDispatch. IDispatch includes methods for determining what methods an interface supports, the names, data types and so on of each method's parameters, and for retrieving a special "dispatch ID" for each method in the interface. If the client knows only the name of the method, it can determine everything it needs to know to call the method through IDispatch. It's time consuming, but it works. Better yet, the client can determine in advance (e.g., when it is compiled) the dispID and parameter info for each method it needs to call, and call the method directly through IDispatch's Invoke method at run time.

When you compile a VB program, it uses objects' IDispatch interface (and/or type libraries) to determine the dispID and parameter info for all your object method calls, and compiles that information right into your program. That's called "early-binding" and it results in the fastest possible calls you can make using OLE automation. You can also use "late-binding": in that case, VB has to use IDispatch at run-time to determine the dispID, etc., for a method given its name. That's what happens whenever you use an object variable declared "As Object". It's slow and somewhat unsafe (since VB can't do any compile-time checking of your parameters), but it's sometimes used for its flexibility.

Please email me and tell me if you liked this page.