The Sun4d architecture provides both a primary and a secondary cache. The primary data and instruction caches reside on the SuperSPARC CPU while the secondary cache is contained on the CPU board. Access to the main memory is done over the system XDBUS. Associated with the CPU and cache is the Bus Watcher and Cache Controller which ensure consistency between all of the CPUs in the system. Although support for caching greatly complicates the hardware design of the system, efficient use of the cache by the operating system will greatly improve the overall performance of the system.
A simple block diagram of the CPU/Memory Subsystem for the Sun4d architecture shows the "distance" between the CPU to the DRAM making up the main memory. Maximizing the frequency that accesses can be done from the primary (within the CPU) or secondary caches not only speeds the operation but also reduces the amount of traffic that takes place on the XDBUS. Reducing the number or scope of shared data objects lessens the number of cache invalidates which occur on the bus.
A number of techniques can be used to enhance the cache utilization of the system. For example, as a thread runs on a particular CPU, the main memory areas that it accesses are duplicated in that CPU's cache. At some point, this thread will go to sleep. When the scheduler awakens this thread, it can cause the thread to execute on the same CPU, thereby using any "state" which remains in the CPU's cache from the previous time of execution. This technique is known as processor affinity and is becoming more common in multiprocessor systems. Processor affinity cannot be taken too far, however, since the workload of the system must be balanced between multiple CPUs.
The existence of shared data objects within the kernel is a significant impediment to providing high cache utilization. Every time a thread writes to a shared kernel data object, any copies of this data object which resides on other CPU's caches must be invalidated to ensure coherency. As the number of CPUs in the system increase, this problem becomes much more significant. Extensive performance testing has identified a number of objects where this is a problem. Algorithmic changes have been made to Solaris to reduce the frequency of these invalidations and thereby improve cache utilization.