The present-day CPUs can usually house more than 31 instructions in the pipeline because some of them are taking parallel paths. That means more than one instruction per clock cycle is executed and you have a super scalar CPU. Doubling the clock speed gets you more than twice the number of instructions executed. Just combine the speed of 3 or 4 Ghz with the capacity of the pipeline and, there you have it, billions of instructions running around. (This should not to be confused with actual “parallel processing,” which is the combining of two or more CPUs to execute a program).
Multithreading is all about improved pipeline action. How do we improve the pipeline capacity? Simple, just add another pipeline. Doing things in parallel speeds them up pretty radically. What’s really happening is the instructions are getting interleaved in the pipelines. While one pipeline is waiting for a memory fetch or something that holds up execution, the other pipeline can take advantage of the time and thus CPU latency times diminish enormously. There is a little problem though. How does the CPU sort out the bits of the program that can be run in parallel? The actual CPUs have become so complex that they can keep track of various threads of the program and put the results back together at a merge point.
Now, imagine you have two or more CPUs working on multiple threads. Is that a super boost for data processing or what? Well, yes, but if you don’t have the support of specialized software, you can’t really appreciate it. In fact, you don’t even have to imagine this. You have this technology in AMD’s Athlon X2 CPUs or Intel’s Core 2 ones right now. The inclusion of more than one interconnected CPU cores in one silicon chip is the present trend and the future will see more and more CPU cores in a single chip.