a Pipeline Story

The pipeline is one of those CPU fundamental features which enable several other very important speed-up schemes. Keep in mind that the pipeline architecture wasn’t introduced until 1989, so the previous CPUs had more “primitive” ways to deal with processing problems.

CPUs have to get instructions and data out of the memory (read cycle) and put data back (write cycle). For this, the CPU fetches an instruction in the first phase. Then, that instruction might require a chunk of data, or even more. That means a single instruction might take two or three read cycles to get the instruction and data into the microprocessor.

Here intervenes the address bus. The microprocessor outputs the address on the address bus, and then reads the instruction. If the instruction calls for data, one or more read cycles take place. But all the while, the microprocessor has to tranquilly sit and wait for the instruction and data to show up.
After the microprocessor gets all the pieces of the instruction and data, it goes to work. Some instructions may take a few steps and now it’s time for the memory to stop and wait for the CPU. A pretty laggy way to process data, I would say.

The introduction of systemic processing partially solved this dubious situation. In this case, the only time the CPU had to wait was for the first instruction and data, and the only time the memory bus was idle was after sending off the last instruction of the program.

As CPUs evolved, their instructions grew up very haphazardly. Some instructions were much longer than others and the long instruction could take multiple memory read cycles to fetch. And on top of that, the size and number of the data was variable. It was time for our nice systemic process to lose it and run amok.
Our generous guys from Intel came up with an ingenious idea that was immediately introduced in their 286 CPU. This was all about prefetching. From then on, there exists a little buffer memory between the memory bus and the CPU. This buffer memory is also known as a prefetch queue. The memory bus would deliver instructions to the prefetch queue and if the CPU got bogged down with a complex instruction, the next instructions would just stack up in the prefetch queue. When the CPU ran into a string of simple instructions, it would draw down the prefetch queue. Either way, both the memory bus and the CPU would run at full speed and not be held up by the other.

Intel thought that this architecture still couldn’t unleash the full power of a CPU and finally introduced the pipelining concept with the 486 CPU. Here’s how it works: take the systemic process and break it up into very small pieces so that each step has to do only very simple tasks. By making more steps in the pipeline, the tasks are extremely simple and even complex instructions can be broken down and executed as quickly as simple ones.

The pipeline features an interesting process that allows instructions to be predicted. The prediction is triggered when branches of complex instructions start to make their loopy way inside the CPU. In this way, the instruction address to follow will depend on the outcome of the execution of the current instructions.

If the pipeline predicts that the program flow will continue in a loop, it can fetch the next instruction in that loop. If the branch prediction turns out to be wrong, the pipeline has to be reset and part of the production line held up until the right instruction works its way down the pipeline. However, these predictions are right the majority of the time, and thus, a performance increase is realized almost every time complex branches of instructions are initiated by software programs.

No Comments Yet

No comments yet.

Comments RSS TrackBack Identifier URI

Leave a comment