|
|
 |
|
Tech Tip 60 - Microprocessor History (Part 3, Surfing the Pipeline)
Article by Roy Davis
Maybe I’m showing my long history in California, but when I hear the word pipeline I think of a long wave breaking over its front forming a long pipe. The ultimate hotdogging trick is to surf inside the pipeline. Well, microprocessors grew up in California too. Both Intel and AMD are located in Silicon Valley, and in their products “doin’ the pipeline” is gnarly too.
We are working our way through the history of microprocessors so we can understand what the latest new features are, what gives a performance boost, and what is just marketing hype. The pipeline is a fundamental feature of microprocessors and is the enabler for several other very important speed-up schemes. Let’s see how pipelines work.
|
1. The CPU to Memory Interface
The first thing to dig into is how your computer gets instructions and data out of the memory and puts data back. First, the CPU fetches an instruction. That instruction might require a chunk of data, or even two. That means a single instruction might take two or three read cycles to get the instruction and data into the microprocessor.
As mentioned, the microprocessor outputs the address on the address bus, and then reads the instruction. If the instruction calls for data, one or more read cycles take place. All the while, the microprocessor is sitting there waiting for the instruction and data to show up.
After the microprocessor gets all the pieces of the instruction and data, it goes to work. Some instructions may take a few steps, so the memory ends up waiting while the CPU works-lots of stop and go and “hurry up and wait.” Seems like a good way to slow things down, right?
|
2. Complicated Wiring
The main memory in your computer is made up of RAM (Random Access Memory) chips. The microprocessor outputs the address of the data on an address bus. This is a series of wires on the circuit board with one wire for each of the bits in the address. Even low-end microprocessors have 32 or more address lines, so you can see that buses are complex affairs. Then, there is the data bus with about the same number of wires. That’s 64 copper traces on the circuit board (the wires) between the CPU and the memory. Add to that a handful of control signals to be complete. A 64 bit microprocessor would have twice this number, about a hundred bus lines. It takes time to get all these bus lines moving. This is the biggest bottleneck to speeding up a computer. Everything has to work around the relatively slow speed of the instruction and data buses. |
3. Systemic Process
Though early microprocessors operated just like I outlined above, it was in the early days of mainframes that someone figured out a way to put everybody to work 100 percent of the time. Back in 1944, the Colossus Mark II was used by the British to break German codes. It introduced an innovation known as a systemic process; just like the systemic process between your mouth and the other end that is still busy processing your breakfast when you are eating dinner. The data went in one end of the Colossus, and before it came out, more data was put in. The only time the CPU had to wait was for the first instruction and data, and the only time the memory bus was idle was after sending off the last instruction of the program. |
4. Indigestion
| The systemic process works very well for early mainframe computers because they had very simple instructions that were, well, regular. The instructions were the same size and so was the data so each stage in the systemic process took the same about of time and the whole thing worked like a well-oiled assembly line.
Microprocessors started out as very simple devices without all this systemic process stuff, but then their instructions grew up very haphazardly. Some instructions were much longer than others and the long instruction could take multiple memory read cycles to fetch. Then, the size and number of the data varied. That made the evenly-paced systemic process break down. |
5. Prefetch to Get Ahead of the Game
| In many ways, the Intel 286 microprocessor was a break from the origins of a minimal CPU on a chip and led the way toward modern mainframe architectures. In 1982, the 286 introduced instruction prefetching to the PC. There is now a little buffer memory between the memory bus and the CPU. The memory bus would deliver instructions to the prefetch queue and if the CPU got bogged down with a complex instruction, the next instructions would just stack up in the prefetch queue. When the CPU ran into a string of simple instructions, it would draw down the prefetch queue. Either way both the memory bus and the CPU would run at full speed and not be held up by the other. |
6. Pipelining Breaks the Logjam

It wasn’t until 1989 when Intel brought out the 486 that PCs had a better way to deal with the memory bus bottleneck. The 486 had a pipeline. Can’t you just hear the surf guitars? Sorry-back to the microprocessors.
The concept was simple: take the systemic process and break it up into very small pieces so each step has to do only very simple tasks. By making more steps in the pipeline, the tasks are extremely simple and even complex instructions can be broken down and executed as quickly as simple ones. |
7. Branch Prediction
| As long as the program runs along in a linear fashion, incrementing the instruction address, the pipeline is hunky-dory. What happens when it runs into a branch instruction? The next instruction address will depend on the outcome of the execution of the instruction. How will the pipeline know which instruction to fetch next?
Most program branches are part of a loop, a section of code that repeats itself until a condition is met and then the program continues outside the loop. If a programmer takes the time to put in a loop, chances are it’s going to continue in the loop for several iterations. So, if the pipeline predicts that the program flow will continue in the loop, it can fetch the next instruction in the loop.
If the branch prediction turns out to be wrong, the pipeline has to be flushed and part of the production line held up until the right instruction works its way down the pipeline. Unlike the real world, these predictions are right the majority of the time, and thus a performance increase is realized. |
8. Super Scalar
| If you have a pipeline with ten steps, then you can have ten instructions in the pipeline at once. If each step takes one clock cycle to complete, then you have a scalar CPU. If you double the clock speed, you get twice as many instructions executed.
Some microprocessors have an amazing number of steps in their pipeline. Some Pentium 4s have 31 steps. By breaking each instruction down into such small steps, it becomes possible to operate on some pieces of the instruction in parallel. In fact, they have gotten so good at it that some of the pieces can actually be processed out of order. As the parallelism goes up, so does the speed.
This CPU usually has more than 31 instructions in the pipeline because some of them are taking parallel paths. That means more than one instruction per clock cycle is executed. That’s superscalar. Doubling the clock speed gets you more than twice the number of instructions executed. (This is not to be confused with actual “parallel processing,” which is the combining of two or more CPOS to execute a program). |
9. Multithreading
| Remember previously in Tech Tip 55 when we talked about multitasking where the computer appears to running more than one program at a time? What was really happening is that the operating system was time slicing, letting each program run for a short period and switching back and forth quickly so that it appears both are running simultaneously. That’s how the different applications share the CPU.

Taking the concept of multitasking and turning it inside out, what if we work on different parts of the same program at the same time? Taking the concept of multitasking (dividing time between multiple programs) and turing it inside out, what if we work on different parts of the SAME program at the same time? It's like completing a single 8-year medical school degree in one year by having 8 different people working on the same degree at once.
What if we put in another pipeline? Doing things in parallel speeds things up, right? What’s really happening is the instructions are getting interleaved in the pipeline. The advantage is that while one pipeline is waiting for a memory fetch or something that holds up execution, the other pipeline can take advantage of the time.
The problem is sorting out the bits of the program that can be run in parallel. It can be done, and as microprocessors get more complex, they can keep track of various threads of the program and put the results back together at a merge point. While complicated, it can and is done all the time in Pentium-class microprocessors. |
10. Hyperthreading
 Finally, we are getting to the ultimate in CPU speedup, superthreading, or as Intel calls it, HyperThreading, or HT. Of course, AMD had to match the acronym, so they call it HyperTransport technology. It’s all over their promotional literature. So, how does it work?
If you have a single thread running, much of the CPU execution hardware is sitting idle because it’s prepared for all sorts of parallel processing of those big instructions. While executing a simple instruction, all the spare stuff is wasted. Running traditional multithreading doesn’t help this situation because the two threads are interleaved. At any step in the pipeline, only one thread is really executing.
By allowing two threads to intermix at each step, they can take advantage of slack hardware and get more done by using the facilities of the CPU more efficiently. The trick is to keep track of which thread is which through the pipeline even though the two are mixed. |
Final Words
| So, there you have it. Your program is sliced and diced and even stirred up as it makes its way through your microprocessor. Techniques like breaking the code up into threads that can be run separately then put back together works well to speed things up. The challenge is for the logic of the microprocessor to keep track of which is which at all times and even to sort out instructions that get out of order as they wind through your CPU.
The latest microprocessors from Intel and AMD have amazingly complex systems to work on your program code and process it faster than ever before. No matter if it’s the latest digital photo or video editing, or the newest 3-D realistic gaming you are doing, a new microprocessor can boost the performance beyond what you are used to. You can be the Big Kahuna on your beach with a single- chip microcomputer on your desk. |
Did you find this article helpful?
Do you have a question or comment?
|
|
|
|
|