# **GD** INF2270 – Spring 2010

**Philipp Häfliger** 

#### ecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



UNIVERSITETET I OSLO

#### content

From Scalar to Superscalar

Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder Multiplexer/Demultiplexer Adders Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





◆□▶ ◆昼▶ ◆至▶ ◆至▶ ~ 巨 ∽ のへの

#### content

From Scalar to Superscalar

Lecture Summary and Brief Repetitio Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder Multiplexer/Demultiplexer Adders Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



▲□▶ ▲@▶ ▲ \\ \ \ \ \

#### **Scalar Processors**

The concept of a CPU that we have discussed so far where all scalar processors, in as far as they do not execute operations in parallel and produce only a single result data item at a time.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





< □ ▶ < □ ▶ < 三 ▶ < 三 ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □ ▶ < □

#### **Vector processors**

High performance computing led to vector processors, most prominently the Cray-1 in 1976 that had 8 vector registers of 64 words of 64-bit length. Vector processors perform 'single instruction multiple datastream' (SIMD) computations, i.e. they execute the same operation on a vector instead of a scalar. Some machines used parallel ALU's but the Cray-1 used a dedicated pipelining architecture that would fetch a single instruction and then execute it efficiently, e.g. 64 times, saving 63 fetches.

Summary/Repetition (1/2)





# Multi processor

Vector computers lost popularity with the introduction of multi-processor computers such as Intels's Paragon series of *massively* parallel supercomputers: It was cheaper to combine multiple (standard) CPU's rather than designing powerfull vector processors, even considering a bigger communication overhead, e.g. in some architectures with a single shared memory/system bus the instructions and the data need to be fetched and written in sequence for each processor, making the von Neumann bottleneck more severe. Other designs, however, had local memory and/or Lectparallel memory access and many clever introduced



< □ ▶ < @ ▶ < ≧ ▶ < ≧ ▶ ≧ ∽ < <>

#### **Clusters/Grids**

But even cheaper and obtainable for the common user are Ethernet clusters of individual computers, or even computer grids connected over the internet. Both of these, obviously, suffer from massive communication overhead and espescially the latter are best used for so called 'embarassingly parallel problems', i.e. computation problems that do require no or minimal communication of the computation nodes.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





◀ㅂ▶◀鹵▶◀重▶◀重▶ 重 ∽੧<♡

#### Multi Core

Designing more complicated integrated circuits has become cheaper with progressing miniaturization, such that several processing units can now be accomodated on a single chip which has now become standard with AMD and Intel processors. These multi-core processors have many of the advantages of multi processor machines, but with much faster communication between the cores, thus, reducing communication overhead. (Although, it has to be said that they are most commonly used to run individual independent processes, and for the common user they do not compute parallel problems.)

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





▲□▶▲@▶▲콜▶▲콜▶ 볼 끼९╚

### **Superscalar Processor Principle**

*Superscalar* processors were introduced even before multi-core and all modern designs belong to this class. Like vector processeors with parallel ALUs, they are actually capable of executing instructions in parallel, but in contrast to vector computers, they are *different* instructions. Instead of replication of the basic functional units n-times in hardware (e.g. the ALU), superscalar processors exploit the fact that there already *are* multiple functional units. For example, many processors do sport both an ALU and a FPU. Thus, they should be able to execute an integer- and a floating-point operation simultaneously. Data access operations do not require the ALU nor the FPU (or have a dedicated ALU for address operations) and can thus also be executed at the same

ifi



#### Superscalar Processor

For this to work, several instructions have to be fetched in parallel, and then *dispatched*, either in parallel, if possible, or in sequence, if necessary. Some additional stages are needed in the pipelining structure, and the pipeline is divided for differnt types of instructions. Superscalar processors can ideally achive an average clock cycle per instruction (CPI) smaller than 1, and a speedup

higher than the number of pipelining stages k (which is saying the same thing in two different ways).

Compiler level support can group instructions to optimize the potential for parallel execution.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





10

◀ㅂ▶◀鹵▶◀重▶◀重▶ 重 ∽੧<♡

#### **Intel Core 2**

As an example: the Intel Core 2 microarchitecture has 14 pipeline stages and can execute up to 4-6 instructions in parallel.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





11

▲□▶ ▲圖▶ ▲ 壹▶ ▲ 壹▶ 壹 のへで



4□ ▶ 4 @ ▶ 4 = ▶ 4

# Some Elements in Superscalar Architectures (1/2)

Micro-instruction reorder buffer (ROB): Stores all instructions that await execution and dispatches them for *out-of-order execution* when appropriate. Note that, thus, the order of execution may be quite different from the order of your assembler code. Extra steps have to be taken to avoid and/or handle hazards caused by this reordering.

Retirement stage: The pipelining stage that takes care of finished instructions and makes the result appear consistent with the execution sequence Lecture 8: Superscalar CPUsthats was intended by the programmer.

# ifi



13

▲□▶▲□▶▲三▶▲三▶ 三 りへぐ

# Some Elements in Superscalar Architectures (2/2)

Reservation station registers: A single instruction reserves a set of these registers for all the data needed for its execution on its functional unit. Each functional unit has several slots in the reservation station. Once all the data becomes available and the functional unit is free, the instruction is executed.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



|□▶◀舂▶◀壹▶◀壹▶ 壹 ∽੧<♡

#### content

From Scalar to Superscalar

Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder Multiplexer/Demultiplexer Adders Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



▲□▶▲昼▶▲喜▶▲喜▶ 直 わえぐ

### **Lecture Content on Hardware**

A rough categorization of the content:

- Digital Logic (Boolean algebra, combinational and sequential logic ...)
- Architecture (Von Neumann, cache, virtual memory, I/O ...)
- Performance Optimization (pipelining, cacheing and virtual memory strategies ...)

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





#### content

From Scalar to Superscalar

#### Lecture Summary and Brief Repetition Binary numbers

Boolean Algebra Combinational Logic Circuit Encoder/Decoder Multiplexer/Demultiplexer Adders Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



▲□▶▲@▶▲들▶▲들▶ 들 ∽੧੧

#### **Binary Numbers**

#### unsigned int: '10010' corresponds to

 $1 \times 2^4 + 0 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 16 + 2 = 18$ 

int, two's complement: for n-bit integers

(unsigned int)  $[2^{(n-1)}, 2^n - 1]$ = (int)  $[-2^{(n-1)}, -1]$ 

(unsigned int)  $[0, 2^{(n-1)} - 1]$ (int)  $[0, 2^{(n-1)} - 1]$ 

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

UNIVERSITETE

18

▲□▶ ▲□▶ ▲ = ▶ ▲ = 
● ▲□▶ ▲ □▶ ▲ =

#### content

From Scalar to Superscalar

#### Lecture Summary and Brief Repetition

Binary numbers

#### Boolean Algebra

Combinational Logic Circuit Encoder/Decoder Multiplexer/Demultiplexer Adders Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○○○

#### **Boolean Function**

- A (Boolean) function assigns exactly one output (or one output vector) to every input vector.
- Boolean expressions are composed of the three basic Boolean algebraic operators, AND, OR, and NOT
- Boolean functions can be defined by
  - Boolean expressions
  - Truth tables
  - Logic gates schematics
- Functions are identical/equivalent if they produce the same output for every input. Note: different expressions/schematics can describe the *same* function. There is only one complete truth table,
  Lecture 8: Suphowever, for one function.

Summary/Repetition (1/2)

#### ▶ ◀♬▶ ◀重▶ ◀重▶ ≡ ∽੧<♡



#### **Boolean function Example**



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

 $F = x \vee y \wedge \bar{z}$ 





▲□▶▲□▶▲三▶▲三▶ ▲□▶

# **Rules governing equivalency**

| $a \lor b \land c = a \lor (b \land c)$                                                      |
|----------------------------------------------------------------------------------------------|
| a∨ā=1                                                                                        |
| a∨a=a                                                                                        |
| a∨0=a                                                                                        |
| a∨1=1                                                                                        |
| $a \lor b = b \lor a$                                                                        |
| $(a \lor b) \lor c = a \lor (b \lor c)$                                                      |
| $a \lor (b \land c) = (a \lor b) \land (a \lor c)$                                           |
| $\overline{\mathbf{a} \wedge \mathbf{b}} = \overline{\mathbf{a}} \vee \overline{\mathbf{b}}$ |
|                                                                                              |

(commutative) (associative) (distributive) (deMorgan)

> UNIVERSITETET I OSLO

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi





# Simplification

Since there are infinitely many equivalent Boolean expressions for the same function, it is often desireable to find a simple expression for a given function. In the lecture we looked at two methods:

- 1. Intuitive application of the algebraic rules
- 2. Karnaugh maps

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





#### **Example Karnaugh map**



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

#### UNIVERSITETET I OSLO

< □ ▶

#### content

From Scalar to Superscalar

#### Lecture Summary and Brief Repetition

Binary numbers Boolean Algebra

Combinational Logic Circuits Encoder/Decoder Multiplexer/Demultiplexer Adders

Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○○○

### Definition

#### Combinational Logic circuits are circuits implementing Boolean functions

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





#### Simple 3-bit Encoder Truth Table

| $I_7$ | $I_6$ | $I_5$ | $I_4$ | I <sub>3</sub> | I <sub>2</sub> | $I_1$ | $I_0$ | <i>O</i> <sub>2</sub> | $O_1$ | <i>O</i> <sub>0</sub> |
|-------|-------|-------|-------|----------------|----------------|-------|-------|-----------------------|-------|-----------------------|
| 0     | 0     | 0     | 0     | 0              | 0              | 0     | 1     | 0                     | 0     | 0                     |
| 0     | 0     | 0     | 0     | 0              | 0              | 1     | 0     | 0                     | 0     | 1                     |
| 0     | 0     | 0     | 0     | 0              | 1              | 0     | 0     | 0                     | 1     | 0                     |
| 0     | 0     | 0     | 0     | 1              | 0              | 0     | 0     | 0                     | 1     | 1                     |
| 0     | 0     | 0     | 1     | 0              | 0              | 0     | 0     | 1                     | 0     | 0                     |
| 0     | 0     | 1     | 0     | 0              | 0              | 0     | 0     | 1                     | 0     | 1                     |
| 0     | 1     | 0     | 0     | 0              | 0              | 0     | 0     | 1                     | 1     | 0                     |
| 1     | 0     | 0     | 0     | 0              | 0              | 0     | 0     | 1                     | 1     | 1                     |

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



▲□▶▲舂▶▲壹▶▲壹▶ 重 ∽੧<



### **3-bit Encoder Implementation Variant**



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





### **3-bit Priority Encoder Truth Table**

| $I_7$ | $I_6$ | $I_5$ | $I_4$ | $I_3$ | I <sub>2</sub> | $I_1$ | $I_0$ | <i>O</i> <sub>2</sub> | 01 | <i>O</i> <sub>0</sub> |
|-------|-------|-------|-------|-------|----------------|-------|-------|-----------------------|----|-----------------------|
| 0     | 0     | 0     | 0     | 0     | 0              | 0     | 1     | 0                     | 0  | 0                     |
| 0     | 0     | 0     | 0     | 0     | 0              | 1     | Х     | 0                     | 0  | 1                     |
| 0     | 0     | 0     | 0     | 0     | 1              | Х     | Х     | 0                     | 1  | 0                     |
| 0     | 0     | 0     | 0     | 1     | Х              | Х     | X     | 0                     | 1  | 1                     |
| 0     | 0     | 0     | 1     | Х     | X              | Х     | Х     | 1                     | 0  | 0                     |
| 0     | 0     | 1     | Х     | Х     | Х              | Х     | Х     | 1                     | 0  | 1                     |
| 0     | 1     | Х     | Х     | Х     | Х              | Х     | X     | 1                     | 1  | 0                     |
| 1     | Х     | Х     | Х     | Х     | Х              | Х     | Х     | 1                     | 1  | 1                     |

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



▲□▶▲圖▶▲≣▶▲≣▶ ▲ □▶



### **3-bit Decoder Truth Table**

| $I_2$ | $I_1$ | I <sub>0</sub> | 07 | $O_6$ | $O_5$ | $O_4$ | <i>O</i> <sub>3</sub> | 02 | $O_1$ | <i>O</i> <sub>0</sub> |
|-------|-------|----------------|----|-------|-------|-------|-----------------------|----|-------|-----------------------|
| 0     | 0     | 0              | 0  | 0     | 0     | 0     | 0                     | 0  | - 0   | 1                     |
| 0     | 0     | 1              | 0  | 0     | 0     | 0     | 0                     | 0  | 1     | 0                     |
| 0     | 1     | 0              | 0  | 0     | 0     | 0     | 0                     | 1  | 0     | 0                     |
| 0     | 1     | 1              | 0  | 0     | 0     | 0     | 1                     | 0  | 0     | 0                     |
| 1     | 0     | 0              | 0  | 0     | 0     | _1    | 0                     | 0  | 0     | 0                     |
| 1     | 0     | 1              | 0  | 0     | 1     | 0     | 0                     | 0  | 0     | 0                     |
| 1     | 1     | 0              | 0  | 1     | 0     | 0     | 0                     | 0  | 0     | 0                     |
| 1     | 1     | 1              | 1  | 0     | 0     | 0     | 0                     | 0  | 0     | 0                     |

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





◆□▶ ◆□▶ ▲ 三▶ ▲ 三 ● ● ●

### **3-bit Decoder Implementation Variant**



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



31

▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○ ○ ○ ○

### **3-bit Multiplexer Truth Table**

| <i>S</i> <sub>2</sub> | $S_1$ | <i>S</i> <sub>0</sub> | 0                     |
|-----------------------|-------|-----------------------|-----------------------|
| 0                     | 0     | 0                     | $I_0$                 |
| 0                     | 0     | 1                     | $I_1$                 |
| 0                     | 1     | 0                     | $I_2$                 |
| 0                     | 1     | 1                     | $I_3$                 |
| 1                     | 0     | 0                     | $I_4$                 |
| 1                     | 0     | 1                     | $I_5$                 |
| 1                     | 1     | 0                     | $I_6$                 |
| 1                     | 1     | 1                     | <i>I</i> <sub>7</sub> |

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



◆□▶ ◀ @ ▶ ◀ ! = ▶ ◀ ! = • ○ < ○</p>

#### **3-bit Multiplexer Implementation Variant**



# ĊF



▲□▶ ▲□▶ ▲ 글▶ ▲ 글▶ ▲ □ ● ○○○

# **3-bit Demultiplexer Truth Table**

| <i>S</i> <sub>2</sub> | $S_1$ | <i>S</i> <sub>0</sub> | <i>O</i> <sub>7</sub> | 06 | <i>O</i> <sub>5</sub> | <i>O</i> <sub>4</sub> | <i>O</i> <sub>3</sub> | <i>O</i> <sub>2</sub> | $O_1$ | <i>O</i> <sub>0</sub> |
|-----------------------|-------|-----------------------|-----------------------|----|-----------------------|-----------------------|-----------------------|-----------------------|-------|-----------------------|
| 0                     | 0     | 0                     | 0                     | 0  | 0                     | 0                     | 0                     | 0                     | 0     | I                     |
| 0                     | 0     | 1                     | 0                     | 0  | 0                     | 0                     | 0                     | 0                     | Ι     | 0                     |
| 0                     | 1     | 0                     | 0                     | 0  | 0                     | 0                     | 0                     | Ι                     | 0     | 0                     |
| 0                     | 1     | 1                     | 0                     | 0  | 0                     | 0                     | Ι                     | 0                     | 0     | 0                     |
| 1                     | 0     | 0                     | 0                     | 0  | 0                     | Ι                     | 0                     | 0                     | 0     | 0                     |
| 1                     | 0     | 1                     | 0                     | 0  | Ι                     | 0                     | 0                     | 0                     | 0     | 0                     |
| 1                     | 1     | 0                     | 0                     | Ι  | 0                     | 0                     | 0                     | 0                     | 0     | 0                     |
| 1                     | 1     | 1                     | Ι                     | 0  | 0                     | 0                     | 0                     | 0                     | 0     | 0                     |







### **3-bit Demultiplexer Implementation Variant**



# ifi



### Half Adder

Truth table for a 1-bit half adder:

| a | b | S | С |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



36

▲□▶▲□▶▲三▶▲三▶ ▲□▶

#### Full Adder (1/2)

A half adder cannot be cascaded to a binary addition of an arbitrary bit-length since there is no carry input. An extension of the circuit is needed.

#### Full Adder truth table:



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



◆□▶▲母▶▲喜▶▲喜▶ ■ ∽੧♡

### Full Adder (2/2)

#### Schematics:



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

# ifi



38

▲□▶▲□▶▲三▶▲三▶ ▲□▶

#### content

From Scalar to Superscalar

#### Lecture Summary and Brief Repetition

Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder Multiplexer/Demultiplexer Adders

Sequential Logic Circuits Counters Shift Registers

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

## ifi



▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○ ○ ○ ○

#### Definition

Sequential logic circuits are logic circuits implementing finite state machines, i.e. circuits composed of combinational logic and internal memory elements. One typical categorization of sequential logic circuits are Moore or Mealy machines.



Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





40

●▷▷ 르 ◀르▶◀르▶◀립▶◀□▶

### Synchronous and Asynchronous FSM

- Synchronous FSMs include an implicit positive transition of a global *clock* signal as transition condition for all state changes. Synchronous FSMs realized as sequential logic circuits use synchronous flip-flops as memory elements, e.g. D-flip-flops. They are generally simpler to implement and easier to verify and test. The clock frequency needs to be slow enough to allow the slowest combinational transition condition to be computed.
- Asynchronous FSMs change state at once if the explicit transition condition is met. They can be very fast but are much harder to design and verify.

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)



41

▲□▶ ▲□▶ ▲ = ▶ ▲ = ● ●

#### **Example: Synchronous Moore Machine**

Characteristic table:



### ÷



▲□▶ ▲□▶ ▲ 글▶ ▲ 글▶ 글 りへの

### **Example: Synchronous Moore Machine**

#### Characteristic table:

| car<br>EW  | car<br>NS | go<br>NS | go <sub>next</sub><br>NS |  |
|------------|-----------|----------|--------------------------|--|
| 0          | 0         | 0        | 0                        |  |
| 1          | 0         | 0        | 0                        |  |
| 0          | 1         | 0        | 1                        |  |
| 1          | 1         | 0        | 1                        |  |
| 0          | 0         | 1        | 1                        |  |
| 1          | 0         | 1        | 0                        |  |
| 0          | 1         | 1        | 1                        |  |
| ure 8. Sup |           |          | urse 0                   |  |

Schematics/circuit diagram:



Careful: Always also consider the conditions for a state to be *maintained*, which sometimes is not explicitly stated in the graph!

ERSITET

### **3-bit Counter State Transition Graph**

#### 

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○ ○ ○ ○

### **3-bit Counter Characteristic Table**

| present               |       |                       | in | next                  |       |                       |
|-----------------------|-------|-----------------------|----|-----------------------|-------|-----------------------|
| <i>S</i> <sub>2</sub> | $S_1$ | <i>S</i> <sub>0</sub> | NA | <i>S</i> <sub>2</sub> | $S_1$ | <i>S</i> <sub>0</sub> |
| 0                     | 0     | 0                     |    | 0                     | 0     | 1                     |
| 0                     | 0     | 1                     |    | 0                     | 1     | 0                     |
| 0                     | 1     | 0                     |    | 0                     | 1     | 1                     |
| 0                     | 1     | 1                     |    | 1                     | 0     | 0                     |
| 1                     | 0     | 0                     |    | 1                     | 0     | 1                     |
| 1                     | 0     | 1                     |    | 1                     | 1     | 0                     |
| 1                     | 1     | 0                     |    | 1                     | 1     | 1                     |
| Course                | 1     | 1                     |    | 0                     | 0     | 0                     |

Lecture 8: Superscalar CPUs Summary/Repetition (1/2)





▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○ ○ ○

### **Counter Element Characteristic Equation**

$$S_{n_{next}} = S_n \oplus \left(\bigwedge_{k=0}^{n-1} S_k\right)$$

In words: if all previous bits are  $1 \rightarrow \text{flip/toggle}$ .

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)





46

< □ ▶ < □ ▶ < 三 ▶ < 三 ▶ . 三 りへぐ

### 3 bit Synchronous Counter



## ifi



▲□▶▲□▶▲三▶▲三▶ ▲□▶

### **3 bit Ripple Counter**



## ifi



▲□▶▲舂▶▲≧▶▲≧▶ ≧ ∽੧<

### Shift Register State Transition Table

| control |    |    | next                  |                       |                       |  |
|---------|----|----|-----------------------|-----------------------|-----------------------|--|
| LD      | SE | LS | <i>O</i> <sub>2</sub> | $O_1$                 | <i>O</i> <sub>0</sub> |  |
| 1       | Х  | Х  | <i>I</i> <sub>2</sub> | $I_1$                 | $I_0$                 |  |
| 0       | 0  | Х  | <i>O</i> <sub>2</sub> | $O_1$                 | <i>O</i> <sub>0</sub> |  |
| 0       | 1  | 0  | RSin                  | <i>O</i> <sub>2</sub> | 01                    |  |
| 0       | 1  | 1  | $O_1$                 | <i>O</i> <sub>0</sub> | LSin                  |  |

Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

## ifi



▲□▶▲□▶▲三▶▲三▶ 三 のへで

#### **Shift Register Schematics**



## ίij

▲□▶ ▲□▶ ▲ 三▶ ▲ 三 ● ○ ○ ○

50

UNIVERSITETET I OSLO