

## Final - 19-20

- ① (a). (20-21) 1 (b) note part (mooR's Law) and that 8080's clock is
- (b). note part (mooR's Law) and that 8080's clock is
- (c). (20-21) 1 (c)
- ② (a) (20-21) 2 (a) 16 bit address bus?
- (b). note 8080

$2^{16} = 65,536 \text{ bytes} = 64 \text{ KB}$

- (c). Asynchronous read/write.

- ③ (a). (20-21) 3 (a)
- (b). (20-21) 3 (b)
- ④ (a). (20-21) 4 (a)
- (b). (20-21) 4 (b)
- (c). (20-21) 4 (c)
- ⑤ (a). Explain the performance of a processor

- (a). (20-21) 1 (a)
- (b). (20-21) 5 (b)
- (c). (20-21) 5 (c)

- ⑥ (a). (20-21) 6 (a)
- (b). (20-21) 6 (b)
- (c). (20-21) 6 (c)
- ⑦ (a). By showing the major components of a CPU, briefly describe how a CPU works.



Major component of CPU:

• Arithmetic Logic Unit (ALU):  
performs all arithmetic and logical

### • Control Unit (CU):

Controls and co-ordinates all operations. It fetches instructions, decodes them, and tells other components what to do.

### • Registers:

Small, very fast memory inside the CPU. They temporarily store data, instructions and addresses.

### • Buses:

Paths that transfer data and signals.  
(Data, Address, Control).

## How a CPU Works:

A CPU works in a cycle called the fetch-Decode-Execute cycle:

### i) Fetch:

- the program counter (PC) gives address of the next instruction.
- the instruction is fetched from memory and stored in the instruction register (IR).

### ii) Decode:

The control unit (CU) interprets the fetched instruction and decides what actions are needed.

### iii) Execute:

- the ALU performs the required operation
- OR the CU signals other components to move data, access memory, or jump to a new instruction.

### iv) Store/Write-back:

The result is stored in a register or sent to memory.

⑥ distinguish between RISC and CISC processor.

RISC

- small and simple instruction set.
- fixed length instruction.
- usually 1 cycle per instruction.
- more general purpose register.
- simple and limited addressing modes.
- larger code size.
- simple hardware, easier pipelining.

CISC

- large and complex instruction set.
- variable length instruction.
- ↳ multiple cycles per instruction.
- fewer registers.
- ↳ complex and many addressing modes.
- smaller code size.
- complex hardware, harder pipelining.

Ex: ARM, RISC-V, MIPS, x86, Intel 80386.

⑦ How stack is organized in CPU? with proper block diagram, describe following types of stack

- Register and
- Memory stack

A stack is a region used to store temporary information such as return addresses, parameters, local variables and saved registers. It follows LIFO → Last In First Out.



## Stack pointer (SP).

- A special register in the CPU
- Always points to the top of the stack
- changes whenever data is pushed or popped

~ push:

~ pop

Memory stack

Register stacks

- Stack is stored in main memory (RAM)
- Controlled by the stack pointer (SP)
- and sometimes a stack base (SB)
- Larger size compare to Register stack
- Slower than register stack because memory access is slower



## How it works:

push: SP is decremented, data is written to that memory location.

pop: Data is read from SP, SP is incremented  
- used in function calls, interrupt handling, and context switching.

## Register Stack:

- The Stack is implemented using a set of CPU registers.
- Very fast because registers are inside the CPU.
- limited in size.
- When the stack becomes full, a stack overflow occurs.



push: Sp moves to next register & store data.

pop: Sp " " previous n & retrieves n  
no need to access memory → very fast

⑧ (a) Describe how an assembler works with its two phases.

An assembler is a program that converts assembly language into machine code.

Assemblers generally work in two phases:



first pass (pass-1)

Purposes to scan the program and create symbol information.

### operations:

- Scans the entire source program once
- Builds the symbol table
- calculate memory addresses
- identifies directives
- Does not generate machine code



### 2nd pass:

#### Purpose:

to generate the final machine code using the information collected in pass-1.

#### Operations:

- read the increment file created in pass-1
- replaces symbolic operands with actual address from symbol table.
- generates machine opcodes
- handles literals, constants, and directives
- produces final object code

8(b) Distinguish between 0-address, 2-addresses 3-address instruction with the example.

3-address: 3 operands  
2 sources in program structure  
One destination

|             |         |         |
|-------------|---------|---------|
| Destination | Source1 | Source2 |
|-------------|---------|---------|

$$a = b + c$$

ADD a, b, c  $\rightarrow a = b + c$

- first
- clear
- uses more bits

2-address: 2 - two operands.

one operand is both source & destination

|          |          |
|----------|----------|
| operand1 | operand2 |
|----------|----------|

Mov a, b ; a = b

Source and destination  
may be same

ADD a, c ; a = a + c

0-address: used stack-based architecture.

All operands are taken from stack top.

push b

push c

ADD (b+c)

pop a

Stack based

operations

(b) write short note: i) op-code ii) machine language iii) vector processor

OP-codes: An op-code is the part of an instruction that specifies the operation the CPU must perform. It tells the processor what to do, (ADD, Sub)

the op-code is essential for defining the action of the instruction.

## ii) Machine language:

lowest-level programming language, consisting only of binary code (0 or 1). It is directly understood by the CPU without any translation.

## iii) Vector processor:

Vector processor is a type of CPU designed to perform operations on entire arrays of data in a single instruction. It uses vector registers and performs vector instructions.

final 18-19

① (a) 20-21 (2(b))

(b) Moore's law

(c) Computer Architecture

### Architecture

- Architecture describes what the computer does.
- deals with functional behaviour.
- deals with high-level design issues.
- indicate hardware
- Architecture fixed first low level

• (a) research world

• & organization

### Organization

- The organization describes how it does it.
- deals with structural relationship.
- low-level design issues.
- indicate hardware's performance
- decided after architecture
- high level

20-21 5(b)

### (b). Mismatch between processor & Main Memory

Reason of mismatch:

— CPU speed increase every generation

— Memory speed " Very slowly

— Memory bottleneck — CPU waits for memory!

Solve processor - memory mismatch:

① Cache memory

② Increase memory bandwidth.

③ Pipelined memory / SDRAM (next request accepted before 1st complete)

④ Multiple-level caches (reduce average access time)

⑤ Pre-fetching (fetch before the request)

⑥ Write buffer & Victim cache (Hold data temporarily when CPU writes)

⑦ Virtual memory + page replacement optimization

⑧ Block transfer (DMA).

### (c). Fetch cycle vs instruction cycles

Instruction cycle = fetch + execution

Instruction cycle



fetch: the instruction fetches the next instruction from memory



execute:

CPU decodes the fetched instruction and performing the required operation.

3(a): 20-21 2(c)

(b). traditional vs high performance bus  
(c). Asynchronous bus operations timing.



④ (a) Discuss about the memory hierarchy and typical cache organization.

Memory hierarchy: In the computer system design memory hierarchy is an enhancement to organize the memory such that it can minimize the access time.



As one goes down the hierarchy, the following occur:

- Decreasing cost per bit
- Increasing capacity
- Increasing access time
- Decreasing frequency of access of the memory by the processor. the smaller more expensive, faster memory are supplemented by larger, cheaper, slower memories.

### Cache organization:



In this cache organization, the cache connects to the processor via data, control and address line also attached to data, and address buffer which attached to a system bus from which main memory is reached. When a cache hit occurs the data and address buffers are disabled and communication is only between cache and processor. When a cache miss occurs the desired address is loaded onto the system bus and the data are returned through the data buffer to both the cache and the processor.

## elements of cache designs:

Cache address: (logical, physical)

Cache size

Mapping function (Direct, associative, set associative)

Replacement Algorithm (LRU, FIFO, LFU)

Write policy (write through, write back)

Line size

Number of caches:

⑥ mapping direct, associative, set associative

⑦ SDRAM

⑧ (a). Write notes on associative mapping function related to cache memory.

associative mapping overcomes the disadvantage of direct mapping by permitting each main memory block to be loaded into any line of the cache.

In this mapping memory address is interpreted as tag and word. tag uniquely identifies block of memory. Every line's tag is examined for a match. Cache searching gets expensive.



address length =  $(S+t+w)$  bits

number of addressable unit =  $2^{S+w}$  words or bytes

block size = line size =  $2^w$  words or bytes

number of blocks in main mem =  $\frac{2^{S+w}}{2^w} = 2^S$

number of line in cache = undetermined

Size of tag =  $S$  bits

## Address structures

Tag 22 bit Word 2 bit

22 bit tag is stored with each 32 bit block of data.

- compare tag field with tag entry in cache to check for hit.
- least significant 2 bits of address identify which 16 bit word is required from 32 bit data block.

## ⑤ Hardware addition and subtraction



## ⑥ (a) Write back

Write-back is a cache write policy in which data is written to the cache only and main memory is updated later only when the block is replaced.

How write back works:

- ① When the processor writes data
  - only the cache block is updated.
  - A dirty bit is set to indicate the block has been modified.
- ② The modified block is written to main memory only when it is removed from the cache.

### adv:

- lower memory traffic  $\rightarrow$  faster performance.
- multiple writes to the same location cause only one memory write at the time of block replacement.
- good for multiprocessor systems with write buffers.

### Disadv:

- requires maintaining a dirty bit
- main memory does not always hold the latest information
- data unit cache replacement
- complicates cache coherence problems

### E(b) RAID 4

parity definition:

~~The parity on a small write to  $x_i$~~

The parity block on  $x_4$  is the XOR of the data blocks.

$$P = x_0 \oplus x_1 \oplus x_2 \oplus x_3$$

updating parity on a small write to  $x_i$

$$P_{\text{new}} = P_{\text{old}} \oplus x_{i,\text{old}} \oplus x_{i,\text{new}}$$

① Read  $x_{i,\text{old}}$  (old data)

② Read  $P_{\text{old}}$  (old parity from  $x_4$ )

③ compute  $P_{\text{new}} = P_{\text{old}} \oplus x_{i,\text{old}} \oplus x_{i,\text{new}}$

④ Write  $x_{i,\text{new}}$  to disk  $x_i$

⑤ Write  $P_{\text{new}}$  to parity disk  $x_4$

This is the usual small-write procedure and causes the RAID-4 write penalty = 2 read + 2 writes = 4 I/O operations for  $\frac{\text{each}}{\text{small}} \text{ write}$ .

Reconstructing data if a disk fails

If one disk fails, you can recover its block by XORing the remaining data blocks with parity.

$$x_i = P \oplus x_0 \oplus x_1 \oplus x_3$$

more generally, for any failed data disk  $x_i$ :

$$x_i = P \oplus x_j$$

AVAILABLE AT:

That's all: parity is XOR of data; update parity using  
 $P_{\text{new}} = P_{\text{old}} \oplus D_{\text{old}} \oplus D_{\text{new}}$   
 reconstruct missing data by XORing parity with other  
 other data blocks.

7(a). Distinguish between RAID 4 and RAID 5

| Comparison                             | RAID 4                                | RAID 5                                |
|----------------------------------------|---------------------------------------|---------------------------------------|
| Storage pattern parity                 | Block-Stripping<br>single disk parity | Block-Stripping<br>Distributed parity |
| Minimum no of disk                     | 2                                     | 3                                     |
| Cost of set up                         | Affordable                            | Affordable                            |
| Modern relevance performance and speed | Obsolete                              | Not obsolete but rarely used          |
| Fault tolerance                        | Decent                                | Better                                |
| Best Application                       | High data transfer task               | High data transfer task               |

(b). What is the purpose of using Addressing mode techniques in computer?

Addressing modes are used to specify how to access operand by the CPU. They make instructions flexible, efficient and powerful.

opende operand

main purposes:

- ① Reduce instruction size
- ② increase programming flexibility
- ③ support complex data structure
- ④ enable efficient memory use

⑤ improve program readability.

⑥ optimize execution speed.

⑦ classify computer instruction, explain logical and bit manipulation instruction.

① Data transfer instruction (MOV, LOAD, STORE, push, pop).

② Arithmetic instruction (ADD, SUB, MUL).

③ logical and bit manipulation instruction (AND, OR, XOR) (SHL, SHR, ROL, ROR).

④ Control transfer / Branch instr. (JMP, CALL).

⑤ input / output instr. (IN, OUT).

⑥ processor control instruction (WAIT, INT).

### ⑧ Interrupt-Driven I/O Technique:

Interrupt-driven I/O technique in which the CPU does not continuously wait for an device instead, the CPU continues executing other instructions, and the I/O device sends an interrupt signal when it is ready for data transfer.

#### How it works:

- CPU issues an I/O command to a device.
- CPU continues executing other instructions instead of waiting.
- When the device is ready it sends an interrupt signal.
- CPU stops its current task and jump to the (ISR).
- ISR transfers the required data between memory and the device.

- After completing the service CPU return to interrupt program

Adv:

- No busy waiting
- faster and more efficient than programmed I/O.
- Good for device with unpredictable time delay.

disadv:

- Requires interrupt handling hardware
- frequent interrupt can slow down the CPU
- JSR adds overhead.

a) Instruction : elements of instruction



opcode: this field specifies the operation to be performed by the CPU.

Operand: these fields contains the data or references.

Addressing mode: this specifies how to interpret or locate the Operand, such as direct, indirect.

[An instruction tells the CPU what to do and which data to use]

Data processing: involves operations that modify data.

Data movement: involves operations that transfer data between memory, register, and I/O device.

Instruction cycle:



Fetch cycle: CPU retrieves instruction from memory using PC

Step:

- the address in PC is transferred to MAR
- Control Unit sends a read signal to memory
- MDR

Decode: the control unit interprets the fetched instruction stored in the IR.

Execute: cycle performs the actual operation determined during the decode stage.

More address: bigger instruction, fewer instruction needed.

~~less~~ fewer address: smaller instruction, more instruction needed.

STRUCTURE: ~~STRUCTURE:~~  ~~STRUCTURE:~~  ~~STRUCTURE:~~

## Synchronous timing diagram



## Asynchronous Timing Diagram



## Moore's Law

18-24) arat nizamir 02  
Gordon Moore 24  
power  
bott  
cost  
Kor

Moore's Law principle states that since the number of transistors on a silicon chip roughly doubles every two years, the performance and capability of computer will continue to increase while the price of computer decrease. It is a prediction made by American engineer Gordon Moore in 1965.

### Explained:

Moore's law was one of the best technological predictions of the last 50 years. Gordon E. Moore predicted that component on integrated circuit would increase every two years. His postulation became known as Moore's law and was confirmed true in 1975.



The majority of this growth in chip density is due to four primary factors: die size, line dimension, technical brilliance, and technology innovation.

According to Moore's observation, one of the major attractions of integrated electronics is low cost. This benefit grows as technology progresses as a single semiconductor substrate can produce more complex circuit functions.

CPU: It fetches instructions and data from memory, decodes them, executes the required operations, and stores the results back in memory.

Instruction Register (IR): Responsible for temporarily holding the current instruction being executed. When the CPU fetches an instruction from memory, it is loaded into the IR.

Program Counter (PC): The primary function of the program counter is to keep track of the memory address of the next instruction to be fetched and executed by the CPU.

Discuss SDRAM: SDRAM which stands for Synchronous Dynamic Random access memory is a type of volatile computer memory that has been widely used in computing systems for several decades. It is commonly found in computers, servers and other electronic devices. SDRAM has played a crucial role in the advancement of computing technologies by providing fast and efficient memory access.

- Synchronous operation: SDRAM is synchronous meaning it synchronizes its operation with the computer's bus speed. This synchronization allows for faster and efficient data transfer between memory and the processor.
- Data organization: SDRAM stores data in individual cells, each consisting of a capacitor and a transistor. These cells are organized into rows and columns, forming an array structure.
- Speed and bandwidth: SDRAM offers faster access times and higher bandwidth compared to traditional DRAM.
- Types of SDRAM: There are several generations of SDRAM that have been developed, and some notable ones include SDR SDRAM, DDR SDRAM, DDR2, DDR3, DDR4, and the latest DDR5.
- Advantages and limitations: SDRAM offers several advantages, including faster data transfer rates, increased system performance, and compatibility with a wide range of computer architecture.

If 8080 has 16 bit address bus, then how much memory space it will provide?

Can address a maximum of 64 Kilo bytes of memory. Each memory address represents a unique location in the memory space, and with 16 bits,  $2^{16} = 65536$  different address = 65536 bytes or 64 Kilo bytes.

## Slide-2:

### Structure of von Neumann machine



### Structure of IAS - detail

#### central processing unit

