Stored Program Computer Architecture

Von Neumann Architecture

  1. Instruction decode
  2. fetch operands from memory
  3. perform operation
  4. write back
  5. next instruction

Instruction set architecture (ISA)

Consists of:

Types of operations

CISC

Instructions are:

RISC

Instructions are:

Explicitly Parallel Instruction Computing (EPIC)

Endianness

Big endian: MSB at lowest address e.g. MIPS, SPARC, PowerPC

Little endian: MSB at highest address e.g. x86/x64

Microarchitecture

Pipelining

Instructions take multiple cycles Instructions go through various stages

Idea: overlap instruction execution => same latency => increased throughput

Operation Pipelining

Idea: subdivide complex operations (e.g. FP addition, multiplication) into simpler stages Floating point sub stages:

  1. Instruction decode
  2. Operand exponent alignment
  3. Actual operation
  4. Normalization

Wind-up/ Wind-down: Time until the pipeline is full/empty again

Requires large amout of independent instructions

Cycle time defined by the longest stage => Pipeline stages need to take the same time => further split stages to achieve this

Instruction Pipelining

At least three stages:

  1. fetch
  2. decode
  3. execute

Speedup: \(T = \frac{T{seq}}{T{pipe}}\) As \(N \to \infty\), \(T \to m\)

Throughput: $$\frac{Tp}{T{pipe}} = \frac{N}{N + m -1}$$ can be increased up to 1 instruction per cycle

Pipeline issues

Sequencing overhead

Hazards

CISC architecture:

C/C++ Aliasing -fno-alias -fargument-noalias restrict keyword -> FORTRAN faster than C because no aliasing

Pipeline Hazards
Pipeline optimization

Software pipelining

inlining might help

Superscalarity

Processor designed to execute multiple instructions per cycle
Additional hardware needed

A kind of ILP

SIMD

Can be realized through vector instructions or Superscalarity

SSE 128 bit registers AVX 256 bit registers

Operations need to be independent

Vectorized code can be produced by compiler

compiler directives

#pragma vector always
#pragma novectorize
#pragma vector aligned
#pragma omp simd

Ways to utilize SIMD as a programmer:

Hierarchies

Caches

Memory bottleneck
spacial and temporal locality
-> Caches

Types of chaches:

Speed capacity tradeoff

Levels of caches L1 usually separate data and instruction cache L2, L3 unified and may be shared between cores

Prefetching

Cache mapping

Cache coherence