CS 5220 | CS 5220

# CS 5220 ## Distributed memory ### Networks and models ## 06 Oct 2015

### Basic questions - How much does a message cost? - *Latency*: time to get between processors - *Bandwidth*: data transferred per unit time - How does *contention* affect communication? - This is a combined hardware-software question! - We want to understand just enough for reasonable modeling.

### Thinking about interconnects Several features characterize an interconnect: - *Topology*: who do the wires connect? - *Routing*: how do we get from A to B? - *Switching*: circuits, store-and-forward? - *Flow control*: how do we manage limited resources?

### Thinking about interconnects - Links are like streets - Switches are like intersections - Hops are like blocks traveled - Routing algorithm is like a travel plan - Stop lights are like flow control - Short packets are like cars, long ones like buses? At some point the analogy breaks down...

Bus topology

One set of wires (the bus)
Only one processor allowed at any given time
- Contention for the bus is an issue
Example: basic Ethernet, some SMPs

Crossbar

Dedicated path from every input to every output
- Takes $O(p^2)$ switches and wires!

### Bus vs. crossbar - Crossbar: more hardware - Bus: more contention (less capacity?) - Generally seek happy medium - Less contention than bus - Less hardware than crossbar - May give up one-hop routing

### Network properties Think about latency and bandwidth via two quantities: - *Diameter*: max distance between nodes - *Bisection bandwidth*: smallest bandwidth cut to bisect - Particularly important for all-to-all communication

Linear topology

$p-1$ links
Diameter $p-1$
Bisection bandwidth $1$

Ring topology

$p$ links
Diameter $p/2$
Bisection bandwidth $2$

Mesh

May be more than two dimensions
Route along each dimension in turn

Torus

Torus : Mesh :: Ring : Linear

Hypercube

Label processors with binary numbers
Connect $p_1$ to $p_2$ if labels differ in one bit

Fat tree

Processors at leaves
Increase link bandwidth near root

### Others... - Butterfly network - Omega network - Cayley graph

### Conventional wisdom - Roughly constant latency (?) - Wormhole routing (or cut-through) flattens latencies vs store-forward at hardware level - Software stack dominates HW latency! - Latencies *not* same between networks (in box vs across) - May also have store-forward at library level - Avoid topology-specific optimization - Want code that runs on next year’s machine, too! - Bundle topology awareness in vendor MPI libraries? - Sometimes specify a *software* topology