# CS 5220
## Parallelism and locality in simulation
### Life
## 15 Sep 2015
### Discrete events
Basic setup:
- Finite set of variables, updated via transition function
- *Synchronous* case: finite state machine
- *Asynchronous* case: event-driven simulation
- Synchronous example: Game of Life
Nice starting point — no discretization concerns!
Game of Life
Game of Life (John Conway):
- Live cell dies with < 2 live neighbors
- Live cell dies with > 3 live neighbors
- Live cell lives with 2–3 live neighbors
- Dead cell becomes live with exactly 3 live neighbors
Game of Life
Easy to parallelize by domain decomposition.
- Update work involves volume of subdomains
- Communication per step on surface (cyan)
### Game of Life: Pioneers and Settlers
What if pattern is “dilute”?
- Few or no live cells at surface at each step
- Think of live cell at a surface as an “event”
- Only communicate events!
- This is *asynchronous*
- Harder with message passing — when do you receive?
### Asynchronous Game of Life
How do we manage events?
- Could be *speculative* — assume no communication across
boundary for many steps, back up if needed
- Or *conservative* — wait whenever communication
possible
- possible $\not \equiv$ guaranteed!
- Deadlock: everyone waits for everyone else to send
- Can get around this with NULL messages
How do we manage load balance?
- No need to simulate quiescent parts of the game!
- Maybe dynamically assign smaller blocks to processors?
### High-Performance Game of Life
- Lots of implementations use fancy bit representations
- [Ch 17](http://downloads.gamedev.net/pdf/gpbb/gpbb17.pdf) and
[Ch 18](http://downloads.gamedev.net/pdf/gpbb/gpbb18.pdf) of
Abrash's _Game Programmer's Black Book_ have an old, but still
illuminating discussion of low-level (serial) optimizations
- [HashLife](https://en.wikipedia.org/wiki/Hashlife) is a triumph
of algorithm design.
### High-Performance Game of Life
How would I tackle this? Assuming matrix version, I might:
- Build a bit-packed representation
- Use a fast vectorized kernel to update small blocks
- Coarse blocking with generation skipping
- Dynamic scheduling of coarse block updates
What would you do?