# CS 5220
## Shared memory
### Basic concepts
## 22 Sep 2015
### Overview
- [Basic concepts (these slides)](/slides/2015-09-22-shared.html)
- [Monte Carlo example](/slides/2015-09-22-mc.html)
- [Pthreads programming](/slides/2015-09-22-pthreads.html)
- [OpenMP programming](/slides/2015-09-24-openmp.html)
- [Memory models and hardware implications](/slides/2015-09-24-memory.html)
### Parallel programming model
- Control
- How is parallelism created?
- What ordering is there between operations?
- Data
- What data is private or shared?
- How is data logically shared or communicated?
- Synchronization
- What operations are used to coordinate?
- What operations are atomic?
- Cost: how do we reason about each of above?
### Shared memory programming model
Program consists of *threads* of control.
- Can be created dynamically
- Each has private variables (e.g. local)
- Each has shared variables (e.g. heap)
- Communication through shared variables
- Coordinate by synchronizing on variables
- Examples: OpenMP, pthreads, Cilk, Java threads
Thread birth and death
![](/img/shared/forkjoin.svg)
Thread is created by forking.
When done, join original thread.
### Mechanisms for thread birth/death
- Statically allocate threads at start
- Fork/join (pthreads)
- Fork detached threads (pthreads)
- Cobegin/coend (OpenMP?)
- Like fork/join, but lexically scoped
- Futures
- `v = future(somefun(x))`
- Attempts to use v wait on evaluation
### Mechanisms for synchronization
- Locks/mutexes (enforce mutual exclusion)
- Monitors (like locks with lexical scoping)
- Barriers
- Condition variables (notification)
Mutex
![](/img/shared/mutex.svg)
Allow only one process at a time in critical section
(red).
Synchronize via locks, aka mutexes (mutual exclusion vars).
Condition variables
![](/img/shared/condvar.svg)
Thread waits until condition holds (e.g. work available).
Barriers
![](/img/shared/barrier.svg)
Computation phases separated by barriers.
Everyone reaches the barrier, then proceeds.
### Synchronization pitfalls
- Incorrect synchronization $\implies$ *deadlock*
- All threads waiting for what the others have
- Doesn’t always happen! $\implies$ hard to debug
- Too little synchronization $\implies$ data races
- Again, doesn’t always happen!
- Too much synchronization $\implies$ poor performance
- ... but makes it easier to think through correctness
Deadlock
// Thread 0:
lock(l1); lock(l2);
do_something();
unlock(l2); unlock(l1);
// Thread 1:
lock(l2); lock(l1);
do_something();
unlock(l1); unlock(l2);
Conditions:
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
### Race to the dot
Consider `S += partial_sum` on 2 CPU:
- P1: Load `S`
- P1: Add `partial_sum`
- P2: Load `S`
- P1: Store new `S`
- P2: Add `partial_sum`
- P2: Store new `S`
### Shared memory
- Multi-threaded programming looks easy
- ... except for synchronization
- Too little kills correctness
- Too much kills performance
- Easy to get wrong either way
- Next up: [a Monte Carlo example](/slides/2015-09-22-mc.html)