# CS 5220 ## Shared memory ### Basic concepts ## 22 Sep 2015
### Overview - [Basic concepts (these slides)](/slides/2015-09-22-shared.html) - [Monte Carlo example](/slides/2015-09-22-mc.html) - [Pthreads programming](/slides/2015-09-22-pthreads.html) - [OpenMP programming](/slides/2015-09-24-openmp.html) - [Memory models and hardware implications](/slides/2015-09-24-memory.html)
### Parallel programming model - Control - How is parallelism created? - What ordering is there between operations? - Data - What data is private or shared? - How is data logically shared or communicated? - Synchronization - What operations are used to coordinate? - What operations are atomic? - Cost: how do we reason about each of above?
### Shared memory programming model Program consists of *threads* of control. - Can be created dynamically - Each has private variables (e.g. local) - Each has shared variables (e.g. heap) - Communication through shared variables - Coordinate by synchronizing on variables - Examples: OpenMP, pthreads, Cilk, Java threads

Thread birth and death


Thread is created by forking.
When done, join original thread.

### Mechanisms for thread birth/death - Statically allocate threads at start - Fork/join (pthreads) - Fork detached threads (pthreads) - Cobegin/coend (OpenMP?) - Like fork/join, but lexically scoped - Futures - `v = future(somefun(x))` - Attempts to use v wait on evaluation
### Mechanisms for synchronization - Locks/mutexes (enforce mutual exclusion) - Monitors (like locks with lexical scoping) - Barriers - Condition variables (notification)

Mutex


Allow only one process at a time in critical section (red).
Synchronize via locks, aka mutexes (mutual exclusion vars).

Condition variables


Thread waits until condition holds (e.g. work available).

Barriers


Computation phases separated by barriers.
Everyone reaches the barrier, then proceeds.

### Synchronization pitfalls - Incorrect synchronization $\implies$ *deadlock* - All threads waiting for what the others have - Doesn’t always happen! $\implies$ hard to debug - Too little synchronization $\implies$ data races - Again, doesn’t always happen! - Too much synchronization $\implies$ poor performance - ... but makes it easier to think through correctness

Deadlock


// Thread 0:
lock(l1); lock(l2);
do_something();
unlock(l2); unlock(l1);

// Thread 1:
lock(l2); lock(l1);
do_something();
unlock(l1); unlock(l2);

Conditions:

  1. Mutual exclusion
  2. Hold and wait
  3. No preemption
  4. Circular wait
### Race to the dot Consider `S += partial_sum` on 2 CPU: - P1: Load `S` - P1: Add `partial_sum` - P2: Load `S` - P1: Store new `S` - P2: Add `partial_sum` - P2: Store new `S`
### Shared memory - Multi-threaded programming looks easy - ... except for synchronization - Too little kills correctness - Too much kills performance - Easy to get wrong either way - Next up: [a Monte Carlo example](/slides/2015-09-22-mc.html)