# CS 5220 ## Shared memory ### Memory issues ## 24 Sep 2015
### Memory model - Single processor: return last write - What about DMA and memory-mapped I/O? - Simplest generalization: *sequential consistency* – as if - Each process runs in program order - Instructions from different processes are interleaved - Interleaved instructions ran on one processor

Sequential consistency

A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.
– Lamport, 1979
### Example: Spin lock Initially, `flag = 0` and `sum = 0` // Processor 1: sum += p1; flag = 1; // Processor 2: while (!flag); sum += p2;
### Example: Spin lock Initially, `flag = 0` and `sum = 0` // Processor 1: sum += p1; flag = 1; // Processor 2: while (!flag); sum += p2; Without sequential consistency support, what if 1. Processor 2 caches `flag`? 2. Compiler optimizes away loop? 3. Compiler reorders assignments on P1? Starts to look restrictive!
### Sequential consistency: ### the good, the bad, the ugly - Program behavior is “intuitive”: - Nobody sees garbage values - Time always moves forward - One issue is *cache coherence*: - Coherence: different copies, same value - Requires (nontrivial) hardware support - Also an issue for optimizing compiler! - There are cheaper *relaxed* consistency models.
### Snoopy bus protocol - Basic idea: - Broadcast operations on memory bus - Cache controllers “snoop” on all bus transactions - Memory writes induce serial order - Act to enforce coherence (invalidate, update, etc) - Problems: - Bus bandwidth limits scaling - Contending writes are slow - There are other protocol options (e.g. directory-based). - But usually give up on *full* sequential consistency.
### Weakening sequential consistency Try to reduce to the *true* cost of sharing - `volatile` tells compiler when to worry about sharing - Memory fences tell when to force consistency - Synchronization primitives (lock/unlock) include fences
### Sharing True sharing: - Frequent writes cause a bottleneck. - Idea: make independent copies (if possible). - Example problem: `malloc`/`free` data structure. False sharing: - Distinct variables on same cache block - Idea: make processor memory contiguous (if possible) - Example problem: array of ints, one per processor
### Take-home message - Sequentially consistent shared memory is a useful idea... - “Natural” analogue to serial case - Architects work hard to support it - ... but implementation is costly! - Makes life hard for optimizing compilers - Coherence traffic slows things down - Helps to limit sharing Have to think about these things to get good performance.