CS 5220 | CS 5220

# CS 5220 ## Shared memory ### Monte Carlo ## 22 Sep 2015

Monte Carlo

Basic idea: Express answer $a$ as $$a = E[f(X)]$$ for some random variable(s) $X$.

Typical toy example: $$\pi/4 = E[\chi_{[0,1]}(X^2 + Y^2)] \mbox{ where } X, Y \sim U(-1,1).$$ We’ll be slightly more interesting...

### A toy problem Given ten points $(X_i, Y_i)$ drawn uniformly in $[0,1]^2$, what is the expected minimum distance between any pair?

### Toy problem: Version 1 Serial version: sum_fX = 0; for i = 1:ntrials x = rand(10,2); fX = min distance between points in x; sum_fX = sum_fX + fx; end result = sum_fX/ntrials; Parallel version: run twice and average results?! No communication — *embarrassingly parallel* Need to worry a bit about `rand`...

### Error estimators Central limit theorem: if $R$ is computed result, then $$R \sim N\left( E[f(X)], \frac{\sigma_{f(X)}}{\sqrt{n}} \right).$$ So: - Compute sample standard deviation $\hat{\sigma_{f(X)}}$ - Error bars are $\pm \hat{\sigma_{f(X)}}/\sqrt{n}$ - Use error bars to monitor convergence

### Toy problem: Version 2 Serial version: sum_fX = 0; sum_fX2 = 0; for i = 1:ntrials x = rand(10,2); fX = min distance between points in x; sum_fX = sum_fX + fX; sum_fX2 = sum_fX2 + fX*fX; result = sum_fX/i; errbar = sqrt(sum_fX2-sum_fX*sum_fX/i)/i; if (abs(errbar/result) < reltol), break; end end result = sum_fX/ntrials; Parallel version: ?

### Pondering parallelism Two major points: - How should we handle random number generation? - How should we manage termination criteria? Some additional points (briefly): - How quickly can we compute $f(X)$? - Can we accelerate convergence (variance reduction)?

### Pseudo-random number generation - Pretend *deterministic* and process is random. - We lose if it doesn’t *look* random! - ... but *quasi-random* numbers (low-discrepancy sequences) are also useful. - RNG functions have *state* - Basic `random()` call is *not* thread-safe!

### Parallel PRNG strategies - Put RNG in critical section (slow) - Run independent RNGs per thread - Concern: correlation between streams - Requires RNG with a very long period - Split stream from one RNG - Leapfrog: thread 0 uses even steps, thread 1 uses odd - Block: like leapfrog, but blocked fashion - Helpful if it’s cheap to skip steps! (often the case) Good libraries help! Mersenne twister, SPRNG, ...?

### One solution - Use a version of Mersenne twister with no global state: - Choose pseudo-random seeds per thread at startup. - See: [Dynamic Creation of Pseudorandom Number Generators](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf) (Matsumo and Nishimura) - [Code is also available](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html)

### Toy problem: Version 2.1p sum_fX = 0; sum_fX2 = 0; n = 0; for each thread in parallel do fX = result of one random trial ++n; sum_fX += fX; sum_fX2 += fX*fX; errbar = ... if (abs(errbar/result) < reltol), break; end loop end result = sum_fX/n; Uh-oh!

### Toy problem: Version 2.2p sum_fX = 0; sum_fX2 = 0; n = 0; done = false; for each thread in parallel do fX = result of one random trial get lock ++n; sum_fX = sum_fX + fX; sum_fX2 = sum_fX2 + fX*fX; errbar = ... if (abs(errbar/result) < reltol) done = true; end release lock until done end result = sum_fX/n; Better, but expensive.

### Toy problem: Version 2.3p sum_fX = 0; sum_fX2 = 0; n = 0; done = false; for each thread in parallel do batch_sum_fX, batch_sum_fX2 = B trials get lock n += B; sum_fX += batch_sum_fX; sum_fX2 += batch_sum_fX2; errbar = ... if (abs(errbar/result) < reltol) done = true; end release lock until done or n > n_max end result = sum_fX/n;

### Some loose ends - Alternative: “master-slave” organization - Master sends out batches of work to slaves - Example: SETI at Home, Folding at Home, ... - What is the right batch size? - Large $B$ $\implies$ amortize sync overhead (and variance actually helps with contention!) - Small $B$ avoids too much extra work

### Some loose ends - How to evaluate $f(X)$? - For $p$ points, obvious algorithm is $O(p^2)$ - Binning points better? No gain for $p$ small... - Is $f(X)$ the right thing to evaluate? - Maybe $E[g(X)] = E[f(X)]$ but $\mathrm{Var}[g(X)] \ll \mathrm{Var}[f(X)]$? - May make much more difference than parallelism!