# CS 5220
## Shared memory
### Monte Carlo
## 22 Sep 2015
Monte Carlo
Basic idea: Express answer $a$ as $$a = E[f(X)]$$ for some random
variable(s) $X$.
Typical toy example:
$$\pi/4 = E[\chi_{[0,1]}(X^2 + Y^2)] \mbox{ where } X, Y \sim U(-1,1).$$
We’ll be slightly more interesting...
### A toy problem
Given ten points $(X_i, Y_i)$ drawn uniformly in $[0,1]^2$, what is the
expected minimum distance between any pair?
### Toy problem: Version 1
Serial version:
sum_fX = 0;
for i = 1:ntrials
x = rand(10,2);
fX = min distance between points in x;
sum_fX = sum_fX + fx;
end
result = sum_fX/ntrials;
Parallel version: run twice and average results?!
No communication — *embarrassingly parallel*
Need to worry a bit about `rand`...
### Error estimators
Central limit theorem: if $R$ is computed result, then
$$R \sim N\left( E[f(X)], \frac{\sigma_{f(X)}}{\sqrt{n}} \right).$$ So:
- Compute sample standard deviation $\hat{\sigma_{f(X)}}$
- Error bars are $\pm \hat{\sigma_{f(X)}}/\sqrt{n}$
- Use error bars to monitor convergence
### Toy problem: Version 2
Serial version:
sum_fX = 0;
sum_fX2 = 0;
for i = 1:ntrials
x = rand(10,2);
fX = min distance between points in x;
sum_fX = sum_fX + fX;
sum_fX2 = sum_fX2 + fX*fX;
result = sum_fX/i;
errbar = sqrt(sum_fX2-sum_fX*sum_fX/i)/i;
if (abs(errbar/result) < reltol), break; end
end
result = sum_fX/ntrials;
Parallel version: ?
### Pondering parallelism
Two major points:
- How should we handle random number generation?
- How should we manage termination criteria?
Some additional points (briefly):
- How quickly can we compute $f(X)$?
- Can we accelerate convergence (variance reduction)?
### Pseudo-random number generation
- Pretend *deterministic* and process is random.
- We lose if it doesn’t *look* random!
- ... but *quasi-random* numbers (low-discrepancy
sequences) are also useful.
- RNG functions have *state*
- Basic `random()` call is *not* thread-safe!
### Parallel PRNG strategies
- Put RNG in critical section (slow)
- Run independent RNGs per thread
- Concern: correlation between streams
- Requires RNG with a very long period
- Split stream from one RNG
- Leapfrog: thread 0 uses even steps, thread 1 uses odd
- Block: like leapfrog, but blocked fashion
- Helpful if it’s cheap to skip steps! (often the case)
Good libraries help! Mersenne twister, SPRNG, ...?
### One solution
- Use a version of Mersenne twister with no global state:
- Choose pseudo-random seeds per thread at startup.
- See: [Dynamic Creation of Pseudorandom Number
Generators](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf)
(Matsumo and Nishimura)
- [Code is also available](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html)
### Toy problem: Version 2.1p
sum_fX = 0; sum_fX2 = 0; n = 0;
for each thread in parallel
do
fX = result of one random trial
++n;
sum_fX += fX;
sum_fX2 += fX*fX;
errbar = ...
if (abs(errbar/result) < reltol), break; end
loop
end
result = sum_fX/n;
Uh-oh!
### Toy problem: Version 2.2p
sum_fX = 0; sum_fX2 = 0; n = 0; done = false;
for each thread in parallel
do
fX = result of one random trial
get lock
++n;
sum_fX = sum_fX + fX;
sum_fX2 = sum_fX2 + fX*fX;
errbar = ...
if (abs(errbar/result) < reltol)
done = true;
end
release lock
until done
end
result = sum_fX/n;
Better, but expensive.
### Toy problem: Version 2.3p
sum_fX = 0; sum_fX2 = 0; n = 0; done = false;
for each thread in parallel
do
batch_sum_fX, batch_sum_fX2 = B trials
get lock
n += B;
sum_fX += batch_sum_fX;
sum_fX2 += batch_sum_fX2;
errbar = ...
if (abs(errbar/result) < reltol)
done = true;
end
release lock
until done or n > n_max
end
result = sum_fX/n;
### Some loose ends
- Alternative: “master-slave” organization
- Master sends out batches of work to slaves
- Example: SETI at Home, Folding at Home, ...
- What is the right batch size?
- Large $B$ $\implies$ amortize sync overhead
(and variance actually helps with contention!)
- Small $B$ avoids too much extra work
### Some loose ends
- How to evaluate $f(X)$?
- For $p$ points, obvious algorithm is $O(p^2)$
- Binning points better? No gain for $p$ small...
- Is $f(X)$ the right thing to evaluate?
- Maybe $E[g(X)] = E[f(X)]$ but
$\mathrm{Var}[g(X)] \ll \mathrm{Var}[f(X)]$?
- May make much more difference than parallelism!