# CS 5220 ## Parallelism and locality in simulation ### Distributed parameter systems ## 17 Sep 2015

Distributed parameter problems

Mostly PDEs:

Type Example Time? Space dependence?
Elliptic electrostatics steady global
Hyperbolic sound waves yes local
Parabolic diffusion yes global

Different types involve different communication:

  • Global dependence $\implies$ lots of communication
  • Finite wave speed $\implies$ local dependence

Example: 1D heat equation

Consider flow (e.g. of heat) in a uniform rod

  • Heat ($Q$) $\propto$ temperature ($u$) $\times$ mass ($\rho h$)
  • Heat flow $\propto$ temperature gradient (Fourier’s law)

Example: 1D heat equation

$$\begin{aligned} \frac{\partial Q}{\partial t} \propto h \frac{\partial u}{\partial t} &\approx C \left[ \left( \frac{u(x-h)-u(x)}{h} \right) + \left( \frac{u(x)-u(x+h)}{h} \right) \right] \\ \frac{\partial u}{\partial t} &\approx C \left[ \frac{u(x-h)-2u(x)+u(x+h)}{h^2} \right] \rightarrow C \frac{\partial^2 u}{\partial x^2} \end{aligned}$$

Spatial discretization

Heat equation with $u(0) = u(1) = 0$ $$\frac{\partial u}{\partial t} = C \frac{\partial^2 u}{\partial x^2}$$

Spatial semi-discretization: $$\frac{\partial^2 u}{\partial x^2} \approx \frac{u(x-h)-2u(x)+u(x+h)}{h^2}$$

Spatial discretization

Yields a system of ODEs $$ \begin{align} \frac{du}{dt} & = C h^{-2} (-T) u \\ &= -C h^{-2} \begin{bmatrix} 2 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & \ddots & \ddots & \ddots & \\ & & -1 & 2 & -1 \\ & & & -1 & 2 \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_{n-2} \\ u_{n-1} \end{bmatrix} \end{align}$$

Explicit time stepping

Approximate PDE by ODE system (“method of lines”): $$\frac{du}{dt} = C h^{-2} T u$$ Now need a time-stepping scheme for the ODE:

  • Simplest scheme is Euler: $$u(t+\delta) \approx u(t) + u'(t) \delta = \left( I - C \frac{\delta}{h^2} T \right) u(t)$$
  • Taking a time step $\equiv$ sparse matvec with $\left( I - C \frac{\delta}{h^2} T \right)$
  • This may not end well...

Explicit time stepping data dependence


Nearest neighbor interactions per step $\implies$
finite rate of numerical information propagation

Explicit time stepping in parallel



for t = 1 to N
  communicate boundary data ("ghost cell")
  take time steps locally
end

Overlap communication + computation



for t = 1 to N
  start boundary data sendrecv
  compute new interior values
  finish sendrecv
  compute new boundary values
end

Batching time steps



for t = 1 to N by B
  start boundary data sendrecv (B values)
  compute new interior values
  finish sendrecv (B values)
  compute new boundary values
end

Explicit pain


Unstable for $\delta > O(h^2)$!

### Implicit time stepping - Backward Euler uses backward difference for $d/dt$ $$u(t+\delta) \approx u(t) + u'(t + \delta t) \delta$$ - Taking a time step $\equiv$ sparse matvec with $\left( I + C \frac{\delta}{h^2} T \right)^{-1}$ - No time step restriction for stability (good!) - But each step involves linear solve (not so good!) - Good if you like numerical linear algebra?

Explicit and implicit

  • Propagates information at finite rate
  • Steps look like sparse matvec (in linear case)
  • Stable step determined by fastest time scale
  • Works fine for hyperbolic PDEs

Explicit and implicit

  • No need to resolve fastest time scales
  • Steps can be long... but expensive
    • Linear/nonlinear solves at each step
    • Often these solves involve sparse matvecs
  • Critical for parabolic PDEs

Poisson problems

Consider 2D Poisson $$-\nabla^2 u = \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = f $$

  • Prototypical elliptic problem (steady state)
  • Similar to a backward Euler step on heat equation

Poisson problem discretization


$$ u_{i,j} = h^{-2} \left( 4u_{i,j}-u_{i-1,j}-u_{i+1,j}-u_{i,j-1}-u_{i,j+1} \right) $$

Poisson problem discretization

$$L = \left[ \begin{array}{ccc|ccc|ccc} 4 & -1 & & -1 & & & & & \\ -1 & 4 & -1 & & -1 & & & & \\ & -1 & 4 & & & -1 & & & \\ \hline -1 & & & 4 & -1 & & -1 & & \\ & -1 & & -1 & 4 & -1 & & -1 & \\ & & -1 & & -1 & 4 & & & -1 \\ \hline & & & -1 & & & 4 & -1 & \\ & & & & -1 & & -1 & 4 & -1 \\ & & & & & -1 & & -1 & 4 \end{array} \right]$$

Poisson solvers in 2D

Demmel, Applied Numerical Linear Algebra.

Method Time Space
Dense LU $N^3$ $N^2$
Band LU $N^2$ $N^{3/2}$
Jacobi $N^2$ $N$
Explicit inv $N^2$ $N^2$
CG $N^{3/2}$ $N$
Red-black SOR $N^{3/2}$ $N$
Sparse LU $N^{3/2}$ $N \log N$
FFT $N \log N$ $N$
Multigrid $N$ $N$

Remember: best MFlop/s $\neq$ fastest solution!

### General implicit picture - Implicit solves or steady state $\implies$ solving systems - Nonlinear solvers generally linearize - Linear solvers can be - Direct (hard to scale) - Iterative (often problem-specific) - Iterative solves boil down to matvec!
### PDE solver summary Can be implicit or explicit (as with ODEs) - Explicit (sparse matvec) - fast, but short steps? - works fine for hyperbolic PDEs - Implicit (sparse solve) - Direct solvers are hard! - Sparse solvers turn into matvec again
### PDE solver summary - Differential operators turn into local mesh stencils - Matrix connectivity looks like mesh connectivity - Can partition into subdomains that communicate only through boundary data - More on graph partitioning later - Not all nearest neighbor ops are equally efficient! - Depends on mesh structure - Also depends on flops/point