# CS 5220
## Shared memory
### OpenMP
## 24 Sep 2015
### Shared memory programming model
Program consists of *threads* of control.
- Can be created dynamically
- Each has private variables (e.g. local)
- Each has shared variables (e.g. heap)
- Communication through shared variables
- Coordinate by synchronizing on variables
- Examples: *OpenMP*, pthreads, Cilk, Java threads
### The problem with pthreads revisited
- pthreads can be painful!
- Makes code verbose
- Synchronization is hard to think about
- Would like to make this more automatic!
- ... and have been trying for a couple decades.
- OpenMP gets us *part* of the way
### OpenMP: Open spec for MultiProcessing
- Standard API for multi-threaded code
- Only a spec — multiple implementations
- Lightweight syntax
- C or Fortran (with appropriate compiler support)
- High level:
- Preprocessor/compiler directives (80%)
- Library calls (19%)
- Environment variables (1%)
Compiling OpenMP
A practical aside...
### Parallel “hello world”
#include <stdio.h>
#include <omp.h>
int main()
{
#pragma omp parallel
printf("Hello world from %d\n",
omp_get_thread_num());
return 0;
}
Parallel sections
- Basic model: fork-join
- Each thread runs same code block
- Annotations distinguish shared ($s$) and private ($i$) data
- Relaxed consistency for shared data
Parallel sections
double s[MAX_THREADS];
int i;
#pragma omp parallel shared(s) private(i)
{
i = omp_get_thread_num();
s[i] = i;
}
Critical sections
- Automatically lock/unlock at ends of critical section
- Automatically memory flushes for consistency
- Locks are still there if you really need them...
Critical sections
#pragma omp parallel {
//...
#pragma omp critical my_data_cs
{
//... modify data structure here ...
}
}
Barriers
#pragma omp parallel
for (i = 0; i < nsteps; ++i) {
do_stuff();
#pragma omp barrier
}
Parallel loops
- Independent loop body? At least order doesn’t matter.
- Partition index space among threads
- Implicit barrier at end (except with nowait)
### Parallel loops
/* Compute dot of x and y of length n */
int i, tid;
double my_dot, dot = 0;
#pragma omp parallel \
shared(dot,x,y,n) \
private(i,my_dot)
{
tid = omp_get_thread_num();
my_dot = 0;
#pragma omp for
for (i = 0; i < n; ++i)
my_dot += x[i]*y[i];
#pragma omp critical
dot += my_dot;
}
### Parallel loops
/* Compute dot of x and y of length n */
int i, tid;
double dot = 0;
#pragma omp parallel \
shared(x,y,n) \
private(i) \
reduction(+:dot)
{
#pragma omp for
for (i = 0; i < n; ++i)
dot += x[i]*y[i];
}
### Parallel loop scheduling
Partition index space different ways:
- `static[(chunk)]`: decide at start of loop; default chunk
is `n/nthreads`. Low overhead, potential load imbalance.
- `dynamic[(chunk)]`: each thread takes `chunk`
iterations when it has time; default `chunk` is 1. Higher
overhead, but automatically balances load.
- `guided`: take chunks of size unassigned
iterations/threads; chunks get smaller toward end of loop. Somewhere
between `static` and `dynamic`.
- `auto`: up to the system!
Default behavior is implementation-dependent.
### Other parallel work divisions
- `single`: do only in one thread (e.g. I/O)
- `master`: do only in one thread; others skip
- `sections`: like cobegin/coend
### Tasks
- So far, very static flavors of parallelism
- *Tasks* allow more dynamic parallel patterns
- From OpenMP 3.0 on, [explicit tasking support](http://openmp.org/sc13/sc13.tasking.ruud.pdf)
### Tasks
#pragma omp parallel
{
#pragma omp single
{
// General setup work
#pragma omp task
task1();
#pragma omp task
task2();
#pragma omp taskwait
depends_on_both_tasks();
}
}
### Linked list
Adapted from [an SC13
presentation](http://openmp.org/sc13/sc13.tasking.ruud.pdf)
node_t* p = head;
#pragma omp parallel
{
#pragma omp single nowait
while (p != NULL) {
#pragma omp task firstprivate(p)
do_work(p);
p = p->next;
}
} // Implied barrier at end of parallel region
### [Post-order traversal](http://openmp.org/wp/presos/sc07openmpbof.pdf)
void traverse(node_t* p)
{
if (p->left)
#pragma omp task
traverse(p->left);
if (p->right)
#pragma omp task
travers(p->right);
#pragma omp taskwait
process(p->data);
}
Essential complexity?
Fred Brooks
(Mythical
Man Month) identified two types of
software complexity: essential and accidental.
Does OpenMP address accidental complexity? Yes, somewhat!
Essential complexity is harder.
### Things to still think about with OpenMP
- Proper serial performance tuning?
- Minimizing false sharing?
- Minimizing synchronization overhead?
- Minimizing loop scheduling overhead?
- Load balancing?
- Finding enough parallelism in the first place?
Let’s focus again on memory issues...