SuperGlue Tutorial

First example

The following example uses SuperGlue to execute a single, independent task:
#include "sg/superglue.hpp"
#include <iostream>

struct Options : public DefaultOptions<Options> {};

struct MyTask : public Task<Options> {
    void run() {
        std::cout << "Hello world!" << std::endl;
    }
};

int main() {
    SuperGlue<Options> sg;
    sg.submit(new MyTask());
    return 0;
}
Compile with
g++ -I $(SUPERGLUE)/include helloworld.cpp -pthread
where $(SUPERGLUE) is replaced with the path to where SuperGlue was downloaded.

The Code Example Explained

First, the SuperGlue header file is included:
#include "superglue.hpp"

SuperGlue is designed to be customizable, and this is implemented by having an Options struct that is a template parameter to most classes. In this example, the default options are used, and the Options struct is defined by
struct Options : public DefaultOptions<Options> {};

Tasks are classes that inherits the Task<Options> class, and provide a run() method. Tasks typically register which shared data structures they access in the constructor. The task in this example has no dependencies, why no constructor is needed. The task is defined by
struct MyTask : public Task<Options> {
    void run() {
        std::cout << "Hello world!" << std::endl;
    }
};

The run-time system and the worker threads is started by instantiating an object of the SuperGlue class, like this:
SuperGlue<Options> sg;
This will create one worker thread per available core, except for one which is reserved for the main thread.

It is also possible to specify how many CPUs to use as an argument to the constructor. To limit the number of CPUs used to 4 (main thread plus three workers), the run-time system is started like this:
SuperGlue<Options> sg(4);

WARNING: This number must be less or equal to the number of available CPUs.

Tasks are submitted to SuperGlue using the submit() method:
sg.submit(new MyTask());

When a task has been submitted to SuperGlue, the ownership (that is, the responsibility for deleting it) is transfered to SuperGlue.

When the SuperGlue object falls out of scope, there will be an implied barrier that waits for all tasks to finish, and the worker threads will then be terminated.

Tasks with Dependencies

The main objective of SuperGlue is to manage dependencies between tasks in a flexible and efficient way. The following code is an example with task dependencies.
#include "sg/superglue.hpp"
#include <iostream>

const size_t numSlices = 5;
const size_t sliceSize = 100;

struct Options : public DefaultOptions<Options> {};

// Task that inputs a vector and outputs a scaled vector.
struct ScaleTask : public Task<Options> {
    double s, *a, *b;
    ScaleTask(double s_,
              double *a_, Handle<Options> &hA,
              double *b_, Handle<Options> &hB)
    : s(s_), a(a_), b(b_)
    {
        register_access(ReadWriteAdd::read, &hA);
        register_access(ReadWriteAdd::write, &hB);
    }
    void run() {
        for (size_t i = 0; i < sliceSize; ++i)
            b[i] = s*a[i];
    }
};

// Task that input two vectors and sums them into an ouput vector
struct SumTask : public Task<Options> {
    double *a, *b, *c;
    SumTask(double *a_, Handle<Options> &hA,
            double *b_, Handle<Options> &hB,
            double *c_, Handle<Options> &hC)
    : a(a_), b(b_), c(c_)
    {
        register_access(ReadWriteAdd::read, &hA);
        register_access(ReadWriteAdd::read, &hB);
        register_access(ReadWriteAdd::write, &hC);
    }
    void run() {
        for (size_t i = 0; i < sliceSize; ++i)
            c[i] = a[i]+b[i];
    }
};

int main() {
    double data[numSlices][sliceSize];

    for (size_t i = 0; i < sliceSize; ++i)
        data[0][i] = 1.0;

    // Define handles for the slices
    Handle<Options> h[numSlices];

    SuperGlue<Options> sg;
    sg.submit(new ScaleTask(2.0, data[0], h[0], data[1], h[1]));         // h_1 = 2*h_0
    sg.submit(new ScaleTask(3.0, data[0], h[0], data[2], h[2]));         // h_2 = 3*h_0
    sg.submit(new SumTask(data[0], h[0], data[1], h[1], data[3], h[3])); // h_3 = h_0+h_1
    sg.submit(new SumTask(data[1], h[1], data[2], h[2], data[4], h[4])); // h_4 = h_1+h_2

    // Wait for all tasks to finish
    sg.barrier();

    // The data may be accessed here, after the barrier
    std::cout << "result=[" << data[0][0] << " "  << data[1][0] << " "
              << data[2][0] << " " << data[3][0] << " " << data[4][0]
              << "]" << std::endl;
    return 0;
}
In SuperGlue, dependencies are specified by registering which handles a task accesses. Handles are objects that represent shared variables. In the example above, one handle is created per slice of the vector data:
Handle<Options> h[numSlices];

SuperGlue does not know about the connection between handles and which variables they protect. This allows handles to be used for representing anything.

The handles are sent along when constructing the tasks that use the corresponding data. In the constructor, the task must register that it accesses the handle, and if it reads or writes the data. The registration in ScaleTask is performed in the following two lines:
register_access(ReadWriteAdd::read, &hA);
register_access(ReadWriteAdd::write, &hB);
Here, ReadWriteAdd is the default class of access types, which defines three different access types: The data-flow in this example can be illustrated by the following graph:
From this graph it can be seen that the task computing h[3] can start as soon as h[1] is available, while the task computing h[4] cannot start until both h[1] and h[2] to be available. These kinds of dependencies are irregular in the sense that it is not possible to achieve this by spawning tasks recursively.

More Examples

The remainder of the documentation is in the form of examples available in the examples/ directory. These examples are described briefly below.

examples/accesstypes

An example of user-defined access types.

The usual read, write, add access types are expanded with a new access type: mul. The new mul access types behaves like add in the sense that accesses may occur in the same order but not concurrently, but separates add from mul so that the accesses of different types are executed in the order they were submitted.

examples/customhandle

Shows how the Handle<Options> class can be extended.

This example extends the handle class with a data member and a new method, that all handles will carry. In the example, it is used to store an index into a global array, and by that associate the handle with the data it represents, since all handles are understood to protect parts of the same array in this application.

examples/dag

Shows how SuperGlue can generate a directed acyclic graph for debugging.

This example generates the DAG of a Cholesky factorization, by implementing a tiled Cholesky factorization but with dummy tasks that perform no actual work. By enabling features in the Options struct, SuperGlue will perform some book-keeping, and can be asked to generate a Graphviz .dot-file that illustrates the dependencies. This book-keeping will come with some cost, and is not ment to be enabled during normal execution.

examples/dependencies

Shows how to create tasks with dependencies in SuperGlue.

This is the same example as above.

examples/handlewithdata

Shows a strategy where the handles are included into user defined datatypes.

This example also associates handles with the data they represent, using an alternative strategy to the one used in examples/customhandle. This strategy suits better when different handles contain different data types.

examples/helloworld

A minimal example of using SuperGlue.

The same example as the first one in this tutorial.

examples/logging

Shows the logging support.

This example enables logging, and creates a log-file of the execution. The generate log file contains one line per executed tasks, on the following format:

  NODE THREAD: START_TIME LENGTH NAME
An actual line looks something like this:
  0 4: 1053212 1000434 B

which would mean that node 0 executed a task on thread 4 that started at time 1053212 and executed for 1000434 cycles, and was named "B". The start time is the time in cycles since the first executed task.

These log files can be visualized using the Python script scripts/drawsched.py, or by using the application tools/viewer.

examples/nbody

An implementation of a n-body simulation using SuperGlue.

Calculates the direct forces between a set of particles, and move the particles accordingly, for a number of time steps. Logging is enabled, and a logfile is created showing the execution. This example uses a feature called Contributions, which allows several tasks performing add accesses to the same handle to execute at a time, by duplicating the buffer. The interface of this feature is not stabilized yet, and subject to change.

examples/pinnedtasks

Shows how to get fine-grained control over task placement and execution.

In this example, task stealing is disabled and tasks are explicitly placed on specified worker threads. This can be used to gain control over which task is executed where, for experiments. Also, tasks are not allowed to execute as soon as they are submitted, but instead must wait until start_executing() is called.

examples/subtasks

Shows how to submit tasks from tasks.

Submitting tasks is not thread safe by default, but can be made thread safe by specifying so in the Options struct.

The created tasks have no connection with their parent, but are independent. Creating tasks from several threads can cause dead-locks, but it can also be useful for distributing the work of submitting tasks for performance reasons. Another use is to delay the task creation until some tasks already have been executed to avoid creating too many tasks at once, while also avoiding having to wait for all tasks to finish before continuing the creation of tasks.

examples/vardeps

Shows an example where the number of dependencies of a task depends on its arguments.

In this example, the number of handles that a task accesses depends on its parameters.

examples/workspace

Shows how to allocate thread-local work buffers.

SuperGlue allows each workerthread to preallocate a certain (configurable) amount of memory that can be requested by the tasks to be used as work buffers. The memory is automatically reclaimed by the worker thread when each task has finished. The purpose of this is to avoid having to allocate memory from within tasks, which can be a performance issue.

Examples with External Dependencies

The examples_dep directory contains examples that depend on external packages.

examples_dep/cholesky

Performs a tiled Cholesky decomposition using Intel MKL

A tiled Cholesky factorization, based on an example from SMPSs developed by Barcelona Supercomputing Center. The Makefile needs to be modified to contain the correct paths to the Intel MKL installation, which is required for this example.