Techniques: Ensembles and Parallel Processing

Model-fitting and model-selection usually requires running multiple model runs, for multiple hypotheses, and sometimes for multiple datasets, resulting in hundreds of individual model runs.

Bristlecone includes capabilities to orchestrate the parallel running of such complex analyses within the Bristlecone.Workflow namespace. Using Bristlecone's OrchestrationAgent, you can queue jobs to run when processor cores are available.

Work Packages

To run your analyses in parallel, you must wrap up each analysis as a work package. A work package is simply an async computation that returns an estimation result, with the function signature Async<ModelSystem.EstimationResult>. A simple example function to setup work packages for a number of datasets, hypotheses, and individuals is given below:

open Bristlecone

let replicates = 3 // number of replications per analysis
let endCondition = Optimisation.EndConditions.afterIteration 100000

let workPackages datasets hypotheses engine =
    seq {
        for d in datasets do
            for h in [ 1 .. hypotheses |> List.length ] do
                for _ in [ 1..replicates ] do
                    yield async { return Bristlecone.fit engine endCondition d hypotheses.[h - 1] }

You can perform any additional steps within each work package. A common approach is to save each model result to disk (i.e., using functions from the Bristlecone.Data namespace) within the async block in the above code, so that results are available as soon as they are complete.

When the EstimationEngine, datasets, and hypotheses are applied to this function, the resultant seq<Async<EstimationResult>> can be passed to an orchestration agent.

Orchestration Agent

An orchestration agent manages the running of work packages depending on your local computer's resources. You can access these features through the following namespace:

open Bristlecone.Workflow

There are three arguments required to create an OrchestrationAgent:

First, let's use one of Bristlecone's built-in loggers to print the progress of each work package:

let logger = Logging.Console.logger 1000

This logger will print the current point in parameter space each thousand iteration, for each chain (along with process IDs). Next, let's create and setup the orchestration agent:

let orchestrator =
    Orchestration.OrchestrationAgent(logger, System.Environment.ProcessorCount, false)

fun datasets hypotheses engine ->

    // Orchestrate the analyses
    let work = workPackages datasets hypotheses engine

    let run () =
        |> Seq.iter (Orchestration.OrchestrationMessage.StartWorkPackage >> orchestrator.Post)

    run ()

If the above code is supplied with datasets, hypotheses, and an EstimationEngine, it will schedule, queue and run the jobs until complete.

