Header menu logo bristlecone

ScriptNotebook

Techniques: Ensembles and Parallel Processing

Model-fitting and model-selection usually requires running multiple model runs, for multiple hypotheses, and sometimes for multiple datasets, resulting in hundreds of individual model runs.

Bristlecone includes capabilities to orchestrate the parallel running of such complex analyses within the Bristlecone.Workflow namespace. Using Bristlecone's OrchestrationAgent, you can queue jobs to run when processor cores are available.

Work Packages

To run your analyses in parallel, you must wrap up each analysis as a work package. A work package is simply an async computation that returns an estimation result, with the function signature Async<ModelSystem.EstimationResult>. A simple example function to setup work packages for a number of datasets, hypotheses, and individuals is given below:

open Bristlecone

let replicates = 3 // number of replications per analysis
let endCondition = Optimisation.EndConditions.afterIteration 100000

let workPackages datasets hypotheses engine =
    seq {
        for d in datasets do
            for h in [ 1 .. hypotheses |> List.length ] do
                for _ in [ 1..replicates ] do
                    yield async { return Bristlecone.fit engine endCondition d hypotheses.[h - 1] }
    }

You can perform any additional steps within each work package. A common approach is to save each model result to disk (i.e., using functions from the Bristlecone.Data namespace) within the async block in the above code, so that results are available as soon as they are complete.

When the EstimationEngine, datasets, and hypotheses are applied to this function, the resultant seq<Async<EstimationResult>> can be passed to an orchestration agent.

Orchestration Agent

An orchestration agent manages the running of work packages depending on your local computer's resources. You can access these features through the following namespace:

open Bristlecone.Workflow

There are three arguments required to create an OrchestrationAgent:

First, let's use one of Bristlecone's built-in loggers to print the progress of each work package:

let logger = Logging.Console.logger 1000

This logger will print the current point in parameter space each thousand iteration, for each chain (along with process IDs). Next, let's create and setup the orchestration agent:

let orchestrator =
    Orchestration.OrchestrationAgent(logger, System.Environment.ProcessorCount, false)

fun datasets hypotheses engine ->

    // Orchestrate the analyses
    let work = workPackages datasets hypotheses engine

    let run () =
        work
        |> Seq.iter (Orchestration.OrchestrationMessage.StartWorkPackage >> orchestrator.Post)

    run ()

If the above code is supplied with datasets, hypotheses, and an EstimationEngine, it will schedule, queue and run the jobs until complete.

Multiple items
module Bristlecone from Bristlecone
<namespacedoc><summary>The core library of Bristlecone, containing model-fitting functions.</summary></namespacedoc>
Main functionality of Bristlecone, including functions to scaffold `ModelSystem`s and for model-fitting (tests and real fits).


--------------------
namespace Bristlecone
val replicates: int
val endCondition: EstimationEngine.EndCondition<float>
namespace Bristlecone.Optimisation
module EndConditions from Bristlecone.Optimisation
val afterIteration: iteration: int -> EstimationEngine.Solution<float> list -> currentIteration: int -> bool
<summary> End the optimisation procedure when a minimum number of iterations is exceeded. </summary>
val workPackages: datasets: CodedMap<Time.TimeSeries<float>> seq -> hypotheses: ModelSystem.ModelSystem list -> engine: EstimationEngine.EstimationEngine<float,float> -> Async<ModelSystem.EstimationResult> seq
val datasets: CodedMap<Time.TimeSeries<float>> seq
val hypotheses: ModelSystem.ModelSystem list
val engine: EstimationEngine.EstimationEngine<float,float>
Multiple items
val seq: sequence: 'T seq -> 'T seq

--------------------
type 'T seq = System.Collections.Generic.IEnumerable<'T>
val d: CodedMap<Time.TimeSeries<float>>
val h: int
Multiple items
module List from Bristlecone

--------------------
module List from Microsoft.FSharp.Collections

--------------------
type List<'T> = | op_Nil | op_ColonColon of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex: rank: int * offset: int -> int member GetSlice: startIndex: int option * endIndex: int option -> 'T list static member Cons: head: 'T * tail: 'T list -> 'T list member Head: 'T member IsEmpty: bool member Item: index: int -> 'T with get ...
val length: list: 'T list -> int
val async: AsyncBuilder
val fit: engine: EstimationEngine.EstimationEngine<float,float> -> endCondition: EstimationEngine.EndCondition<float> -> timeSeriesData: CodedMap<Time.TimeSeries<float>> -> model: ModelSystem.ModelSystem -> ModelSystem.EstimationResult
<summary>Fit a time-series model to data.</summary>
<param name="engine">An estimation engine configured and tested for the given model.</param>
<param name="endCondition">The condition at which optimisation should cease.</param>
<param name="timeSeriesData">Time-series dataset that contains a series for each equation in the model system.</param>
<param name="model">A model system of equations, likelihood function, estimatible parameters, and optional measures.</param>
<returns>The result of the model-fitting procedure. If an error occurs, throws an exception.</returns>
namespace Bristlecone.Workflow
val logger: (Logging.LogEvent -> unit)
namespace Bristlecone.Logging
module Console from Bristlecone.Logging
<summary> Simple logger to console that prints line-by-line progress and events. </summary>
val logger: nIteration: int -> (Logging.LogEvent -> unit)
<summary> A simple console logger. `nIteration` specifies the number of iterations after which to log the current likelihood and parameter values. </summary>
val orchestrator: Orchestration.OrchestrationAgent
module Orchestration from Bristlecone.Workflow
<summary> Queue functions to manage many work packages in parallel. [ Inspired by Tom Petricek: http://fssnip.net/nX ] </summary>
Multiple items
type OrchestrationAgent = new: writeOut: (LogEvent -> unit) * maxSimultaneous: int * retainResults: bool -> OrchestrationAgent member Post: msg: OrchestrationMessage -> unit member TryGetResult: unit -> EstimationResult option
<summary> The `OrchestrationAgent` queues work items of the type `Async&lt;EstimationResult&gt;`, which are run in parallel up to a total of `maxSimultaneous` at one time. </summary>

--------------------
new: writeOut: (Logging.LogEvent -> unit) * maxSimultaneous: int * retainResults: bool -> Orchestration.OrchestrationAgent
namespace System
type Environment = static member Exit: exitCode: int -> unit static member ExpandEnvironmentVariables: name: string -> string static member FailFast: message: string -> unit + 1 overload static member GetCommandLineArgs: unit -> string array static member GetEnvironmentVariable: variable: string -> string + 1 overload static member GetEnvironmentVariables: unit -> IDictionary + 1 overload static member GetFolderPath: folder: SpecialFolder -> string + 1 overload static member GetLogicalDrives: unit -> string array static member SetEnvironmentVariable: variable: string * value: string -> unit + 1 overload static member CommandLine: string ...
<summary>Provides information about, and means to manipulate, the current environment and platform. This class cannot be inherited.</summary>
property System.Environment.ProcessorCount: int with get
<summary>Gets the number of processors available to the current process.</summary>
<returns>The 32-bit signed integer that specifies the number of processors that are available.</returns>
val work: Async<ModelSystem.EstimationResult> seq
val run: unit -> unit
Multiple items
module Seq from Bristlecone

--------------------
module Seq from Microsoft.FSharp.Collections
val iter: action: ('T -> unit) -> source: 'T seq -> unit
type OrchestrationMessage = | StartWorkPackage of Async<EstimationResult> | StartDependentWorkPackages of Async<EstimationResult> | Finished of EstimationResult | WorkFailed of exn | WorkCancelled
union case Orchestration.OrchestrationMessage.StartWorkPackage: Async<ModelSystem.EstimationResult> -> Orchestration.OrchestrationMessage
member Orchestration.OrchestrationAgent.Post: msg: Orchestration.OrchestrationMessage -> unit

Type something to start searching.