bristlecone


Techniques: Ensembles and Parallel Processing

Model-fitting and model-selection usually requires running multiple model runs, for multiple hypotheses, and sometimes for multiple datasets, resulting in hundreds of individual model runs.

Bristlecone includes capabilities to orchestrate the parallel running of such complex analyses within the Bristlecone.Workflow namespace. Using Bristlecone's OrchestrationAgent, you can queue jobs to run when processor cores are available.

Work Packages

To run your analyses in parallel, you must wrap up each analysis as a work package. A work package is simply an async computation that returns an estimation result, with the function signature Async<ModelSystem.EstimationResult>. A simple example function to setup work packages for a number of datasets, hypotheses, and individuals is given below:

open Bristlecone

let replicates = 3 // number of replications per analysis
let endCondition = Optimisation.EndConditions.afterIteration 100000

let workPackages datasets hypotheses engine =
    seq {
        for d in datasets do
            for h in [ 1 .. hypotheses |> List.length ] do
                for _ in [ 1 .. replicates ] do
                    yield async {
                        return Bristlecone.fit 
                                engine 
                                endCondition 
                                d
                                hypotheses.[h-1]
                    }
    }

You can perform any additional steps within each work package. A common approach is to save each model result to disk (i.e., using functions from the Bristlecone.Data namespace) within the async block in the above code, so that results are available as soon as they are complete.

When the EstimationEngine, datasets, and hypotheses are applied to this function, the resultant seq<Async<EstimationResult>> can be passed to an orchestration agent.

Orchestration Agent

An orchestration agent manages the running of work packages depending on your local computer's resources. You can access these features through the following namespace:

open Bristlecone.Workflow

There are three arguments required to create an OrchestrationAgent:

First, let's use one of Bristlecone's built-in loggers to print the progress of each work package:

let logger = Logging.Console.logger 1000

This logger will print the current point in parameter space each thousand iteration, for each chain (along with process IDs). Next, let's create and setup the orchestration agent:

let orchestrator = Orchestration.OrchestrationAgent(logger, System.Environment.ProcessorCount, false)

fun datasets hypotheses engine ->

    // Orchestrate the analyses
    let work = workPackages datasets hypotheses engine
    let run() = 
        work 
        |> Seq.iter (
            Orchestration.OrchestrationMessage.StartWorkPackage 
            >> orchestrator.Post)

    run()

If the above code is supplied with datasets, hypotheses, and an EstimationEngine, it will schedule, queue and run the jobs until complete.

namespace Bristlecone
val replicates : int
val endCondition : EstimationEngine.EndCondition<float>
namespace Bristlecone.Optimisation
module EndConditions from Bristlecone.Optimisation
val afterIteration : iteration:int -> results:EstimationEngine.Solution<float> list -> bool
<summary> End the optimisation procedure when a minimum number of iterations is exceeded. </summary>
val workPackages : datasets:seq<Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>> -> hypotheses:ModelSystem.ModelSystem list -> engine:EstimationEngine.EstimationEngine<float,float> -> seq<Async<ModelSystem.EstimationResult>>
val datasets : seq<Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>>
val hypotheses : ModelSystem.ModelSystem list
val engine : EstimationEngine.EstimationEngine<float,float>
Multiple items
val seq : sequence:seq<'T> -> seq<'T>
<summary>Builds a sequence using sequence expression syntax</summary>
<param name="sequence">The input sequence.</param>
<returns>The result sequence.</returns>


--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
<summary>An abbreviation for the CLI type <see cref="T:System.Collections.Generic.IEnumerable`1" /></summary>
<remarks> See the <see cref="T:Microsoft.FSharp.Collections.SeqModule" /> module for further operations related to sequences. See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/sequences">F# Language Guide - Sequences</a>. </remarks>
val d : Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>
val h : int
Multiple items
module List from Bristlecone

--------------------
module List from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary>
<namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide. </summary></namespacedoc>


--------------------
type List<'T> = | ( [] ) | ( :: ) of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex : rank:int * offset:int -> int member GetSlice : startIndex:int option * endIndex:int option -> 'T list static member Cons : head:'T * tail:'T list -> 'T list member Head : 'T member IsEmpty : bool member Item : index:int -> 'T with get ...
<summary>The type of immutable singly-linked lists.</summary>
<remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate values of this type, or pattern match against the values directly. </remarks>
<exclude />
val length : list:'T list -> int
<summary>Returns the length of the list.</summary>
<param name="list">The input list.</param>
<returns>The length of the list.</returns>
val async : AsyncBuilder
<summary>Builds an asynchronous workflow using computation expression syntax.</summary>
Multiple items
module Bristlecone from Bristlecone

--------------------
namespace Bristlecone
val fit : engine:EstimationEngine.EstimationEngine<float,float> -> endCondition:EstimationEngine.EndCondition<float> -> timeSeriesData:Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>> -> model:ModelSystem.ModelSystem -> ModelSystem.EstimationResult
<summary> Fit a time-series model to data. Please note: it is strongly recommended that you test that the given `EstimationEngine` can correctly identify known parameters for your model. Refer to the `Bristlecone.testModel` function, which can be used to generate known data and complete this process. </summary>
<param name="engine">The engine encapsulates all settings that form part of the estimation method. Importantly, this includes the random number generator used for all stages of the analysis; if this is set using a fixed seed, the result will be reproducable.</param>
<param name="endCondition">You must specify a stopping condition, after which the optimisation process will cease. Bristlecone includes built-in end conditions in the `Bristlecone.Optimisation.EndConditions` module.</param>
<param name="timeSeriesData"></param>
<param name="model"></param>
<returns></returns>
namespace Bristlecone.Workflow
val logger : (Logging.LogEvent -> unit)
namespace Bristlecone.Logging
module Console from Bristlecone.Logging
<summary> Simple logger to console that prints line-by-line progress and events. </summary>
val logger : nIteration:int -> (Logging.LogEvent -> unit)
<summary> A simple console logger. `nIteration` specifies the number of iterations after which to log the current likelihood and parameter values. </summary>
val orchestrator : Orchestration.OrchestrationAgent
module Orchestration from Bristlecone.Workflow
<summary> Queue functions to manage many work packages in parallel. [ Inspired by Tom Petricek: http://fssnip.net/nX ] </summary>
Multiple items
type OrchestrationAgent = new : writeOut:(LogEvent -> unit) * maxSimultaneous:int * retainResults:bool -> OrchestrationAgent member Post : msg:OrchestrationMessage -> unit member TryGetResult : unit -> EstimationResult option
<summary> The `OrchestrationAgent` queues work items of the type `Async&lt;EstimationResult&gt;`, which are run in parallel up to a total of `maxSimultaneous` at one time. </summary>

--------------------
new : writeOut:(Logging.LogEvent -> unit) * maxSimultaneous:int * retainResults:bool -> Orchestration.OrchestrationAgent
namespace System
type Environment = static member Exit : exitCode: int -> unit static member ExpandEnvironmentVariables : name: string -> string static member FailFast : message: string -> unit + 1 overload static member GetCommandLineArgs : unit -> string [] static member GetEnvironmentVariable : variable: string -> string + 1 overload static member GetEnvironmentVariables : unit -> IDictionary + 1 overload static member GetFolderPath : folder: SpecialFolder -> string + 1 overload static member GetLogicalDrives : unit -> string [] static member SetEnvironmentVariable : variable: string * value: string -> unit + 1 overload static member CommandLine : string ...
<summary>Provides information about, and means to manipulate, the current environment and platform. This class cannot be inherited.</summary>
property System.Environment.ProcessorCount: int with get
<summary>Gets the number of processors on the current machine.</summary>
<returns>The 32-bit signed integer that specifies the number of processors on the current machine. There is no default. If the current machine contains multiple processor groups, this property returns the number of logical processors that are available for use by the common language runtime (CLR).</returns>
val work : seq<Async<ModelSystem.EstimationResult>>
val run : (unit -> unit)
Multiple items
module Seq from Bristlecone

--------------------
module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val iter : action:('T -> unit) -> source:seq<'T> -> unit
<summary>Applies the given function to each element of the collection.</summary>
<param name="action">A function to apply to each element of the sequence.</param>
<param name="source">The input sequence.</param>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
type OrchestrationMessage = | StartWorkPackage of Async<EstimationResult> | StartDependentWorkPackages of Async<EstimationResult> | Finished of EstimationResult | WorkFailed of exn | WorkCancelled
union case Orchestration.OrchestrationMessage.StartWorkPackage: Async<ModelSystem.EstimationResult> -> Orchestration.OrchestrationMessage
member Orchestration.OrchestrationAgent.Post : msg:Orchestration.OrchestrationMessage -> unit