Model-fitting and model-selection usually requires running
multiple model runs, for multiple hypotheses, and sometimes
for multiple datasets, resulting in hundreds of individual
model runs.
Bristlecone includes capabilities to orchestrate the parallel
running of such complex analyses within the
Bristlecone.Workflow
namespace. Using Bristlecone's
OrchestrationAgent
, you can queue jobs to run when processor
cores are available.
To run your analyses in parallel, you must wrap up each analysis
as a work package. A work package is simply an async computation
that returns an estimation result, with the function signature
Async<ModelSystem.EstimationResult>
. A simple example function to
setup work packages for a number of datasets, hypotheses, and individuals
is given below:
open Bristlecone
let replicates = 3 // number of replications per analysis
let endCondition = Optimisation.EndConditions.afterIteration 100000
let workPackages datasets hypotheses engine =
seq {
for d in datasets do
for h in [ 1 .. hypotheses |> List.length ] do
for _ in [ 1 .. replicates ] do
yield async {
return Bristlecone.fit
engine
endCondition
d
hypotheses.[h-1]
}
}
You can perform any additional steps within each work package. A common
approach is to save each model result to disk (i.e., using functions from
the Bristlecone.Data
namespace) within the async
block in the above code, so that results are available as soon as
they are complete.
When the EstimationEngine
, datasets, and hypotheses are
applied to this function, the resultant seq<Async<EstimationResult>>
can be passed to an orchestration agent.
An orchestration agent manages the running of work packages depending on
your local computer's resources. You can access these features through the following
namespace:
open Bristlecone.Workflow
There are three arguments required to create an OrchestrationAgent
:
-
A logger (LogEvent -> unit
), which consumes log messages from all threads. You
must ensure that the logger is safe to use from multiple processes. Bristlecone
includes some loggers that are thread-safe. In addition, the Bristlecone.Charts.R
NuGet package contains interoperability with R to produce real-time traces of the movement
of each analysis through parameter space; this is very useful for example
with MCMC optimisation techniques to compare chains.
-
The number of processes to run in parallel. The recommended approach is to set
this to System.Environment.ProcessorCount
, which represents the number of cores
available on your system.
-
Whether to cache results in the resultant object. This increases memory usage, but allows
post-hoc analysis of the results. A common approach is to set this to false
but save each
result to disk within the work package itself (see above). The results
can then be re-loaded at a later date for diagnostics and further analysis.
First, let's use one of Bristlecone's built-in loggers to print the progress of each
work package:
let logger = Logging.Console.logger 1000
This logger will print the current point in parameter space each thousand
iteration, for each chain (along with process IDs). Next, let's create and setup
the orchestration agent:
let orchestrator = Orchestration.OrchestrationAgent(logger, System.Environment.ProcessorCount, false)
fun datasets hypotheses engine ->
// Orchestrate the analyses
let work = workPackages datasets hypotheses engine
let run() =
work
|> Seq.iter (
Orchestration.OrchestrationMessage.StartWorkPackage
>> orchestrator.Post)
run()
If the above code is supplied with datasets, hypotheses, and an
EstimationEngine
, it will schedule, queue and run the jobs until complete.
namespace Bristlecone
val replicates : int
val endCondition : EstimationEngine.EndCondition<float>
namespace Bristlecone.Optimisation
module EndConditions
from Bristlecone.Optimisation
val afterIteration : iteration:int -> results:EstimationEngine.Solution<float> list -> bool
<summary>
End the optimisation procedure when a minimum number of iterations is exceeded.
</summary>
val workPackages : datasets:seq<Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>> -> hypotheses:ModelSystem.ModelSystem list -> engine:EstimationEngine.EstimationEngine<float,float> -> seq<Async<ModelSystem.EstimationResult>>
val datasets : seq<Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>>
val hypotheses : ModelSystem.ModelSystem list
val engine : EstimationEngine.EstimationEngine<float,float>
Multiple items
val seq : sequence:seq<'T> -> seq<'T>
<summary>Builds a sequence using sequence expression syntax</summary>
<param name="sequence">The input sequence.</param>
<returns>The result sequence.</returns>
--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
<summary>An abbreviation for the CLI type <see cref="T:System.Collections.Generic.IEnumerable`1" /></summary>
<remarks>
See the <see cref="T:Microsoft.FSharp.Collections.SeqModule" /> module for further operations related to sequences.
See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/sequences">F# Language Guide - Sequences</a>.
</remarks>
val d : Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>>
val h : int
Multiple items
module List
from Bristlecone
--------------------
module List
from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary>
<namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also
<a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide.
</summary></namespacedoc>
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IReadOnlyList<'T>
interface IReadOnlyCollection<'T>
interface IEnumerable
interface IEnumerable<'T>
member GetReverseIndex : rank:int * offset:int -> int
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
static member Cons : head:'T * tail:'T list -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
...
<summary>The type of immutable singly-linked lists.</summary>
<remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or
the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate
values of this type, or pattern match against the values directly.
</remarks>
<exclude />
val length : list:'T list -> int
<summary>Returns the length of the list.</summary>
<param name="list">The input list.</param>
<returns>The length of the list.</returns>
val async : AsyncBuilder
<summary>Builds an asynchronous workflow using computation expression syntax.</summary>
Multiple items
module Bristlecone
from Bristlecone
--------------------
namespace Bristlecone
val fit : engine:EstimationEngine.EstimationEngine<float,float> -> endCondition:EstimationEngine.EndCondition<float> -> timeSeriesData:Map<ShortCode.ShortCode,Time.TimeSeries.TimeSeries<float>> -> model:ModelSystem.ModelSystem -> ModelSystem.EstimationResult
<summary>
Fit a time-series model to data.
Please note: it is strongly recommended that you test that the given `EstimationEngine`
can correctly identify known parameters for your model. Refer to the `Bristlecone.testModel`
function, which can be used to generate known data and complete this process.
</summary>
<param name="engine">The engine encapsulates all settings that form part of the estimation
method. Importantly, this includes the random number generator used for all stages
of the analysis; if this is set using a fixed seed, the result will be reproducable.</param>
<param name="endCondition">You must specify a stopping condition, after which
the optimisation process will cease. Bristlecone includes built-in end conditions
in the `Bristlecone.Optimisation.EndConditions` module.</param>
<param name="timeSeriesData"></param>
<param name="model"></param>
<returns></returns>
namespace Bristlecone.Workflow
val logger : (Logging.LogEvent -> unit)
namespace Bristlecone.Logging
module Console
from Bristlecone.Logging
<summary>
Simple logger to console that prints line-by-line progress and events.
</summary>
val logger : nIteration:int -> (Logging.LogEvent -> unit)
<summary>
A simple console logger.
`nIteration` specifies the number of iterations after which to log
the current likelihood and parameter values.
</summary>
val orchestrator : Orchestration.OrchestrationAgent
module Orchestration
from Bristlecone.Workflow
<summary>
Queue functions to manage many work packages in parallel.
[ Inspired by Tom Petricek: http://fssnip.net/nX ]
</summary>
Multiple items
type OrchestrationAgent =
new : writeOut:(LogEvent -> unit) * maxSimultaneous:int * retainResults:bool -> OrchestrationAgent
member Post : msg:OrchestrationMessage -> unit
member TryGetResult : unit -> EstimationResult option
<summary>
The `OrchestrationAgent` queues work items of the type `Async<EstimationResult>`, which
are run in parallel up to a total of `maxSimultaneous` at one time.
</summary>
--------------------
new : writeOut:(Logging.LogEvent -> unit) * maxSimultaneous:int * retainResults:bool -> Orchestration.OrchestrationAgent
namespace System
type Environment =
static member Exit : exitCode: int -> unit
static member ExpandEnvironmentVariables : name: string -> string
static member FailFast : message: string -> unit + 1 overload
static member GetCommandLineArgs : unit -> string []
static member GetEnvironmentVariable : variable: string -> string + 1 overload
static member GetEnvironmentVariables : unit -> IDictionary + 1 overload
static member GetFolderPath : folder: SpecialFolder -> string + 1 overload
static member GetLogicalDrives : unit -> string []
static member SetEnvironmentVariable : variable: string * value: string -> unit + 1 overload
static member CommandLine : string
...
<summary>Provides information about, and means to manipulate, the current environment and platform. This class cannot be inherited.</summary>
property System.Environment.ProcessorCount: int with get
<summary>Gets the number of processors on the current machine.</summary>
<returns>The 32-bit signed integer that specifies the number of processors on the current machine. There is no default. If the current machine contains multiple processor groups, this property returns the number of logical processors that are available for use by the common language runtime (CLR).</returns>
val work : seq<Async<ModelSystem.EstimationResult>>
val run : (unit -> unit)
Multiple items
module Seq
from Bristlecone
--------------------
module Seq
from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val iter : action:('T -> unit) -> source:seq<'T> -> unit
<summary>Applies the given function to each element of the collection.</summary>
<param name="action">A function to apply to each element of the sequence.</param>
<param name="source">The input sequence.</param>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
type OrchestrationMessage =
| StartWorkPackage of Async<EstimationResult>
| StartDependentWorkPackages of Async<EstimationResult>
| Finished of EstimationResult
| WorkFailed of exn
| WorkCancelled
union case Orchestration.OrchestrationMessage.StartWorkPackage: Async<ModelSystem.EstimationResult> -> Orchestration.OrchestrationMessage
member Orchestration.OrchestrationAgent.Post : msg:Orchestration.OrchestrationMessage -> unit