Running Pathfinder on Turing.jl models

This tutorial demonstrates how Turing can be used with Pathfinder.

We'll demonstrate with a regression example.

using AdvancedHMC, Pathfinder, Random, Turing
Random.seed!(39)

@model function regress(x, y)
    α ~ Normal()
    β ~ Normal()
    σ ~ truncated(Normal(); lower=0)
    y ~ product_distribution(Normal.(α .+ β .* x, σ))
end
x = 0:0.1:10
y = @. 2x + 1.5 + randn() * 0.2

model = regress(collect(x), y)
n_chains = 8

For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce MCMCChains.Chains objects as outputs.

result_single = pathfinder(model; ndraws=1_000)

Single-path Pathfinder result
  tries: 1
  draws: 1000
  fit iteration: 22 (total: 27)
  fit ELBO: -1.95 ± 0.03
  fit distribution: Distributions.MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.4759589631397645, 2.006026941373265, -1.5478356855806579]
Σ: [0.0018286217896122648 -0.0002601667183752173 -0.0005873937036806257; -0.0002601667183752181 5.100356929144418e-5 5.198418656871739e-5; -0.0005873937036806262 5.198418656871667e-5 0.0064369057163803384]
)

result_multi = multipathfinder(model, 1_000; nruns=n_chains)

Multi-path Pathfinder result
  runs: 8
  draws: 1000
  Pareto shape diagnostic: 0.79 (bad)

Here, the Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.

When passed a DynamicPPL.Model, Pathfinder also gives access to the posterior draws in a familiar Chains object.

result_multi.draws_transformed

Chains MCMC chain (1000×3×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           α    1.4767    0.0439    0.0015    864.4081    950.0646    0.9993   ⋯
           β    2.0058    0.0074    0.0002   1124.0753   1068.3139    0.9992   ⋯
           σ    0.2172    0.0157    0.0005    956.9526    925.4764    0.9993   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    1.3951    1.4465    1.4779    1.5045    1.5666
           β    1.9917    2.0008    2.0059    2.0108    2.0208
           σ    0.1887    0.2063    0.2168    0.2272    0.2510

We can also use these posterior draws to initialize MCMC sampling.

init_params = collect.(eachrow(result_multi.draws_transformed.value[1:n_chains, :, 1]))

8-element Vector{Vector{Float64}}:
 [1.4691467301037398, 2.0111813084487418, 0.22204577368872686]
 [1.4225743419997652, 2.0150513003204957, 0.19291870250704835]
 [1.375596352156451, 2.0205385232497464, 0.23071888349800143]
 [1.436418141335313, 2.0118829758483967, 0.20678691584596665]
 [1.4770215057770222, 2.0025924142409632, 0.2236454231227252]
 [1.4483836340938132, 2.015921280636908, 0.21591143831903215]
 [1.5451189039344957, 1.9945619185153087, 0.25102646725887745]
 [1.4018133420228451, 2.02161765671409, 0.2125642337875723]

chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; init_params, progress=false)

Chains MCMC chain (1000×15×8 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 5.77 seconds
Compute duration  = 4.12 seconds
parameters        = α, β, σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           α    1.4750    0.0429    0.0007   3592.0524   3898.4925    1.0030   ⋯
           β    2.0061    0.0074    0.0001   3763.7934   4436.1419    1.0026   ⋯
           σ    0.2167    0.0157    0.0002   5087.3685   4657.4357    1.0008   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    1.3902    1.4469    1.4753    1.5036    1.5601
           β    1.9917    2.0011    2.0062    2.0109    2.0207
           σ    0.1884    0.2058    0.2158    0.2267    0.2491

We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.

inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0)  # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)

n_adapts = 50
n_draws = 1_000
chns = sample(
    model,
    externalsampler(nuts),
    MCMCThreads(),
    n_draws + n_adapts,
    n_chains;
    n_adapts,
    init_params,
    progress=false,
)[n_adapts + 1:end, :, :]  # drop warm-up draws

Chains MCMC chain (1000×16×8 Array{Float64, 3}):

Iterations        = 51:1:1050
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 4.23 seconds
Compute duration  = 3.87 seconds
parameters        = α, β, σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           α    1.4751    0.0449    0.0006   6678.2638   5558.2071    1.0011   ⋯
           β    2.0059    0.0101    0.0002   6577.4152   5374.9491    1.0010   ⋯
           σ    0.2182    0.0351    0.0009   6448.5094   4827.2103    1.0002   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    1.3920    1.4460    1.4746    1.5040    1.5590
           β    1.9918    2.0011    2.0061    2.0110    2.0204
           σ    0.1883    0.2060    0.2160    0.2267    0.2507

See Initializing HMC with Pathfinder for further examples.