Running Pathfinder on Turing.jl models

This tutorial demonstrates how Turing can be used with Pathfinder.

We'll demonstrate with a regression example.

using AbstractMCMC, AdvancedHMC, DynamicPPL, FlexiChains, Pathfinder, Random, Turing
Random.seed!(39)

@model function regress(x)
    α ~ Normal()
    β ~ Normal()
    σ ~ truncated(Normal(); lower=0)
    μ = α .+ β .* x
    y ~ product_distribution(Normal.(μ, σ))
end
x = 0:0.1:10
true_params = (; α=1.5, β=2, σ=2)
# simulate data
(; y) = rand(regress(x) | true_params)

model = regress(x) | (; y)
n_chains = 8

For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce MCMCChains.Chains or FlexiChains.VNChain objects as outputs. To access this, we run Pathfinder normally; the chains representation of the draws (defaulting to Chains) is stored in draws_transformed.

result_single = pathfinder(model; ndraws=1_000)

Single-path Pathfinder result
  tries: 1
  draws: 1000
  fit iteration: 14 (total: 15)
  fit ELBO: -213.66 ± 0.09
  fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.6508971085750266, 1.9311753921107155, 0.5801261338729247]
Σ: [0.11087795709346265 -0.01652728003870209 -0.001420801533652816; -0.016527280038702102 0.003407135732868846 0.000194334574183673; -0.0014208015336527936 0.0001943345741836763 0.004718298769786383]
)

result_single.draws_transformed

Chains MCMC chain (1000×6×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ
internals         = logprior, loglikelihood, lp

Use `describe(chains)` for summary statistics and quantiles.

To request a different chain type (e.g. VNChain), we can specify the chain_type directly.

pathfinder(model; ndraws=1_000, chain_type=VNChain).draws_transformed

FlexiChain (1000 iterations, 1 chain)
↓ iter=1:1000 | → chain=1:1

Parameter type   VarName
Parameters       α, β, σ
Extra keys       :logprior, :loglikelihood, :lp

Note that while Turing's sample methods default to initializing parameters from the prior with InitFromPrior, Pathfinder defaults to uniformly sampling them in the range [-2, 2] in unconstrained space (equivalent to Turing's InitFromUniform(-2, 2)). To use Turing's default in Pathfinder, specify init_sampler=InitFromPrior().

result_multi = multipathfinder(model, 1_000; nruns=n_chains, init_sampler=InitFromPrior())

Multi-path Pathfinder result
  runs: 8
  draws: 1000
  Pareto shape diagnostic: 0.27 (good)

The Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.

chns_pf = result_multi.draws_transformed
describe(chns_pf)

Chains MCMC chain (1000×6×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ
internals         = logprior, loglikelihood, lp

Summary Statistics

  parameters      mean       std      mcse    ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64    Float64   Float64       Missing 

           α    1.6353    0.3333    0.0109    947.6950   837.3904    0.9998       missing
           β    1.9345    0.0591    0.0019    951.1703   903.5186    1.0005       missing
           σ    1.8188    0.1276    0.0040   1039.0218   884.6355    1.0027       missing


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9125    1.4150    1.6323    1.8430    2.3111
           β    1.8159    1.8943    1.9343    1.9717    2.0563
           σ    1.5992    1.7266    1.8180    1.9046    2.0830

We can also use these draws to initialize MCMC sampling with InitFromParams.

params = AbstractMCMC.to_samples(DynamicPPL.ParamsWithStats, chns_pf[1:n_chains, :, :])
initial_params = [InitFromParams(p.params) for p in vec(params)]

chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; initial_params, progress=false)
describe(chns)

┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/mcqES/src/sample.jl:432
┌ Info: Found initial step size
└   ϵ = 0.046875
┌ Info: Found initial step size
└   ϵ = 0.049218750000000006
┌ Info: Found initial step size
└   ϵ = 0.025
┌ Info: Found initial step size
└   ϵ = 0.05
┌ Info: Found initial step size
└   ϵ = 0.0484375
┌ Info: Found initial step size
└   ϵ = 0.025
┌ Info: Found initial step size
└   ϵ = 0.05
┌ Info: Found initial step size
└   ϵ = 0.025
Chains MCMC chain (1000×17×8 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 7.78 seconds
Compute duration  = 6.46 seconds
parameters        = α, β, σ
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, lp, logprior, loglikelihood

Summary Statistics

  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64 

           α    1.6417    0.3429    0.0083   1735.4593   1761.5648    1.0066      268.6885
           β    1.9328    0.0601    0.0013   2099.6991   2453.3953    1.0050      325.0811
           σ    1.8133    0.1278    0.0017   5393.5441   4908.2402    1.0006      835.0432


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9554    1.4163    1.6490    1.8707    2.2804
           β    1.8166    1.8926    1.9326    1.9724    2.0535
           σ    1.5850    1.7235    1.8057    1.8917    2.0892

We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.

inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0)  # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)

n_adapts = 50
n_draws = 1_000
chns = sample(
    model,
    externalsampler(nuts),
    MCMCThreads(),
    n_draws + n_adapts,
    n_chains;
    n_adapts,
    initial_params,
    progress=false,
)[n_adapts + 1:end, :, :]  # drop warm-up draws
describe(chns)

┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/mcqES/src/sample.jl:432
[ Info: Found initial step size 1.6
[ Info: Found initial step size 3.2
[ Info: Found initial step size 0.9
[ Info: Found initial step size 1.6
[ Info: Found initial step size 1.6
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
Chains MCMC chain (1000×18×8 Array{Float64, 3}):

Iterations        = 51:1:1050
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 3.26 seconds
Compute duration  = 2.69 seconds
parameters        = α, β, σ
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt, lp, logprior, loglikelihood

Summary Statistics

  parameters      mean       std      mcse     ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64      Float64     Float64   Float64       Float64 

           α    1.6394    0.3339    0.0034    9882.8805   6407.8910    1.0003     3678.0352
           β    1.9330    0.0591    0.0006   10477.8223   6162.0016    1.0016     3899.4501
           σ    1.8139    0.1269    0.0013   10356.6453   6152.3306    1.0003     3854.3525


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9968    1.4128    1.6373    1.8665    2.2877
           β    1.8183    1.8930    1.9329    1.9728    2.0484
           σ    1.5899    1.7233    1.8068    1.8951    2.0880

See Initializing HMC with Pathfinder for further examples.