Running Pathfinder on Turing.jl models

This tutorial demonstrates how Turing can be used with Pathfinder.

We'll demonstrate with a regression example.

using AbstractMCMC, AdvancedHMC, DynamicPPL, FlexiChains, Pathfinder, Random, Turing
Random.seed!(39)

@model function regress(x)
    α ~ Normal()
    β ~ Normal()
    σ ~ truncated(Normal(); lower=0)
    μ = α .+ β .* x
    y ~ product_distribution(Normal.(μ, σ))
end
x = 0:0.1:10
true_params = (; α=1.5, β=2, σ=2)
# simulate data
y = rand(regress(x) | true_params)[@varname(y)]

model = regress(x) | (; y)
n_chains = 8

For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce MCMCChains.Chains or FlexiChains.VNChain objects as outputs. To access this, we run Pathfinder normally; the chains representation of the draws (defaulting to Chains) is stored in draws_transformed.

result_single = pathfinder(model; ndraws=1_000)
Single-path Pathfinder result
  tries: 1
  draws: 1000
  fit iteration: 26 (total: 26)
  fit ELBO: -213.38 ± 0.06
  fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.6508971085470592, 1.9311753921149317, 0.5801261338911265]
Σ: [0.12833559665737235 -0.024925831081334386 -0.003747748546056793; -0.024925831081334372 0.007926736754357806 0.000559861078137877; -0.0037477485460567896 0.0005598610781378777 0.00621065939801297]
)
result_single.draws_transformed
Chains MCMC chain (1000×6×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ
internals         = logprior, loglikelihood, logjoint

Use `describe(chains)` for summary statistics and quantiles.

To request a different chain type (e.g. VNChain), we can specify the chain_type directly.

pathfinder(model; ndraws=1_000, chain_type=VNChain).draws_transformed
╭─FlexiChain (1000 iterations, 1 chain) ───────────────────────────────────────────────────────────╮
 ↓ iter  = 1:1000                                                                                 
 → chain = 1:1                                                                                    
                                                                                                  
 Parameters (3) ── VarName                                                                        
  Float64  α, β, σ                                                                                
                                                                                                  
 Extras (3)                                                                                       
  Float64  logprior, loglikelihood, logjoint                                                      
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Note that while Turing's sample methods default to initializing parameters from the prior with InitFromPrior, Pathfinder defaults to uniformly sampling them in the range [-2, 2] in unconstrained space (equivalent to Turing's InitFromUniform(-2, 2)). To use Turing's default in Pathfinder, specify init_sampler=InitFromPrior().

result_multi = multipathfinder(model, 1_000; nruns=n_chains, init_sampler=InitFromPrior())
Multi-path Pathfinder result
  runs: 8
  draws: 1000
  Pareto shape diagnostic: 0.78 (bad)

The Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.

chns_pf = result_multi.draws_transformed
describe(chns_pf)
Chains MCMC chain (1000×6×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ
internals         = logprior, loglikelihood, logjoint

Summary Statistics

  parameters      mean       std      mcse    ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64    Float64   Float64       Missing 

           α    1.6334    0.3551    0.0110   1049.8243   965.0031    1.0009       missing
           β    1.9329    0.0624    0.0020   1000.5126   856.2056    1.0001       missing
           σ    1.8213    0.1211    0.0040    933.3720   765.2233    0.9998       missing


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9865    1.3863    1.6219    1.8519    2.4077
           β    1.8024    1.8944    1.9360    1.9780    2.0532
           σ    1.6055    1.7260    1.8177    1.9094    2.0835

We can also use these draws to initialize MCMC sampling with InitFromParams.

params = AbstractMCMC.to_samples(DynamicPPL.ParamsWithStats, chns_pf[1:n_chains, :, :], model)
initial_params = [InitFromParams(p.params) for p in vec(params)]
chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; initial_params, progress=false)
describe(chns)
┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/C1aKp/src/sample.jl:544
┌ Info: Found initial step size
  ϵ = 0.05
┌ Info: Found initial step size
  ϵ = 0.05
┌ Info: Found initial step size
  ϵ = 0.025
┌ Info: Found initial step size
  ϵ = 0.025
┌ Info: Found initial step size
  ϵ = 0.05
┌ Info: Found initial step size
  ϵ = 0.025
┌ Info: Found initial step size
  ϵ = 0.05
┌ Info: Found initial step size
  ϵ = 0.05
Chains MCMC chain (1000×17×8 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 4.2 seconds
Compute duration  = 2.69 seconds
parameters        = α, β, σ
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Summary Statistics

  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64 

           α    1.6470    0.3406    0.0058   3502.6815   3922.4493    1.0015     1300.6615
           β    1.9315    0.0597    0.0010   3508.4206   4116.5602    1.0014     1302.7926
           σ    1.8141    0.1273    0.0018   4825.9828   4624.0206    1.0020     1792.0471


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9806    1.4112    1.6499    1.8763    2.3061
           β    1.8156    1.8920    1.9307    1.9714    2.0487
           σ    1.5881    1.7246    1.8083    1.8931    2.0847

We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.

inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0)  # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)

n_adapts = 50
n_draws = 1_000
chns = sample(
    model,
    externalsampler(nuts),
    MCMCThreads(),
    n_draws + n_adapts,
    n_chains;
    n_adapts,
    initial_params,
    progress=false,
)[n_adapts + 1:end, :, :]  # drop warm-up draws
describe(chns)
┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/C1aKp/src/sample.jl:544
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 1.6062500000000002
[ Info: Found initial step size 1.6
[ Info: Found initial step size 1.6
Chains MCMC chain (1000×17×8 Array{Float64, 3}):

Iterations        = 51:1:1050
Number of chains  = 8
Samples per chain = 1000
Wall duration     = 1.52 seconds
Compute duration  = 1.36 seconds
parameters        = α, β, σ
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Summary Statistics

  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64 

           α    1.6293    0.3406    0.0063   2901.6511   3763.6943    1.0038     2130.4340
           β    1.9343    0.0595    0.0011   2813.1895   3333.2943    1.0039     2065.4842
           σ    1.8169    0.1258    0.0015   6849.7093   5765.4023    1.0000     5029.1552


Quantiles

  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.9447    1.3989    1.6350    1.8556    2.2921
           β    1.8153    1.8945    1.9350    1.9748    2.0502
           σ    1.5936    1.7268    1.8105    1.8971    2.0796

See Initializing HMC with Pathfinder for further examples.