Running Pathfinder on Turing.jl models
This tutorial demonstrates how Turing can be used with Pathfinder.
We'll demonstrate with a regression example.
using AdvancedHMC, Pathfinder, Random, Turing
Random.seed!(39)
@model function regress(x, y)
α ~ Normal()
β ~ Normal()
σ ~ truncated(Normal(); lower=0)
y ~ product_distribution(Normal.(α .+ β .* x, σ))
end
x = 0:0.1:10
y = @. 2x + 1.5 + randn() * 0.2
model = regress(collect(x), y)
n_chains = 8
8
For convenience, pathfinder
and multipathfinder
can take Turing models as inputs and produce MCMCChains.Chains
objects as outputs.
result_single = pathfinder(model; ndraws=1_000)
Single-path Pathfinder result
tries: 1
draws: 1000
fit iteration: 22 (total: 27)
fit ELBO: -1.95 ± 0.03
fit distribution: Distributions.MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.4759589631397645, 2.006026941373265, -1.5478356855806579]
Σ: [0.0018286217896122648 -0.0002601667183752173 -0.0005873937036806257; -0.0002601667183752181 5.100356929144418e-5 5.198418656871739e-5; -0.0005873937036806262 5.198418656871667e-5 0.0064369057163803384]
)
result_multi = multipathfinder(model, 1_000; nruns=n_chains)
Multi-path Pathfinder result
runs: 8
draws: 1000
Pareto shape diagnostic: 0.79 (bad)
Here, the Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.
When passed a DynamicPPL.Model
, Pathfinder also gives access to the posterior draws in a familiar Chains
object.
result_multi.draws_transformed
Chains MCMC chain (1000×3×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
parameters = α, β, σ
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
α 1.4767 0.0439 0.0015 864.4081 950.0646 0.9993 ⋯
β 2.0058 0.0074 0.0002 1124.0753 1068.3139 0.9992 ⋯
σ 0.2172 0.0157 0.0005 956.9526 925.4764 0.9993 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 1.3951 1.4465 1.4779 1.5045 1.5666
β 1.9917 2.0008 2.0059 2.0108 2.0208
σ 0.1887 0.2063 0.2168 0.2272 0.2510
We can also use these posterior draws to initialize MCMC sampling.
init_params = collect.(eachrow(result_multi.draws_transformed.value[1:n_chains, :, 1]))
8-element Vector{Vector{Float64}}:
[1.4691467301037398, 2.0111813084487418, 0.22204577368872686]
[1.4225743419997652, 2.0150513003204957, 0.19291870250704835]
[1.375596352156451, 2.0205385232497464, 0.23071888349800143]
[1.436418141335313, 2.0118829758483967, 0.20678691584596665]
[1.4770215057770222, 2.0025924142409632, 0.2236454231227252]
[1.4483836340938132, 2.015921280636908, 0.21591143831903215]
[1.5451189039344957, 1.9945619185153087, 0.25102646725887745]
[1.4018133420228451, 2.02161765671409, 0.2125642337875723]
chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; init_params, progress=false)
Chains MCMC chain (1000×15×8 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 8
Samples per chain = 1000
Wall duration = 5.88 seconds
Compute duration = 4.2 seconds
parameters = α, β, σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
α 1.4750 0.0429 0.0007 3592.0524 3898.4925 1.0030 ⋯
β 2.0061 0.0074 0.0001 3763.7934 4436.1419 1.0026 ⋯
σ 0.2167 0.0157 0.0002 5087.3685 4657.4357 1.0008 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 1.3902 1.4469 1.4753 1.5036 1.5601
β 1.9917 2.0011 2.0062 2.0109 2.0207
σ 0.1884 0.2058 0.2158 0.2267 0.2491
We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.
inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0) # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)
n_adapts = 50
n_draws = 1_000
chns = sample(
model,
externalsampler(nuts),
MCMCThreads(),
n_draws + n_adapts,
n_chains;
n_adapts,
init_params,
progress=false,
)[n_adapts + 1:end, :, :] # drop warm-up draws
Chains MCMC chain (1000×16×8 Array{Float64, 3}):
Iterations = 51:1:1050
Number of chains = 8
Samples per chain = 1000
Wall duration = 4.3 seconds
Compute duration = 3.94 seconds
parameters = α, β, σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, is_adapt
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
α 1.4751 0.0449 0.0006 6678.2638 5558.2071 1.0011 ⋯
β 2.0059 0.0101 0.0002 6577.4152 5374.9491 1.0010 ⋯
σ 0.2182 0.0351 0.0009 6448.5094 4827.2103 1.0002 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 1.3920 1.4460 1.4746 1.5040 1.5590
β 1.9918 2.0011 2.0061 2.0110 2.0204
σ 0.1883 0.2060 0.2160 0.2267 0.2507
See Initializing HMC with Pathfinder for further examples.