Running Pathfinder on Turing.jl models
This tutorial demonstrates how Turing can be used with Pathfinder.
We'll demonstrate with a regression example.
using AbstractMCMC, AdvancedHMC, DynamicPPL, FlexiChains, Pathfinder, Random, Turing
Random.seed!(39)
@model function regress(x)
α ~ Normal()
β ~ Normal()
σ ~ truncated(Normal(); lower=0)
μ = α .+ β .* x
y ~ product_distribution(Normal.(μ, σ))
end
x = 0:0.1:10
true_params = (; α=1.5, β=2, σ=2)
# simulate data
y = rand(regress(x) | true_params)[@varname(y)]
model = regress(x) | (; y)
n_chains = 8For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce MCMCChains.Chains or FlexiChains.VNChain objects as outputs. To access this, we run Pathfinder normally; the chains representation of the draws (defaulting to Chains) is stored in draws_transformed.
result_single = pathfinder(model; ndraws=1_000)Single-path Pathfinder result
tries: 1
draws: 1000
fit iteration: 26 (total: 26)
fit ELBO: -213.38 ± 0.06
fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.6508971085470592, 1.9311753921149317, 0.5801261338911265]
Σ: [0.12833559665737235 -0.024925831081334386 -0.003747748546056793; -0.024925831081334372 0.007926736754357806 0.000559861078137877; -0.0037477485460567896 0.0005598610781378777 0.00621065939801297]
)
result_single.draws_transformedChains MCMC chain (1000×6×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
parameters = α, β, σ
internals = logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.
To request a different chain type (e.g. VNChain), we can specify the chain_type directly.
pathfinder(model; ndraws=1_000, chain_type=VNChain).draws_transformed╭─FlexiChain (1000 iterations, 1 chain) ───────────────────────────────────────────────────────────╮
│ ↓ iter = 1:1000 │
│ → chain = 1:1 │
│ │
│ Parameters (3) ── VarName │
│ Float64 α, β, σ │
│ │
│ Extras (3) │
│ Float64 logprior, loglikelihood, logjoint │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯Note that while Turing's sample methods default to initializing parameters from the prior with InitFromPrior, Pathfinder defaults to uniformly sampling them in the range [-2, 2] in unconstrained space (equivalent to Turing's InitFromUniform(-2, 2)). To use Turing's default in Pathfinder, specify init_sampler=InitFromPrior().
result_multi = multipathfinder(model, 1_000; nruns=n_chains, init_sampler=InitFromPrior())Multi-path Pathfinder result
runs: 8
draws: 1000
Pareto shape diagnostic: 0.78 (bad)The Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.
chns_pf = result_multi.draws_transformed
describe(chns_pf)Chains MCMC chain (1000×6×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
parameters = α, β, σ
internals = logprior, loglikelihood, logjoint
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Missing
α 1.6334 0.3551 0.0110 1049.8243 965.0031 1.0009 missing
β 1.9329 0.0624 0.0020 1000.5126 856.2056 1.0001 missing
σ 1.8213 0.1211 0.0040 933.3720 765.2233 0.9998 missing
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 0.9865 1.3863 1.6219 1.8519 2.4077
β 1.8024 1.8944 1.9360 1.9780 2.0532
σ 1.6055 1.7260 1.8177 1.9094 2.0835We can also use these draws to initialize MCMC sampling with InitFromParams.
params = AbstractMCMC.to_samples(DynamicPPL.ParamsWithStats, chns_pf[1:n_chains, :, :], model)
initial_params = [InitFromParams(p.params) for p in vec(params)]chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; initial_params, progress=false)
describe(chns)┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/C1aKp/src/sample.jl:544
┌ Info: Found initial step size
└ ϵ = 0.05
┌ Info: Found initial step size
└ ϵ = 0.05
┌ Info: Found initial step size
└ ϵ = 0.025
┌ Info: Found initial step size
└ ϵ = 0.025
┌ Info: Found initial step size
└ ϵ = 0.05
┌ Info: Found initial step size
└ ϵ = 0.025
┌ Info: Found initial step size
└ ϵ = 0.05
┌ Info: Found initial step size
└ ϵ = 0.05
Chains MCMC chain (1000×17×8 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 8
Samples per chain = 1000
Wall duration = 4.2 seconds
Compute duration = 2.69 seconds
parameters = α, β, σ
internals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 1.6470 0.3406 0.0058 3502.6815 3922.4493 1.0015 1300.6615
β 1.9315 0.0597 0.0010 3508.4206 4116.5602 1.0014 1302.7926
σ 1.8141 0.1273 0.0018 4825.9828 4624.0206 1.0020 1792.0471
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 0.9806 1.4112 1.6499 1.8763 2.3061
β 1.8156 1.8920 1.9307 1.9714 2.0487
σ 1.5881 1.7246 1.8083 1.8931 2.0847We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.
inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0) # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)
n_adapts = 50
n_draws = 1_000
chns = sample(
model,
externalsampler(nuts),
MCMCThreads(),
n_draws + n_adapts,
n_chains;
n_adapts,
initial_params,
progress=false,
)[n_adapts + 1:end, :, :] # drop warm-up draws
describe(chns)┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC ~/.julia/packages/AbstractMCMC/C1aKp/src/sample.jl:544
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 3.2
[ Info: Found initial step size 1.6062500000000002
[ Info: Found initial step size 1.6
[ Info: Found initial step size 1.6
Chains MCMC chain (1000×17×8 Array{Float64, 3}):
Iterations = 51:1:1050
Number of chains = 8
Samples per chain = 1000
Wall duration = 1.52 seconds
Compute duration = 1.36 seconds
parameters = α, β, σ
internals = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 1.6293 0.3406 0.0063 2901.6511 3763.6943 1.0038 2130.4340
β 1.9343 0.0595 0.0011 2813.1895 3333.2943 1.0039 2065.4842
σ 1.8169 0.1258 0.0015 6849.7093 5765.4023 1.0000 5029.1552
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 0.9447 1.3989 1.6350 1.8556 2.2921
β 1.8153 1.8945 1.9350 1.9748 2.0502
σ 1.5936 1.7268 1.8105 1.8971 2.0796See Initializing HMC with Pathfinder for further examples.