Running Pathfinder on Turing.jl models
This tutorial demonstrates how Turing can be used with Pathfinder.
We'll demonstrate with a regression example.
using AbstractMCMC, AdvancedHMC, DynamicPPL, FlexiChains, MCMCChains, Pathfinder, Random, Turing
Random.seed!(39)
@model function regress(x)
α ~ Normal()
β ~ Normal()
σ ~ truncated(Normal(); lower=0)
μ = α .+ β .* x
y ~ product_distribution(Normal.(μ, σ))
end
x = 0:0.1:10
true_params = (; α=1.5, β=2, σ=2)
# simulate data
y = rand(regress(x) | true_params)[@varname(y)]
model = regress(x) | (; y)
n_chains = 8For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce FlexiChains.VNChain objects as outputs. To access this, we run Pathfinder normally; the chains representation of the draws is stored in draws_transformed, with the type defaulting to whatever Turing.sample itself defaults to.
Since Turing v0.45, the default chain_type is FlexiChains.VNChain, while previous versions returned MCMCChains.Chains. Pathfinder will return the same default chain type that your installed Turing version returns, but you can always specify the chain_type manually. This tutorial uses the new default (FlexiChains) throughout; see Using MCMCChains below if you still want MCMCChains.Chains.
result_single = pathfinder(model; ndraws=1_000)Single-path Pathfinder result
tries: 1
draws: 1000
fit iteration: 26 (total: 26)
fit ELBO: -213.38 ± 0.06
fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.6508971085470592, 1.9311753921149317, 0.5801261338911265]
Σ: [0.12833559665737235 -0.024925831081334386 -0.003747748546056793; -0.024925831081334372 0.007926736754357806 0.000559861078137877; -0.0037477485460567896 0.0005598610781378777 0.00621065939801297]
)
result_single.draws_transformed╭─FlexiChain (1000 iterations, 1 chain) ───────────────────────────────────────────────────────────╮
│ ↓ iter = 1:1000 │
│ → chain = 1:1 │
│ │
│ Parameters (3) ── VarName │
│ Float64 α, β, σ │
│ │
│ Extras (3) │
│ Float64 logprior, loglikelihood, logjoint │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯Note that while Turing's sample methods default to initializing parameters from the prior with InitFromPrior, Pathfinder defaults to uniformly sampling them in the range [-2, 2] in unconstrained space (equivalent to Turing's InitFromUniform(-2, 2)). To use Turing's default in Pathfinder, specify init_sampler=InitFromPrior().
result_multi = multipathfinder(model, 1_000; nruns=n_chains, init_sampler=InitFromPrior())Multi-path Pathfinder result
runs: 8
draws: 1000
Pareto shape diagnostic: 0.76 (bad)The Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.
chns_pf = result_multi.draws_transformed
summarystats(chns_pf)╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
│ iter collapsed │
│ chain collapsed │
│ ↓ stat = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95] │
│ │
│ Parameters (3) ── VarName │
│ Float64 α, β, σ │
│ │
│ Extras (3) │
│ Float64 logprior, loglikelihood, logjoint │
│ │
│ Summary │
│ param mean std mcse ess_bulk ess_tail rhat q5 q50 q95 │
│ α 1.6419 0.3151 0.0101 964.6082 907.0437 1.0012 1.1183 1.6388 2.1384 │
│ β 1.9308 0.0554 0.0018 918.3018 970.7639 1.0019 1.8432 1.9322 2.0198 │
│ σ 1.8131 0.1155 0.0039 877.1734 970.1057 1.0020 1.6457 1.8063 2.0158 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯We can also use these draws to initialize MCMC sampling with InitFromParams. FlexiChains.VNChain subsets iterations and chains with keyword arguments rather than MCMCChains.Chains's 3-argument indexing; see the FlexiChains migration guide for more such translations.
params = AbstractMCMC.to_samples(DynamicPPL.ParamsWithStats, chns_pf[iter=1:n_chains], model)
initial_params = [InitFromParams(p.params) for p in vec(params)]chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; initial_params, progress=false)
summarystats(chns)╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
│ iter collapsed │
│ chain collapsed │
│ ↓ stat = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95] │
│ │
│ Parameters (3) ── VarName │
│ Float64 α, β, σ │
│ │
│ Extras (14) │
│ Float64 n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, │
│ hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, │
│ step_size, nom_step_size, logprior, loglikelihood, logjoint │
│ │
│ Summary │
│ param mean std mcse ess_bulk ess_tail rhat q5 q50 q95 │
│ α 1.6550 0.3374 0.0056 3592.0169 4226.8098 1.0004 1.0955 1.6494 2.2126 │
│ β 1.9303 0.0589 0.0010 3557.7341 4307.1996 1.0003 1.8341 1.9299 2.0285 │
│ σ 1.8151 0.1270 0.0019 4658.9320 4264.2468 1.0007 1.6218 1.8083 2.0347 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.
inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0) # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)
n_adapts = 50
n_draws = 1_000
chns = sample(
model,
externalsampler(nuts),
MCMCThreads(),
n_draws + n_adapts,
n_chains;
n_adapts,
initial_params,
progress=false,
)[iter=(n_adapts + 1):(n_adapts + n_draws)] # drop warm-up draws
summarystats(chns)╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
│ iter collapsed │
│ chain collapsed │
│ ↓ stat = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95] │
│ │
│ Parameters (3) ── VarName │
│ Float64 α, β, σ │
│ │
│ Extras (14) │
│ Float64 n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, │
│ hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, │
│ step_size, nom_step_size, logprior, loglikelihood, logjoint │
│ │
│ Summary │
│ param mean std mcse ess_bulk ess_tail rhat q5 q50 q95 │
│ α 1.6465 0.3389 0.0036 9099.7892 6196.5137 1.0002 1.0870 1.6481 2.1984 │
│ β 1.9321 0.0596 0.0006 9272.8141 5845.9523 1.0003 1.8340 1.9316 2.0298 │
│ σ 1.8142 0.1288 0.0013 9520.2387 5884.2924 1.0001 1.6154 1.8050 2.0388 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯See Initializing HMC with Pathfinder for further examples.
Using MCMCChains
To request transformed draws be returned as MCMCChains.Chains instead of the default, you may specify chain_type directly.
result_multi_mcmc = multipathfinder(
model, 1_000; nruns=n_chains, init_sampler=InitFromPrior(), chain_type=MCMCChains.Chains
)
chns_pf_mcmc = result_multi_mcmc.draws_transformedChains MCMC chain (1000×6×1 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 1
Samples per chain = 1000
parameters = α, β, σ
internals = logprior, loglikelihood, logjoint
Use `describe(chains)` for summary statistics and quantiles.
As before, we can use these draws to initialize MCMC sampling with InitFromParams. Note that Chains subsets iterations and chains with 3-argument indexing (chain[iterations, parameters, chains]):
params_mcmcchains = AbstractMCMC.to_samples(
DynamicPPL.ParamsWithStats, chns_pf_mcmc[1:n_chains, :, :], model
)
initial_params_mcmcchains = [InitFromParams(p.params) for p in vec(params_mcmcchains)]