Running Pathfinder on Turing.jl models

This tutorial demonstrates how Turing can be used with Pathfinder.

We'll demonstrate with a regression example.

using AbstractMCMC, AdvancedHMC, DynamicPPL, FlexiChains, MCMCChains, Pathfinder, Random, Turing
Random.seed!(39)

@model function regress(x)
    α ~ Normal()
    β ~ Normal()
    σ ~ truncated(Normal(); lower=0)
    μ = α .+ β .* x
    y ~ product_distribution(Normal.(μ, σ))
end
x = 0:0.1:10
true_params = (; α=1.5, β=2, σ=2)
# simulate data
y = rand(regress(x) | true_params)[@varname(y)]

model = regress(x) | (; y)
n_chains = 8

For convenience, pathfinder and multipathfinder can take Turing models as inputs and produce FlexiChains.VNChain objects as outputs. To access this, we run Pathfinder normally; the chains representation of the draws is stored in draws_transformed, with the type defaulting to whatever Turing.sample itself defaults to.

Turing pre-v0.45 backward-compatibility

Since Turing v0.45, the default chain_type is FlexiChains.VNChain, while previous versions returned MCMCChains.Chains. Pathfinder will return the same default chain type that your installed Turing version returns, but you can always specify the chain_type manually. This tutorial uses the new default (FlexiChains) throughout; see Using MCMCChains below if you still want MCMCChains.Chains.

result_single = pathfinder(model; ndraws=1_000)
Single-path Pathfinder result
  tries: 1
  draws: 1000
  fit iteration: 26 (total: 26)
  fit ELBO: -213.38 ± 0.06
  fit distribution: MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, Pathfinder.WoodburyPDFactorization{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}}, Vector{Float64}}(
dim: 3
μ: [1.6508971085470592, 1.9311753921149317, 0.5801261338911265]
Σ: [0.12833559665737235 -0.024925831081334386 -0.003747748546056793; -0.024925831081334372 0.007926736754357806 0.000559861078137877; -0.0037477485460567896 0.0005598610781378777 0.00621065939801297]
)
result_single.draws_transformed
╭─FlexiChain (1000 iterations, 1 chain) ───────────────────────────────────────────────────────────╮
 ↓ iter  = 1:1000                                                                                 
 → chain = 1:1                                                                                    
                                                                                                  
 Parameters (3) ── VarName                                                                        
  Float64  α, β, σ                                                                                
                                                                                                  
 Extras (3)                                                                                       
  Float64  logprior, loglikelihood, logjoint                                                      
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Note that while Turing's sample methods default to initializing parameters from the prior with InitFromPrior, Pathfinder defaults to uniformly sampling them in the range [-2, 2] in unconstrained space (equivalent to Turing's InitFromUniform(-2, 2)). To use Turing's default in Pathfinder, specify init_sampler=InitFromPrior().

result_multi = multipathfinder(model, 1_000; nruns=n_chains, init_sampler=InitFromPrior())
Multi-path Pathfinder result
  runs: 8
  draws: 1000
  Pareto shape diagnostic: 0.76 (bad)

The Pareto shape diagnostic indicates that it is likely safe to use these draws to compute posterior estimates.

chns_pf = result_multi.draws_transformed
summarystats(chns_pf)
╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
   iter    collapsed                                                                              
   chain   collapsed                                                                              
 ↓ stat  = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95]                              
                                                                                                  
 Parameters (3) ── VarName                                                                        
  Float64  α, β, σ                                                                                
                                                                                                  
 Extras (3)                                                                                       
  Float64  logprior, loglikelihood, logjoint                                                      
                                                                                                  
 Summary                                                                                          
   param    mean     std    mcse  ess_bulk  ess_tail    rhat      q5     q50     q95              
       α  1.6419  0.3151  0.0101  964.6082  907.0437  1.0012  1.1183  1.6388  2.1384              
       β  1.9308  0.0554  0.0018  918.3018  970.7639  1.0019  1.8432  1.9322  2.0198              
       σ  1.8131  0.1155  0.0039  877.1734  970.1057  1.0020  1.6457  1.8063  2.0158              
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

We can also use these draws to initialize MCMC sampling with InitFromParams. FlexiChains.VNChain subsets iterations and chains with keyword arguments rather than MCMCChains.Chains's 3-argument indexing; see the FlexiChains migration guide for more such translations.

params = AbstractMCMC.to_samples(DynamicPPL.ParamsWithStats, chns_pf[iter=1:n_chains], model)
initial_params = [InitFromParams(p.params) for p in vec(params)]
chns = sample(model, Turing.NUTS(), MCMCThreads(), 1_000, n_chains; initial_params, progress=false)
summarystats(chns)
╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
   iter    collapsed                                                                              
   chain   collapsed                                                                              
 ↓ stat  = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95]                              
                                                                                                  
 Parameters (3) ── VarName                                                                        
  Float64  α, β, σ                                                                                
                                                                                                  
 Extras (14)                                                                                      
  Float64  n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy,                  
           hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error,   
           step_size, nom_step_size, logprior, loglikelihood, logjoint                            
                                                                                                  
 Summary                                                                                          
   param    mean     std    mcse   ess_bulk   ess_tail    rhat      q5     q50     q95            
       α  1.6550  0.3374  0.0056  3592.0169  4226.8098  1.0004  1.0955  1.6494  2.2126            
       β  1.9303  0.0589  0.0010  3557.7341  4307.1996  1.0003  1.8341  1.9299  2.0285            
       σ  1.8151  0.1270  0.0019  4658.9320  4264.2468  1.0007  1.6218  1.8083  2.0347            
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

We can use Pathfinder's estimate of the metric and only perform enough warm-up to tune the step size.

inv_metric = result_multi.pathfinder_results[1].fit_distribution.Σ
metric = Pathfinder.RankUpdateEuclideanMetric(inv_metric)
kernel = HMCKernel(Trajectory{MultinomialTS}(Leapfrog(0.0), GeneralisedNoUTurn()))
adaptor = StepSizeAdaptor(0.8, 1.0)  # adapt only the step size
nuts = AdvancedHMC.HMCSampler(kernel, metric, adaptor)

n_adapts = 50
n_draws = 1_000
chns = sample(
    model,
    externalsampler(nuts),
    MCMCThreads(),
    n_draws + n_adapts,
    n_chains;
    n_adapts,
    initial_params,
    progress=false,
)[iter=(n_adapts + 1):(n_adapts + n_draws)]  # drop warm-up draws
summarystats(chns)
╭─FlexiSummary (9 statistics) ─────────────────────────────────────────────────────────────────────╮
   iter    collapsed                                                                              
   chain   collapsed                                                                              
 ↓ stat  = [mean, std, mcse, ess_bulk, ess_tail, rhat, q5, q50, q95]                              
                                                                                                  
 Parameters (3) ── VarName                                                                        
  Float64  α, β, σ                                                                                
                                                                                                  
 Extras (14)                                                                                      
  Float64  n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy,                  
           hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error,   
           step_size, nom_step_size, logprior, loglikelihood, logjoint                            
                                                                                                  
 Summary                                                                                          
   param    mean     std    mcse   ess_bulk   ess_tail    rhat      q5     q50     q95            
       α  1.6465  0.3389  0.0036  9099.7892  6196.5137  1.0002  1.0870  1.6481  2.1984            
       β  1.9321  0.0596  0.0006  9272.8141  5845.9523  1.0003  1.8340  1.9316  2.0298            
       σ  1.8142  0.1288  0.0013  9520.2387  5884.2924  1.0001  1.6154  1.8050  2.0388            
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

See Initializing HMC with Pathfinder for further examples.

Using MCMCChains

To request transformed draws be returned as MCMCChains.Chains instead of the default, you may specify chain_type directly.

result_multi_mcmc = multipathfinder(
    model, 1_000; nruns=n_chains, init_sampler=InitFromPrior(), chain_type=MCMCChains.Chains
)
chns_pf_mcmc = result_multi_mcmc.draws_transformed
Chains MCMC chain (1000×6×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = α, β, σ
internals         = logprior, loglikelihood, logjoint

Use `describe(chains)` for summary statistics and quantiles.

As before, we can use these draws to initialize MCMC sampling with InitFromParams. Note that Chains subsets iterations and chains with 3-argument indexing (chain[iterations, parameters, chains]):

params_mcmcchains = AbstractMCMC.to_samples(
    DynamicPPL.ParamsWithStats, chns_pf_mcmc[1:n_chains, :, :], model
)
initial_params_mcmcchains = [InitFromParams(p.params) for p in vec(params_mcmcchains)]