User Guide

This guide demonstrates how to use BayesianDoubleML.jl for causal inference with different inference methods.

Installation

using Pkg
Pkg.add(url = "https://github.com/masonrhayes/BayesianDoubleML.jl")

Basic Usage

Creating a Model

All analyses start by creating a BDMLModel from your data:

using BayesianDoubleML
using StableRNGs

# Your data
n = 200
p = 100
alpha_true = 2.0 # true causal effect
rng = StableRNG(42) # for reproducibility

# Generate data as DataFrame
df = make_plr_DTL2025(n, p, 2.0; alpha = alpha_true, rng = rng)

# Create model with hierarchical model (recommended)
# All columns except :y and :d are automatically used as covariates
model = BDMLModel(df, :y, :d; model_type=:hier)

# Or use basic model with fixed priors
model = BDMLModel(df, :y, :d; model_type=:basic)

# Explicit covariate selection (optional)
# model = BDMLModel(df, :y, :d; model_type=:hier, x_cols=[:X1, :X2, :X3])

Model Types

Hierarchical Model (:hier) - Recommended:

  • Uses adaptive shrinkage via hierarchical priors
  • Equivalent to Student-t(4) priors on coefficients
  • Better coverage in simulations (0.94 vs 0.91-0.93) from the paper.

Basic Model (:basic):

  • Uses fixed N(0, 25·I) priors
  • Good for baseline comparisons

Inference Methods

MCMC (NUTS)

MCMC using the No-U-Turn Sampler provides great posterior inference and is recommended for small-to-medium datasets or where fitting time is not a large concern.

# Default NUTS settings
fit!(model, MCMCNUTS())

# Custom settings
fit!(
    model, 
    MCMCNUTS(; target_acceptance=0.9, max_depth=12);
    n_samples=2000,
    n_chains=4
)

When to use:

  • Small to medium datasets
  • When exact inference is critical

Key parameters:

  • target_acceptance: Target acceptance rate (default: 0.8, range: 0.6-0.9)
  • max_depth: Maximum tree depth (default: 10)
  • n_samples: Number of posterior samples per chain (default: 2000)
  • n_chains: Number of parallel chains (default: 4)

Simple VI with Mooncake

Simple VI uses Turing's native ADVI implementation and works excellently with the Mooncake AD backend, which provides 5-10x speedup (after initial warmup).

# Default: SimpleVI with Mooncake
using Mooncake 

fit!(
    model, 
    SimpleVIMethod(; ad_backend=AutoMooncake);
    n_iterations=1000,
    n_draws=2000
)

Unified VI with AutoReverseDiff

Unified VI uses AdvancedVI.jl with explicit bijectors and supports multiple variational families and subsampling for large datasets.

MeanField (Diagonal Covariance)

The default MeanField approximation assumes independent parameters:

# MeanField with ReverseDiff (default)
fit!(
    model,
    UnifiedVIMethod(; 
        ad_backend=AutoReverseDiff,
        family=MeanField()
    );
    n_iterations=1000,
    n_draws=2000
)

# Or use the convenience constructor
fit!(model, MeanFieldVI())

LowRank (Low-Rank + Diagonal Covariance)

LowRank captures parameter correlations with fewer parameters than full covariance:

# LowRank with rank 3
fit!(
    model,
    UnifiedVIMethod(; 
        ad_backend=AutoReverseDiff,
        family=LowRank(3)
    );
    n_iterations=1000,
    n_draws=2000
)

# Or use the convenience constructor
fit!(model, LowRankVI(3))

When to use:

  • Large datasets (automatically enables subsampling when n ≥ 10,000)
  • When you need specific variational family control
  • For exploring mean-field vs low-rank tradeoffs

Subsampling: Automatically enabled for n ≥ 10,000:

# Auto-subsampling (default batch size: min(256, ceil(n/1000)))
fit!(model, UnifiedVIMethod())  # Auto-enabled for large n

# Explicit control
fit!(
    model,
    UnifiedVIMethod(; 
        ad_backend=AutoReverseDiff,
        family=MeanField(),
        subsample=true,
        batch_size=512
    );
    n_iterations=1000
)

Understanding Results

All inference methods store results in the model object. Access them using common accessor functions:

# Extract causal effect samples
alpha_samples = extract_alpha(model)  # On original scale

# Generate coefficient table with diagnostics
coeftable(model)

# See full model result summary
summary(model)

Coefficient Table Output

Bayesian Double ML Coefficient Table
======================================================================
Parameter: α (treatment effect)
Model type: hier
Inference method: VI
Credible interval level: 95.0% (HPD)
Number of posterior samples: 2000

  Parameter    Estimate  Std. Error        MCSE     P-value
  ---------    --------  ----------        ----     -------
          α        1.9832      0.1124      0.0025      0.0000

HPD Credible Intervals:
  α: [1.7623, 2.2031]

Diagnostics:
  Final ELBO: -3421.56

MCMC-Specific Diagnostics

# Effective Sample Size (ESS)
ess(model)

# R-hat convergence diagnostic (should be ≈ 1.0)
rhat(model)

# Monte Carlo Standard Error
mcse(model)

VI-Specific Information

# Access the result object for VI-specific fields
result = model.result

# ELBO convergence history
result.elbo_history

# Convergence flag
result.converged

# Final ELBO value
result.final_elbo

Performance Tips

AD Backend Selection

BackendSpeedBest For
AutoReverseDiffBaselineDefault choice
AutoMooncake5-10x fasterSpeed
AutoZygoteTypically slower than ReverseDiff or MooncakeNot recommended
AutoForwardDiffTypically very slowNot recommended

Mathematical Background

The Bivariate Reduced Form

The BDML model avoids regularization-induced confounding by parameterizing the causal inference problem as a bivariate regression:

Structural Model:

\[Y = \alpha D + X'\beta + \varepsilon, \quad \varepsilon \perp V\]

Reduced Form (substituting $D = X'\gamma + V$):

\[\begin{aligned} Y &= X'\underbrace{(\beta + \alpha\gamma)}_{\delta} + \underbrace{(\varepsilon + \alpha V)}_{U} \\ D &= X'\gamma + V \end{aligned}\]

Since $\varepsilon \perp V$ by assumption:

\[\text{Cov}(U, V) = \text{Cov}(\varepsilon + \alpha V, V) = \alpha \cdot \text{Var}(V)\]

Therefore, the causal effect is:

\[\alpha = \frac{\text{Cov}(U, V)}{\text{Var}(V)} = \frac{\sigma_{UV}}{\sigma^2_V} = \rho \frac{\sigma_U}{\sigma_V}\]

Prior Specifications

BDML-Basic:

\[\begin{aligned} \delta &\sim \mathcal{N}(0, 25 \cdot I_p) \\ \gamma &\sim \mathcal{N}(0, 25 \cdot I_p) \\ \sigma_U, \sigma_V &\sim \text{Cauchy}^+(0, 2.5) \\ R &\sim \text{LKJ}(4) \end{aligned}\]

BDML-Hier:

\[\begin{aligned} \sigma^2_\delta, \sigma^2_\gamma &\sim \text{InvGamma}(2, 2) \\ \delta \mid \sigma^2_\delta &\sim \mathcal{N}(0, \sigma^2_\delta \cdot I_p) \\ \gamma \mid \sigma^2_\gamma &\sim \mathcal{N}(0, \sigma^2_\gamma \cdot I_p) \\ \sigma_U, \sigma_V &\sim \text{Cauchy}^+(0, 2.5) \\ R &\sim \text{LKJ}(4) \end{aligned}\]

The hierarchical prior is equivalent to placing independent Student-t(4) distributions on each coefficient marginally, providing adaptive shrinkage that learns the appropriate regularization from data.

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference". arXiv:2508.12688v1.
  • Chernozhukov, V., et al. (2018). "Double/debiased machine learning for treatment and structural parameters". The Econometrics Journal, 21(1), C1-C68.