API Reference

Model Types

BayesianDoubleML.AbstractBDMLModelType
AbstractBDMLModel

Abstract type for all BDML model specifications.

Model types encapsulate:

  • Data (Y, D, X) - already standardized
  • Standardization statistics (for transforming back)
  • Pre-allocated temporaries (for performance)
  • Metadata (n, p, model_type)
  • Fitting results and state (isfitted, result, lastmethod)

Each concrete subtype represents a different model specification (basic vs hierarchical) and can be fitted with different methods (MCMC, VI unified, VI simple) via multiple dispatch.

After fitting with fit!(), results are stored in the model and can be extracted using coeftable(), extract_alpha(), summary(), etc.

source
BayesianDoubleML.BDMLBasicModelType
BDMLBasicModel <: AbstractBDMLModel

Basic BDML model specification implementing BDML-Basic from DiTraglia & Liu (2025), Algorithm 1.

Fields

  • Y::Vector{Float64}: Standardized outcome variable
  • D::Vector{Float64}: Standardized treatment variable
  • X::Matrix{Float64}: Standardized control variables (covariates)
  • stats::StandardizationStats: Statistics for transforming back to original scale
  • n::Int: Number of observations
  • p::Int: Number of control variables
  • result::Union{Nothing, AbstractBDMLResult}: Stores fitting results after fit!()
  • is_fitted::Bool: Whether the model has been fitted
  • last_method::Union{Nothing, AbstractInferenceMethod}: Method used in last fit

Model Specification (Section 4, Equations 12-13)

Reduced form: $Y = X'\delta + U$ (Eq. 12) $D = X'\gamma + V$ (Eq. 5)

Priors:

  • $\delta \sim N(0, 25\cdot I_p)$ [Outcome reduced form coefficients]
  • $\gamma \sim N(0, 25\cdot I_p)$ [Treatment reduced form coefficients]
  • $\sigma_U \sim \text{Cauchy}^+(0, 2.5)$ [Outcome error scale]
  • $\sigma_V \sim \text{Cauchy}^+(0, 2.5)$ [Treatment error scale]
  • $R \sim \text{LKJ}(4)$ [Correlation matrix]

For VI, uses $\text{Beta}(2,2)$ correlation parameterization instead of LKJCholesky.

Causal effect recovery: $\alpha = \rho\cdot\sigma_U / \sigma_V$ (Eq. 15)

This is the BDML-Basic variation with fixed prior variances (Section 6, Table 1).

See also: BDMLHierarchicalModel, BDMLModel

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Algorithm 1, Section 6.
source
BayesianDoubleML.BDMLHierarchicalModelType
BDMLHierarchicalModel <: AbstractBDMLModel

Hierarchical BDML model specification implementing BDML-Hier from DiTraglia & Liu (2025), Algorithm 1.

Fields

  • Y::Vector{Float64}: Standardized outcome variable
  • D::Vector{Float64}: Standardized treatment variable
  • X::Matrix{Float64}: Standardized control variables (covariates)
  • stats::StandardizationStats: Standardization statistics
  • n::Int: Number of observations
  • p::Int: Number of control variables
  • result::Union{Nothing, AbstractBDMLResult}: Stores fitting results after fit!()
  • is_fitted::Bool: Whether the model has been fitted
  • last_method::Union{Nothing, AbstractInferenceMethod}: Method used in last fit

Model Specification (Section 4, Equations 12-13)

Reduced form: $Y = X'\delta + U$ (Eq. 12) $D = X'\gamma + V$ (Eq. 5)

Hierarchical priors:

  • $\sigma^2_\delta \sim \text{InvGamma}(2, 2)$ [Variance hyperparameter for $\delta$]
  • $\sigma^2_\gamma \sim \text{InvGamma}(2, 2)$ [Variance hyperparameter for $\gamma$]
  • $\delta \sim N(0, \sigma^2_\delta\cdot I_p)$ [Outcome coefficients with adaptive shrinkage]
  • $\gamma \sim N(0, \sigma^2_\gamma\cdot I_p)$ [Treatment coefficients with adaptive shrinkage]
  • $\sigma_U \sim \text{Cauchy}^+(0, 2.5)$ [Outcome error scale]
  • $\sigma_V \sim \text{Cauchy}^+(0, 2.5)$ [Treatment error scale]
  • $R \sim \text{LKJ}(4)$ [Correlation matrix]

For VI, uses $\text{Beta}(2,2)$ correlation parameterization.

Causal effect recovery: $\alpha = \rho\cdot\sigma_U / \sigma_V$ (Eq. 15)

Hierarchical structure: This is equivalent to placing independent $\text{Student-}t(4)$ distributions on each coefficient marginally. The $\text{InvGamma}(2, 2)$ hyperprior provides adaptive shrinkage that learns the appropriate regularization from data (Section 6, Table 1).

This can improve performance when p is large relative to n, and the paper's simulations show BDML-Hier achieves better coverage (0.94) than BDML-Basic (0.91-0.93).

See also: BDMLBasicModel, BDMLModel

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Algorithm 1, Section 6.
source
BayesianDoubleML.BDMLModelFunction
BDMLModel(Y, D, X; model_type=:basic)

Factory function to create appropriate BDML model type for Algorithm 1 from DiTraglia & Liu (2025).

Standardizes data once during model creation. This ensures data is only standardized once, even if fitted multiple times.

Arguments

  • Y::Vector{Float64}: Outcome variable (will be standardized)
  • D::Vector{Float64}: Treatment variable (will be standardized)
  • X::Matrix{Float64}: Control variables (will be standardized)

Keyword Arguments

  • model_type::Symbol=:basic:
    • :basic for BDML-Basic (fixed prior variances)
    • :hier for BDML-Hier (adaptive hierarchical priors)

Returns

AbstractBDMLModel: Either BDMLBasicModel or BDMLHierarchicalModel

Algorithm 1 Variations

BDML-Basic (model_type=:basic):

  • Places independent $N(0, 25\cdot I)$ priors on $\delta$ and $\gamma$
  • Fixed shrinkage across all covariates
  • Recommended for interpretability and simplicity

BDML-Hier (model_type=:hier):

  • Places hierarchical $\text{InvGamma}(2, 2)$ priors on $\sigma^2_\delta$ and $\sigma^2_\gamma$
  • Adaptive shrinkage equivalent to $\text{Student-}t(4)$ on coefficients
  • Better coverage in simulations (Table 1: 0.94 vs 0.91-0.93)
  • Recommended as default choice

Both variations recover $\alpha$ via Equation 15: $\alpha = \rho\cdot\sigma_U / \sigma_V$

Examples

# Create basic model
model_basic = BDMLModel(Y, D, X; model_type=:basic)

# Create hierarchical model  
model_hier = BDMLModel(Y, D, X; model_type=:hier)

# Fit with different methods
fit!(model_basic, MCMCMethod(:nuts))
fit!(model_hier, UnifiedVIMethod())

Performance Notes

Standardization is performed once during model creation. Results are stored in the model after calling fit!().

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Algorithm 1.

See also: BDMLBasicModel, BDMLHierarchicalModel, fit!

source
BDMLModel(data::BDMLData; model_type=:basic)

Create BDML model from BDMLData struct.

Convenience constructor for working with the existing BDMLData type.

Examples

data = BDMLData(Y, D, X)
model = BDMLModel(data; model_type=:basic)
source
BDMLModel(df::DataFrame, y::Symbol, d::Symbol; model_type::Symbol=:basic, x_cols=nothing)

Create a BDML model from a DataFrame, specifying the outcome and treatment columns.

Arguments

  • df::DataFrame: The data as a DataFrame
  • y::Symbol: Column name for the outcome variable
  • d::Symbol: Column name for the treatment variable

Keyword Arguments

  • model_type::Symbol=:basic:
    • :basic for BDML-Basic (fixed prior variances)
    • :hier for BDML-Hier (adaptive hierarchical priors)
  • x_cols: Column names for covariates. If nothing (default), uses all columns except y and d

Returns

AbstractBDMLModel: Either BDMLBasicModel or BDMLHierarchicalModel

Examples

using DataFrames

# Generate data
df = make_plr_DTL2025(200, 100, 2.0)

# Default: use all columns except :y and :d as covariates
model = BDMLModel(df, :y, :d; model_type=:hier)

# Explicit covariate selection
model = BDMLModel(df, :y, :d; model_type=:hier, x_cols=[:X1, :X2, :X3])

# Using column indices or regex
model = BDMLModel(df, :y, :d; model_type=:hier, x_cols=names(df, r"^X"))

See also: BDMLModel, BDMLBasicModel, BDMLHierarchicalModel

source

Model Accessors

BayesianDoubleML.nobsFunction
nobs(model::AbstractBDMLModel)

Return the number of observations in the model.

Examples

model = BDMLModel(Y, D, X)
n = nobs(model)  # Same as length(Y)
source
BayesianDoubleML.ncovariatesFunction
ncovariates(model::AbstractBDMLModel)

Return the number of control variables (covariates) in the model.

Examples

model = BDMLModel(Y, D, X)
p = ncovariates(model)  # Same as size(X, 2)
source
BayesianDoubleML.model_typeFunction
model_type(model::BDMLBasicModel)
model_type(model::BDMLHierarchicalModel)

Return the model type symbol (:basic or :hier).

Useful for dispatch and debugging.

Examples

model = BDMLModel(Y, D, X; model_type=:basic)
model_type(model)  # Returns :basic
source
BayesianDoubleML.standardization_statsFunction
standardization_stats(model::AbstractBDMLModel)

Return the standardization statistics for the model.

Used to transform results back to original scale.

Examples

model = BDMLModel(Y, D, X)
stats = standardization_stats(model)
# Use stats.Y_sd, stats.D_sd, etc.
source
BayesianDoubleML.isfittedFunction
isfitted(model::AbstractBDMLModel)

Return true if the model has been fitted (i.e., fit!() has been called).

Examples

model = BDMLModel(Y, D, X)
isfitted(model)  # Returns false

fit!(model)
isfitted(model)  # Returns true
source

Inference Methods

Abstract and Types

BayesianDoubleML.AbstractInferenceMethodType
AbstractInferenceMethod

Abstract type for all inference methods/algorithms.

Subtypes define HOW to fit a BDML problem:

  • MCMC methods: NUTS, HMC, etc.
  • VI methods: Unified (AdvancedVI), Simple (Turing vi())

Each concrete method type is dispatched on in the fit() function along with the problem type to execute the appropriate algorithm.

See also: MCMCMethod, UnifiedVIMethod, SimpleVIMethod

source
BayesianDoubleML.MCMCMethodType
MCMCMethod <: AbstractInferenceMethod

MCMC (Markov Chain Monte Carlo) inference method.

Supports multiple MCMC algorithms via the algorithm field:

  • :nuts - No-U-Turn Sampler (default, recommended)

Fields

  • algorithm::Symbol: Which MCMC algorithm (:nuts)
  • target_acceptance::Float64: Target acceptance rate for NUTS (default: 0.8)
  • max_depth::Int: Maximum tree depth for NUTS (default: 10)

Constructors

MCMCMethod(:nuts; target_acceptance=0.8, max_depth=10)
MCMCNUTS(; target_acceptance=0.8, max_depth=10)  # Convenience

Examples

# Default NUTS sampler
method = MCMCMethod(:nuts)

# Custom NUTS settings
method = MCMCMethod(:nuts; target_acceptance=0.9, max_depth=12)

# Convenience constructor
method = MCMCNUTS()
method = MCMCNUTS(; target_acceptance=0.9, max_depth=12)

Notes

NUTS is the recommended MCMC algorithm. It automatically tunes the trajectory length and is robust to step size choices.

See also: MCMCNUTS

source
BayesianDoubleML.UnifiedVIMethodType
UnifiedVIMethod{F<:AbstractVariationalFamily} <: AbstractInferenceMethod

Unified Variational Inference method using AdvancedVI with explicit Bijectors.

This is the primary VI implementation with:

  • Explicit bijector transformations (unconstrained → constrained)
  • Support for all AD backends (ReverseDiff, Mooncake, Zygote, ForwardDiff)
  • Subsampling support for large datasets (n > 10,000)
  • Multiple variational families (MeanField, LowRank) via type parameter
  • Full control over the optimization process

Type Parameters

  • F<:AbstractVariationalFamily: The variational family type (MeanField, LowRank)

Fields

  • ad_backend::Type{<:AbstractADType}: AD backend (AutoReverseDiff, AutoMooncake, etc.)
  • subsample::Union{Bool, Nothing}: Whether to use mini-batch gradients
  • batch_size::Int: Mini-batch size (auto-computed if -1)
  • n_montecarlo::Int: Number of Monte Carlo samples for gradient estimation (default: 10)
  • family::F: Variational family instance (carries type-specific parameters like rank)

Constructors

# Default MeanField
UnifiedVIMethod()

# LowRank with specific rank
UnifiedVIMethod(; family=LowRank(10))

# Convenience aliases
MeanFieldVI()      # Explicit mean-field
LowRankVI(10)      # Low-rank with rank 10
UnifiedVI()        # Backwards-compatible alias (defaults to MeanField)

AD Backend Options

  • AutoReverseDiff (default): Most stable, tape compilation, no warmup needed
  • AutoMooncake: 5-10x faster after warmup, requires compilation
  • AutoZygote: Source-to-source, higher memory usage
  • AutoForwardDiff: Forward-mode, good for small p

Subsampling

  • subsample=nothing: Auto-enable for n >= 10,000
  • subsample=true: Force subsampling with auto batch size
  • subsample=false: Force full-batch
  • batch_size: Manual control (default: min(256, ceil(n/1000)))

Examples

# Default - MeanField with ReverseDiff
method = UnifiedVIMethod()

# Low-rank with rank 10
method = UnifiedVIMethod(; family=LowRank(10))

# With Mooncake (fast after warmup)
method = UnifiedVIMethod(; ad_backend=AutoMooncake, family=LowRank(10))

# Explicit subsampling control with low-rank
method = UnifiedVIMethod(; subsample=true, batch_size=512, family=LowRank(5))

# Convenience aliases
method = MeanFieldVI()           # Explicit mean-field
method = LowRankVI(10)           # Low-rank with rank 10
method = UnifiedVI()             # Backwards-compatible

Notes

This implementation uses explicit bijectors (LogNormal, Beta) to transform parameters from unconstrained space (where VI optimizes) to constrained space (where model is defined). This provides better AD compatibility than relying on Turing's automatic bijectors.

The variational family determines the covariance structure:

  • MeanField: Diagonal only (fastest, assumes independence)
  • LowRank: Diagonal + low-rank factors (balances speed and correlation capture)

See also: MeanField, LowRank, SimpleVIMethod

source
BayesianDoubleML.SimpleVIMethodType
SimpleVIMethod <: AbstractInferenceMethod

Simple Variational Inference method using Turing's native vi() function.

This is an alternative VI implementation that:

  • Uses Turing's built-in ADVI with automatic bijectors
  • Works well with AutoMooncake (5-10x speedup after warmup)
  • Has simpler code, less maintenance overhead
  • Does NOT support subsampling (Turing limitation)

Best for: Production use with Mooncake when n < 10,000 and you want maximum performance after initial warmup.

Fields

  • ad_backend::Type{<:AbstractADType}: AD backend (AutoMooncake recommended, AutoReverseDiff works)

Constructor

SimpleVIMethod(; ad_backend=AutoMooncake)
SimpleVI(; kwargs...)  # Convenience alias

Examples

# Default - Mooncake (recommended for this implementation)
method = SimpleVIMethod()

# With ReverseDiff
method = SimpleVIMethod(; ad_backend=AutoReverseDiff)

# Convenience alias
method = SimpleVI()

Comparison with UnifiedVIMethod

FeatureUnifiedVIMethodSimpleVIMethod
BijectorsExplicitAutomatic (Turing)
SubsamplingYesNo
MooncakeHas issuesWorks well
ReverseDiffExcellentGood
Large data (n>10k)YesNo (memory)
Code complexityMore controlSimpler

See also: UnifiedVIMethod, SimpleVI

source

MCMC Method Constructors

BayesianDoubleML.MCMCNUTSFunction
MCMCNUTS(; target_acceptance=0.8, max_depth=10)

Convenience constructor for NUTS (No-U-Turn Sampler) MCMC method.

NUTS is the default and recommended MCMC algorithm. It automatically tunes the trajectory length and is robust to step size choices.

Arguments

  • target_acceptance::Float64=0.8: Target acceptance rate (typical range: 0.6-0.9)
  • max_depth::Int=10: Maximum tree depth (limits trajectory length)

Examples

# Default NUTS
method = MCMCNUTS()

# Higher target acceptance for better exploration
method = MCMCNUTS(; target_acceptance=0.9)

See also: MCMCMethod

source

VI Method Constructors

BayesianDoubleML.MeanFieldVIFunction
MeanFieldVI(; kwargs...)

Convenience constructor for MeanField VI.

Creates a UnifiedVIMethod with MeanField variational family.

Examples

method = MeanFieldVI()                      # Default MeanField
method = MeanFieldVI(; ad_backend=AutoMooncake)  # With Mooncake

See also: UnifiedVIMethod, LowRankVI

source
BayesianDoubleML.LowRankVIFunction
LowRankVI(rank::Int; kwargs...)

Convenience constructor for LowRank VI.

Creates a UnifiedVIMethod with LowRank variational family.

Arguments

  • rank::Int: Number of low-rank factors to use

Examples

method = LowRankVI(10)                      # Low-rank with 10 factors
method = LowRankVI(5; ad_backend=AutoMooncake)   # With Mooncake

See also: UnifiedVIMethod, MeanFieldVI

source
BayesianDoubleML.LowRankScoreVIFunction
LowRankScoreVI(rank::Int; kwargs...)

Convenience constructor for LowRankScore VI (using score gradient).

Creates a UnifiedVIMethod with LowRankScore variational family. Uses the score gradient (REINFORCE) estimator with VarGrad control variate.

Arguments

  • rank::Int: Number of low-rank factors to use

Examples

method = LowRankScoreVI(10)                      # Low-rank score gradient with 10 factors
method = LowRankScoreVI(5; ad_backend=AutoMooncake)   # With Mooncake

See also: UnifiedVIMethod, LowRankVI

source

Variational Families

BayesianDoubleML.AbstractVariationalFamilyType
AbstractVariationalFamily

Abstract type for variational distribution families in UnifiedVI.

Subtypes define the shape of the variational posterior approximation:

  • MeanField: Diagonal covariance (independent dimensions)
  • LowRank: Low-rank plus diagonal covariance structure

Each family type is dispatched on for initialization in initialize_variational_distribution.

source
BayesianDoubleML.MeanFieldType
MeanField <: AbstractVariationalFamily

Mean-field (factorized) Gaussian variational family.

Uses a diagonal covariance matrix, assuming independence between all parameters in the variational posterior.

Best for: High-dimensional problems, fast inference, when parameters are approximately uncorrelated in the posterior.

source
BayesianDoubleML.LowRankType
LowRank <: AbstractVariationalFamily

Low-rank Gaussian variational family.

Uses a low-rank plus diagonal decomposition of the covariance: Σ = D² + U*U' where U is d×r with r << d.

Best for: Problems with structured correlations, balancing between mean-field and full-rank flexibility.

Constructor

LowRank(rank::Int)  # rank is the number of low-rank factors
source
BayesianDoubleML.LowRankScoreType
LowRankScore <: AbstractVariationalFamily

Low-rank Gaussian variational family using score gradient estimator (BBVI).

Uses a low-rank plus diagonal decomposition of the covariance: Σ = D² + U*U' where U is d×r with r << d.

Uses the score gradient (REINFORCE) estimator with VarGrad control variate, which can be more stable than the reparameterization gradient for some problems.

Best for: Problems where reparameterization gradient is unstable or when exploring different gradient estimators.

Constructor

LowRankScore(rank::Int)  # rank is the number of low-rank factors
source

Method Traits

BayesianDoubleML.uses_samplingFunction
uses_sampling(method::AbstractInferenceMethod)

Return true if the method uses sampling (MCMC or Monte Carlo VI).

All current methods return true, but this enables future deterministic methods.

source
BayesianDoubleML.supports_subsamplingFunction
supports_subsampling(method::AbstractInferenceMethod)

Return true if the method supports mini-batch subsampling for large datasets.

Only UnifiedVIMethod supports subsampling. MCMC and SimpleVI do not.

source
BayesianDoubleML.is_deterministicFunction
is_deterministic(method::AbstractInferenceMethod)

Return true if the method is deterministic (no random sampling).

Currently all methods use sampling. Future deterministic methods (e.g., Laplace approximation) would return true.

source
BayesianDoubleML.default_n_samplesFunction
default_n_samples(method::AbstractInferenceMethod)

Return the default number of samples/draws for the method.

Returns 2000 for MCMC, 2000 for VI draw phase.

source
BayesianDoubleML.default_n_iterationsFunction
default_n_iterations(method::AbstractInferenceMethod)

Return the default number of optimization iterations for the method.

MCMC uses iterations as warm-up/tuning. VI uses iterations for optimization.

source

Fitting Functions

BayesianDoubleML.fit!Function
fit!(model::AbstractBDMLModel, method::AbstractInferenceMethod; force=false, kwargs...)

Fit a BDML model using the specified inference method, storing results in the model.

This is a mutating function that modifies the model in-place. After fitting, results can be extracted using coeftable(), extract_alpha(), summary(), etc.

If the model has already been fitted, a warning is shown unless force=true is passed to allow refitting.

Arguments

  • model::AbstractBDMLModel: The model to fit (data + metadata)
  • method::AbstractInferenceMethod: The inference algorithm to use

Keyword Arguments

  • force::Bool=false: Allow refitting an already-fitted model

For MCMC methods:

  • n_samples::Int=2000: Number of posterior samples to draw
  • n_chains::Int=4: Number of MCMC chains to run

For VI methods:

  • n_iterations::Int=1000: Number of optimization iterations
  • n_draws::Int=2000: Number of posterior samples to draw after fitting

Returns

nothing (follows standard Julia mutating function convention)

Examples

# Create model
model = BDMLModel(Y, D, X; model_type=:basic)

# Fit with MCMC (NUTS)
fit!(model, MCMCMethod(:nuts); n_samples=2000, n_chains=4)

# Fit with VI (Unified)
fit!(model, UnifiedVIMethod(); n_iterations=1000)

# Extract results
coef_table = coeftable(model)
summary(model)

# Refit with force (changes the stored result)
fit!(model, SimpleVIMethod(); force=true)

Multiple Dispatch

The actual implementation dispatches on both model and method:

  • _fit_impl(::BDMLBasicModel, ::MCMCMethod) - Basic model MCMC
  • _fit_impl(::BDMLHierarchicalModel, ::MCMCMethod) - Hierarchical MCMC
  • _fit_impl(::BDMLBasicModel, ::UnifiedVIMethod) - Basic model VI (Unified)
  • _fit_impl(::BDMLHierarchicalModel, ::UnifiedVIMethod) - Hierarchical VI (Unified)
  • _fit_impl(::BDMLBasicModel, ::SimpleVIMethod) - Basic model VI (Simple)
  • _fit_impl(::BDMLHierarchicalModel, ::SimpleVIMethod) - Hierarchical VI (Simple)

See also: BDMLModel, MCMCMethod, UnifiedVIMethod, isfitted

source
fit!(model::AbstractBDMLModel; force=false, kwargs...)

Fit a model using default method (MCMC with NUTS).

Convenience method that defaults to NUTS sampler.

source

Result Types

BayesianDoubleML.AbstractBDMLResultType
AbstractBDMLResult

Abstract parent type for all Bayesian Double Machine Learning results.

Subtypes include:

  • BDMLMCMCResult: Results from MCMC inference
  • BDMLVIResult: Results from Variational Inference

All subtypes implement the result interface with methods like:

  • extract_alpha(): Extract α samples
  • coeftable(): Generate coefficient table
  • confint(): Compute credible intervals
source
BayesianDoubleML.BDMLMCMCResultType
BDMLMCMCResult <: AbstractBDMLResult

Results from MCMC inference using NUTS or HMC sampling.

Fields

  • chain::MCMCChains.Chains: Full MCMC chain from Turing
  • alpha_samples::Vector{Float64}: Causal effect samples (original scale)
  • alpha_samples_standardized::Vector{Float64}: Causal effect samples (standardized scale)
  • std_stats::StandardizationStats: Statistics for back-transformation
  • model_type::Symbol: :basic or :hier

Usage

result = fit(problem, MCMCNUTS())
alpha_mean = mean(result.alpha_samples)
ci = credible_interval(result)

See also: BDMLVIResult, AbstractBDMLResult

source
BayesianDoubleML.BDMLVIResultType
BDMLVIResult{P} <: AbstractBDMLResult

Results from Variational Inference (ADVI) approximation.

Type Parameters

  • P: The concrete type of the variational posterior distribution

Fields

  • variational_posterior::P: Variational distribution from AdvancedVI
  • alpha_samples::Vector{Float64}: Causal effect samples (original scale)
  • alpha_samples_standardized::Vector{Float64}: Causal effect samples (standardized scale)
  • std_stats::StandardizationStats: Statistics for back-transformation
  • model_type::Symbol: :basic or :hier
  • variational_family::Symbol: :meanfield, :lowrank, or :fullrank
  • vi_method::Symbol: :unified or :simple (which VI method was used)
  • n_iterations::Int: Number of optimization iterations performed
  • elbo_history::Vector{Float64}: ELBO values during optimization
  • converged::Bool: Whether convergence criteria were met
  • final_elbo::Float64: Final ELBO value

Usage

result = fit(problem, UnifiedVI())
alpha_mean = mean(result.alpha_samples)
println("Converged: ", result.converged)
println("Final ELBO: ", result.final_elbo)

See also: BDMLMCMCResult, AbstractBDMLResult

source
BayesianDoubleML.BDMLDataType
BDMLData

Container for BDML input data with standardized types.

Fields

  • Y::Vector{Float64}: Outcome variable
  • D::Vector{Float64}: Treatment variable
  • X::Matrix{Float64}: Control covariates (n×p)
  • n::Int: Number of observations
  • p::Int: Number of covariates

Constructor

BDMLData(Y, D, X)

Automatically converts inputs to Float64 and validates dimensions.

source
BayesianDoubleML.StandardizationStatsType
StandardizationStats

Statistics from data standardization, used for back-transformation.

Fields

  • Y_mean::Float64: Mean of outcome before standardization
  • Y_sd::Float64: Std dev of outcome before standardization
  • D_mean::Float64: Mean of treatment before standardization
  • D_sd::Float64: Std dev of treatment before standardization
  • X_mean::Vector{Float64}: Means of covariates before standardization
  • X_sd::Vector{Float64}: Std devs of covariates before standardization

Used internally to transform results back to original scale after fitting on standardized data.

source

Extraction Functions

BayesianDoubleML.extract_alphaFunction
extract_alpha(chain::MCMCChains.Chains)

Extract $\alpha$ samples from MCMC chain using the paper's transformation $(Eq. 15)$.

Mathematical Derivation (DiTraglia & Liu 2025)

The BDML model uses a bivariate reduced form parameterization (Section 4):

Structural equation: $Y = \alpha D + X'\beta + \epsilon$, where $\epsilon \perp V$ (Eq. 6)

Reduced form: $Y = X'\delta + U$, where $U = \epsilon + \alpha V$ (Eq. 12) $D = X'\gamma + V$ (Eq. 5)

Since $\epsilon$ is uncorrelated with V by assumption, the covariance between the reduced form errors is: $\text{Cov}(U, V) = \text{Cov}(\epsilon + \alpha V, V) = \alpha \cdot \text{Var}(V)$

Therefore, the causal effect $\alpha$ can be recovered from the error covariance: $\alpha = \text{Cov}(U, V) / \text{Var}(V) = \sigma_{UV} / \sigma^2_V$ (Eq. 15)

Using the correlation parameterization $\sigma_{UV} = \rho \cdot \sigma_U \cdot \sigma_V$: $\alpha = \rho \cdot \sigma_U \cdot \sigma_V / \sigma^2_V = \rho \cdot \sigma_U / \sigma_V$

Implementation

For MCMC with LKJCholesky, the chain stores the Cholesky factor L where: $\rho = L[2,1] / L[1,1]$

Since $L[1,1] = 1$ for correlation matrices, $\rho = L[2,1]$.

This function extracts $\sigma_U$, $\sigma_V$, and $\rho$ from the MCMC chain, then computes: $\alpha = \rho \cdot \sigma_U / \sigma_V$

Arguments

  • chain::MCMCChains.Chains: MCMC chain containing posterior samples

Returns

  • Vector{Float64}: Posterior samples of $\alpha$

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Equation 15.
source
extract_alpha(result::BDMLVIResult)

Extract $\alpha$ samples from a BDMLVIResult.

source
extract_alpha(vi_samples::Matrix{Float64}, p::Int, model_type::Symbol)

Extract α samples from VI posterior samples matrix using paper's Equation 15.

Mathematical Derivation (DiTraglia & Liu 2025, Section 4)

From the bivariate reduced form model (Equations 13-15): Y = X'δ + U, where U = ε + αV (Eq. 12) D = X'γ + V (Eq. 5)

The causal effect α is recovered from error covariance: α = Cov(U, V) / Var(V) = σUV / σ²V (Eq. 15)

For VI models, the correlation is parameterized as ρraw ~ Beta(2, 2) on [0, 1], then transformed to ρ = 2*ρraw - 1 on [-1, 1].

Therefore: α = ρ * σU / σV = (2*ρraw - 1) * σU / σ_V

Arguments

  • vi_samples::Matrix{Float64}: Samples from variational posterior (parameters × samples)
  • p::Int: Number of control variables (covariates)
  • model_type::Symbol: :basic or :hier

Returns

  • Vector{Float64}: Alpha samples (α = ρ·σU / σV)

Parameter Indexing

For hierarchical models:

  • [σ²δ, σ²γ, δ(1:p), γ(1:p), σU, σV, ρ_raw]
  • Indices: σ²δ=1, σ²γ=2, δ=3:(2+p), γ=(3+p):(2+2p), σU=(3+2p), σV=(4+2p), ρ_raw=(5+2p)

For basic models:

  • [δ(1:p), γ(1:p), σU, σV, ρ_raw]
  • Indices: δ=1:p, γ=(p+1):2p, σU=(2p+1), σV=(2p+2), ρ_raw=(2p+3)

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Equation 15.
source
extract_alpha(model::AbstractBDMLModel)

Extract alpha samples from a fitted BDML model.

Delegates to the stored result. Throws an error if the model has not been fitted.

source

Statistical Functions

Coefficient Table

BayesianDoubleML.coeftableFunction
coeftable(result::BDMLMCMCResult; level=0.95)

Compute coefficient table for MCMC results with HPD credible intervals. Uses MCMCChains for ESS and MCSE calculations.

source
coeftable(result::BDMLVIResult; level=0.95)

Compute coefficient table for VI results with HPD credible intervals. Note: ESS and MCSE are not computed for VI results as samples are drawn independently.

source
coeftable(model::AbstractBDMLModel; level=0.95)

Compute coefficient table for a fitted BDML model.

Delegates to the stored result. Throws an error if the model has not been fitted.

source
BayesianDoubleML.BDMLCoeftableType
BDMLCoeftable

Coefficient table following StatsAPI conventions for regression results.

Stores parameter estimates, standard errors, credible intervals, and diagnostics for Bayesian Double Machine Learning models.

Fields

  • coefnames::Vector{String}: Parameter names
  • coef::Vector{Float64}: Point estimates
  • stderror::Vector{Float64}: Standard errors
  • mcse::Vector{Float64}: Monte Carlo standard errors
  • cilower::Vector{Float64}: Lower bound of credible interval
  • ciupper::Vector{Float64}: Upper bound of credible interval
  • pvalue::Vector{Float64}: Two-sided p-values
  • ess::Vector{Float64}: Effective sample sizes (MCMC only)
  • elbo::Union{Float64, Nothing}: Final ELBO value (VI only)
  • level::Float64: Credible interval level (e.g., 0.95)
  • nsamples::Int: Number of posterior samples
  • model_type::Symbol: :basic or :hier
  • method_type::Symbol: :mcmc or :vi

Usage

result = fit(problem, MCMCNUTS())
ct = coeftable(result)
println(ct)  # Pretty-printed table

See also: coeftable, coef, stderror

source

Diagnostics

BayesianDoubleML.confintFunction
confint(ct::BDMLCoeftable)

Return confidence/credible intervals as matrix [lower upper].

source
confint(result::AbstractBDMLResult; level=0.95)

Return confidence/credible intervals for the treatment effect from a BDML result. Computes HPD credible intervals at the specified level.

source
confint(model::AbstractBDMLModel; level=0.95)

Return confidence/credible intervals from a fitted BDML model.

Delegates to the stored result. Throws an error if the model has not been fitted.

source
BayesianDoubleML.essFunction
ess(ct::BDMLCoeftable)

Return effective sample size.

source
ess(result::BDMLMCMCResult)

Compute effective sample size from MCMC chain. Uses MCMCChains.ess for proper multi-chain ESS calculation. Returns the minimum ESS across all parameters as conservative estimate.

source
ess(model::AbstractBDMLModel)

Return effective sample size from a fitted BDML model (MCMC only).

Delegates to the stored result. Throws an error if the model has not been fitted or if the model was fitted with VI.

source
BayesianDoubleML.pvaluesFunction
pvalues(ct::BDMLCoeftable)

Return p-values.

source
pvalues(result::AbstractBDMLResult)

Return p-value for testing H0: α = 0 from a BDML result.

source
pvalues(model::AbstractBDMLModel)

Return p-values from a fitted BDML model.

Delegates to the stored result. Throws an error if the model has not been fitted.

source
BayesianDoubleML.hpd_intervalFunction
hpd_interval(samples::Vector{Float64}; level::Real=0.95)

Compute the Highest Posterior Density (HPD) interval for a vector of samples. HPD intervals are the shortest intervals containing the specified probability mass.

source
BayesianDoubleML.mcseFunction
mcse(result::BDMLMCMCResult)

Compute Monte Carlo Standard Error from MCMC chain. Uses MCMCChains.mcse for proper multi-chain MCSE calculation. Returns the maximum MCSE across all parameters as conservative estimate.

source
mcse(model::AbstractBDMLModel)

Return Monte Carlo standard error from a fitted BDML model (MCMC only).

Delegates to the stored result. Throws an error if the model has not been fitted or if the model was fitted with VI.

source
BayesianDoubleML.rhatFunction
rhat(result::BDMLMCMCResult)

Compute R-hat (potential scale reduction factor) from MCMC chain. R-hat ≈ 1.0 indicates good convergence. Values > 1.05 suggest non-convergence. Returns the maximum R-hat across all parameters as conservative estimate.

source
rhat(model::AbstractBDMLModel)

Return R-hat convergence diagnostic from a fitted BDML model (MCMC only).

Delegates to the stored result. Throws an error if the model has not been fitted or if the model was fitted with VI.

source
BayesianDoubleML.rhat_statisticFunction
rhat_statistic(result::BDMLMCMCResult)

Alias for rhat(result) - compute R-hat convergence diagnostic. R-hat ≈ 1.0 indicates good convergence.

source
rhat_statistic(model::AbstractBDMLModel)

Alias for rhat(model) - compute R-hat convergence diagnostic (MCMC only).

source
BayesianDoubleML.chain_infoFunction
chain_info(result::BDMLMCMCResult)

Get chain summary information: (nchains, nsamplesperchain, total_samples)

source
chain_info(model::AbstractBDMLModel)

Return chain summary information from a fitted BDML model (MCMC only).

Delegates to the stored result. Throws an error if the model has not been fitted or if the model was fitted with VI.

source

StatsAPI Functions

BayesianDoubleML.coefFunction
coef(result::AbstractBDMLResult)

Module-level wrapper for StatsAPI.coef. See StatsAPI.coef for details.

source
BayesianDoubleML.stderrorFunction
stderror(result::AbstractBDMLResult)

Module-level wrapper for StatsAPI.stderror. See StatsAPI.stderror for details.

source
BayesianDoubleML.vcovFunction
vcov(result::AbstractBDMLResult)

Module-level wrapper for StatsAPI.vcov. See StatsAPI.vcov for details.

source

Utility Functions

BayesianDoubleML.credible_intervalFunction
credible_interval(model::AbstractBDMLModel; level=0.95)

Return credible intervals from a fitted BDML model.

Delegates to the stored result. Throws an error if the model has not been fitted.

source
credible_interval(result::BDMLVIResult; level=0.95)

Compute credible interval for α from BDMLVIResult.

source
credible_interval(samples::Vector{Float64}; level=0.95)

Compute credible interval for a vector of samples.

source

Internal Functions

These functions are primarily for internal use but are documented here for developers.

Model Functions

BayesianDoubleML.bdml_basicFunction
bdml_basic(Y, D, X)

Bayesian Double Machine Learning - Basic Model.

Implements the BDML-Basic variation of Algorithm 1 from DiTraglia & Liu (2025), Section 4. This model uses fixed prior variances on the reduced form coefficients.

The Bivariate Reduced Form Model

The BDML approach avoids regularization-induced confounding by using a bivariate reduced form parameterization (Section 4, Equations 13-15):

Outcome equation: $Y = X'\delta + U$ (Eq. 12)

Treatment equation: $D = X'\gamma + V$ (Eq. 5)

Joint error distribution: $[U; V] | X \sim N(0, \Sigma)$ (Eq. 13)

where $\Sigma$ is the $2\times2$ covariance matrix: $\Sigma = [\sigma^2_U, \sigma_{UV}; \sigma_{UV}, \sigma^2_V]$

Causal effect recovery: The causal effect $\alpha$ is recovered from the error covariance via: $\alpha = \text{Cov}(U, V) / \text{Var}(V) = \sigma_{UV} / \sigma^2_V = \rho \cdot \sigma_U / \sigma_V$ (Eq. 15)

This works because $U = \epsilon + \alpha V$ (Eq. 6), where $\epsilon$ is the structural error uncorrelated with V. Therefore: $\text{Cov}(U, V) = \text{Cov}(\epsilon + \alpha V, V) = \alpha \cdot \text{Var}(V)$.

This Variation: BDML-Basic

Priors on reduced form coefficients:

  • $\delta \sim N(0, 25 \cdot I_p)$ [$\text{Normal}(0, 5^2)$ for outcome coefficients]
  • $\gamma \sim N(0, 25 \cdot I_p)$ [$\text{Normal}(0, 5^2)$ for treatment coefficients]

These are fixed-variance priors that provide uniform shrinkage across all covariates. This corresponds to "BDML-Basic" in paper Section 6, Table 1.

Priors on error covariance:

  • $\sigma_U \sim \text{Cauchy}^+(0, 2.5)$ [Half-Cauchy for outcome error scale]
  • $\sigma_V \sim \text{Cauchy}^+(0, 2.5)$ [Half-Cauchy for treatment error scale]
  • $R \sim \text{LKJ}(4)$ [LKJ prior with $\eta=4$ for correlation matrix]

The correlation parameter $\rho$ is extracted from the LKJCholesky factor $R_\text{chol}.L[2,1]$.

For adaptive shrinkage that learns the appropriate regularization from data, use bdml_hier() (BDML-Hierarchical) instead.

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Algorithm 1, Section 6.

See also: bdml_hier, bdml_basic_vi

source
BayesianDoubleML.bdml_hierFunction
bdml_hier(Y, D, X)

Bayesian Double Machine Learning - Hierarchical Model.

Implements the BDML-Hier variation of Algorithm 1 from DiTraglia & Liu (2025), Section 4. This model uses adaptive hierarchical priors for data-driven shrinkage on the reduced form coefficients.

The Bivariate Reduced Form Model

The BDML approach avoids regularization-induced confounding by using a bivariate reduced form parameterization (Section 4, Equations 13-15):

Outcome equation: $Y = X'\delta + U$ (Eq. 12)

Treatment equation: $D = X'\gamma + V$ (Eq. 5)

Joint error distribution: $[U; V] | X \sim N(0, \Sigma)$ (Eq. 13)

where $\Sigma$ is the $2\times2$ covariance matrix: $\Sigma = [\sigma^2_U, \sigma_{UV}; \sigma_{UV}, \sigma^2_V]$

Causal effect recovery: The causal effect $\alpha$ is recovered from the error covariance via: $\alpha = \text{Cov}(U, V) / \text{Var}(V) = \sigma_{UV} / \sigma^2_V = \rho \cdot \sigma_U / \sigma_V$ (Eq. 15)

This works because $U = \epsilon + \alpha V$ (Eq. 6), where $\epsilon$ is the structural error uncorrelated with V. Therefore: $\text{Cov}(U, V) = \text{Cov}(\epsilon + \alpha V, V) = \alpha \cdot \text{Var}(V)$.

This Variation: BDML-Hier

Hierarchical priors on reduced form coefficients: The key difference from BDML-Basic is the adaptive prior structure:

  1. Hyperpriors on variance hyperparameters:

    • $\sigma^2_\delta \sim \text{InvGamma}(2, 2)$
    • $\sigma^2_\gamma \sim \text{InvGamma}(2, 2)$
  2. Conditional priors on coefficients:

    • $\delta | \sigma^2_\delta \sim N(0, \sigma^2_\delta \cdot I_p)$
    • $\gamma | \sigma^2_\gamma \sim N(0, \sigma^2_\gamma \cdot I_p)$

Why this works: This hierarchical structure is equivalent to placing independent $\text{Student-}t(4)$ distributions on each coefficient $\delta_j$ and $\gamma_j$. The $\text{InvGamma}(2, 2)$ hyperprior induces heavy tails that allow some coefficients to escape shrinkage while regularizing others. This provides adaptive shrinkage that learns the appropriate regularization level from the data (Section 6, Table 1).

Priors on error covariance:

  • $\sigma_U \sim \text{Cauchy}^+(0, 2.5)$ [Half-Cauchy for outcome error scale]
  • $\sigma_V \sim \text{Cauchy}^+(0, 2.5)$ [Half-Cauchy for treatment error scale]
  • $R \sim \text{LKJ}(4)$ [LKJ prior with $\eta=4$ for correlation matrix]

The correlation parameter $\rho$ is extracted from the LKJCholesky factor $R_\text{chol}.L[2,1]$.

Recommendation: The paper's simulation study (Section 6, Table 1) shows BDML-Hier achieves better coverage (0.94) compared to BDML-Basic (0.91-0.93), making it the recommended default choice for most applications.

For fixed shrinkage with simpler interpretation, use bdml_basic() instead.

References

  • DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1, Section 4, Algorithm 1, Section 6.

See also: bdml_basic, bdml_hier_vi

source

Data Functions

BayesianDoubleML.DGP.make_plr_DTL2025Function
make_plr_DTL2025(n::Int, p::Int, sigma_epsilon::Real; alpha=2.0, rng=Random.default_rng())

Generate synthetic data matching the simulation design from DiTraglia & Liu (2025), Section 6, Table 1.

This DGP corresponds to the "fixed" design described in Equations (20)-(21) of the paper, which is used to generate the results reported in Table 1.

Data Generating Process (Equations 20-21)

Covariates: $X_i \sim \text{iid Normal}_p(0, I_p)$

Errors: $(\epsilon_i, V_i)' | X \sim \text{iid Normal}([0; 0], [\sigma^2_\epsilon, 0; 0, 1])$

where:

  • $\epsilon_i$: Structural error (uncorrelated with $V_i$)
  • $V_i$: Treatment error (variance = 1)
  • $\sigma_\epsilon$: Standard deviation of $\epsilon$ (varies across simulations)

Coefficient distribution: $\beta \sim \text{Normal}_p(\mu_\beta, \sigma^2_\beta \cdot I_p)$

Fixed parameters (per paper):

  • $\alpha = 2$ (true causal effect)
  • $\gamma = \iota_p / \sqrt{p}$ (treatment coefficients)
  • $\mu_\beta = -\gamma/2$ (mean of outcome coefficients)
  • $\sigma^2_\beta = 1/p$ (variance of outcome coefficients)

Construction:

  • $D_i = X_i'\gamma + V_i$ (Equation 4: Treatment reduced form)
  • $Y_i = \alpha \cdot D_i + X_i'\beta + \epsilon_i$ (Equation 5: Outcome structural)

Arguments

  • n::Int: Number of observations (paper uses: 200)
  • p::Int: Number of covariates (paper uses: 100)
  • sigma_epsilon::Real: Std dev of structural error $\epsilon \in \{1, 2, 4\}$
  • alpha::Real: True causal effect (default: 2.0)
  • rng::AbstractRNG: Random number generator for reproducibility (default: Random.default_rng())

Returns

  • df::DataFrame: DataFrame containing:
    • y::Vector{Float64}: Outcome variable (length n)
    • d::Vector{Float64}: Treatment variable (length n)
    • X1, X2, ..., Xp::Vector{Float64}: Covariates as columns

Extract the components for use with BDMLModel:

df = make_plr_DTL2025(n, p, sigma_epsilon; alpha=2.0, rng)

Paper Reference

Section 6, "Simulation Study", Equations (20)-(21):

DiTraglia, F.J. & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference", arXiv:2508.12688v1.

Table 1 Settings

The paper reports results for three values of $\sigma_\epsilon$:

$\sigma_\epsilon$BDML-Hier CoverageBDML-Hier RMSEBDML-Basic CoverageBDML-Basic RMSE
10.940.090.930.11
20.940.180.910.22
40.940.350.920.46

All settings use n=200, p=100, and $\alpha=2$.

Examples

Basic usage

# Replicate one row of Table 1 (n=200, p=100, σ_ε=2)
using BayesianDoubleML
using Random

df = make_plr_DTL2025(200, 100, 2.0; rng=MersenneTwister(42))

Implementation Notes

Randomness:

  • The function generates new random draws for $X$, $eta$, $psilon$, and $V$ in each call
  • Pass an rng argument (e.g., MersenneTwister(seed)) for reproducibility
  • Following the paper, each replication should use a fresh seed or different seed

Coefficient generation:

  • $\gamma$ is fixed at $\iota_p/\sqrt{p}$ (as specified in paper)
  • $\beta$ is randomly drawn from $N(\mu_\beta, \sigma^2_\beta\cdot I)$ for each replication
  • This mimics the paper's design where $\beta$ varies across replications

Error structure:

  • $\epsilon$ and $V$ are independent (covariance matrix is diagonal)
  • $\text{Var}(V) = 1$ (normalized)
  • $\text{Var}(\epsilon) = \sigma^2_\epsilon$ (varies across simulation settings)

Confounding:

  • $X$ affects both $D$ (through $\gamma$) and $Y$ (through $\beta$)
  • This creates confounding that BDML is designed to handle
  • The specific structure ($\gamma = \iota_p/\sqrt{p}$, $\mu_\beta = -\gamma/2$) creates realistic correlation

See also: make_plr_DTL2025

source

Index