MCMC analysis

Dendros reads Galacticus posterior-sample (“MCMC”) chain logs given the <parameters> config XML used to drive the run, and provides convergence diagnostics, post-burn analyses, parameter-file emission, and corner plots.

Opening a run

open_mcmc() parses the config and returns an MCMCRun. Per-rank <root>_NNNN.log chain files are discovered automatically from the <logFileRoot> entry and loaded lazily on first access of run.chains.

from dendros import open_mcmc

with open_mcmc("mcmcConfig.xml") as run:
    print(run.parameters)            # active model parameters
    chains = run.chains              # ChainSet, one Chain per MPI rank

Both headerless chain files (the standard Galacticus output) and #-prefixed headered chain files (as on the dmConstraintPipeline branch) are supported. When a header is present, its parameter columns are validated against the config. simulation_kind="particleSwarm" configurations include trailing per-row velocity columns; these are split off into Chain.velocity automatically.

Convergence

Brooks-Gelman corrected \(\hat R\) and the non-parametric \(R_{\mathrm{interval}}\) are returned as functions of step:

result = run.gelman_rubin()
step = run.convergence_step(threshold=1.1)

For chains started from an under-dispersed state (where Gelman-Rubin can appear converged before mixing is achieved), Geweke z-scores are a useful secondary diagnostic:

z = run.geweke()                      # (n_chains, n_params)

Outlier-chain detection iteratively applies a two-sided Grubbs test to each chain’s most recent state:

outliers = run.outlier_chains()       # tuple of chain_index values
step = run.convergence_step(drop_chains=outliers)

Grubbs requires the inverse Student-t quantile from scipy.stats, which ships with the optional mcmc extra alongside corner and matplotlib:

pip install 'dendros[mcmc]'

A clear ImportError is raised if outlier_chains is called without scipy installed.

All post-burn methods accept post_burn=None (the default), which runs gelman_rubin() and convergence_step() internally to pick a burn point. Pass an explicit integer for full control.

Mixing diagnostics

tau = run.autocorrelation_time(post_burn=step)        # per parameter
ess = run.effective_sample_size(post_burn=step)       # per parameter
rate = run.acceptance_rate(post_burn=step)            # per chain
trace = run.acceptance_rate_trace(window=30, post_burn=step)

Maximum posterior, sampling, PCA, and MVN fits

res = run.maximum_posterior()
print(res.state, res.log_posterior, res.chain_index, res.step)

import numpy as np
samples = run.posterior_samples(
    n=1000, post_burn=step, rng=np.random.default_rng(42),
)

pca = run.projection_pursuit(post_burn=step)
print(pca.eigenvalues)            # ascending — smallest = best constrained
print(pca.latex_summary(0))

fit = run.multivariate_normal_fit(post_burn=step)
fit.write_reparameterization_config("reparam.xml")

The reparameterization config declares metaParameter{i} as active unit-normal parameters truncated to \(\pm n_\sigma\) (default 5), together with the original parameters as derived expressions of those metas. Re-running the MCMC against this config samples in coordinates where the posterior is approximately spherical.

Emitting parameter files

For likelihoods that derive from posteriorSampleLikelihoodBaseParameters, a state vector can be written into a Galacticus parameter file by reusing the leaf’s <baseParametersFileName>:

res = run.maximum_posterior()
run.write_parameter_file(res.state, "max_post.xml")

For independentLikelihoods configs, each leaf has its own base file and <parameterMap>; one file is written per leaf:

run.write_parameter_files(res.state, "out_dir")

Chain values are stored in physical (model) space — Galacticus applies the inverse of operatorUnaryMapper before writing each row — so no mapper inversion is performed at emission time.

Galacticus’s parameter selectors are supported in active-parameter <name> paths:

  • a/b and a::b — element navigation (both separators are accepted).

  • a[2] — 1-based integer instance selector.

  • a[@value='x'] — element-with-matching-value-attribute selector.

Corner plots

corner_plot() is a thin wrapper around corner.corner() that defaults to plotting every active parameter with LaTeX labels derived from the config.

fig = run.corner_plot(post_burn=step)
fig = run.corner_plot(parameters=["alpha", "beta"], post_burn=step)

The optional mcmc extra (scipy, corner, matplotlib) is required for both outlier_chains() and corner_plot():

pip install 'dendros[mcmc]'

End-to-end example

import numpy as np
from dendros import open_mcmc

with open_mcmc("mcmcConfig.xml") as run:
    outliers = run.outlier_chains()
    step = run.convergence_step(threshold=1.1, drop_chains=outliers)
    if step is None:
        raise RuntimeError("MCMC did not converge on the default grid.")

    ess = run.effective_sample_size(post_burn=step)
    print(f"ESS per parameter: {dict(zip(run.config.parameter_names, ess))}")

    fit = run.multivariate_normal_fit(
        post_burn=step, drop_chains=outliers,
    )
    fit.write_reparameterization_config("reparam.xml")

    map_ = run.maximum_posterior(drop_chains=outliers)
    run.write_parameter_files(map_.state, "max_posterior")

    samples = run.posterior_samples(
        n=200, post_burn=step, drop_chains=outliers,
        rng=np.random.default_rng(0),
    )
    for i, state in enumerate(samples.state):
        run.write_parameter_files(state, f"samples/{i:04d}")

    fig = run.corner_plot(post_burn=step, drop_chains=outliers)
    fig.savefig("corner.png")