API Reference

dendros.open_outputs(path, output_root='Outputs')[source]

Open a Galacticus output collection.

Parameters:
  • path (str | Path | List[str | Path] | Mapping[str, str | Path | List[str | Path]]) –

    One of:

    • A single filename – e.g. "galacticus.hdf5". If sibling MPI-rank files (*_MPI:????) exist they are included automatically.

    • A glob string – e.g. "run*/galacticus*.hdf5".

    • An explicit list of filenames.

    • A dict {label: path-or-paths} to open several models for side-by-side comparison. Equivalent to open_models(path); returns a ModelCollection.

  • output_root (str) – Top-level HDF5 group containing the Output* groups. Defaults to "Outputs". Pass "Lightcone" for lightcone runs or any other custom group name as needed.

Returns:

  • Collection – For the single-, glob-, or list-of-files forms (one model, possibly MPI-split).

  • ModelCollection – For the dict form (one entry per model, suitable for multi-model comparison plots).

Raises:
Return type:

Collection | ModelCollection

Examples

Open a single file:

c = open_outputs("galacticus.hdf5")

Auto-detect MPI-split files (given any one rank’s file):

c = open_outputs("galacticus_MPI:0000.hdf5")

Open via glob:

c = open_outputs("run001/galacticus*.hdf5")

Open an explicit list:

c = open_outputs(["file_a.hdf5", "file_b.hdf5"])

Open several models for comparison plots (see also open_models()):

m = open_outputs({"Fiducial": "fid.hdf5", "Variant": "var.hdf5"})
figs = m.plot_analyses()

Lightcone mode:

c = open_outputs("lightcone.hdf5", output_root="Lightcone")
dendros.open_models(models, output_root='Outputs')[source]

Open several Galacticus runs as a labelled ModelCollection.

Each entry is opened with open_outputs(), so a single filename auto-detects its *_MPI:???? peers and a list-of-files entry is accepted as-is. The returned ModelCollection can be passed directly to plot_analyses() to overlay analyses from every model on a single figure.

Parameters:
  • models (Mapping[str, str | Path | List[str | Path]] | Sequence[str | Path | List[str | Path]]) – Either a dict {label: path-or-paths} — keys become legend labels — or a list of path-or-paths, in which case default labels are derived from each model’s primary file stem (with any :MPIxxxx suffix stripped).

  • output_root (str) – Forwarded to open_outputs().

Return type:

ModelCollection

Raises:

ValueError – If a list form produces duplicate default labels. Pass a dict with explicit labels to disambiguate.

Examples

Compare two models with explicit labels:

with open_models({"Fiducial": "fid.hdf5", "Variant": "var.hdf5"}) as m:
    figs = dendros.plot_analyses(m)

Or with labels derived from filenames:

models = open_models(["fid.hdf5", "var.hdf5"])
class dendros.Collection(files, output_root='Outputs')[source]

Bases: object

A collection of one or more Galacticus HDF5 output files.

Prefer constructing instances through open_outputs() rather than calling this constructor directly.

Parameters:
  • files (List[str]) – Paths to the HDF5 files to open (read-only).

  • output_root (str) – Name of the top-level HDF5 group that contains the Output* subgroups. Defaults to "Outputs".

Examples

>>> from dendros import open_outputs
>>> with open_outputs("galacticus.hdf5") as c:
...     c.validate_completion()
...     print(c.list_outputs())
...     data = c.read("Output1", ["nodeData/haloMass"])
property files: List[str]

Paths of the files in this collection (in order).

property output_root: str

Top-level HDF5 group containing the Output* groups.

keys()[source]

Return the top-level group keys of the primary file.

Return type:

List[str]

validate_completion(mode='error')[source]

Check that all files report a successful completion status.

Galacticus writes a statusCompletion attribute to the root of the HDF5 file when it finishes. This method verifies that the attribute equals "complete" for every file in the collection.

Parameters:

mode (str) –

What to do when an incomplete file is found:

Raises:
  • ValueError – If mode is not one of the accepted values.

  • RuntimeError – If mode is "error" and at least one file is incomplete.

Return type:

None

property outputs: OutputIndex

An OutputIndex for this collection.

The index is scanned lazily on first access and then cached.

list_outputs(format='astropy')[source]

Return a table of available outputs.

Scans all Output* groups inside /{output_root}/ and extracts outputTime and outputExpansionFactor attributes. Redshift is computed as z = 1/a − 1.

Parameters:

format (str) – "astropy" (default) returns an astropy.table.Table; "pandas" returns a pandas.DataFrame; "tabulate" returns a str formatted using the tabulate library.

Return type:

astropy.table.Table, pandas.DataFrame, or tabulate-formatted string

list_properties(output, format='astropy')[source]

Return a table of datasets available in the nodeData group.

Parameters:
  • output (str | int) – Output name (e.g. "Output1") or 1-based integer index.

  • format (str) – "astropy" (default), "pandas", or "tabulate".

Return type:

astropy.table.Table, pandas.DataFrame, or tabulate-formatted string

read(output, datasets, where=None)[source]

Read one or more datasets from an output group.

For multi-file collections, arrays from all files are concatenated along axis 0 before any selection is applied.

Parameters:
  • output (str | int) – Output name (e.g. "Output1") or 1-based integer index.

  • datasets (List[str] | Dict[str, str]) – Either a list of relative dataset paths under the output group (e.g. ["nodeData/haloMass"]), in which case the same strings are used as dict keys in the return value; or a dict mapping user-chosen labels to relative paths.

  • whereNone reads all rows. A boolean mask array of length N_total or an integer index array selects a subset.

Returns:

Mapping from dataset name / label to numpy.ndarray.

Return type:

dict

Notes

The unitsInSI attribute is preserved in the raw array values but not yet applied. Future versions will optionally return astropy.units.Quantity objects.

trace_history(ids, properties, outputs=None, *, id_dataset='nodeData/nodeUniqueIDBranchTip', on_duplicate_file_match='error', int_sentinel=-1)[source]

Trace the history of specified galaxies across outputs.

Convenience wrapper around dendros.trace_galaxy_history(). See that function for full parameter and return-value documentation.

Parameters:
Return type:

Dict[str, ndarray]

list_analyses(format='astropy')[source]

Return a table of function1D analyses in /analyses.

Convenience wrapper around dendros.list_analyses().

Parameters:

format (str)

plot_analyses(name=None, output_directory=None, *, show_target=True, figsize=(7.0, 5.0), dpi=120, file_format='pdf')[source]

Plot function1D analyses from /analyses.

Convenience wrapper around dendros.plot_analyses().

Parameters:
close()[source]

Close all open HDF5 file handles.

Return type:

None

class dendros.ModelCollection[source]

Bases: dict

A dict mapping label → Collection, one entry per model.

Returned by open_models() and accepted by plot_analyses() so analyses from several Galacticus runs can be overlaid on a single figure for comparison.

Acts as a regular dict but also supports the context-manager protocol — __exit__ closes every contained Collection.

close()[source]

Close every contained Collection.

Each close is attempted independently; if one fails the others are still closed and a UserWarning is emitted naming the failing model so the problem is visible.

Return type:

None

list_analyses(format='astropy')[source]

Return the union of function1D analyses across all models.

Each row carries an extra models column listing the labels of the models that contain the analysis (sorted, comma-separated). Per-row metadata (description, axis labels, log flags, target presence) is taken from the first model that supplies the analysis.

Parameters:

format (str)

plot_analyses(name=None, output_directory=None, *, show_target=True, figsize=(7.0, 5.0), dpi=120, file_format='pdf')[source]

Plot function1D analyses overlaid across every model.

Convenience wrapper around dendros.plot_analyses() applied to this ModelCollection. Labels come from this dict’s keys; the target overlay is drawn once.

Parameters:
class dendros.OutputIndex(collection)[source]

Bases: object

Index of all Output* groups found in a Collection.

Instances are obtained via outputs.

Parameters:

collection (Collection) – The parent Collection.

table(format='astropy')[source]

Return a table of output metadata.

Parameters:

format (str) – "astropy" (default) returns an astropy.table.Table; "pandas" returns a pandas.DataFrame.

Return type:

astropy.table.Table or pandas.DataFrame

class dendros.OutputMeta(name, path, index, time, scale_factor, redshift)[source]

Bases: object

Metadata for a single Galacticus output snapshot.

Parameters:
name

Name of the output group (e.g. "Output1").

Type:

str

path

Full HDF5 path to the output group.

Type:

str

index

1-based sequential index (order of discovery, sorted numerically).

Type:

int

time

Output time. Units are determined by the Galacticus configuration (typically Gyr), or None if the attribute is absent.

Type:

float | None

scale_factor

Expansion factor a, or None if absent.

Type:

float | None

redshift

Redshift z = 1/a − 1, computed from scale_factor, or None.

Type:

float | None

name: str
path: str
index: int
time: float | None
scale_factor: float | None
redshift: float | None
dendros.sfh_collapse_metallicities(dataset)[source]

Collapse a formation history over metallicity.

Collapses (sums) star formation histories over the metallicity axis. If fixed times were used a 2D numpy.ndarray is returned, and any empty entries are filled with zeros. Otherwise, a list of 1D :class:`numpy.ndarray`s is returned.

Parameters:

dataset (DatasetProxy) – The dataset containing the star formation history data.

dendros.sfh_times(dataset)[source]

Return times associated with a star formation history.

Returns None if no fixed times are associated with this star formation history.

Parameters:

dataset (DatasetProxy) – The dataset containing the star formation history data.

dendros.trace_galaxy_history(collection, ids, properties, outputs=None, *, id_dataset='nodeData/nodeUniqueIDBranchTip', on_duplicate_file_match='error', int_sentinel=-1)[source]

Extract per-galaxy property histories across Galacticus outputs.

Galaxies are traced across Output* groups via an integer branch-tip identifier (usually nodeUniqueIDBranchTip) that is constant over time for a given object and unique within a single HDF5 file. For each requested property and each chosen output, this function locates every requested ID in every file of the collection, assembles the per-galaxy slice, and stacks the results along a trailing “output” axis.

Slots where a galaxy is absent at a given output are filled with numpy.nan (floating-point properties, and the time, redshift and expansion_factor metadata arrays), with int_sentinel (integer properties), or with False (boolean properties). The returned present mask is the canonical indicator of presence/absence and should be preferred to sentinel checks.

Parameters:
  • collection (Collection) – An open Collection.

  • ids – Array-like of integer nodeUniqueIDBranchTip values to trace. Coerced to numpy.ndarray of int64. Input order is preserved along the first axis of every returned array.

  • properties (Union[List[str], Dict[str, str]]) – Either a list of relative dataset paths under each Output* group (e.g. ["nodeData/basicMass"]), matching Collection.read(), or a dict mapping user-chosen labels to relative paths.

  • outputs (Optional[Sequence[Union[int, str]]]) – Optional iterable selecting a subset of outputs to include. Each element may be a 1-based integer (e.g. 3) or a group name (e.g. "Output3"). A range is accepted. Defaults to all outputs in the collection, in temporal order.

  • id_dataset (str) – Relative path of the tracing ID dataset under each Output* group. Defaults to "nodeData/nodeUniqueIDBranchTip".

  • on_duplicate_file_match (str) –

    What to do if the same ID is found in more than one file at the same output (IDs are only unique within a file in multi-file collections):

    • "error" (default) – raise ValueError.

    • "warn" – emit a UserWarning and keep the first file’s match.

    • "first" – silently keep the first file’s match.

  • int_sentinel (int) – Missing-slot value used for integer-typed properties. Defaults to -1.

Returns:

Contains:

  • one entry per property – numpy.ndarray of shape (n_galaxies,) + per_galaxy_tail + (n_outputs,). A 1-D source dataset yields a 2-D (n_galaxies, n_outputs) array; a 2-D dataset of shape (N, W) yields a 3-D (n_galaxies, W, n_outputs) array; and so on.

  • "time" – float array (n_galaxies, n_outputs) of output times, NaN where the galaxy is absent.

  • "redshift" – float array (n_galaxies, n_outputs) of redshifts, NaN where the galaxy is absent.

  • "expansion_factor" – float array (n_galaxies, n_outputs) of expansion factors, NaN where the galaxy is absent.

  • "present" – bool array (n_galaxies, n_outputs) that is True exactly where the galaxy was located.

  • "output_names" – 1-D object array of output group names in temporal order.

  • "ids" – 1-D int64 array of normalized input IDs.

Return type:

dict

Raises:
  • KeyError – If id_dataset is not present in any chosen output of any file (e.g. the Galacticus run did not emit nodeUniqueIDBranchTip), or if a requested property is missing from a chosen output.

  • ValueError – If properties contains a reserved label, if outputs is empty, if the tail shape of a property differs between outputs, or (by default) if an ID appears in more than one file at the same output.

  • NotImplementedError – If a property has a dtype other than integer, floating, or boolean.

Notes

A galaxy need not be present at every output (it may have formed later or merged earlier); ragged histories are expected. Requesting IDs that are never found anywhere produces a UserWarning rather than an error, since exploratory workflows often probe IDs of uncertain provenance.

dendros.list_analyses(collection, format='astropy')[source]

Return a table of function1D analyses available in the collection.

Parameters:
  • collection (Collection) – A Collection. Only the primary file is consulted — for MPI runs, the /analyses data has been reduced over all ranks and is identical in every file.

  • format (str) – "astropy" (default), "pandas", or "tabulate".

Return type:

astropy.table.Table, pandas.DataFrame, or tabulate-formatted string

Raises:

KeyError – If the file has no top-level /analyses group.

dendros.plot_analyses(collection, name=None, output_directory=None, *, labels=None, show_target=True, figsize=(7.0, 5.0), dpi=120, file_format='pdf')[source]

Plot one, several, or all function1D analyses.

A single Collection produces one model curve per figure (legacy behaviour). A list, dict, or ModelCollection of Collections overlays one curve per model on each figure, plotting the target/observational overlay once (since it is shared across models). The union of analyses discovered across models is plotted — figures whose analysis is absent from a given model simply do not include its curve.

Parameters:
  • collection (_MultiInput) – A Collection; a sequence of Collections; or a mapping {label: Collection} (e.g. one returned by open_models()).

  • name (Union[None, str, List[str]]) – None (default) plots every function1D analysis discovered across all models. A single name (str) or list of names plots only those.

  • output_directory (Union[None, str, 'Path']) – If given, each figure is also saved as <output_directory>/<safe_name>.<file_format>. The directory is created if it does not exist.

  • labels (Optional[Sequence[str]]) – Optional sequence of legend labels, one per Collection, used only when collection is a list/tuple of Collections. When omitted, each model is labelled by its primary file’s stem (with any :MPIxxxx suffix stripped). Cannot be combined with a dict input.

  • show_target (bool) – If True (default), overlay target/observational data when present. For multi-model plots the target is plotted only once, from the first model that has it.

  • figsize (Tuple[float, float]) – Forwarded to matplotlib.

  • dpi (int) – Forwarded to matplotlib.

  • file_format (str) – Forwarded to matplotlib.

Returns:

Mapping from analysis name to matplotlib.figure.Figure.

Return type:

dict

Raises:
  • KeyError – If a model has no /analyses group, or if a requested name is missing from every model.

  • ImportError – If matplotlib is not installed; install with pip install     'dendros[plot]'.

MCMC

Entry point

dendros.open_mcmc(config_path)[source]

Open an MCMC run by parsing its config XML.

Parameters:

config_path (str | Path) – Path to the Galacticus MCMC <parameters> XML file.

Return type:

MCMCRun

Examples

>>> from dendros import open_mcmc
>>> with open_mcmc("mcmcConfig.xml") as run:
...     print(run.parameters)
...     chains = run.chains
class dendros.MCMCRun(config)[source]

Bases: object

An MCMC run, parsed from its config file and lazily backed by chain data.

Construct via open_mcmc(). The chain files are not read until chains is first accessed; subsequent accesses return the cached ChainSet.

Parameters:

config (MCMCConfig) – Parsed MCMCConfig.

property config: MCMCConfig

The parsed MCMCConfig.

property parameters: Tuple[ModelParameter, ...]

Active model parameters, in chain-file column order.

property chains: ChainSet

Lazily-loaded ChainSet for this run.

gelman_rubin(*, drop_chains=(), step_grid=None, n_grid=200, min_steps=10, alpha_interval=0.15)[source]

Convenience wrapper around dendros.gelman_rubin().

Parameters:
Return type:

RhatResult

convergence_step(*, threshold=1.1, sustained_for=1, drop_chains=(), step_grid=None, n_grid=200, min_steps=10)[source]

First simulation-step count at which max-Rhat is sustained below threshold.

Computes a Gelman-Rubin trace via gelman_rubin() and returns the RhatResult.steps value at which convergence is first declared. Returns None if convergence is never reached on the chosen grid.

Parameters:
Return type:

int | None

geweke(*, first=0.1, last=0.5)[source]

Convenience wrapper around dendros.geweke().

Parameters:
Return type:

ndarray

outlier_chains(*, alpha=0.05, max_outliers=10, parameters=None)[source]

Convenience wrapper around dendros.outlier_chains().

Parameters:
Return type:

Tuple[int, …]

acceptance_rate(*, post_burn=None)[source]

Convenience wrapper around dendros.acceptance_rate().

Parameters:

post_burn (int | None)

Return type:

ndarray

acceptance_rate_trace(*, window=30, post_burn=0)[source]

Convenience wrapper around dendros.acceptance_rate_trace().

Parameters:
  • window (int)

  • post_burn (int)

autocorrelation_time(*, post_burn=None, c=5.0)[source]

Convenience wrapper around dendros.autocorrelation_time().

Parameters:
Return type:

ndarray

effective_sample_size(*, post_burn=None, c=5.0)[source]

Convenience wrapper around dendros.effective_sample_size().

Parameters:
Return type:

ndarray

maximum_posterior(*, drop_chains=())[source]

Convenience wrapper around dendros.maximum_posterior().

Parameters:

drop_chains (Sequence[int])

Return type:

MaxResult

maximum_likelihood(*, drop_chains=())[source]

Convenience wrapper around dendros.maximum_likelihood().

Parameters:

drop_chains (Sequence[int])

Return type:

MaxResult

posterior_samples(n, *, post_burn=None, drop_chains=(), rng=None, replace=None)[source]

Convenience wrapper around dendros.posterior_samples().

Parameters:
Return type:

PosteriorSamples

projection_pursuit(*, post_burn=None, drop_chains=())[source]

Convenience wrapper around dendros.projection_pursuit().

Parameters:
Return type:

ProjectionPursuitResult

multivariate_normal_fit(*, post_burn=None, drop_chains=())[source]

Convenience wrapper around dendros.multivariate_normal_fit().

Parameters:
Return type:

MVNFit

write_parameter_file(state, out_path, *, likelihood_index=0)[source]

Emit a single Galacticus parameter file for one likelihood leaf.

Reads leaves[likelihood_index].base_parameters_file, applies the leaf’s parameter_map (or the full state when no map is set), and writes the result to out_path.

Parameters:
  • state(n_params,) state vector in physical (model) space. Galacticus stores chain rows in physical space, so a row from maximum_posterior() / posterior_samples() can be passed directly.

  • out_path – Output path. Parent directory is created if missing.

  • likelihood_index (int) – Which leaf of the likelihood tree to use. Defaults to 0.

Returns:

Resolved output path.

Return type:

pathlib.Path

corner_plot(*, parameters=None, post_burn=None, drop_chains=(), labels=None, **corner_kwargs)[source]

Convenience wrapper around dendros.corner_plot().

Parameters:
write_parameter_files(state, out_dir, *, name_format=None)[source]

Emit one parameter file per likelihood leaf into out_dir.

For independentLikelihoods configs each leaf has its own baseParametersFileName and parameterMap; this writes one file per leaf, with each file’s filename derived from the base file’s stem.

Parameters:
  • state(n_params,) state vector in physical space.

  • out_dir – Output directory (created if missing).

  • name_format (str | None) – Output-filename format string accepting leaf_index and stem. Defaults to "{stem}.xml" for a single leaf and "{leaf_index:02d}_{stem}.xml" for multiple.

Returns:

One entry per leaf, in document order.

Return type:

list of (leaf_index, pathlib.Path)

Configuration

dendros.parse_mcmc_config(path)[source]

Parse a Galacticus MCMC <parameters> XML file.

Parameters:

path (str | Path) – Path to the MCMC configuration XML.

Return type:

MCMCConfig

Raises:
  • FileNotFoundError – If path does not exist.

  • ValueError – If the file’s root element is not <parameters>, or if the required posteriorSampleSimulation / logFileRoot elements are missing.

class dendros.MCMCConfig(config_path, log_file_root, simulation_kind, parameters, likelihood)[source]

Parsed Galacticus MCMC configuration.

Parameters:
config_path

Absolute path to the parsed XML file.

Type:

pathlib.Path

log_file_root

Resolved chain log-file root (relative paths resolved against config_path’s directory). Per-rank chain files are at f"{log_file_root}_{rank:04d}.log".

Type:

pathlib.Path

simulation_kind

Value attribute of posteriorSampleSimulation, e.g. "differentialEvolution" or "particleSwarm". Determines whether chain rows carry trailing per-particle velocity columns.

Type:

str

parameters

Tuple of active ModelParameter entries in document order. This is the canonical ordering used by chain-file columns.

Type:

Tuple[dendros._mcmc._config.ModelParameter, …]

likelihood

Root of the posteriorSampleLikelihood tree, or None if the config lacks a likelihood block.

Type:

dendros._mcmc._config.Likelihood | None

property parameter_names: Tuple[str, ...]

Tuple of parameter name strings, in column order.

state_indices_for(leaf)[source]

Indices of the global state vector applicable to leaf.

For a leaf inside an independentLikelihoods subtree, returns the positions in parameters named in the leaf’s parameter_map. For other leaves, returns (0, 1, ..., n_params - 1) (identity).

Parameters:

leaf (Likelihood) – A Likelihood from Likelihood.leaves().

Raises:

KeyError – If a name in parameter_map isn’t among the active parameters.

Return type:

Tuple[int, …]

class dendros.ModelParameter(name, label=None, prior=None, mapper='identity', perturber=None)[source]

A single <modelParameter value="active"> entry from the config.

Parameters:
name

Galacticus parameter path, e.g. "haloMassFunctionParameters/a".

Type:

str

label

Optional LaTeX label for plotting. None when the config omits the <label> sub-element. Use display_label to obtain a plottable string regardless.

Type:

str | None

prior

Parsed distributionFunction1DPrior block, if present.

Type:

dendros._mcmc._config.PriorSpec | None

mapper

Value of operatorUnaryMapper; defaults to "identity".

Type:

str

perturber

Parsed distributionFunction1DPerturber block, if present.

Type:

dendros._mcmc._config.PerturberSpec | None

property display_label: str

label if set, else the trailing component of name.

Type:

A plottable label

class dendros.Likelihood(kind, base_parameters_file=None, parameter_map=None, children=<factory>)[source]

A node in the posteriorSampleLikelihood tree.

Parameters:
kind

Value attribute of the posteriorSampleLikelihood element.

Type:

str

base_parameters_file

Resolved path to the baseParametersFileName element’s value when present. None for non-leaf nodes (e.g. independentLikelihoods without a base file of its own).

Type:

pathlib.Path | None

parameter_map

For children of posteriorSampleLikelihoodIndependentLikelihoods, the parsed <parameterMap value="space separated names"/> for this child. Each entry is a parameter name from the active model parameters. None outside of an independentLikelihoods context, in which case identity mapping (all active parameters) is implied.

Type:

Tuple[str, …] | None

children

Tuple of child Likelihood instances. Empty for leaves.

Type:

Tuple[dendros._mcmc._config.Likelihood, …]

leaves()[source]

Flatten the tree to its leaf likelihoods (in document order).

Return type:

Tuple[Likelihood, …]

class dendros.PriorSpec(kind, params)[source]

Prior on a single model parameter.

Parameters:
kind

Value of the distributionFunction1DPrior element, e.g. "uniform" or "normal".

Type:

str

params

Mapping of sub-element name to its parsed value (as a float when possible, else the raw string). For uniform priors the keys are "limitLower" and "limitUpper"; for (truncated) normal they are "mean", "variance", and optionally "limitLower" / "limitUpper".

Type:

dict

class dendros.PerturberSpec(kind, params)[source]

Perturber on a single model parameter.

Parameters:
kind

Value of the distributionFunction1DPerturber element.

Type:

str

params

Mapping of sub-element name to its parsed value.

Type:

dict

Chains

dendros.read_chains(config)[source]

Discover and read all per-rank chain files for config.

Parameters:

config (MCMCConfig) – Parsed MCMCConfig.

Return type:

ChainSet

Raises:

FileNotFoundError – If no chain files are found at config.log_file_root.

class dendros.Chain(chain_index, path, step, eval_time, converged, log_posterior, log_likelihood, state, velocity=None)[source]

One MPI rank’s MCMC chain.

Parameters:
chain_index

MPI rank, parsed from the _NNNN.log filename suffix.

Type:

int

path

Source log-file path.

Type:

pathlib.Path

step

Integer simulation-step index, one per row.

Type:

numpy.ndarray

eval_time

Wall-clock evaluation time per step, in seconds.

Type:

numpy.ndarray

converged

Boolean flag indicating whether the simulation had declared convergence at this step.

Type:

numpy.ndarray

log_posterior

Log posterior probability per step.

Type:

numpy.ndarray

log_likelihood

Log likelihood per step.

Type:

numpy.ndarray

state

(n_steps, n_params) array of parameter values, in MCMCConfig.parameters order. Values are in physical (model) space — Galacticus applies the inverse of operatorUnaryMapper before writing.

Type:

numpy.ndarray

velocity

(n_steps, n_params) array of per-parameter particle velocities for particleSwarm simulations; None for differential-evolution and other state-only simulations.

Type:

numpy.ndarray | None

class dendros.ChainSet(config, chains)[source]

An ordered collection of Chain objects from one MCMC run.

Iteration yields chains in MPI-rank order.

Parameters:
post_burn(burn)[source]

Return a new ChainSet with the first burn steps dropped from each chain.

Parameters:

burn (int) – Number of leading steps to discard. Must be non-negative.

Return type:

ChainSet

concatenated(*, burn=0, drop_chains=())[source]

Return a single (n_total_post_burn, n_params) state array.

Concatenates the post-burn segments of every chain not listed in drop_chains, preserving chain order.

Parameters:
  • burn (int) – Number of leading steps to discard from each chain.

  • drop_chains (Sequence[int]) – Iterable of chain_index values to exclude entirely.

Return type:

ndarray

Convergence

dendros.gelman_rubin(chains, *, drop_chains=(), step_grid=None, n_grid=200, min_steps=10, alpha_interval=0.15)[source]

Brooks-Gelman corrected Rhat as a function of simulation step.

For each chosen truncation point s the first s rows of every surviving chain are used to compute the standard between-chain (B) and within-chain (W) variances and the Brooks-Gelman corrected potential-scale reduction factor \(\hat{R}_c\). The non-parametric interval-length ratio \(R_{\rm interval}\) (Brooks & Gelman 1998 section 1.3) is also computed at the same evaluation points.

Parameters:
  • chains (ChainSet) – ChainSet to evaluate. Must contain at least two non-dropped chains and at least min_steps rows per chain.

  • drop_chains (Sequence[int]) – Iterable of chain_index values to exclude before computing. Use this with the indices returned by outlier_chains().

  • step_grid (Sequence[int] | None) – Optional explicit 1-D iterable of truncation step counts (1-based). When given, n_grid and min_steps are ignored.

  • n_grid (int) – Number of evenly-spaced evaluation points to use when step_grid is None. Capped at the shortest surviving chain length minus min_steps + 1.

  • min_steps (int) – Smallest truncation step count to evaluate. Must be >= 2.

  • alpha_interval (float) – Two-sided significance level for R_interval (default 0.15, i.e. 85 % credible intervals — matches the Galacticus Perl reference).

Return type:

RhatResult

Raises:

ValueError – If fewer than two chains survive drop_chains or min_steps is too small.

class dendros.RhatResult(steps, Rhat_c, R_interval, parameter_names, alpha_interval, chains_used)[source]

Result of gelman_rubin().

Parameters:
steps

(n_eval,) 1-D array of truncation step counts at which Rhat was computed (i.e. each entry s means “use the first s rows of every chain”). These are 1-based step counts so the smallest value is the chosen min_steps.

Type:

numpy.ndarray

Rhat_c

(n_eval, n_params) array of Brooks-Gelman corrected potential-scale reduction factors.

Type:

numpy.ndarray

R_interval

(n_eval, n_params) array of non-parametric interval-length ratios (mixed-chain credible interval / mean per-chain credible interval) at the chosen alpha_interval.

Type:

numpy.ndarray

parameter_names

Names of the parameters along axis=1.

Type:

Tuple[str, …]

alpha_interval

Significance level used to compute R_interval.

Type:

float

chains_used

chain_index values of the chains that contributed (after drop_chains was applied).

Type:

Tuple[int, …]

Rhat_c_max:

Per-step max-over-parameters of Rhat_c, useful as the input to convergence_step().

Rhat_c_max()[source]

Return (n_eval,) max-over-parameters of Rhat_c.

Return type:

ndarray

dendros.convergence_step(rhat_max, *, threshold=1.1, sustained_for=1)[source]

Index into the Rhat grid at which convergence is first declared.

Searches for the smallest index i such that every entry of rhat_max[i : i + sustained_for] is at or below threshold.

Parameters:
  • rhat_max (ndarray) – 1-D array of (max-over-parameters) Rhat values, e.g. RhatResult.Rhat_c_max().

  • threshold (float) – Convergence threshold. Defaults to 1.1.

  • sustained_for (int) – Number of consecutive grid points that must all be below the threshold before convergence is declared. Defaults to 1 (strict first crossing).

Returns:

Grid index at which convergence is first sustained, or None if the threshold is never met.

Return type:

int or None

Notes

Use RhatResult.steps to translate the returned grid index to a simulation-step count.

dendros.geweke(chains, *, first=0.1, last=0.5)[source]

Per-chain Geweke z-scores comparing the means of two chain segments.

For each chain and each parameter, returns

\[z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s^2_1/n_1 + s^2_2/n_2}}\]

where segment 1 covers the first first fraction of the chain and segment 2 covers the last last fraction. Large |z| for any parameter suggests the chain has not yet reached a stationary distribution — useful when the chains were started from an under-dispersed state (which makes Gelman-Rubin uninformative).

Parameters:
  • chains (ChainSet) – ChainSet.

  • first (float) – Fractions in (0, 1) for the lengths of the two segments. By default first=0.1 and last=0.5 (Geweke’s original recommendation).

  • last (float) – Fractions in (0, 1) for the lengths of the two segments. By default first=0.1 and last=0.5 (Geweke’s original recommendation).

Returns:

(n_chains, n_params) z-score array. Chains shorter than 4 rows in either segment yield NaN.

Return type:

np.ndarray

Notes

The variance estimator used here is the simple sample variance, which treats each draw as independent. Autocorrelated chains will produce artificially-large |z|; once a proper integrated-autocorrelation-time estimator lands (Phase 3) this can be inflated by the ACL to recover the classical spectral-density-at-zero variant.

dendros.outlier_chains(chains, *, alpha=0.05, max_outliers=10, parameters=None)[source]

Iterative two-sided Grubbs test on each chain’s final state.

Each chain contributes its last row (the most recent state) as a single multivariate point. The Grubbs test is applied iteratively over the active chains, dropping the chain whose maximum per-parameter deviation exceeds the critical value at each step, until none exceed it or max_outliers chains have been removed.

Parameters:
  • chains (ChainSet) – ChainSet. Must contain at least three chains.

  • alpha (float) – Two-sided significance level. Defaults to 0.05 to match the Galacticus Perl reference’s hard-coded value.

  • max_outliers (int) – Maximum number of chains to declare as outliers.

  • parameters (Iterable[str] | None) – Optional iterable of parameter names to restrict the test to a subset. Unknown names raise KeyError.

Returns:

chain_index values of the chains flagged as outliers, in the order they were removed.

Return type:

tuple of int

Mixing diagnostics

dendros.autocorrelation_function(chains, *, post_burn=0, max_lag=None)[source]

Per-chain, per-parameter normalized autocorrelation function.

Parameters:
  • chains (ChainSet) – ChainSet.

  • post_burn (int) – Number of leading rows to skip in each chain before computing the ACF.

  • max_lag (int | None) – Truncate the returned ACF to this lag (inclusive). None returns the full lag range out to the post-burn chain length.

Returns:

Array of shape (n_chains, max_lag + 1, n_params). Chains are truncated to a common length (the shortest post-burn chain) so this is rectangular.

Return type:

np.ndarray

dendros.autocorrelation_time(chains, *, post_burn=None, c=5.0)[source]

Integrated autocorrelation time per parameter, in steps.

Implements the standard Sokal automatic-windowing estimator over the chain-averaged autocovariance. For each parameter, the per-chain autocovariances are averaged before integrating, which is more stable than averaging per-chain τ_int estimates.

Parameters:
  • chains (ChainSet) – ChainSet. All chains are truncated to the shortest post-burn length.

  • post_burn (int | None) – Number of leading rows to skip from each chain. None triggers automatic detection via gelman_rubin() / convergence_step(); if convergence is not reached a UserWarning is emitted and 0 is used.

  • c (float) – Sokal window constant. Defaults to 5.0.

Returns:

(n_params,) array of integrated autocorrelation times in steps.

Return type:

np.ndarray

dendros.effective_sample_size(chains, *, post_burn=None, c=5.0)[source]

Effective sample size per parameter.

Defined as N_total / τ_int where N_total is the total number of post-burn samples summed across all chains and τ_int is the chain- averaged integrated autocorrelation time from autocorrelation_time().

Parameters:
Returns:

(n_params,) array of effective sample sizes.

Return type:

np.ndarray

dendros.acceptance_rate(chains, *, post_burn=None)[source]

Per-chain post-burn acceptance rate.

A step is “accepted” iff any parameter component differs from the previous step. Galacticus emits the same row when a proposal is rejected, so this is the canonical acceptance count.

Parameters:
Returns:

(n_chains,) array. NaN for any chain with fewer than two post-burn rows.

Return type:

np.ndarray

dendros.acceptance_rate_trace(chains, *, window=30, post_burn=0)[source]

Sliding-window acceptance rate as a function of step.

For each chain, returns a 1-D array whose i-th entry is the fraction of the previous window transitions that were accepted (i.e. changed at least one parameter). The first window entries are filled with numpy.nan because the window is not yet full.

Parameters:
  • chains (ChainSet) – ChainSet.

  • window (int) – Sliding-window length in steps. Defaults to 30 to match the Galacticus Perl reference.

  • post_burn (int) – Number of leading rows to skip in each chain before evaluation.

Returns:

One 1-D array per chain, each of length n_steps_post_burn. Returned as a list because chains may have different post-burn lengths.

Return type:

list of np.ndarray

Posterior analyses

dendros.maximum_posterior(chains, *, drop_chains=())[source]

State vector at the maximum log posterior across all surviving chains.

Parameters:
Return type:

MaxResult

dendros.maximum_likelihood(chains, *, drop_chains=())[source]

State vector at the maximum log likelihood across all surviving chains.

Parameters:
Return type:

MaxResult

class dendros.MaxResult(state, log_posterior, log_likelihood, chain_index, step, parameter_names)[source]

Result of maximum_posterior() or maximum_likelihood().

Parameters:
state

(n_params,) parameter vector at the maximizing step.

Type:

numpy.ndarray

log_posterior

Log posterior at that step.

Type:

float

log_likelihood

Log likelihood at that step.

Type:

float

chain_index

chain_index of the chain holding the maximum.

Type:

int

step

Simulation-step value of the maximizing row (1-based, matching Chain.step).

Type:

int

parameter_names

Names along state.

Type:

Tuple[str, …]

dendros.posterior_samples(chains, n, *, post_burn=None, drop_chains=(), rng=None, replace=None)[source]

Draw n uniformly-random rows from the post-burn concatenated chain.

Parameters:
  • chains (ChainSet) – ChainSet.

  • n (int) – Number of samples to draw. Must be positive.

  • post_burn (int | None) – Number of leading rows to skip in each chain. None triggers automatic detection via gelman_rubin() / convergence_step(); if convergence is not reached a UserWarning is emitted and 0 is used.

  • drop_chains (Sequence[int]) – Iterable of chain_index values to exclude.

  • rng (Generator | None) – numpy.random.Generator. Defaults to numpy.random.default_rng(), which seeds from system entropy. Pass an explicit generator for reproducibility.

  • replace (bool | None) – Whether to sample with replacement. None (default) means “with replacement only when n exceeds the available pool”, matching the common case where a small n is desired and identical rows would be misleading.

Return type:

PosteriorSamples

Raises:

ValueError – If n is non-positive, or if all chains are dropped, or if replace=False and n exceeds the pool size.

class dendros.PosteriorSamples(state, log_posterior, log_likelihood, chain_index, step, parameter_names)[source]

Sampled rows from the post-burn concatenated chain.

Parameters:
state

(n_samples, n_params) parameter values.

Type:

numpy.ndarray

log_posterior

(n_samples,) log posterior at each sample.

Type:

numpy.ndarray

log_likelihood

(n_samples,) log likelihood at each sample.

Type:

numpy.ndarray

chain_index

(n_samples,) source chain indices.

Type:

numpy.ndarray

step

(n_samples,) source simulation-step values.

Type:

numpy.ndarray

parameter_names

Parameter names along state.shape[1].

Type:

Tuple[str, …]

Notes

Adjacent steps in an MCMC chain are correlated — N draws here represent significantly fewer independent samples from the posterior. Use effective_sample_size() to estimate the equivalent count, and thin by the integrated autocorrelation time if independence is required.

dendros.projection_pursuit(chains, *, post_burn=None, drop_chains=())[source]

Find the linear combinations of parameters best constrained by the data.

Each post-burn parameter column is mapped via its operatorUnaryMapper, normalised by sqrt(prior variance), and mean-centred. The covariance matrix of the resulting samples is eigendecomposed, and the eigenvalues/eigenvectors are returned sorted by ascending eigenvalue — so eigenvectors[:, 0] is the linear combination most tightly constrained relative to the prior.

Parameters:
  • chains (ChainSet) – ChainSet.

  • post_burn (int | None) – Number of leading rows to skip per chain. None triggers automatic detection.

  • drop_chains (Sequence[int]) – chain_index values to exclude.

Return type:

ProjectionPursuitResult

Raises:
  • NotImplementedError – If any active parameter uses a prior or mapper not yet supported by projection_pursuit() (currently uniform/normal priors and identity mapper only).

  • ValueError – If the post-burn pool is empty or contains fewer than two rows.

class dendros.ProjectionPursuitResult(eigenvalues, eigenvectors, parameter_names, parameter_labels, prior_sigma)[source]

Result of projection_pursuit().

Parameters:
eigenvalues

(n_params,) ascending eigenvalues of the rescaled-sample covariance matrix. Smaller is “better constrained”.

Type:

numpy.ndarray

eigenvectors

(n_params, n_params) matrix whose [:, k] column is the eigenvector for eigenvalues[k], expressed in rescaled-mapped space (i.e. in the same coordinates used for the eigendecomposition).

Type:

numpy.ndarray

parameter_names

Parameter names along axis 0.

Type:

Tuple[str, …]

parameter_labels

ModelParameter.display_label strings parallel to parameter_names.

Type:

Tuple[str, …]

prior_sigma

(n_params,) square root of the prior variance for each parameter, the rescaling that was applied before eigendecomposition.

Type:

numpy.ndarray

direction:

Components of one eigenvector that exceed a contribution threshold.

latex_summary:

LaTeX-rendered summary line for a chosen direction.

direction(index, *, contribution_minimum=0.05)[source]

Significant components of eigenvector index.

Parameters:
  • index (int) – Eigenvector index (0 = best constrained).

  • contribution_minimum (float) – Drop components whose squared loading is below this threshold.

Returns:

Sorted by descending absolute loading.

Return type:

list of (label, loading) pairs

latex_summary(index, *, contribution_minimum=0.05, precision=3)[source]

LaTeX-formatted summary of eigenvector index’s significant components.

Parameters:
  • index (int)

  • contribution_minimum (float)

  • precision (int)

Return type:

str

dendros.multivariate_normal_fit(chains, *, post_burn=None, drop_chains=())[source]

Fit a multivariate normal to the post-burn concatenated chain.

Parameters:
Return type:

MVNFit

Raises:
  • ValueError – If fewer than n_params + 1 post-burn samples remain (so that the sample covariance is rank-deficient).

  • np.linalg.LinAlgError – If the sample covariance is not positive-definite (which can happen for parameters that are degenerate post-burn). Drop the offending parameter or supply more samples.

class dendros.MVNFit(mean, covariance, cholesky, parameter_names)[source]

Multivariate-normal fit to post-burn samples.

Parameters:
mean

(n_params,) sample mean.

Type:

numpy.ndarray

covariance

(n_params, n_params) sample covariance, symmetrised.

Type:

numpy.ndarray

cholesky

(n_params, n_params) lower-triangular Cholesky factor of covariance. Satisfies L @ L.T == covariance.

Type:

numpy.ndarray

parameter_names

Parameter names along the axes.

Type:

Tuple[str, …]

write_reparameterization_config:

Emit a Galacticus-style XML config that re-parameterizes the active parameters in terms of independent unit-normal meta parameters.

write_reparameterization_config(out_path, *, n_sigma=5.0, perturber_scale=1e-05)[source]

Write a Galacticus reparameterization XML config.

For an n-parameter MVN fit with mean \(\mu\) and Cholesky factor \(L\), the emitted config declares n active metaParameter{i} parameters with truncated unit-normal priors (limits \(\pm n_\sigma\)), and n derived parameters expressing the original active parameters as

\[x_i = \mu_i + \sum_j L_{ij} \, m_j .\]

Re-running the MCMC against this config samples in coordinates where the posterior is approximately spherical.

Parameters:
  • out_path (str | Path) – Destination path.

  • n_sigma (float) – Truncation half-width for the meta-parameter priors, in units of their (unit) standard deviation. Defaults to 5.0.

  • perturber_scale (float) – Cauchy scale of the per-meta-parameter perturber. Defaults to 1.0e-5 to match the Galacticus reference.

Returns:

Resolved path of the written file.

Return type:

pathlib.Path

Parameter-file emission

dendros.read_parameter_file(path)[source]

Parse a Galacticus parameter XML file.

Parameters:

path (str | Path) – Path to the file.

Return type:

xml.etree.ElementTree.ElementTree

dendros.resolve_parameter_path(root, path)[source]

Locate the XML element identified by a Galacticus parameter path.

Parameters:
  • root (Element) – Root xml.etree.ElementTree.Element to search.

  • path (str) – Slash- or ::-separated parameter path. Each segment is an element name, optionally followed by a [N] integer (1-based) instance selector or a [@value='x'] attribute filter — matching the Galacticus parameter-file convention.

Return type:

xml.etree.ElementTree.Element

Raises:
  • KeyError – If any segment does not match an element under the current node.

  • ValueError – If a path segment is malformed.

dendros.apply_state(tree, parameters, state, *, parameter_map=None)[source]

Set value= attributes in tree from a state vector.

Parameters:
  • tree (ElementTree) – Parsed XML tree to modify in place.

  • parameters (Sequence[ModelParameter]) – Active model parameters (the same ordering used by chain columns).

  • state (ndarray) – (n_params,) state vector aligned with parameters.

  • parameter_map (Iterable[str] | None) – Optional iterable of parameter names to apply. None applies every entry of parameters; non-None applies only those named and is the typical case for an independentLikelihoods leaf, whose base parameter file mentions only that leaf’s parameters.

Raises:
  • KeyError – If a parameter’s path does not resolve in tree, or if a name in parameter_map is not among parameters.

  • ValueError – If state doesn’t match the length of parameters.

Return type:

None

dendros.emit_parameter_files(state, config, out_dir, *, name_format=None)[source]

Write one Galacticus parameter file per leaf of config.likelihood.

Reads each leaf’s base_parameters_file, applies the subset of state selected by that leaf’s parameter_map (or the full state when no map is set), and writes the modified XML into out_dir.

Parameters:
  • state (ndarray) – (n_params,) state vector (in physical / model space, as stored in the chain log file — no mapper inversion is applied).

  • configMCMCConfig.

  • out_dir (str | Path) – Output directory (created if missing).

  • name_format (str | None) – Format string for output filenames; receives leaf_index and stem (the base file’s stem). Defaults to "{stem}.xml" for a single leaf and "{leaf_index:02d}_{stem}.xml" for multiple, so per-leaf files don’t collide when several leaves share a base stem.

Returns:

One tuple per leaf, in document order.

Return type:

list of (leaf_index, written_path)

Raises:
  • ValueError – If config.likelihood is None, or if any leaf lacks a base_parameters_file.

  • KeyError – If a parameter’s path does not resolve in the corresponding base file, or a parameter_map references an unknown parameter.

dendros.write_parameter_file_to(tree, path)[source]

Write tree to path with an XML declaration.

Returns the resolved output path.

Parameters:
Return type:

Path

Corner plots

dendros.corner_plot(chains, *, parameters=None, post_burn=None, drop_chains=(), labels=None, **corner_kwargs)[source]

Render a corner plot of post-burn chain samples.

Parameters:
  • chains (ChainSet) – ChainSet whose post-burn samples will be plotted.

  • parameters (Iterable[str] | None) – Optional iterable of parameter names to restrict the plot to a subset (in the order given). None plots every active parameter.

  • post_burn (int | None) – Number of leading rows to skip per chain. None triggers automatic detection via gelman_rubin() / convergence_step().

  • drop_chains (Sequence[int]) – Iterable of chain_index values to exclude.

  • labels (Sequence[str] | None) – Optional axis labels. None uses each parameter’s LaTeX ModelParameter.display_label, wrapped in $...$ so corner renders them in math mode.

  • **corner_kwargs – Additional keyword arguments forwarded to corner.corner().

Return type:

matplotlib.figure.Figure

Raises:
  • ImportError – If the optional corner package is not installed. Install via pip install 'dendros[mcmc]'.

  • KeyError – If a name in parameters is not among the active parameters.

  • ValueError – If the post-burn pool is empty.

Internal helpers

class dendros._collection.GroupProxy(collection, path)[source]

Read-only h5py-like proxy for an HDF5 group.

Parameters:
property attrs: dict

Return group attributes as a plain dict.

property name: str

HDF5 path of this group.

keys()[source]

Return the immediate children of this group.

Return type:

List[str]

class dendros._collection.DatasetProxy(collection, path)[source]

Read-only h5py-like proxy for an HDF5 dataset.

For multi-file Collection instances, read() concatenates data from all files along axis 0.

Parameters:
property attrs: dict

Return dataset attributes as a plain dict.

property dtype

NumPy dtype of the dataset.

property shape: tuple

Total shape; for multi-file collections axis-0 is the sum across files.

property name: str

HDF5 path of this dataset.

read(where=None)[source]

Read the dataset into a numpy.ndarray.

For multi-file collections the arrays from all files are concatenated along axis 0 before the optional where selection is applied.

Parameters:

whereNone reads everything. A boolean mask or integer index array is applied after concatenation.

Return type:

ndarray