Quickstart

Installation

pip install dendros

# With optional pandas support:
pip install 'dendros[pandas]'

# With matplotlib for plotting analyses:
pip install 'dendros[plot]'

# Development version from GitHub:
pip install git+https://github.com/galacticusorg/dendros.git

Opening files

from dendros import open_outputs

# Single file
c = open_outputs("galacticus.hdf5")

# Auto-detect MPI-split outputs
c = open_outputs("galacticus_MPI:0000.hdf5")

# Explicit list or glob
c = open_outputs(["rank0.hdf5", "rank1.hdf5"])
c = open_outputs("run001/galacticus*.hdf5")

# Lightcone mode
c = open_outputs("lightcone.hdf5", output_root="Lightcone")

Use Collection as a context manager to ensure file handles are closed automatically:

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_outputs()

Checking completion status

with open_outputs("galacticus.hdf5") as c:
    c.validate_completion()           # raises RuntimeError if incomplete
    c.validate_completion(mode="warn")    # emit UserWarning instead
    c.validate_completion(mode="ignore")  # silent

Listing outputs

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_outputs()          # astropy Table
    df  = c.list_outputs(format="pandas")

The table contains columns: index, name, time, scale_factor, and redshift.

Listing properties

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_properties("Output1")  # by name
    tbl = c.list_properties(1)          # by 1-based integer index

Columns: name, dtype, shape, description, unitsInSI.

Reading datasets

with open_outputs("galacticus.hdf5") as c:
    data = c.read("Output1", ["nodeData/basicMass"])
    # data["nodeData/basicMass"] is a numpy array

    # Custom labels via dict
    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
    )

Filtering

Pass a boolean mask or integer index array as where:

with open_outputs("galacticus.hdf5") as c:
    masses = c.read("Output1", ["nodeData/basicMass"])["nodeData/basicMass"]
    mask = masses > 1e12

    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
        where=mask,
    )

Tracing galaxy histories

Given one or more nodeUniqueIDBranchTip values, dendros can assemble each galaxy’s full history across all outputs:

from dendros import open_outputs

with open_outputs("galacticus.hdf5") as c:
    ids = [101, 104]   # branch-tip IDs of galaxies of interest
    hist = c.trace_history(
        ids,
        {"Mstar": "nodeData/diskMassStellar"},
    )

# hist["Mstar"]             shape (2, n_outputs); NaN where absent
# hist["time"]              shape (2, n_outputs); NaN where absent
# hist["expansion_factor"]  shape (2, n_outputs)
# hist["present"]           bool mask, shape (2, n_outputs)
# hist["output_names"]      object array of output group names
# hist["ids"]               int64 array, the normalized input

A 2-D per-galaxy property (e.g. a spectrum of shape (N_gals, n_bins)) is returned as a 3-D array of shape (n_galaxies, n_bins, n_outputs) — one extra trailing axis for time. Each galaxy need not be present at every output (it may have formed later or merged earlier), so history arrays are ragged in time. Absent slots are filled with:

NaN for floating-point properties (and for the time and expansion_factor arrays);
the value of int_sentinel (default -1) for integer properties;
False for boolean properties.

The present mask is the canonical indicator of presence and should be preferred to sentinel checks:

import numpy as np
mask = hist["present"][0]         # galaxy 0 presence across outputs
times  = hist["time"][0][mask]
masses = hist["Mstar"][0][mask]

Restrict to a subset of outputs with the outputs= argument (accepts a range, a list of 1-based integers, or output group names):

hist = c.trace_history(ids, ["nodeData/basicMass"], outputs=range(1, 6))

Multi-file collections search each file independently. For arbitrary user-provided file lists or globs, nodeUniqueIDBranchTip collisions are possible across files, so by default if the same ID is found in more than one file at the same output the call raises ValueError; pass on_duplicate_file_match="warn" or "first" to keep the first match instead. True Galacticus MPI-split outputs are a separate case: there nodeUniqueIDBranchTip is expected to be unique across ranks/files for a given output.

If nodeUniqueIDBranchTip was not included in the Galacticus run, the function raises a KeyError that points you at the missing output property.

MPI outputs

Galacticus MPI runs produce files suffixed _MPI:NNNN. Dendros detects and groups them automatically when you pass any single rank’s filename or a glob:

c = open_outputs("galacticus_MPI:0000.hdf5")  # auto-detects all peers

read() concatenates arrays from all ranks along axis 0.

Star formation histories

Star formation histories are output as lists of 2D numpy.ndarray objects, with one dimension being time, and the other metallicity. Dendros provides functions to collapse (sum) over metallicity:

from dendros import sfh_collapse_metallicities, sfh_times

with open_outputs("galacticus.hdf5") as c:
    sfh = c["Outputs/Output1/nodeData/diskStarFormationHistoryMass"]
    sfh_times = sfh_times(sfh)
    sfh_collapsed = sfh_collapse_metallicities(sfh)

Plotting analyses

If a Galacticus run was configured to write reduced analysis results, the HDF5 file will contain a top-level /analyses group with one subgroup per analysis. Dendros can list and plot every function1D analysis, overlaying the model curve with the observational/target data when present. Requires the plot extra (pip install 'dendros[plot]').

For MPI runs, the /analyses data is reduced over all ranks and is identical in every rank’s file, so dendros reads only the primary file.

from dendros import open_outputs

with open_outputs("galacticus.hdf5") as c:
    # Tabulate available analyses (no matplotlib needed for this).
    print(c.list_analyses())

    # Plot every analysis; returns dict[name, matplotlib.figure.Figure].
    figs = c.plot_analyses()

    # Plot one analysis and also save to disk.
    figs = c.plot_analyses(
        name="stellarMassFunction",
        output_directory="figs",
        file_format="pdf",
    )

    # Hide the target overlay (model only).
    figs = c.plot_analyses(show_target=False)

To compare several models on the same figure, open them as a ModelCollection with open_models() and pass the result to the module-level plot_analyses(). The target overlay is shared across models so it is drawn only once; each model contributes its own curve, labelled by the dict key (or by its primary file stem when no dict is supplied). MPI-split files are still treated as a single model — only inter-model differences produce additional curves.

from dendros import open_models, plot_analyses

with open_models({"Fiducial": "fid.hdf5", "Variant": "var.hdf5"}) as m:
    figs = plot_analyses(m)

    # Or with default labels derived from filenames:
    # figs = plot_analyses(open_models(["fid.hdf5", "var.hdf5"]))

    # Or pass an explicit list with custom labels:
    # figs = plot_analyses(list(m.values()), labels=["A", "B"])