Quickstart
==========

Installation
------------

.. code-block:: bash

   pip install dendros

   # With optional pandas support:
   pip install 'dendros[pandas]'

   # With matplotlib for plotting analyses:
   pip install 'dendros[plot]'

   # Development version from GitHub:
   pip install git+https://github.com/galacticusorg/dendros.git

Opening files
-------------

.. code-block:: python

   from dendros import open_outputs

   # Single file
   c = open_outputs("galacticus.hdf5")

   # Auto-detect MPI-split outputs
   c = open_outputs("galacticus_MPI:0000.hdf5")

   # Explicit list or glob
   c = open_outputs(["rank0.hdf5", "rank1.hdf5"])
   c = open_outputs("run001/galacticus*.hdf5")

   # Lightcone mode
   c = open_outputs("lightcone.hdf5", output_root="Lightcone")

Use :class:`~dendros.Collection` as a context manager to ensure file handles
are closed automatically::

   with open_outputs("galacticus.hdf5") as c:
       tbl = c.list_outputs()

Checking completion status
--------------------------

.. code-block:: python

   with open_outputs("galacticus.hdf5") as c:
       c.validate_completion()           # raises RuntimeError if incomplete
       c.validate_completion(mode="warn")    # emit UserWarning instead
       c.validate_completion(mode="ignore")  # silent

Listing outputs
---------------

.. code-block:: python

   with open_outputs("galacticus.hdf5") as c:
       tbl = c.list_outputs()          # astropy Table
       df  = c.list_outputs(format="pandas")

The table contains columns: ``index``, ``name``, ``time``,
``scale_factor``, and ``redshift``.

Listing properties
------------------

.. code-block:: python

   with open_outputs("galacticus.hdf5") as c:
       tbl = c.list_properties("Output1")  # by name
       tbl = c.list_properties(1)          # by 1-based integer index

Columns: ``name``, ``dtype``, ``shape``, ``description``, ``unitsInSI``.

Reading datasets
----------------

.. code-block:: python

   with open_outputs("galacticus.hdf5") as c:
       data = c.read("Output1", ["nodeData/basicMass"])
       # data["nodeData/basicMass"] is a numpy array

       # Custom labels via dict
       data = c.read(
           "Output1",
           {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
       )

Filtering
---------

Pass a boolean mask or integer index array as ``where``:

.. code-block:: python

   with open_outputs("galacticus.hdf5") as c:
       masses = c.read("Output1", ["nodeData/basicMass"])["nodeData/basicMass"]
       mask = masses > 1e12

       data = c.read(
           "Output1",
           {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
           where=mask,
       )

Tracing galaxy histories
------------------------

Given one or more ``nodeUniqueIDBranchTip`` values, dendros can assemble each
galaxy's full history across all outputs:

.. code-block:: python

   from dendros import open_outputs

   with open_outputs("galacticus.hdf5") as c:
       ids = [101, 104]   # branch-tip IDs of galaxies of interest
       hist = c.trace_history(
           ids,
           {"Mstar": "nodeData/diskMassStellar"},
       )

   # hist["Mstar"]             shape (2, n_outputs); NaN where absent
   # hist["time"]              shape (2, n_outputs); NaN where absent
   # hist["expansion_factor"]  shape (2, n_outputs)
   # hist["present"]           bool mask, shape (2, n_outputs)
   # hist["output_names"]      object array of output group names
   # hist["ids"]               int64 array, the normalized input

A 2-D per-galaxy property (e.g. a spectrum of shape ``(N_gals, n_bins)``) is
returned as a 3-D array of shape ``(n_galaxies, n_bins, n_outputs)`` — one
extra trailing axis for time. Each galaxy need not be present at every output
(it may have formed later or merged earlier), so history arrays are *ragged*
in time. Absent slots are filled with:

* ``NaN`` for floating-point properties (and for the ``time`` and
  ``expansion_factor`` arrays);
* the value of ``int_sentinel`` (default ``-1``) for integer properties;
* ``False`` for boolean properties.

The ``present`` mask is the canonical indicator of presence and should be
preferred to sentinel checks::

   import numpy as np
   mask = hist["present"][0]         # galaxy 0 presence across outputs
   times  = hist["time"][0][mask]
   masses = hist["Mstar"][0][mask]

Restrict to a subset of outputs with the ``outputs=`` argument (accepts a
``range``, a list of 1-based integers, or output group names)::

   hist = c.trace_history(ids, ["nodeData/basicMass"], outputs=range(1, 6))

Multi-file collections search each file independently. For arbitrary
user-provided file lists or globs, ``nodeUniqueIDBranchTip`` collisions are
possible across files, so by default if the same ID is found in more than
one file at the same output the call raises :class:`ValueError`; pass
``on_duplicate_file_match="warn"`` or ``"first"`` to keep the first
match instead. True Galacticus MPI-split outputs are a separate case:
there ``nodeUniqueIDBranchTip`` is expected to be unique across ranks/files
for a given output.

If ``nodeUniqueIDBranchTip`` was not included in the Galacticus run, the
function raises a :class:`KeyError` that points you at the missing output
property.

MPI outputs
-----------

Galacticus MPI runs produce files suffixed ``_MPI:NNNN``.  Dendros detects and
groups them automatically when you pass any single rank's filename or a glob::

   c = open_outputs("galacticus_MPI:0000.hdf5")  # auto-detects all peers

:meth:`~dendros.Collection.read` concatenates arrays from all ranks along
axis 0.

Star formation histories
------------------------

Star formation histories are output as lists of 2D :class:`numpy.ndarray` objects,
with one dimension being time, and the other metallicity. Dendros provides
functions to collapse (sum) over metallicity:

.. code-block:: python

   from dendros import sfh_collapse_metallicities, sfh_times

   with open_outputs("galacticus.hdf5") as c:
       sfh = c["Outputs/Output1/nodeData/diskStarFormationHistoryMass"]
       sfh_times = sfh_times(sfh)
       sfh_collapsed = sfh_collapse_metallicities(sfh)

Plotting analyses
-----------------

If a Galacticus run was configured to write reduced analysis results, the
HDF5 file will contain a top-level ``/analyses`` group with one subgroup per
analysis.  Dendros can list and plot every ``function1D`` analysis,
overlaying the model curve with the observational/target data when
present.  Requires the ``plot`` extra (``pip install 'dendros[plot]'``).

For MPI runs, the ``/analyses`` data is reduced over all ranks and is
identical in every rank's file, so dendros reads only the primary file.

.. code-block:: python

   from dendros import open_outputs

   with open_outputs("galacticus.hdf5") as c:
       # Tabulate available analyses (no matplotlib needed for this).
       print(c.list_analyses())

       # Plot every analysis; returns dict[name, matplotlib.figure.Figure].
       figs = c.plot_analyses()

       # Plot one analysis and also save to disk.
       figs = c.plot_analyses(
           name="stellarMassFunction",
           output_directory="figs",
           file_format="pdf",
       )

       # Hide the target overlay (model only).
       figs = c.plot_analyses(show_target=False)

To compare several models on the same figure, open them as a
:class:`~dendros.ModelCollection` with :func:`~dendros.open_models` and
pass the result to the module-level :func:`~dendros.plot_analyses`.  The
target overlay is shared across models so it is drawn only once; each
model contributes its own curve, labelled by the dict key (or by its
primary file stem when no dict is supplied).  MPI-split files are still
treated as a single model — only inter-model differences produce
additional curves.

.. code-block:: python

   from dendros import open_models, plot_analyses

   with open_models({"Fiducial": "fid.hdf5", "Variant": "var.hdf5"}) as m:
       figs = plot_analyses(m)

       # Or with default labels derived from filenames:
       # figs = plot_analyses(open_models(["fid.hdf5", "var.hdf5"]))

       # Or pass an explicit list with custom labels:
       # figs = plot_analyses(list(m.values()), labels=["A", "B"])