{ "cells": [ { "cell_type": "markdown", "id": "d1cd6238", "metadata": {}, "source": [ "# Custom derived quantities\n", "\n", "When you fit an SED with `galapy-fit`, the sampler explores the posterior of the\n", "*free model parameters* (ages, star-formation parameters, ISM parameters, ...).\n", "On their own those numbers are rarely what you want to report. What you usually\n", "report are **derived quantities**; i.e. physical numbers computed *from* the model at\n", "each posterior sample: a stellar mass, a star-formation rate, a dust temperature,\n", "a model SED.\n", "\n", "`galapy` computes a set of these for you automatically and stores their full\n", "posterior (one value per sample) inside the `Results` object. This guide shows\n", "you how to\n", "\n", "1. choose *which* built-in quantities are stored when you run a fit,\n", "2. add *your own* quantities to an existing `Results` object, and\n", "3. summarise any of them (mean, median, credible intervals).\n", "\n", "It builds up in four levels — feel free to stop at the one that covers your use\n", "case. The cookbook at the end collects ready-to-paste recipes." ] }, { "cell_type": "markdown", "id": "89334d89", "metadata": {}, "source": [ "## The built-in quantities\n", "\n", "Every `Results` object stores the posterior of these nine quantities by default:\n", "\n", "| key | quantity | units | shape |\n", "|---------|--------------------------------|------------------------------------|-------------|\n", "| `SED` | spectral energy distribution | flux, mJy | array |\n", "| `Mstar` | stellar mass | $M_\\odot$ | scalar |\n", "| `Mdust` | dust mass | $M_\\odot$ | scalar |\n", "| `Mgas` | gas mass | $M_\\odot$ | scalar |\n", "| `Zstar` | stellar metallicity | absolute (metal mass fraction) | scalar |\n", "| `Zgas` | gas metallicity | absolute (metal mass fraction) | scalar |\n", "| `SFR` | star-formation rate | $M_\\odot$ / yr | scalar |\n", "| `TMC` | molecular-cloud temperature | K | scalar |\n", "| `TDD` | diffuse-dust temperature | K | scalar |\n", "\n", "Each is exposed as an attribute holding the posterior: a `(Nsamples,)` array for\n", "the scalar quantities and a `(Nsamples, Nwavelength)` array for the `SED`." ] }, { "cell_type": "code", "execution_count": null, "id": "131ecef7", "metadata": {}, "outputs": [], "source": [ "from galapy.sampling.Results import load_results\n", "\n", "# load a results file written by galapy-fit (HDF5 or pickle); the format is\n", "# inferred from the extension\n", "res = load_results( 'data/sampling_4D+noise_dynesty_results_light.galapy.hdf5' )\n", "\n", "res.Mstar.shape # -> (Nsamples,)\n", "res.SED.shape # -> (Nsamples, Nwl)\n", "res.get_stored_quantities() # everything currently stored on this object" ] }, { "cell_type": "markdown", "id": "2df85f08", "metadata": {}, "source": [ "## Level 1 — choosing which built-ins to store (no code)\n", "\n", "Computing and storing every quantity for every posterior sample takes time and\n", "disk space (the `SED` in particular is a full array per sample). If you only care\n", "about a few numbers, tell `galapy-fit` to store just those, through the\n", "`store_quantities` hyperparameter in your parameter file:\n", "\n", "```python\n", "# parameter file (galapy-genparams)\n", "store_quantities = ['Mstar', 'SFR', 'Mdust']\n", "```\n", "\n", "- `None` (the default) stores **all** nine quantities.\n", "- A shorter list makes the output file smaller and post-processing faster.\n", "- `'SED'` is **always** stored, whether or not you list it — listing it just\n", " prints a harmless warning and is ignored. (The SED is needed internally to\n", " validate each sample, so you always get it for free.)\n", "\n", "That is all most users ever need. The remaining levels are for when you want a\n", "quantity that is **not** in the built-in list." ] }, { "cell_type": "markdown", "id": "9a101008", "metadata": {}, "source": [ "## Level 2 — adding one custom quantity\n", "\n", "Anything you can compute from a fully-configured `galapy` model, you can store as\n", "a derived quantity, *after* the run, with `Results.add_property`.\n", "\n", "You pass a function that takes **one argument, the model**, with the parameters of\n", "the current sample already set, and returns the value you want for that sample:" ] }, { "cell_type": "code", "execution_count": null, "id": "42fa1835", "metadata": {}, "outputs": [], "source": [ "# bolometric-ish luminosity: integrate the model SED\n", "res.add_property( lambda model : model.get_SED().sum(), name = 'Lsed' )\n", "\n", "res.Lsed.shape # -> (Nsamples,) one value per posterior sample" ] }, { "cell_type": "markdown", "id": "d3af06bb", "metadata": {}, "source": [ "`add_property` evaluates your function on every posterior sample exactly the way\n", "the built-ins are evaluated, stores the result as a new attribute (`res.Lsed`\n", "here), and returns the list of names it added:\n", "\n", "```python\n", "res.add_property( lambda model : model.age, name = 'age' )\n", "# -> ['age']\n", "```\n", "\n", "What can the `model` give you? Anything the model exposes, for example:\n", "\n", "```python\n", "model.age # galaxy age [yr]\n", "model.sfh.Mstar( model.age ) # stellar mass at that age\n", "model.sfh( model.age ) # SFR at that age\n", "model.ism.mc.T # molecular-cloud temperature\n", "model.get_SED() # the model SED [mJy]\n", "model.get_emission() # rest-frame luminosity [Lsun]\n", "```\n", "\n", "If you omit `name` for a single function, it is auto-named `custom0`, `custom1`,\n", "... — handy for quick exploration, but pass an explicit `name` for anything you\n", "intend to keep." ] }, { "cell_type": "markdown", "id": "08f6587d", "metadata": {}, "source": [ "> **⚠️ The operation of adding a new property to the list can take up to several 10s of seconds.**\n", "> This is a consequence of the necessity to re-build the model for each sample in the parameter space.\n", "> Since this additional cost is payed every time you call `add_property`, in the case you need\n", "> to store more than 1 additional quantity, it is desirable to list all of them in a single call of\n", "> `add_property`, as explained in the following section." ] }, { "cell_type": "markdown", "id": "84a64f23", "metadata": {}, "source": [ "## Level 3 — several quantities at once, and summarising them\n", "\n", "To add more than one quantity in a single pass over the posterior, hand\n", "`add_property` a dictionary of `{ name : function }`. The posterior is walked\n", "once and all the functions are evaluated per sample. **Way cheaper than calling\n", "`add_property` repeatedly:**" ] }, { "cell_type": "code", "execution_count": null, "id": "6b5616b1", "metadata": {}, "outputs": [], "source": [ "res.add_property( {\n", " 'Lsed' : lambda model : model.get_SED().sum(),\n", " 'fdust' : lambda model : model.sfh.Mdust( model.age ) / model.sfh.Mstar( model.age ),\n", "} )\n", "# -> ['Lsed', 'fdust']" ] }, { "cell_type": "markdown", "id": "213173c0", "metadata": {}, "source": [ "Once stored, a custom quantity behaves *exactly* like a built-in: every summary\n", "helper accepts its name." ] }, { "cell_type": "code", "execution_count": null, "id": "12b445a9", "metadata": {}, "outputs": [], "source": [ "res.get_mean( 'fdust' ) # posterior weighted mean" ] }, { "cell_type": "code", "execution_count": null, "id": "5a8c78d1", "metadata": {}, "outputs": [], "source": [ "res.get_median( 'fdust' ) # posterior median" ] }, { "cell_type": "code", "execution_count": null, "id": "8f020cc6", "metadata": {}, "outputs": [], "source": [ "res.get_quantile( 'fdust', quantile = 0.84 ) # any weighted quantile" ] }, { "cell_type": "code", "execution_count": null, "id": "603c5334", "metadata": {}, "outputs": [], "source": [ "res.get_std( 'fdust' ) # weighted standard deviation" ] }, { "cell_type": "code", "execution_count": null, "id": "5f3b5499", "metadata": {}, "outputs": [], "source": [ "res.get_bestfit( 'fdust' ) # value at the max-likelihood sample" ] }, { "cell_type": "code", "execution_count": null, "id": "57565fba", "metadata": {}, "outputs": [], "source": [ "res.get_credible_interval( 'fdust', percent = 0.68 ) # central 68% interval" ] }, { "cell_type": "markdown", "id": "dc190ff2", "metadata": {}, "source": [ "These helpers are *posterior-weighted* (they use the sampling weights) and they\n", "automatically **exclude invalid samples** (see the sentinel note below), so you\n", "do not have to filter anything by hand." ] }, { "cell_type": "markdown", "id": "385abf00", "metadata": {}, "source": [ "## Level 4 — array-valued quantities, and a numerical pitfall\n", "\n", "A derived quantity can be an array, not just a scalar — the `SED` is the built-in\n", "example. The classic user case is the **average attenuation curve** as a function\n", "of wavelength.\n", "\n", "It is tempting to store the attenuation in magnitudes via the model's\n", "`get_avgAtt()`. **Do not** — and this is the single most common trap, so it gets\n", "its own box:\n", "\n", "> **⚠️ Average in linear space, not in magnitudes / logs.**\n", "> Magnitude attenuation is `-2.5 * log10( attenuated / unattenuated )`. When a\n", "> sample attenuates a band to (near) zero flux that is `+inf`; round-off can even\n", "> make it `nan`. Those non-finite values are *correct* per sample, but they\n", "> poison any posterior mean. **Store the linear quantity and convert at the very\n", "> end.** For attenuation, `galapy` already keeps the linear average attenuation\n", "> factor (in `[0, 1]`) on the model as `model.Aavg` after every parameter update:" ] }, { "cell_type": "code", "execution_count": null, "id": "aba5651f", "metadata": {}, "outputs": [], "source": [ "res.add_property( lambda model : model.Aavg, name = 'Aavg' )\n", "\n", "res.Aavg.shape # -> (Nsamples, Nwl), finite by construction\n", "mean_Aavg = res.get_mean( 'Aavg' ) # safe: linear, finite\n", "\n", "# convert the *summary* to magnitudes only at the end, if you want them:\n", "import numpy as np\n", "mean_att_mag = -2.5 * np.log10( mean_Aavg )" ] }, { "cell_type": "markdown", "id": "4f6ed67d", "metadata": {}, "source": [ "The same principle applies to any quantity you would normally view in log space\n", "(luminosities spanning orders of magnitudes, metallicities, ...): **store it linear, average,\n", "then take the log of the summary.**" ] }, { "cell_type": "markdown", "id": "0375892a", "metadata": {}, "source": [ "### Saving the newly computed derived properties\n", "\n", "Custom quantities are first-class: they are serialised together with the\n", "built-ins, so you only pay the per-sample evaluation cost once. After adding\n", "properties you can re-save the `Results` object and reload it later.\n", "\n", "The lowest-friction option is to pickle the whole object:\n", "\n", "```python\n", "import pickle\n", "res.add_property( lambda model : model.Aavg, name = 'Aavg' )\n", "\n", "with open( 'myfit_with_Aavg.pickle', 'wb' ) as f :\n", " pickle.dump( res, f )\n", "\n", "from galapy.sampling.Results import load_results\n", "later = load_results( 'myfit_with_Aavg.pickle' ) # format inferred from extension\n", "later.get_mean( 'Aavg' ) # the custom quantity is right there\n", "```\n", "\n", "To re-save as HDF5 instead (portable, the format `galapy-fit` uses), dump the\n", "object to a dictionary and write it with the galapy IO helper:\n", "\n", "```python\n", "import galapy\n", "from galapy.io.hdf5 import write_to_hdf5\n", "\n", "write_to_hdf5(\n", " 'myfit_with_Aavg.galapy.hdf5',\n", " metadata = dict( storage_method = 'hard',\n", " galapy_version = galapy.__version__ ),\n", " hard = True,\n", " results = res.dump(),\n", ")\n", "later = load_results( 'myfit_with_Aavg.galapy.hdf5' )\n", "```" ] }, { "cell_type": "markdown", "id": "e30a1373", "metadata": {}, "source": [ "(Under the hood the in-memory round-trip is `Results.load( res.dump() )`:\n", "`dump()` returns a plain dictionary and `Results.load()` rebuilds the object from\n", "it — `load_results` is just the file-level wrapper around that.)\n", "\n", "> **ℹ️ Invalid samples are flagged, not dropped.**\n", "> If a sample's parameters are unphysical (e.g. its SED is non-finite), every\n", "> derived quantity for that sample is stored as `-inf` rather than silently\n", "> skipped, so all the per-sample arrays keep the same length and stay aligned\n", "> with the chain. The summary helpers exclude these sentinels automatically;\n", "> if you slice the raw arrays yourself, mask them with `numpy.isfinite` first." ] }, { "cell_type": "markdown", "id": "d30db420", "metadata": {}, "source": [ "## Cookbook\n", "\n", "Short, paste-ready recipes. Each assumes a loaded `Results` object `res`." ] }, { "cell_type": "markdown", "id": "df26f236", "metadata": {}, "source": [ "**Specific star-formation rate (sSFR):**\n", "```python\n", "res.add_property(\n", " lambda model : model.sfh( model.age ) - model.sfh.Mstar( model.age ),\n", " name = 'sSFR' )\n", "```" ] }, { "cell_type": "markdown", "id": "346711a4", "metadata": {}, "source": [ "**Dust-to-stellar mass ratio:**\n", "```python\n", "res.add_property(\n", " lambda model : model.sfh.Mdust( model.age ) - model.sfh.Mstar( model.age ),\n", " name = 'fdust' )\n", "```" ] }, { "cell_type": "markdown", "id": "71bb3985", "metadata": {}, "source": [ "**Monochromatic luminosity at a chosen rest-frame wavelength** (e.g. 1500 Å):\n", "```python\n", "import numpy as np\n", "def L1500 ( model ) :\n", " wl = model.wl() # rest-frame wavelength grid [Angstrom]\n", " return np.interp( 1500., wl, model.get_emission() )\n", "res.add_property( L1500, name = 'L1500' )\n", "```" ] }, { "cell_type": "markdown", "id": "b76cc6c0", "metadata": {}, "source": [ "**A rest-frame colour from the model SED** (flux ratio in two windows):\n", "```python\n", "import numpy as np\n", "def colour ( model ) :\n", " wl, sed = model.wl(), model.get_SED()\n", " blue = sed[ ( wl > 4000. ) & ( wl < 5000. ) ].mean()\n", " red = sed[ ( wl > 6000. ) & ( wl < 7000. ) ].mean()\n", " return -2.5 * np.log10( blue / red ) # scalar -> magnitudes is fine here\n", "res.add_property( colour, name = 'blue_red' )\n", "```\n", "(Here the per-sample value is finite, so magnitudes are safe — contrast with the\n", "average-attenuation case, where it is the *averaging* across samples that breaks.)" ] }, { "cell_type": "markdown", "id": "60c4b1b7", "metadata": {}, "source": [ "**Linear average attenuation curve** (then convert the summary):\n", "```python\n", "res.add_property( lambda model : model.Aavg, name = 'Aavg' )\n", "att_mag_median = -2.5 * np.log10( res.get_median( 'Aavg' ) )\n", "```" ] }, { "cell_type": "markdown", "id": "591d780d", "metadata": {}, "source": [ "**Several quantities in one pass:**\n", "```python\n", "res.add_property( {\n", " 'sSFR' : lambda model : model.sfh( model.age ) / model.sfh.Mstar( model.age ),\n", " 'fdust' : lambda model : model.sfh.Mdust( model.age ) / model.sfh.Mstar( model.age ),\n", " 'Lsed' : lambda model : model.get_SED().sum(),\n", "} )\n", "```" ] }, { "cell_type": "markdown", "id": "de1343bb", "metadata": {}, "source": [ "## Plotting\n", "\n", "The posteriors of derived properties added via `add_property` can be plotted through the `corner_derived` helper function along with the default derived properties:" ] }, { "cell_type": "code", "execution_count": null, "id": "288d7c8e", "metadata": {}, "outputs": [], "source": [ "from galapy.analysis.plot import corner_derived\n", "\n", "fig, axes = corner_derived(\n", " res, which_keys = [\n", " 'SFR', 'TMC', 'TDD', # <-- default\n", " 'fdust', 'Lsed' # <-- custom\n", " ],\n", " getdist_settings={\n", " 'contours' : [0.68, 0.95],\n", " 'smooth_scale_2D' : 0.5,\n", " 'fine_bins': 1024,\n", " 'fine_bins_2D': 1024,\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "6a7eba83", "metadata": {}, "source": [ "Since a user-defined property does not have built-in metadata, `corner_derived` falls back to showing the quantity's *name* as the axis label (rendered as, e.g., `Lsed` or `fdust`, above).\n", "Built-in quantities, by contrast, get a nice LaTeX symbol ($\\psi$, $T_\\mathrm{MC}$, $T_\\mathrm{DD}$ above).\n", "\n", "Label and log-axis choices for the derived quantities live entirely on the **analysis-side**: they are *not* stored. \n", "You can supply them at plotting time through keyword arguments:\n", "\n", "- `labels`: a `{key: latex}` dictionary giving a LaTeX symbol for a quantity (raw LaTeX, no surrounding `$` are required);\n", "- `log_scale`: a list of keys to draw on a `log10` axis.\n", "\n", "For example, having added `Lsed` and `fdust`:" ] }, { "cell_type": "code", "execution_count": null, "id": "a02f1306", "metadata": {}, "outputs": [], "source": [ "fig, axes = corner_derived(\n", " res, which_keys = [\n", " 'SFR', 'TMC', 'TDD', # <-- default\n", " 'fdust', 'Lsed' # <-- custom\n", " ],\n", " labels = {\n", " 'Lsed' : 'L_\\mathrm{SED}', # proper symbol instead of Lsed\n", " 'fdust' : 'f_\\mathrm{dust}' # proper symbol instead of fdust\n", " },\n", " log_scale = ['SFR', 'fdust'], # fdust on log-axis too\n", " getdist_settings={\n", " 'contours' : [0.68, 0.95],\n", " 'smooth_scale_2D' : 0.5,\n", " 'fine_bins': 1024,\n", " 'fine_bins_2D': 1024,\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "16e6a008", "metadata": {}, "source": [ "The `labels` mapping also overrides the default label of a built-in quantity if you list it, and the `\\log_{10}(...)` wrapping is applied automatically on top of a label whenever the corresponding key also appears in the `log_scale` list.\n", "\n", "Since these are purely cosmetics the same `Results` object can be re-plotted with different labels without changing anything stored in it." ] }, { "cell_type": "markdown", "id": "14bdf058", "metadata": {}, "source": [ "## Recap\n", "\n", "- **Storing built-ins:** set `store_quantities` in the parameter file; `SED` is\n", " always kept.\n", "- **Adding your own:** `res.add_property( f, name = ... )`, where `f(model)`\n", " returns the per-sample value; pass a dict to add several at once.\n", "- **Summarising:** `get_mean`, `get_median`, `get_quantile`, `get_std`,\n", " `get_bestfit`, `get_credible_interval` — all accept custom names and are\n", " posterior-weighted.\n", "- **Two rules that save you:** return the *same shape* for every sample, and\n", " *average linear quantities in linear space* (convert to log/mag only at the\n", " end)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }