Results Analysis

We have successfully run a sampling on some source with a given photometric system using the terminal commands (i.e. galapy-fit). We now want to analyse the results obtained. To this end, we have to first load the results object that has been stored at the end of the sampling run.

This can be done with the dedicated function galapy.sampling.Results.load_results. First of all we load the function from the galapy.sampling sub-package:

[1]:

from galapy.sampling.Results import load_results

We can then simply call the function with the path to the output file as an argument.

Since we have selected store_lightweigth = True in the parameter file for this run,

loading the results object will require some tens of seconds. The load_results function automathically understands what type of file is being passed (from the extension and from internal meta-data) if not specified differently in the optional arguments.

[2]:

res = load_results('data/sampling_4D+noise_dynesty_results_light.galapy.hdf5')

Read from file data/sampling_4D+noise_dynesty_results_light.galapy.hdf5
Now processing the sampling results, this might require some time ...
... done in 36.37893056869507 seconds.

res is an instance of type Results, these objects contain:

attributes and functions to re-construct the characteristics of the run, such as
- the model
- the observation
- the handler

[3]:

model = res.get_model()
observation = res.get_observation()
handler = res.get_handler()
print('The run has ', res.Ndof, 'degrees of freedom')

The run has  19 degrees of freedom

tables of relevant physical quantities pre-computed at each sampled position in the parameter space, a list can be shown by calling

[4]:

res.get_stored_quantities()

[4]:

['_mod',
 '_han',
 'Ndof',
 '_obs',
 '_noise',
 'sampler',
 'ndim',
 'logz',
 'logzerr',
 'size',
 'params',
 'logl',
 'samples',
 'weights',
 'wnot0',
 '_derived',
 'SED',
 'Mstar',
 'Mdust',
 'Mgas',
 'Zstar',
 'Zgas',
 'SFR',
 'TMC',
 'TDD']

a set of useful functions to analyse the results, such as get_mean, get_std, get_bestfit, all of which take, as first argument a string with one of the quantities stored (i.e. res.get_mean('Mstar') will return the weighted mean of all the sampled positions). Weighting is done automathically by these functions. Some other functions, such as res.get_chi2 and res.get_residuals, compute quantities related to some chosen statistics (i.e. res.get_chi2('bestfit') will return the reduced \(\chi^2\) computed at the best-fitting values of parameters). Note that all these functions also account for the noise treatment chosen when running the sampling.

[5]:

print(f"Best-fitting stellar mass: {res.get_bestfit('Mstar'):.2e} Msol")

Best-fitting stellar mass: 3.97e+09 Msol

[6]:

print(f"Median stellar mass: {res.get_median('Mstar'):.2e} Msol")

Median stellar mass: 4.11e+09 Msol

Tip: The handler object contains information on the parameterisation chosen and provides a function to easily return the argument that has to be passed to the function model.set_parameters (i.e. handler.return_nested()). For instance, it contains the priors limits, the list of free-parameters (by keyword) and which of them has been sampled in logarithmic space:

[7]:

handler.par_prior, handler.par_free, handler.par_log

[7]:

(array([[  8.,  10.],
        [  0.,   5.],
        [  0.,   3.],
        [  7.,  10.],
        [-10.,   1.]]),
 array(['galaxy.age', 'galaxy.redshift', 'galaxy.sfh.psi_max',
        'galaxy.sfh.tau_star', 'noise.f_cal'], dtype='<U19'),
 array([ True, False,  True,  True,  True]))

Plots

The galapy.analysis sub-package contains sub-modules to ease the analysis and plotting of a sampling run results. To have access to plotting functions, we can import

[8]:

import galapy.analysis.plot as gplot

Let us also import the formatted version of matplotlib.pyplot present inside the module, for consistency:

[9]:

from galapy.analysis.plot import plt

The function gplot.sed_flux_res allows to plot the sampled model and compare it to the input dataset, by passing the results object as input argument (it also provides some further optional arguments to customise the plot, i.e. axes limits, colours, legend keyword arguments). The returned value is an object of type matplotlib.axes.Axes, that can be accessed by the user to further customize the plot (or overplot further lines and artists to it)

[10]:

ax = gplot.sed_flux_res(
    res, plot_components=True, plot_observation=True, plot_contours=True,
    ax_kwargs = {
       'xlim':(1.e+3, 2.e+8),
       'ylim':(2.e-6,1.e+3),
    },
    legend_kwargs = {
        'l1': {'loc': 'upper left', 'fontsize':12},
        'l2': {'loc': 'upper right', 'ncol': 2, 'fontsize': 12}
    }
)

../_images/notebooks_results_analysis_18_0.png

The function gplot.sed_residuals_res, equivalently and consistently, plots the standardised residuals (i.e. the residuals in units of the error in the measurement):

[11]:

# we can also build matplotlib figure and axes before calling the function
# e.g. if we want to overwrite the default figure size:
fig, ax = plt.subplots(1,1,figsize=(6,2))

# it is then sufficient to pass the generated axes to the galapy function:
_ = gplot.sed_residuals_res(
    res, frame='obs', plot_contours=True, plot_chi2 = True,
    ax = ax, # <------ here!
    ax_kwargs={
        'xlim':(1.e+3, 2.e+8),
        'ylim':(-3., +3.)
    },
    text_kwargs={'loc':'lower right'}
)

../_images/notebooks_results_analysis_20_0.png

The library leaves users freedom to customise the plots as they prefer, by combining the different functions we can, e.g., obtain a two panel plot of the information above:

[12]:

fig, axes = plt.subplots(2,1,figsize=(6,5),tight_layout=True,
                                      sharex = True,
                         gridspec_kw={'height_ratios':(4.5,1.5), 'hspace':0.0})

###################################################
# Plot the SED:

_ = gplot.sed_flux_res(
    res, plot_components=True, plot_observation=True, plot_contours=True,
    ax = axes[0], # passing the first Axis to the ax argument
    ax_kwargs = {
       'xlim':(1.e+3, 2.e+8),
       'ylim':(2.e-6,1.e+4),
    },
    legend_kwargs = {
        'l1': {'loc': 'upper left', 'fontsize':12},
        'l2': {'loc': 'upper right', 'ncol': 2, 'fontsize': 12}
    }
)

###################################################
# Plot the residuals

_ = gplot.sed_residuals_res(
    res, frame='obs', plot_contours=True, plot_chi2 = True,
    ax = axes[1], # passing the second Axis to the ax argument
    ax_kwargs={
        'xlim':(1.e+3, 2.e+8),
        'ylim':(-3., +3.)
    },
    text_kwargs={'loc':'lower right'}
)

../_images/notebooks_results_analysis_22_0.png

Plot the derived attenuation curve

With GalaPy is possible to compute the derived attenuation curve for a given model among those sampled by the algorithm.

Let us first extract the element in the wavelength grid that we will assume to approximate emission in the visible band (so that to normalize the derived attenuation curve).

[13]:

from galapy.internal.utils import find_nearest
ll = model.wl()
w5500 = find_nearest(ll, 5500) # index in the wavelength grid corresponding to 5500 Angstrom

We will plot the average attenuation for the best-fit model, to that purpose we set the parameters to the position in the parameter space that maximazes the likelihood:

[14]:

model.set_parameters(**res.get_bestfit('params'))

Before actually computing the attenuation for this model we have to compute the emission for this model

[15]:

_ = model.get_emission()

We can now compute the average total attenuation and normalize it to the its value in the visible band:

[16]:

Abest = model.get_avgAtt()
Abest /= Abest[w5500]

Before plotting let’s also define a function that computes Calzetti-like (Calzetti et al., 2020) attenuation curves for reference

[17]:

import numpy
def att_calzetti ( ll, Rv ) :
    """from Calzetti et al., 2000"""
    # convert angstrom to micron:
    ll = numpy.array(ll) * 1.e-4
    kp = 2.659 * ( - 2.156 + 1.509 / ll - 0.198 / ll**2. + 0.011 / ll**3. ) + Rv
    wl = numpy.where( ll >= 0.63 )
    kp[wl] = 2.659 * ( - 1.857 + 1.04 / ll[wl] ) + Rv
    return kp / Rv

With a bit of cosmetics, we can plot (in black in the plot below) the derived average total attenuation of the best-fitting galaxy model:

[18]:

# Plot set-up
fig, ax = plt.subplots(1,1,figsize=(6,3),constrained_layout=True)
ax.set(
    xscale='log', yscale='log',
    xlim=(9.12e+2,5.e+4), ylim=(2.e-1,1.e+1),
    xlabel='$\\lambda_\\mathrm{rest}\\ [\\AA]$',
    ylabel='$A(\\lambda)/A_V$',
)

#########################
# Plot Calzetti reference

# Get a sequence of colors for values of the Rv parameter
Rvs = [ 2.,4.,6.,8.,10.,20. ]
cmaplist = plt.cm.viridis(numpy.linspace(0.1,0.9,len(Rvs)))

# Plot for each Rv value
_ = [ ax.plot(ll, att_calzetti(ll, Rv),
              color=clr, ls=':', lw=2.5, label=f'$R_v = ${Rv:.0f}')
      for Rv, clr in zip(Rvs,cmaplist) ]

##########################
# Plot derived Attenuation

ax.plot( ll, Abest, color = 'k', lw=2.5, label='derived' )

# legend
_ = ax.legend(ncol=2, fontsize=11)

../_images/notebooks_results_analysis_34_0.png

Posteriors

Another useful diagnostic plot one might want to check, is the triangular 2D and 1D marginalisation of the parameters posteriors (i.e. the contour plot, corner plot or triangle plot).

This can be done with the gplot.corner_res function which also accepts an argument which_params to filter what sub-sets of the free parameters for which to compute the marginal probabilities. So, if our free parameters are:

[19]:

handler.par_free

[19]:

array(['galaxy.age', 'galaxy.redshift', 'galaxy.sfh.psi_max',
       'galaxy.sfh.tau_star', 'noise.f_cal'], dtype='<U19')

and we want to plot only the free-parameters of the galaxy model (thus marginalising over the noise parameters), we can tell the function to only select parameters with the word 'galaxy' in the name (note the usage of the magic character '*'): which_params='galaxy*'

Tip: Internally, GalaPy uses getdist (see online documentation of the package) to generate corner plots, by passing a dictionary of keyword arguments to the argument getdist_settings one can modify the internal defaults of getdist.

Once again, conveniently, the function accepts the res object as first argument:

[20]:

fig, axes = gplot.corner_res(
    res, # results object
    which_params='galaxy*', # what parameters to plot (use None for all the free parameters)
    mark = 'median', # what statistics to plot dashed lines ('median','mean' or 'bestfit')
    getdist_settings={
        'contours' : [0.68, 0.95],
        'smooth_scale_2D' : 0.5,
        'fine_bins': 1024
    },
)

Removed no burn in
Falling back to default quantiles: (0.16,0.5,0.84)

../_images/notebooks_results_analysis_38_1.png

Tables and TeX

It is possible to print values already prepared for TeX math mode with the results of the sampling run.

Import the sub-module:

[21]:

import galapy.analysis.funcs as gtabs

The function get_parameters_label_strings returns a dictionary with the keyword identifying a specific free parameter as key and the corresponding TeX math-mode symbol as value (takes the handler as input):

[22]:

gtabs.get_parameters_label_strings(handler)

[22]:

{np.str_('galaxy.age'): '\\log~age',
 np.str_('galaxy.redshift'): 'redshift',
 np.str_('galaxy.sfh.psi_max'): '\\log~\\psi_\\mathrm{max}',
 np.str_('galaxy.sfh.tau_star'): '\\log~\\tau_\\star',
 np.str_('noise.f_cal'): '\\log~f_\\mathrm{cal}'}

to get a collection of summary statistics computed on the free-parameters, already converted to TeX math-mode, use function:

[23]:

gtabs.get_parameters_summary_strings(res, stat_type='quantiles', quantile=(0.025, 0.5, 0.975))

[23]:

array(['8.72_{-0.13}^{+0.17}', '2.55_{-0.19}^{+0.14}',
       '1.84_{-0.29}^{+0.48}', '9.13_{-0.33}^{+0.71}',
       '-5.64_{-4.13}^{+4.41}'], dtype='<U21')