Load an ASCII catalogue

A format quite used in the field are ASCII catalogues of objects for which an header lists the entries present in the following rows, typically listing a series of sources and relative properties (e.g. catalogues formatted in such a way can be written using the popular tool TopCat).

The first two lines of a catalogue formatted in such a way are shown below:

[1]:
input_file = 'data/dustpedia_z0.dat'
!head -2 $input_file
# name  redshift        redshift_err    ra      dec     semimaj_arcsec  axial_ratio     pos_angle       global_flag     GALEX.FUV       GALEX.FUV_err   GALEX.FUV_flag  GALEX.NUV       GALEX.NUV_err   GALEX.NUV_flag  SDSS.u  SDSS.u_err      SDSS.u_flag     SDSS.g  SDSS.g_err      SDSS.g_flag     SDSS.r  SDSS.r_err      SDSS.r_flag     SDSS.i  SDSS.i_err      SDSS.i_flag     SDSS.z  SDSS.z_err      SDSS.z_flag     2MASS.J 2MASS.J_err     2MASS.J_flag    2MASS.H 2MASS.H_err     2MASS.H_flag    2MASS.Ks        2MASS.Ks_err    2MASS.Ks_flag   WISE.W1 WISE.W1_err     WISE.W1_flag    WISE.W2 WISE.W2_err     WISE.W2_flag    WISE.W3 WISE.W3_err     WISE.W3_flag    WISE.W4 WISE.W4_err     WISE.W4_flag    Spitzer.IRAC.I1 Spitzer.IRAC.I1_err     Spitzer.IRAC.I1_flag    Spitzer.IRAC.I2 Spitzer.IRAC.I2_err     Spitzer.IRAC.I2_flag    Spitzer.IRAC.I3 Spitzer.IRAC.I3_err     Spitzer.IRAC.I3_flag    Spitzer.IRAC.I4 Spitzer.IRAC.I4_err     Spitzer.IRAC.I4_flag    Spitzer.MIPS.24mu       Spitzer.MIPS.24mu_err   Spitzer.MIPS.24mu_flag  Spitzer.MIPS.70mu       Spitzer.MIPS.70mu_err   Spitzer.MIPS.70mu_flag  Spitzer.MIPS.160mu      Spitzer.MIPS.160mu_err  Spitzer.MIPS.160mu_flag Herschel.PACS.blue      Herschel.PACS.blue_err  Herschel.PACS.blue_flag Herschel.PACS.green     Herschel.PACS.green_err Herschel.PACS.green_flag        Herschel.PACS.red       Herschel.PACS.red_err   Herschel.PACS.red_flag  Herschel.SPIRE.PSW      Herschel.SPIRE.PSW_err  Herschel.SPIRE.PSW_flag Herschel.SPIRE.PMW      Herschel.SPIRE.PMW_err  Herschel.SPIRE.PMW_flag Herschel.SPIRE.PLW      Herschel.SPIRE.PLW_err  Herschel.SPIRE.PLW_flag
  NGC3898       0.00386 0.00001 177.31365 56.08403 310.928317908  1.63041716814 17.0123398018 ""          0.00537977803468 2.99682855102E-4 ""             0.00545318225825 1.94574384382E-4 ""             0.0354973057801 0.00174332394312 ""          0.145672229925 0.00187256903554 ""          0.278452576451 0.00267294643895 ""          0.403647371746 0.00401672825073 ""          0.476932453667 0.00913868125733 ""          0.650577564183 0.0144630672616 ""           0.722486083351 0.0214573061148 ""           0.676177213551 0.0211666411658 ""            0.314438239227 0.00965063628585 ""            0.171267918225 0.00726116093413 ""            0.160404159281 0.0116467021931 ""           0.111005453721 0.0264470537155 ""           0.26982093328 0.00944363854825 n                0.164146624028 0.00588898048301 n                ""          ""              ""               ""          ""              ""               ""         ""             ""              ""         ""             ""              ""          ""              ""               ""      ""          ""           3.39943849824 3.67213156703 n             4.20816358926 3.79007708028 n             3.8617382301 0.391000959086 ""             2.04115250933 0.300802465629 ""             0.867412522553 0.163957437494 ""

Credits for the data in the above catalogue goes to the DustPedia Collaboration and to Casasola et al. (2020, A&A, 633, 100) in particular.

Note that, as already mentioned, the first line lists all the fields of the following lines, starting from the second line, each hosts data from a different source.

In order to automatise and ease loading such catalogues in a format that is compliant with the requirements of the library, a specific function for the conversion of catalogues into dictionary is made available and can be imported by calling

[2]:
from galapy.internal.utils import cat_to_dict

The documentation of the function can be retrieved with:

[3]:
help(cat_to_dict)
Help on function cat_to_dict in module galapy.internal.utils:

cat_to_dict(infile, id_field='id', err_field='_err', meta_fields=[], skip_fields=[])
    Converts an ASCII (e.g. Topcat-like) catalogue into a 2-levels dictionary.
    The 1st order dictionary contains a 2nd order dictionary for each entry in the catalogue.
    Each 2nd order dictionary contains data and meta-data about the named entry.

    Parameters
    ----------
    infile : str
        Path to the ASCII file
    id_field : str
        (Optional, default='id') header name of the field containing the sources' ID
    err_field : str
        (Optional, default='_err') the sub-string identifying fields
        containing error measurements
    meta_fields : str or sequence of str
        (Optional, default empty list) which fields to consider meta-data
        Either a single string or a sequence of strings.
        Also accepts wildcards (e.g. ``meta_fields = 'R*'`` will
        consider all the fields in header that start with ``R`` as meta-data).
    skip_fields : str or sequence of str
        (Optional, default empty list) which fields to skip
        Either a single string or a sequence of strings.
        Also accepts wildcards (e.g. ``skip_fields = 'R*'`` will
        ignore all the fields in header that start with ``R``).


    Returns
    -------
    : dict

The purpose is to read from catalogue files a list of bands, fluxes and associated errors into a nested (2-levels) dictionary.

Some features:

  • the comment character '#' at the beginning of the header line is optional. If it is present though, a space at the beginning of each following line is necessary.

  • the entries are automatically converted into floats (except for the column passed as id_field). All the entries that cannot be converted into floats (except for the column passed as id_field) will be skipped and a warning message will be print on screen.

  • the system automatically interprets as separators tabs, spaces and commas (i.e. '1 2,3\t4' is automatically interpreted as 4 different entries: 1, 2, 3 and 4). Note, though, this also means not to insert spaces into the id_field entry.

Let’s try it out:

[4]:
catalogue = cat_to_dict(
    input_file, id_field='name', err_field='_err',
    meta_fields = ['redshift', 'redshift_err'],
    skip_fields = ['ra', 'dec', 'semimaj_arcsec', 'axial_ratio', 'pos_angle', '*_flag']
)

We have made the following choices, based on the format of the file:

  • id_field = 'name' the column named 'name' is the identification name of the source.

  • err_field = '_err' all fields containing the '_err' string are interpreted as errors on fluxes.

  • the meta_fields argument lists all the entries that have to be read but not interpreted as a flux nor an error on a flux.

  • the skip_fields argument lists all the entries that have to be ignored when reading the file.

Note also that we have used a magic character in '*_flag', this tells the system to apply the specified behaviour to all fields that end with the string '_flag'.

The object we named catalogue is a standard python dictionary, we can list the sources stored with:

[5]:
list(catalogue)
[5]:
['NGC3898', 'NGC4351', 'NGC3364', 'NGC4254']

Each source is a subdictionary with the following entries:

[6]:
list(catalogue['NGC3898'])
[6]:
['bands', 'fluxes', 'errors', 'redshift', 'redshift_err']

(we choose to list the entries of the NGC3898 but the same would have been shown for any other source in the catalogue)

As you can see, we have 3 default entries:

  • bands: a list of band (filter) names

  • fluxes: a list of fluxes associated to the bands listed in bands

  • errors: a list of errors associated to the flux measurement listed in fluxes

NOTE THAT GalaPy USES MILLIJANSKY AS FLUX UNIT, THEREFORE, IF THE FLUXES IN THE CATALOGUE ARE NOT IN MILLIJANSKY, CONVERT THEM AS SOON AS YOU LOAD THEM!

Additionally, the system also loaded the redshift and associated error as meta-fields, as requested when calling the function.

Let’s see what we have loaded

First of all, we can check what bands we have measured fluxes, this can be by calling

[7]:
catalogue['NGC3364']['bands']
[7]:
array(['GALEX.FUV', 'GALEX.NUV', '2MASS.J', '2MASS.H', '2MASS.Ks',
       'WISE.W1', 'WISE.W2', 'WISE.W3', 'WISE.W4', 'Spitzer.IRAC.I1',
       'Spitzer.IRAC.I2', 'Herschel.PACS.green', 'Herschel.PACS.red',
       'Herschel.SPIRE.PSW', 'Herschel.SPIRE.PMW', 'Herschel.SPIRE.PLW'],
      dtype='<U19')

Conveniently, we have stored the band names with the same format accepted by GalaPy, i.e.:

Experiment[.Instrument].BandName

and all the filters are present in the data-base.

Note that you can check what filters are present in the database by calling

from galapy.PhotometricSystem import print_filters
print_filters()

Therefore, we can build a photometric-system for each of object in the catalogue:

[8]:
from galapy.PhotometricSystem import PMS

for obj in catalogue.values() :
    obj['pms'] = PMS(*obj['bands'])

In the catalogue above, upper limits have been marked as fluxes with negative errors. Having a negative value for the error is not liked by the system, we want to change that and keep track of the “is an upper-limit” flag.

We are therefore saving a boolean array for each object pointing out where the errors were negative in the original catalogue and we are replacing the negative errors with the corresponding value of flux (i.e. we are assuming the flux is 1 times the error):

[9]:
for key in catalogue :
    obj = catalogue[key]
    obj['uplims'] = obj['errors']<0.0
    obj['errors'][obj['uplims']] = obj['fluxes'][obj['uplims']]
    print(f"object {key} has {obj['uplims'].sum()} non-detections")
object NGC3898 has 0 non-detections
object NGC4351 has 1 non-detections
object NGC3364 has 0 non-detections
object NGC4254 has 1 non-detections

Functions of the galapy.analysis.plot allow to generate formatted plots to inspect these datasets:

[10]:
from galapy.analysis import plot as gplot

# build a matplotlib figure and axes array with the internal pyplot format
fig, axes = gplot.plt.subplots(2,2,figsize=(10,6), constrained_layout=True)

# Loop on the 4 objects and axes
for ax, key in zip(axes.flatten(), catalogue) :

    # set a title for the axes
    ax.set_title(key)

    # extract object from catalogue
    obj = catalogue[key]

    # set the image layout (with axis limits)
    ax = gplot.sed_layout(
        redshift=obj['redshift'], frame='rest', ax = ax,
        xlim=(1.e+3, 1.e+7), ylim=(1.e-4, 1.e+3)
    )

    # plot the fluxes
    _ = gplot.sed_obs(
        obj['pms'].lpiv, obj['fluxes'], obj['errors'],
        lo = obj['uplims'],
        redshift = obj['redshift'], frame = 'rest', ax = ax
    )
../_images/notebooks_load_catalogue_21_0.png

Note that the galapy.analysis.plot module also contains a function for showing the photometric system used:

[11]:
fig, ax = gplot.plt.subplots(1,1,figsize=(12,3), constrained_layout=True)
_ = gplot.photometric_system(catalogue['NGC3364']['pms'], ax=ax)
../_images/notebooks_load_catalogue_23_0.png