Load an ASCII catalogue
A format quite used in the field are ASCII catalogues of objects for which an header lists the entries present in the following rows, typically listing a series of sources and relative properties (e.g. catalogues formatted in such a way can be written using the popular tool TopCat).
The first two lines of a catalogue formatted in such a way are shown below:
[1]:
input_file = 'data/dustpedia_z0.dat'
!head -2 $input_file
# name redshift redshift_err ra dec semimaj_arcsec axial_ratio pos_angle global_flag GALEX.FUV GALEX.FUV_err GALEX.FUV_flag GALEX.NUV GALEX.NUV_err GALEX.NUV_flag SDSS.u SDSS.u_err SDSS.u_flag SDSS.g SDSS.g_err SDSS.g_flag SDSS.r SDSS.r_err SDSS.r_flag SDSS.i SDSS.i_err SDSS.i_flag SDSS.z SDSS.z_err SDSS.z_flag 2MASS.J 2MASS.J_err 2MASS.J_flag 2MASS.H 2MASS.H_err 2MASS.H_flag 2MASS.Ks 2MASS.Ks_err 2MASS.Ks_flag WISE.W1 WISE.W1_err WISE.W1_flag WISE.W2 WISE.W2_err WISE.W2_flag WISE.W3 WISE.W3_err WISE.W3_flag WISE.W4 WISE.W4_err WISE.W4_flag Spitzer.IRAC.I1 Spitzer.IRAC.I1_err Spitzer.IRAC.I1_flag Spitzer.IRAC.I2 Spitzer.IRAC.I2_err Spitzer.IRAC.I2_flag Spitzer.IRAC.I3 Spitzer.IRAC.I3_err Spitzer.IRAC.I3_flag Spitzer.IRAC.I4 Spitzer.IRAC.I4_err Spitzer.IRAC.I4_flag Spitzer.MIPS.24mu Spitzer.MIPS.24mu_err Spitzer.MIPS.24mu_flag Spitzer.MIPS.70mu Spitzer.MIPS.70mu_err Spitzer.MIPS.70mu_flag Spitzer.MIPS.160mu Spitzer.MIPS.160mu_err Spitzer.MIPS.160mu_flag Herschel.PACS.blue Herschel.PACS.blue_err Herschel.PACS.blue_flag Herschel.PACS.green Herschel.PACS.green_err Herschel.PACS.green_flag Herschel.PACS.red Herschel.PACS.red_err Herschel.PACS.red_flag Herschel.SPIRE.PSW Herschel.SPIRE.PSW_err Herschel.SPIRE.PSW_flag Herschel.SPIRE.PMW Herschel.SPIRE.PMW_err Herschel.SPIRE.PMW_flag Herschel.SPIRE.PLW Herschel.SPIRE.PLW_err Herschel.SPIRE.PLW_flag
NGC3898 0.00386 0.00001 177.31365 56.08403 310.928317908 1.63041716814 17.0123398018 "" 0.00537977803468 2.99682855102E-4 "" 0.00545318225825 1.94574384382E-4 "" 0.0354973057801 0.00174332394312 "" 0.145672229925 0.00187256903554 "" 0.278452576451 0.00267294643895 "" 0.403647371746 0.00401672825073 "" 0.476932453667 0.00913868125733 "" 0.650577564183 0.0144630672616 "" 0.722486083351 0.0214573061148 "" 0.676177213551 0.0211666411658 "" 0.314438239227 0.00965063628585 "" 0.171267918225 0.00726116093413 "" 0.160404159281 0.0116467021931 "" 0.111005453721 0.0264470537155 "" 0.26982093328 0.00944363854825 n 0.164146624028 0.00588898048301 n "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" 3.39943849824 3.67213156703 n 4.20816358926 3.79007708028 n 3.8617382301 0.391000959086 "" 2.04115250933 0.300802465629 "" 0.867412522553 0.163957437494 ""
Credits for the data in the above catalogue goes to the DustPedia Collaboration and to Casasola et al. (2020, A&A, 633, 100) in particular.
Note that, as already mentioned, the first line lists all the fields of the following lines, starting from the second line, each hosts data from a different source.
In order to automatise and ease loading such catalogues in a format that is compliant with the requirements of the library, a specific function for the conversion of catalogues into dictionary is made available and can be imported by calling
[2]:
from galapy.internal.utils import cat_to_dict
The documentation of the function can be retrieved with:
[3]:
help(cat_to_dict)
Help on function cat_to_dict in module galapy.internal.utils:
cat_to_dict(infile, id_field='id', err_field='_err', meta_fields=[], skip_fields=[])
Converts an ASCII (e.g. Topcat-like) catalogue into a 2-levels dictionary.
The 1st order dictionary contains a 2nd order dictionary for each entry in the catalogue.
Each 2nd order dictionary contains data and meta-data about the named entry.
Parameters
----------
infile : str
Path to the ASCII file
id_field : str
(Optional, default='id') header name of the field containing the sources' ID
err_field : str
(Optional, default='_err') the sub-string identifying fields
containing error measurements
meta_fields : str or sequence of str
(Optional, default empty list) which fields to consider meta-data
Either a single string or a sequence of strings.
Also accepts wildcards (e.g. ``meta_fields = 'R*'`` will
consider all the fields in header that start with ``R`` as meta-data).
skip_fields : str or sequence of str
(Optional, default empty list) which fields to skip
Either a single string or a sequence of strings.
Also accepts wildcards (e.g. ``skip_fields = 'R*'`` will
ignore all the fields in header that start with ``R``).
Returns
-------
: dict
The purpose is to read from catalogue files a list of bands, fluxes and associated errors into a nested (2-levels) dictionary.
Some features:
the comment character
'#'
at the beginning of the header line is optional. If it is present though, a space at the beginning of each following line is necessary.the entries are automatically converted into floats (except for the column passed as
id_field
). All the entries that cannot be converted into floats (except for the column passed asid_field
) will be skipped and a warning message will be print on screen.the system automatically interprets as separators tabs, spaces and commas (i.e.
'1 2,3\t4'
is automatically interpreted as 4 different entries:1
,2
,3
and4
). Note, though, this also means not to insert spaces into theid_field
entry.
Let’s try it out:
[4]:
catalogue = cat_to_dict(
input_file, id_field='name', err_field='_err',
meta_fields = ['redshift', 'redshift_err'],
skip_fields = ['ra', 'dec', 'semimaj_arcsec', 'axial_ratio', 'pos_angle', '*_flag']
)
We have made the following choices, based on the format of the file:
id_field = 'name'
the column named'name'
is the identification name of the source.err_field = '_err'
all fields containing the'_err'
string are interpreted as errors on fluxes.the
meta_fields
argument lists all the entries that have to be read but not interpreted as a flux nor an error on a flux.the
skip_fields
argument lists all the entries that have to be ignored when reading the file.
Note also that we have used a magic character in '*_flag'
, this tells the system to apply the specified behaviour to all fields that end with the string '_flag'
.
The object we named catalogue
is a standard python dictionary, we can list the sources stored with:
[5]:
list(catalogue)
[5]:
['NGC3898', 'NGC4351', 'NGC3364', 'NGC4254']
Each source is a subdictionary with the following entries:
[6]:
list(catalogue['NGC3898'])
[6]:
['bands', 'fluxes', 'errors', 'redshift', 'redshift_err']
(we choose to list the entries of the NGC3898 but the same would have been shown for any other source in the catalogue)
As you can see, we have 3 default entries:
bands
: a list of band (filter) namesfluxes
: a list of fluxes associated to the bands listed inbands
errors
: a list of errors associated to the flux measurement listed influxes
NOTE THAT GalaPy USES MILLIJANSKY AS FLUX UNIT, THEREFORE, IF THE FLUXES IN THE CATALOGUE ARE NOT IN MILLIJANSKY, CONVERT THEM AS SOON AS YOU LOAD THEM!
Additionally, the system also loaded the redshift
and associated error as meta-fields, as requested when calling the function.
Let’s see what we have loaded
First of all, we can check what bands we have measured fluxes, this can be by calling
[7]:
catalogue['NGC3364']['bands']
[7]:
array(['GALEX.FUV', 'GALEX.NUV', '2MASS.J', '2MASS.H', '2MASS.Ks',
'WISE.W1', 'WISE.W2', 'WISE.W3', 'WISE.W4', 'Spitzer.IRAC.I1',
'Spitzer.IRAC.I2', 'Herschel.PACS.green', 'Herschel.PACS.red',
'Herschel.SPIRE.PSW', 'Herschel.SPIRE.PMW', 'Herschel.SPIRE.PLW'],
dtype='<U19')
Conveniently, we have stored the band names with the same format accepted by GalaPy, i.e.:
Experiment[.Instrument].BandName
and all the filters are present in the data-base.
Note that you can check what filters are present in the database by calling
from galapy.PhotometricSystem import print_filters print_filters()
Therefore, we can build a photometric-system for each of object in the catalogue:
[8]:
from galapy.PhotometricSystem import PMS
for obj in catalogue.values() :
obj['pms'] = PMS(*obj['bands'])
In the catalogue above, upper limits have been marked as fluxes with negative errors. Having a negative value for the error is not liked by the system, we want to change that and keep track of the “is an upper-limit” flag.
We are therefore saving a boolean array for each object pointing out where the errors were negative in the original catalogue and we are replacing the negative errors with the corresponding value of flux (i.e. we are assuming the flux is 1 times the error):
[9]:
for key in catalogue :
obj = catalogue[key]
obj['uplims'] = obj['errors']<0.0
obj['errors'][obj['uplims']] = obj['fluxes'][obj['uplims']]
print(f"object {key} has {obj['uplims'].sum()} non-detections")
object NGC3898 has 0 non-detections
object NGC4351 has 1 non-detections
object NGC3364 has 0 non-detections
object NGC4254 has 1 non-detections
Functions of the galapy.analysis.plot
allow to generate formatted plots to inspect these datasets:
[10]:
from galapy.analysis import plot as gplot
# build a matplotlib figure and axes array with the internal pyplot format
fig, axes = gplot.plt.subplots(2,2,figsize=(10,6), constrained_layout=True)
# Loop on the 4 objects and axes
for ax, key in zip(axes.flatten(), catalogue) :
# set a title for the axes
ax.set_title(key)
# extract object from catalogue
obj = catalogue[key]
# set the image layout (with axis limits)
ax = gplot.sed_layout(
redshift=obj['redshift'], frame='rest', ax = ax,
xlim=(1.e+3, 1.e+7), ylim=(1.e-4, 1.e+3)
)
# plot the fluxes
_ = gplot.sed_obs(
obj['pms'].lpiv, obj['fluxes'], obj['errors'],
lo = obj['uplims'],
redshift = obj['redshift'], frame = 'rest', ax = ax
)

Note that the galapy.analysis.plot
module also contains a function for showing the photometric system used:
[11]:
fig, ax = gplot.plt.subplots(1,1,figsize=(12,3), constrained_layout=True)
_ = gplot.photometric_system(catalogue['NGC3364']['pms'], ax=ax)
