{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fdd60d8c",
   "metadata": {},
   "source": [
    "# Load an ASCII catalogue\n",
    "\n",
    "A format quite used in the field are ASCII catalogues of objects for which an header lists the entries present in the following rows, typically listing a series of sources and relative properties (e.g. catalogues formatted in such a way can be written using the popular tool [TopCat](https://www.star.bris.ac.uk/~mbt/topcat/)).\n",
    "\n",
    "The first two lines of a catalogue formatted in such a way are shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30cc4819",
   "metadata": {},
   "outputs": [],
   "source": [
    "input_file = 'data/dustpedia_z0.dat'\n",
    "!head -2 $input_file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "403ff8e3",
   "metadata": {},
   "source": [
    "> Credits for the data in the above catalogue goes to the [DustPedia Collaboration](http://dustpedia.astro.noa.gr/) and to [Casasola et al. (2020, A&A, 633, 100)](https://www.aanda.org/articles/aa/pdf/2020/01/aa36665-19.pdf) in particular."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9520adc2",
   "metadata": {},
   "source": [
    "Note that, as already mentioned, the first line lists all the fields of the following lines, starting from the second line, each hosts data from a different source.\n",
    "\n",
    "In order to automatise and ease loading such catalogues in a format that is compliant with the requirements of the library, a specific function for the conversion of catalogues into dictionary is made available and can be imported by calling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce91f361",
   "metadata": {},
   "outputs": [],
   "source": [
    "from galapy.internal.utils import cat_to_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b093d5d0",
   "metadata": {},
   "source": [
    "The documentation of the function can be retrieved with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c726dcc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(cat_to_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "285f3191",
   "metadata": {},
   "source": [
    "The purpose is to read from catalogue files a list of bands, fluxes and associated errors into a nested (2-levels) dictionary.\n",
    "\n",
    "Some features:\n",
    "\n",
    "- the comment character ``'#'`` at the beginning of the header line is optional. If it is present though, a space at the beginning of each following line is necessary.\n",
    "- the entries are automatically converted into floats (except for the column passed as ``id_field``). All the entries that cannot be converted into floats (except for the column passed as ``id_field``) will be skipped and a warning message will be print on screen.\n",
    "- the system automatically interprets as separators tabs, spaces and commas (i.e. ``'1 2,3\\t4'`` is automatically interpreted as 4 different entries: ``1``, ``2``, ``3`` and ``4``). Note, though, this also means not to insert spaces into the ``id_field`` entry.\n",
    "\n",
    "Let's try it out:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e2110f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "catalogue = cat_to_dict( \n",
    "    input_file, id_field='name', err_field='_err', \n",
    "    meta_fields = ['redshift', 'redshift_err'], \n",
    "    skip_fields = ['ra', 'dec', 'semimaj_arcsec', 'axial_ratio', 'pos_angle', '*_flag']\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a84d4e85",
   "metadata": {},
   "source": [
    "We have made the following choices, based on the format of the file:\n",
    "\n",
    "- ``id_field = 'name'`` the column named ``'name'`` is the identification name of the source.\n",
    "- ``err_field = '_err'`` all fields containing the ``'_err'`` string are interpreted as errors on fluxes.\n",
    "- the ``meta_fields`` argument lists all the entries that have to be read but not interpreted as a flux nor an error on a flux.\n",
    "- the ``skip_fields`` argument lists all the entries that have to be ignored when reading the file. \n",
    "\n",
    "Note also that we have used a magic character in ``'*_flag'``, this tells the system to apply the specified behaviour to all fields that end with the string ``'_flag'``.\n",
    "\n",
    "The object we named ``catalogue`` is a standard python dictionary, we can list the sources stored with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28ae43f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "list(catalogue)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02611707",
   "metadata": {},
   "source": [
    "Each source is a subdictionary with the following entries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f8c3aad",
   "metadata": {},
   "outputs": [],
   "source": [
    "list(catalogue['NGC3898'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "025b955c",
   "metadata": {},
   "source": [
    "> (we choose to list the entries of the NGC3898 but the same would have been shown for any other source in the catalogue)\n",
    "\n",
    "As you can see, we have 3 default entries:\n",
    "\n",
    "- ``bands``: a list of band (filter) names \n",
    "- ``fluxes``: a list of fluxes associated to the bands listed in ``bands``\n",
    "- ``errors``: a list of errors associated to the flux measurement listed in ``fluxes``\n",
    "\n",
    "**NOTE THAT GalaPy USES MILLIJANSKY AS FLUX UNIT, THEREFORE, IF THE FLUXES IN THE CATALOGUE ARE NOT IN MILLIJANSKY, CONVERT THEM AS SOON AS YOU LOAD THEM!**\n",
    "\n",
    "Additionally, the system also loaded the ``redshift`` and associated error as meta-fields, as requested when calling the function."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19294e58",
   "metadata": {},
   "source": [
    "## Let's see what we have loaded\n",
    "\n",
    "First of all, we can check what bands we have measured fluxes, this can be by calling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdca64db",
   "metadata": {},
   "outputs": [],
   "source": [
    "catalogue['NGC3364']['bands']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dd8075c",
   "metadata": {},
   "source": [
    "Conveniently, we have stored the band names with the same format accepted by GalaPy, i.e.:\n",
    "\n",
    "> **Experiment[.Instrument].BandName**\n",
    "\n",
    "and **all the filters** are present in the data-base.\n",
    "\n",
    "> Note that you can check what filters are present in the database by calling \n",
    ">```python\n",
    ">from galapy.PhotometricSystem import print_filters\n",
    ">print_filters()\n",
    ">```\n",
    "\n",
    "Therefore, we can build a photometric-system for each of object in the catalogue:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0006c90",
   "metadata": {},
   "outputs": [],
   "source": [
    "from galapy.PhotometricSystem import PMS\n",
    "\n",
    "for obj in catalogue.values() :\n",
    "    obj['pms'] = PMS(*obj['bands'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4d7f9d4",
   "metadata": {},
   "source": [
    "In the catalogue above, upper limits have been marked as fluxes with negative errors. Having a negative value for the error is not liked by the system, we want to change that and keep track of the *\"is an upper-limit\"* flag.\n",
    "\n",
    "We are therefore saving a boolean array for each object pointing out where the errors were negative in the original catalogue **and** we are replacing the negative errors with the corresponding value of flux (i.e. we are assuming the flux is 1 times the error):  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43c249b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in catalogue :\n",
    "    obj = catalogue[key]\n",
    "    obj['uplims'] = obj['errors']<0.0\n",
    "    obj['errors'][obj['uplims']] = obj['fluxes'][obj['uplims']]\n",
    "    print(f\"object {key} has {obj['uplims'].sum()} non-detections\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38ca9ecb",
   "metadata": {},
   "source": [
    "Functions of the ``galapy.analysis.plot`` allow to generate formatted plots to inspect these datasets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b587056",
   "metadata": {},
   "outputs": [],
   "source": [
    "from galapy.analysis import plot as gplot\n",
    "\n",
    "# build a matplotlib figure and axes array with the internal pyplot format \n",
    "fig, axes = gplot.plt.subplots(2,2,figsize=(10,6), constrained_layout=True)\n",
    "\n",
    "# Loop on the 4 objects and axes\n",
    "for ax, key in zip(axes.flatten(), catalogue) :\n",
    "    \n",
    "    # set a title for the axes\n",
    "    ax.set_title(key)\n",
    "    \n",
    "    # extract object from catalogue\n",
    "    obj = catalogue[key]\n",
    "    \n",
    "    # set the image layout (with axis limits)\n",
    "    ax = gplot.sed_layout(\n",
    "        redshift=obj['redshift'], frame='rest', ax = ax, \n",
    "        xlim=(1.e+3, 1.e+7), ylim=(1.e-4, 1.e+3)\n",
    "    )\n",
    "    \n",
    "    # plot the fluxes\n",
    "    _ = gplot.sed_obs(\n",
    "        obj['pms'].lpiv, obj['fluxes'], obj['errors'], \n",
    "        lo = obj['uplims'],\n",
    "        redshift = obj['redshift'], frame = 'rest', ax = ax\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55d03113",
   "metadata": {},
   "source": [
    "Note that the ``galapy.analysis.plot`` module also contains a function for showing the photometric system used:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f2bdc28",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = gplot.plt.subplots(1,1,figsize=(12,3), constrained_layout=True)\n",
    "_ = gplot.photometric_system(catalogue['NGC3364']['pms'], ax=ax)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}