{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# NCL_xy_18.py\nThis script illustrates the following concepts:\n    - Filling the area between two curves in an XY plot\n    - Labeling the bottom X axis with years\n    - Drawing a main title on three separate lines\n    - Calculating a weighted average\n    - Changing the size/shape of an XY plot using viewport resources\n    - Manually creating a legend\n    - Overlaying XY plots on each other\n    - Maximizing plots after they've been created\n\nSee following URLs to see the reproduced NCL plot & script:\n    - Original NCL script: https://www.ncl.ucar.edu/Applications/Scripts/xy_18.ncl\n    - Original NCL plot: https://www.ncl.ucar.edu/Applications/Images/xy_18_lg.png\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Import packages:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport xarray as xr\nfrom matplotlib import pyplot as plt\n\nimport geocat.datafiles as gdf\nfrom geocat.viz import util as gvutil"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Read in data:\n\nOpen files and read in monthly data\n\nXarray's ``open_mfdataset`` (open multi-file dataset) method will attempt to\nmerge all of the individual datasets (i.e., NetCDF files) into one single\nXarray ``Dataset``.  The ``concat_dim`` and ``combine`` keyword arguments to\nthis method give you control over how this merging takes place (see the\nXarray documentation for more information).\n\nIn the below example, each NetCDF file represents the same variables and\ncoordinates, but from a different ensemble member.  There is no ``case`` (or\nensemble) dimension explicitly declared in the files, so we use the\n``concat_dim`` argument to state that we will create a new dimension called\n``case`` that spans the ensemble members.  Here, each file contains a\n``TREFHT`` variable that depends upon dimensions ``(time, lat, lon)`` and\ncoordinate variables ``time``, ``lat`` and ``lon``.  After opening these\nfiles with ``open_mfdataset``, the resulting Xarray ``Dataset`` will consist\nof a ``TREFHT`` variable that depends upon dimensions ``(case, time, lat, lon)``\nand coordinate variables ``case``, ``time``, ``lat`` and ``lon``.\n\n**NOTE:** One of the files (``TREFHT.B06.69.atm.1890-1999ANN.nc``) contains\na ``time`` coordinate variable with a ``calendar`` attribute having the\nvalue ``noleap`` (i.e., the \"No leap year\" non-standard calendar).  The\n``time`` coordinate variable in all of the other files do not have a\n``calendar`` attribute *at all*.  By default, when Xarray's ``open_mfdataset``\nreads each individual dataset, it will attempt to decode the ``time`` coordinate\ninto an appropriate ``datetime`` object, so that you can then take advantage of\nXarray's (and Pandas's) excellent time-series manipulation capabilities.\nHowever, due to the lacking ``calendar`` attribute in most of the files\n(which, according to CF conventions, defaults to the ``standard`` Gregorian\ncalendar) and the ``noleap`` calendar attribute in one of the files, the\n``time`` coordinate variable will be interpreted as \"non-uniform\" across all\nof the datasets.  To fix this problem, we tell Xarray's ``open_mfdataset``\nfunction to *not* decode the ``time`` coordinate into ``datetime`` objects\nby passing the ``decode_times=False`` argument.  Second, we pass a pre-processing\nfunction via the ``preprocess`` argument to ``open_mfdataset``, telling\nXarray to read each individual dataset from file (with the ``decode_times=False``\noption) and then modify the resulting dataset according to the pre-processing\nfunction.  In this case, the pre-processing function (``assume_noleap_calendar``)\ntakes the single-file dataset, sets the ``calendar`` attribute of the ``time``\ncoordinate variable to ``noleap``, and returns the *decoded* dataset (using\nthe Xarray function ``decode_cf``).  Work-arounds like this are needed\nwhenever you have \"errors\" or \"inconsistancies\" in your data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Define the xarray.open_mfdataset pre-processing function\n# (Must take an xarray.Dataset as input and return an xarray.Dataset)\ndef assume_noleap_calendar(ds):\n    ds.time.attrs['calendar'] = 'noleap'\n    return xr.decode_cf(ds)\n\n\n# Create a dataset for the \"natural\" (i.e., no anthropogenic effects) data\nnfiles = [\n    gdf.get(\"netcdf_files/TREFHT.B06.66.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.67.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.68.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.69.atm.1890-1999ANN.nc\")\n]\nnds = xr.open_mfdataset(nfiles,\n                        concat_dim='case',\n                        combine='nested',\n                        preprocess=assume_noleap_calendar,\n                        decode_times=False)\n\n# Create a dataset for the \"natural + anthropogenic\" data\nvfiles = [\n    gdf.get(\"netcdf_files/TREFHT.B06.61.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.59.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.60.atm.1890-1999ANN.nc\"),\n    gdf.get(\"netcdf_files/TREFHT.B06.57.atm.1890-1999ANN.nc\")\n]\nvds = xr.open_mfdataset(vfiles,\n                        concat_dim='case',\n                        combine='nested',\n                        preprocess=assume_noleap_calendar,\n                        decode_times=False)\n\n# Read the \"weights\" file\n# (The xarray.Dataset.expand_dims call adds the longitude dimension to the\n# dataset, which originally depends only upon the latitude dimension. This\n# arguably makes computing the weighted means below more straight-forward.)\ngds = xr.open_dataset(gdf.get(\"netcdf_files/gw.nc\"))\ngds = gds.expand_dims(dim={'lon': nds.lon})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Observations:\n\nRead in the observational data from an ASCII (text) file.  Here, we use\nNumpy's nice ``loadtxt`` method to read the data from the text file and\nreturn a Numpy array with ``float`` type.  Then, we construct an Xarray\n``DataArray`` explicitly, since the time values are not stored in the\nASCII data file (we have to know them!).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "obs_data = np.loadtxt(gdf.get(\"ascii_files/jones_glob_ann_2002.asc\"),\n                      dtype=float)\nobs_time = xr.cftime_range('1856-07-16T22:00:00',\n                           freq='365D',\n                           periods=len(obs_data),\n                           calendar='noleap')\nobs = xr.DataArray(name='TREFHT', data=obs_data, coords=[('time', obs_time)])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## NCL-based Weighted Mean Function:\n\nWe define this function just for convenience.  This is equivalent to how\nNCL computes the weighted mean.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def horizontal_weighted_mean(var, wgts):\n    return (var * wgts).sum(dim=['lat', 'lon']) / wgts.sum(dim=['lat', 'lon'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Natural data:\n\nWe compute the weighted mean across the latitude and longitude dimensions\n(leaving only the ``case`` and ``time`` dimensions), and then we compute the\nanomaly measured from the average of the first 30 years.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "gavn = horizontal_weighted_mean(nds[\"TREFHT\"], gds[\"gw\"])\ngavan = gavn - gavn.sel(time=slice('1890', '1920')).mean(dim='time')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Natural + Anthropogenic data:\n\nWe do the same thing for the \"natural + anthropogenic\" data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "gavv = horizontal_weighted_mean(vds[\"TREFHT\"], gds[\"gw\"])\ngavav = gavv - gavv.sel(time=slice('1890', '1920')).mean(dim='time')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Observation data:\n\nWe do the same thing for the observation data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "obs_avg = obs.sel(time=slice('1890', '1999')) - obs.sel(\n    time=slice('1890', '1920')).mean(dim='time')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Calculate the ensemble Min. & Max. & Mean:\n\nHere we find the ``min``, ``max``, and ``mean`` along the ``case`` (i.e.,\nensemble) dimension (leaving only the ``time`` dimension) for both of our\ndatasets.  We compute the equivalent anomaly for the observations data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "gavan_min = gavan.min(dim='case')\ngavan_max = gavan.max(dim='case')\ngavan_avg = gavan.mean(dim='case')\n\ngavav_min = gavav.min(dim='case')\ngavav_max = gavav.max(dim='case')\ngavav_avg = gavav.mean(dim='case')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plot:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Generate figure (set its size (width, height) in inches) and axes\nfig, ax = plt.subplots(figsize=(10.5, 6))\n\n# We create the time axis data, not as datetime objects, but as just years\n# The following line of code is equivalent to this:\n#     time = [t.year for t in gavan.time.values]\n# but it uses Xarray's convenient DatetimeAccessor functionality.\ntime = gavan.time.dt.year\n\n# Plot data and add a legend\nax.plot(time, obs_avg, color='black', label='Observations', zorder=4)\nax.plot(time, gavan_avg, color='blue', label='Natural', zorder=3)\nax.plot(time, gavav_avg, color='red', label='Anthropogenic + Natural', zorder=2)\nax.legend(loc='upper left', frameon=False, fontsize=18)\n\n# Use geocat.viz.util convenience function to add minor and major tick lines\ngvutil.add_major_minor_ticks(ax,\n                             x_minor_per_major=4,\n                             y_minor_per_major=3,\n                             labelsize=20)\n\n# Use geocat.viz.util convenience function to set axes limits & tick values without calling several matplotlib functions\ngvutil.set_axes_limits_and_ticks(ax,\n                                 xlim=(1890, 2000),\n                                 ylim=(-0.4, 1),\n                                 xticks=np.arange(1900, 2001, step=20),\n                                 yticks=np.arange(-0.3, 1, step=0.3))\n\n# Set three titles on top of each other using axes title and texts\nax.set_title('Parallel Climate Model Ensembles', fontsize=24, pad=60.0)\nax.text(0.5,\n        1.125,\n        'Global Temperature Anomalies',\n        fontsize=18,\n        ha='center',\n        va='center',\n        transform=ax.transAxes)\nax.text(0.5,\n        1.06,\n        'from 1890-1919 average',\n        fontsize=14,\n        ha='center',\n        va='center',\n        transform=ax.transAxes)\nax.set_ylabel('$^\\circ$C', fontsize=24)\nax.fill_between(time, gavan_min, gavan_max, color='lightblue', zorder=0)\nax.fill_between(time, gavav_min, gavav_max, color='lightpink', zorder=1)\n\n# Show the plot\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}