.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/data_transformations.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_data_transformations.py: Data transformations ==================== The statistics of intermittent precipitation rates are particularly non-Gaussian and display an asymmetric distribution bounded at zero. Such properties restrict the usage of well-established statistical methods that assume symmetric or Gaussian data. A common workaround is to introduce a suitable data transformation to approximate a normal distribution. In this example, we test the data transformation methods available in pysteps in order to obtain a more symmetric distribution of the precipitation data (excluding the zeros). The currently available transformations include the Box-Cox, dB, square-root and normal quantile transforms. .. GENERATED FROM PYTHON SOURCE LINES 21-29 .. code-block:: default from datetime import datetime import matplotlib.pyplot as plt import numpy as np from pysteps import io, rcparams from pysteps.utils import conversion, transformation from scipy.stats import skew .. GENERATED FROM PYTHON SOURCE LINES 30-36 Read the radar input images --------------------------- First, we will import the sequence of radar composites. You need the pysteps-data archive downloaded and the pystepsrc file configured with the data_source paths pointing to data folders. .. GENERATED FROM PYTHON SOURCE LINES 36-42 .. code-block:: default # Selected case date = datetime.strptime("201609281600", "%Y%m%d%H%M") data_source = rcparams.data_sources["fmi"] .. GENERATED FROM PYTHON SOURCE LINES 43-45 Load the data from the archive ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 45-69 .. code-block:: default root_path = data_source["root_path"] path_fmt = data_source["path_fmt"] fn_pattern = data_source["fn_pattern"] fn_ext = data_source["fn_ext"] importer_name = data_source["importer"] importer_kwargs = data_source["importer_kwargs"] timestep = data_source["timestep"] # Get 1 hour of observations in the data archive fns = io.archive.find_by_date( date, root_path, path_fmt, fn_pattern, fn_ext, timestep, num_next_files=11 ) # Read the radar composites importer = io.get_method(importer_name, "importer") Z, _, metadata = io.read_timeseries(fns, importer, **importer_kwargs) # Keep only positive rainfall values Z = Z[Z > metadata["zerovalue"]].flatten() # Convert to rain rate R, metadata = conversion.to_rainrate(Z, metadata) .. GENERATED FROM PYTHON SOURCE LINES 70-72 Test data transformations ------------------------- .. GENERATED FROM PYTHON SOURCE LINES 72-103 .. code-block:: default # Define method to visualize the data distribution with boxplots and plot the # corresponding skewness def plot_distribution(data, labels, skw): N = len(data) fig, ax1 = plt.subplots() ax2 = ax1.twinx() ax2.plot(np.arange(N + 2), np.zeros(N + 2), ":r") ax1.boxplot(data, labels=labels, sym="", medianprops={"color": "k"}) ymax = [] for i in range(N): y = skw[i] x = i + 1 ax2.plot(x, y, "*r", ms=10, markeredgecolor="k") ymax.append(np.max(data[i])) # ylims ylims = np.percentile(ymax, 50) ax1.set_ylim((-1 * ylims, ylims)) ylims = np.max(np.abs(skw)) ax2.set_ylim((-1.1 * ylims, 1.1 * ylims)) # labels ax1.set_ylabel(r"Standardized values [$\sigma$]") ax2.set_ylabel(r"Skewness []", color="r") ax2.tick_params(axis="y", labelcolor="r") .. GENERATED FROM PYTHON SOURCE LINES 104-117 Box-Cox transform ~~~~~~~~~~~~~~~~~ The Box-Cox transform is a well-known power transformation introduced by `Box and Cox (1964)`_. In its one-parameter version, the Box-Cox transform takes the form T(x) = ln(x) for lambda = 0, or T(x) = (x**lambda - 1)/lambda otherwise. To find a suitable lambda, we will experiment with a range of values and select the one that produces the most symmetric distribution, i.e., the lambda associated with a value of skewness closest to zero. To visually compare the results, the transformed data are standardized. .. _`Box and Cox (1964)`: https://doi.org/10.1111/j.2517-6161.1964.tb00553.x .. GENERATED FROM PYTHON SOURCE LINES 117-143 .. code-block:: default data = [] labels = [] skw = [] # Test a range of values for the transformation parameter Lambda Lambdas = np.linspace(-0.4, 0.4, 11) for i, Lambda in enumerate(Lambdas): R_, _ = transformation.boxcox_transform(R, metadata, Lambda) R_ = (R_ - np.mean(R_)) / np.std(R_) data.append(R_) labels.append("{0:.2f}".format(Lambda)) skw.append(skew(R_)) # skewness # Plot the transformed data distribution as a function of lambda plot_distribution(data, labels, skw) plt.title("Box-Cox transform") plt.tight_layout() plt.show() # Best lambda idx_best = np.argmin(np.abs(skw)) Lambda = Lambdas[idx_best] print("Best parameter lambda: %.2f\n(skewness = %.2f)" % (Lambda, skw[idx_best])) .. image-sg:: /auto_examples/images/sphx_glr_data_transformations_001.png :alt: Box-Cox transform :srcset: /auto_examples/images/sphx_glr_data_transformations_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Best parameter lambda: 0.16 (skewness = 0.02) .. GENERATED FROM PYTHON SOURCE LINES 144-146 Compare data transformations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 146-151 .. code-block:: default data = [] labels = [] skw = [] .. GENERATED FROM PYTHON SOURCE LINES 152-155 Rain rates ~~~~~~~~~~ First, let's have a look at the original rain rate values. .. GENERATED FROM PYTHON SOURCE LINES 155-160 .. code-block:: default data.append((R - np.mean(R)) / np.std(R)) labels.append("R") skw.append(skew(R)) .. GENERATED FROM PYTHON SOURCE LINES 161-164 dB transform ~~~~~~~~~~~~ We transform the rainfall data into dB units: 10*log(R) .. GENERATED FROM PYTHON SOURCE LINES 164-170 .. code-block:: default R_, _ = transformation.dB_transform(R, metadata) data.append((R_ - np.mean(R_)) / np.std(R_)) labels.append("dB") skw.append(skew(R_)) .. GENERATED FROM PYTHON SOURCE LINES 171-174 Square-root transform ~~~~~~~~~~~~~~~~~~~~~ Transform the data using the square-root: sqrt(R) .. GENERATED FROM PYTHON SOURCE LINES 174-180 .. code-block:: default R_, _ = transformation.sqrt_transform(R, metadata) data.append((R_ - np.mean(R_)) / np.std(R_)) labels.append("sqrt") skw.append(skew(R_)) .. GENERATED FROM PYTHON SOURCE LINES 181-184 Box-Cox transform ~~~~~~~~~~~~~~~~~ We now apply the Box-Cox transform using the best parameter lambda found above. .. GENERATED FROM PYTHON SOURCE LINES 184-190 .. code-block:: default R_, _ = transformation.boxcox_transform(R, metadata, Lambda) data.append((R_ - np.mean(R_)) / np.std(R_)) labels.append("Box-Cox\n($\lambda=$%.2f)" % Lambda) skw.append(skew(R_)) .. GENERATED FROM PYTHON SOURCE LINES 191-197 Normal quantile transform ~~~~~~~~~~~~~~~~~~~~~~~~~ At last, we apply the empirical normal quantile (NQ) transform as described in `Bogner et al (2012)`_. .. _`Bogner et al (2012)`: http://dx.doi.org/10.5194/hess-16-1085-2012 .. GENERATED FROM PYTHON SOURCE LINES 197-203 .. code-block:: default R_, _ = transformation.NQ_transform(R, metadata) data.append((R_ - np.mean(R_)) / np.std(R_)) labels.append("NQ") skw.append(skew(R_)) .. GENERATED FROM PYTHON SOURCE LINES 204-211 By plotting all the results, we can notice first of all the strongly asymmetric distribution of the original data (R) and that all transformations manage to reduce its skewness. Among these, the Box-Cox transform (using the best parameter lambda) and the normal quantile (NQ) transform provide the best correction. Despite not producing a perfectly symmetric distribution, the square-root (sqrt) transform has the strong advantage of being defined for zeros, too, while all other transformations need an arbitrary rule for non-positive values. .. GENERATED FROM PYTHON SOURCE LINES 211-216 .. code-block:: default plot_distribution(data, labels, skw) plt.title("Data transforms") plt.tight_layout() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_data_transformations_002.png :alt: Data transforms :srcset: /auto_examples/images/sphx_glr_data_transformations_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 6.745 seconds) .. _sphx_glr_download_auto_examples_data_transformations.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: data_transformations.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: data_transformations.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_