diff --git a/docs/Precision_Calculation.ipynb b/docs/Precision_Calculation.ipynb new file mode 100644 index 000000000..43ac3ce5a --- /dev/null +++ b/docs/Precision_Calculation.ipynb @@ -0,0 +1,463 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "899290c2", + "metadata": {}, + "source": [ + "In this notebook, we try to provide some calculations that might help clarifying the precision in z-normalize case." + ] + }, + { + "cell_type": "markdown", + "id": "1d62668e", + "metadata": {}, + "source": [ + "# Approach I:" + ] + }, + { + "cell_type": "markdown", + "id": "35abf2be", + "metadata": {}, + "source": [ + "$d = \\sqrt{2m (1 - \\rho)}$\n", + "\n", + "Now, we take the derivative of function $f(\\rho) = \\sqrt{2m (1 - \\rho)}$ with respect to the variable $\\rho$:\n", + "\n", + "$\\frac{df}{d\\rho} = \\sqrt{2m}\\frac{-1}{2\\sqrt{1 - \\rho}}$" + ] + }, + { + "cell_type": "markdown", + "id": "a17ad50d", + "metadata": {}, + "source": [ + "Now, let us assume $d$ is the precise z-normalized distance between two subsequence, and $\\hat{d}$ the z-normalized distance with imprecision. Similarly, $\\rho$ and $\\hat{\\rho}$ are, respectively, the pearson values, wihout and with imprecision.\n", + "\n", + "$d - \\hat{d} = \\Delta{d}$\n", + "\n", + "$\\rho - \\hat{\\rho} = \\Delta{\\rho}$" + ] + }, + { + "cell_type": "markdown", + "id": "87a316ba", + "metadata": {}, + "source": [ + "So, if we assume the two values $\\Delta{d}$ and $\\Delta{\\rho}$ are small values, then we can say:" + ] + }, + { + "cell_type": "markdown", + "id": "88f584ff", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "\\frac{\\Delta{d}}{\\Delta{\\rho}} \\approx{}&\n", + "\\frac{-\\sqrt{2m}}{2\\sqrt{1 - \\rho}}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "e399ace5", + "metadata": {}, + "source": [ + "Hence:" + ] + }, + { + "cell_type": "markdown", + "id": "662679a1", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "\\Delta{d} \\approx{}&\n", + "\\frac{-\\sqrt{2m}\\Delta{\\rho}}{2\\sqrt{1 - \\rho}}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "05636e81", + "metadata": {}, + "source": [ + "Let's say we want to have small error in z-normalized distance, i.e. $|\\Delta{d}| < \\epsilon_{d}$. (In STUMPY, $\\epsilon_{d}$ is set by config variable `STUMPY_TEST_PRECISION`, which is default to 1e-5.)" + ] + }, + { + "cell_type": "markdown", + "id": "cc569fbc", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "|\\Delta{d}| \\le \\epsilon_{d}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "426fe547", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "\\left|\\frac{-\\sqrt{2m}\\Delta{\\rho}}{2\\sqrt{1 - \\rho}}\\right| \\le \\epsilon_{d}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "fdea8b61", + "metadata": {}, + "source": [ + "Also, note that:\n", + "$\\Delta_{\\rho} = \\rho - \\hat{\\rho} = \\frac{cov}{s} - \\frac{\\hat{cov}}{s} = \\frac{\\Delta{cov}}{s}$, **where `s` is the multiplication of std values of two subsequenes** for which the cov is being calculated." + ] + }, + { + "cell_type": "markdown", + "id": "5cf90e29", + "metadata": {}, + "source": [ + "Note that we assumed the standard deviation has no imprecision." + ] + }, + { + "cell_type": "markdown", + "id": "cf75b5d0", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "\\left|\\frac{\\sqrt{2m}\\Delta{cov}}{2s\\sqrt{1 - \\rho}}\\right| \\le \\epsilon_{d}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "880334cb", + "metadata": {}, + "source": [ + "Hence:" + ] + }, + { + "cell_type": "markdown", + "id": "4fc3aca6", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "\\left|\\frac{2s\\sqrt{1 - \\rho}}{\\sqrt{2m}\\Delta{cov}}\\right| \\ge \\frac{1}{\\epsilon_{d}}\n", + "\\end{align}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e1dd1f5", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "s \\ge \\frac{\\sqrt{2m}|\\Delta{cov}|}{2\\epsilon_{d}\\sqrt{1 - \\rho}}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "ab2d537e", + "metadata": {}, + "source": [ + "So, if have no imprecision in `cov`, i.e. $\\Delta{cov}=0$, then, $s \\ge 0$, which is always satisfied. But, if we have imprecision in `cov`, i.e. $\\Delta{cov} \\ne 0$, then `s` should satisfiy the inequality above, otherwise, we the imprecision in distance, $|\\Delta{d}|$, will be more than $\\epsilon_{d}$ (hence, error in STUMPY test suite)" + ] + }, + { + "cell_type": "markdown", + "id": "5d5dbc2a", + "metadata": {}, + "source": [ + "# Approach II:" + ] + }, + { + "cell_type": "markdown", + "id": "310010fd", + "metadata": {}, + "source": [ + "Let us assume $d$ is the precise z-normalized distance between two subsequence, and $\\hat{d}$ is the z-normalized distance with imprecision." + ] + }, + { + "cell_type": "markdown", + "id": "d4a7fdef", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + "d - \\hat{d} ={}& \\Delta{d}\n", + "\\\\\n", + "\\sqrt{2m(1-\\rho)} - \\sqrt{2m(1 - \\hat{\\rho})} ={}& \\Delta{d}\n", + "\\\\\n", + "\\sqrt{1-\\rho} - \\sqrt{1 - \\hat{\\rho}} ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + "\\end{align}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e72edc34", + "metadata": {}, + "source": [ + "Also, recall that $\\rho = \\frac{cov}{stddev}$, where `cov` is the covariance between two subsequence and `stddev` is the mutiplication of the standard devations of the two subsequences. Hereafter, it is denoted as `s`. We also assume `s` is exact and has no imprecision." + ] + }, + { + "cell_type": "markdown", + "id": "527970c3", + "metadata": {}, + "source": [ + "$\\rho = \\frac{cov}{s}$\n", + "\n", + "$\\hat{\\rho} = \\frac{\\hat{cov}}{s}$\n", + "\n", + "\n", + "Hence:\n", + "\n", + "$\\rho - \\hat{\\rho} = \\frac{\\Delta{cov}}{s}$" + ] + }, + { + "cell_type": "markdown", + "id": "7a28b1af", + "metadata": {}, + "source": [ + "Now, let us solve the following equations:" + ] + }, + { + "cell_type": "markdown", + "id": "f177c1a6", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " \\sqrt{1-\\rho} - \\sqrt{1 - \\hat{\\rho}} ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\\\\n", + " \\rho - \\hat{\\rho} = \\frac{\\Delta{cov}}{s}\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e47990d3", + "metadata": {}, + "source": [ + "we define new variables:\n", + "\n", + "$\\rho' \\triangleq 1 - \\rho$; note that $\\rho'$ is in range `[0, 2]`\n", + "\n", + "$\\hat{\\rho}' \\triangleq 1 - \\hat{\\rho}$; note that $\\hat{\\rho}'$ is in range `[0, 2]`" + ] + }, + { + "cell_type": "markdown", + "id": "bee64357", + "metadata": {}, + "source": [ + "Hence, we now have:" + ] + }, + { + "cell_type": "markdown", + "id": "d0d3e4f6", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " \\sqrt{\\rho'} - \\sqrt{\\hat{\\rho}'} ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\\\\n", + " \\hat{\\rho}' - \\rho' ={}& \\frac{\\Delta{cov}}{s}\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1060a03e", + "metadata": {}, + "source": [ + "And, we define new variables:\n", + "\n", + "$x \\triangleq \\sqrt{\\rho'}$, note that x is in range $[0, \\sqrt{2}]$\n", + "\n", + "$y \\triangleq \\sqrt{\\hat{\\rho}'}$, note that x is in range $[0, \\sqrt{2}]$" + ] + }, + { + "cell_type": "markdown", + "id": "521b5726", + "metadata": {}, + "source": [ + "Hence:" + ] + }, + { + "cell_type": "markdown", + "id": "f144188c", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " x - y ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\\\\n", + " y^{2} - x^{2} ={}& \\frac{\\Delta{cov}}{s}\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "d42b7e1f", + "metadata": {}, + "source": [ + "Thus,\n", + "\n", + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " x - y ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\\\\n", + " (y - x)(y + x) ={}& \\frac{\\Delta{cov}}{s}\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "172ab9e8", + "metadata": {}, + "source": [ + "By dividing the second equation by the first equation, we can find the equation for `y + x`. Hence:" + ] + }, + { + "cell_type": "markdown", + "id": "d305e9ea", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " x - y ={}& \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\\\\n", + " y + x ={}& -\\frac{\\Delta{cov}\\sqrt{2m}}{s\\Delta{d}}\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "e23c1cf3", + "metadata": {}, + "source": [ + "Now, we can solve this system of equations for `x` and `y`. Hence:" + ] + }, + { + "cell_type": "markdown", + "id": "ad92ebe0", + "metadata": {}, + "source": [ + "\n", + "$$\n", + "\\begin{align}\n", + " \\begin{cases}\n", + " x ={}& \\frac{1}{2}\\left(\n", + " \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " -\n", + " \\frac{\\Delta{cov}\\sqrt{2m}}{s\\Delta{d}}\n", + " \\right)\n", + " \\\\\n", + " y ={}& -\\frac{1}{2}\\left(\n", + " \\frac{\\Delta{cov}\\sqrt{2m}}{s\\Delta{d}}\n", + " +\n", + " \\frac{\\Delta{d}}{\\sqrt{2m}}\n", + " \\right)\n", + " \\end{cases}\n", + "\\end{align}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "dabaa326", + "metadata": {}, + "source": [ + "**What now? :)**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "761f7c89", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/stumpy/config.py b/stumpy/config.py index dad017ec3..1338b1ff1 100644 --- a/stumpy/config.py +++ b/stumpy/config.py @@ -8,9 +8,12 @@ STUMPY_MEAN_STD_NUM_CHUNKS = 1 STUMPY_MEAN_STD_MAX_ITER = 10 STUMPY_DENOM_THRESHOLD = 1e-14 -STUMPY_STDDEV_THRESHOLD = 1e-7 +STUMPY_STDDEV_THRESHOLD = 1e-20 STUMPY_P_NORM_THRESHOLD = 1e-14 STUMPY_TEST_PRECISION = 5 STUMPY_MAX_P_NORM_DISTANCE = np.finfo(np.float64).max STUMPY_MAX_DISTANCE = np.sqrt(STUMPY_MAX_P_NORM_DISTANCE) STUMPY_EXCL_ZONE_DENOM = 4 +STUMPY_MIN_VAR = 1.0 +STUMPY_MIN_STD_AB = 1.0 # denom in equation: pearson_AB = cov / (std_A * std_B) +STUMPY_CORRELATION_THRESHOLD = 0.99999999 # 1 - 1e-08 diff --git a/stumpy/core.py b/stumpy/core.py index 3be3bdc01..b318f1a68 100644 --- a/stumpy/core.py +++ b/stumpy/core.py @@ -577,6 +577,8 @@ def _welford_nanvar(a, w, a_subseq_isfinite): * (a[last_idx] - curr_mean + a[prev_start_idx] - prev_mean) / w ) + if curr_var < config.STUMPY_MIN_VAR: + curr_var = np.nanvar(a[start_idx:stop_idx]) all_variances[start_idx] = curr_var @@ -1738,8 +1740,8 @@ def preprocess_diagonal(T, m): M_T : numpy.ndarray Rolling mean with a subsequence length of `m` - Σ_T_inverse : numpy.ndarray - Inverted rolling standard deviation + Σ_T : numpy.ndarray + Rolling standard deviation M_T_m_1 : numpy.ndarray Rolling mean with a subsequence length of `m-1` @@ -1753,12 +1755,12 @@ def preprocess_diagonal(T, m): """ T, T_subseq_isfinite = preprocess_non_normalized(T, m) M_T, Σ_T = compute_mean_std(T, m) + T_subseq_isconstant = Σ_T < config.STUMPY_STDDEV_THRESHOLD Σ_T[T_subseq_isconstant] = 1.0 # Avoid divide by zero in next inversion step - Σ_T_inverse = 1.0 / Σ_T M_T_m_1, _ = compute_mean_std(T, m - 1) - return T, M_T, Σ_T_inverse, M_T_m_1, T_subseq_isfinite, T_subseq_isconstant + return T, M_T, Σ_T, M_T_m_1, T_subseq_isfinite, T_subseq_isconstant def replace_distance(D, search_val, replace_val, epsilon=0.0): diff --git a/stumpy/scrump.py b/stumpy/scrump.py index 3ec55ef7c..d76193756 100644 --- a/stumpy/scrump.py +++ b/stumpy/scrump.py @@ -529,7 +529,7 @@ def __init__( ( self._T_A, self._μ_Q, - self._σ_Q_inverse, + self._σ_Q, self._μ_Q_m_1, self._T_A_subseq_isfinite, self._T_A_subseq_isconstant, @@ -538,7 +538,7 @@ def __init__( ( self._T_B, self._M_T, - self._Σ_T_inverse, + self._Σ_T, self._M_T_m_1, self._T_B_subseq_isfinite, self._T_B_subseq_isconstant, @@ -639,8 +639,8 @@ def update(self): self._m, self._M_T, self._μ_Q, - self._Σ_T_inverse, - self._σ_Q_inverse, + self._Σ_T, + self._σ_Q, self._M_T_m_1, self._μ_Q_m_1, self._T_A_subseq_isfinite, diff --git a/stumpy/stump.py b/stumpy/stump.py index fbf518045..29cec94ab 100644 --- a/stumpy/stump.py +++ b/stumpy/stump.py @@ -25,8 +25,8 @@ def _compute_diagonal( m, M_T, μ_Q, - Σ_T_inverse, - σ_Q_inverse, + Σ_T, + σ_Q, cov_a, cov_b, cov_c, @@ -66,11 +66,11 @@ def _compute_diagonal( μ_Q : numpy.ndarray Mean of the query sequence, `Q`, relative to the current sliding window - Σ_T_inverse : numpy.ndarray - Inverse sliding standard deviation of time series, `T` + Σ_T : numpy.ndarray + Sliding standard deviation of time series, `T` - σ_Q_inverse : numpy.ndarray - Inverse standard deviation of the query sequence, `Q`, relative to the current + σ_Q : numpy.ndarray + Standard deviation of the query sequence, `Q`, relative to the current sliding window cov_a : numpy.ndarray @@ -182,13 +182,35 @@ def _compute_diagonal( if T_B_subseq_isfinite[i + k] and T_A_subseq_isfinite[i]: # Neither subsequence contains NaNs - if T_B_subseq_isconstant[i + k] or T_A_subseq_isconstant[i]: - pearson = 0.5 - else: - pearson = cov * Σ_T_inverse[i + k] * σ_Q_inverse[i] - if T_B_subseq_isconstant[i + k] and T_A_subseq_isconstant[i]: pearson = 1.0 + elif T_B_subseq_isconstant[i + k] or T_A_subseq_isconstant[i]: + pearson = 0.5 + else: + denom = Σ_T[i + k] * σ_Q[i] + if denom < config.STUMPY_MIN_STD_AB: + cov = ( + np.dot( + (T_B[i + k : i + k + m] - M_T[i + k]), + (T_A[i : i + m] - μ_Q[i]), + ) + * m_inverse + ) + + pearson = cov / denom + if pearson > 1.0: + pearson = 1.0 + + # if config.STUMPY_CORRELATION_THRESHOLD <= pearson < 1.0: + # cov = ( + # np.dot( + # (T_B[i + k : i + k + m] - M_T[i + k]), + # (T_A[i : i + m] - μ_Q[i]), + # ) + # * m_inverse + # ) + + # pearson = cov * Σ_T_inverse[i + k] * σ_Q_inverse[i] if pearson > ρ[thread_idx, i, 0]: ρ[thread_idx, i, 0] = pearson @@ -225,8 +247,8 @@ def _stump( m, M_T, μ_Q, - Σ_T_inverse, - σ_Q_inverse, + Σ_T, + σ_Q, M_T_m_1, μ_Q_m_1, T_A_subseq_isfinite, @@ -259,11 +281,11 @@ def _stump( μ_Q : numpy.ndarray Mean of the query sequence, `Q`, relative to the current sliding window - Σ_T_inverse : numpy.ndarray - Inverse sliding standard deviation of time series, `T` + Σ_T : numpy.ndarray + Sliding standard deviation of time series, `T` - σ_Q_inverse : numpy.ndarray - Inverse standard deviation of the query sequence, `Q`, relative to the current + σ_Q : numpy.ndarray + Standard deviation of the query sequence, `Q`, relative to the current sliding window M_T_m_1 : numpy.ndarray @@ -384,8 +406,8 @@ def _stump( m, M_T, μ_Q, - Σ_T_inverse, - σ_Q_inverse, + Σ_T, + σ_Q, cov_a, cov_b, cov_c, @@ -545,7 +567,7 @@ def stump(T_A, m, T_B=None, ignore_trivial=True, normalize=True, p=2.0): ( T_A, μ_Q, - σ_Q_inverse, + σ_Q, μ_Q_m_1, T_A_subseq_isfinite, T_A_subseq_isconstant, @@ -554,7 +576,7 @@ def stump(T_A, m, T_B=None, ignore_trivial=True, normalize=True, p=2.0): ( T_B, M_T, - Σ_T_inverse, + Σ_T, M_T_m_1, T_B_subseq_isfinite, T_B_subseq_isconstant, @@ -600,8 +622,8 @@ def stump(T_A, m, T_B=None, ignore_trivial=True, normalize=True, p=2.0): m, M_T, μ_Q, - Σ_T_inverse, - σ_Q_inverse, + Σ_T, + σ_Q, M_T_m_1, μ_Q_m_1, T_A_subseq_isfinite, diff --git a/stumpy/stumped.py b/stumpy/stumped.py index 696d49080..8a65b1a58 100644 --- a/stumpy/stumped.py +++ b/stumpy/stumped.py @@ -141,7 +141,7 @@ def stumped(dask_client, T_A, m, T_B=None, ignore_trivial=True, normalize=True, ( T_A, μ_Q, - σ_Q_inverse, + σ_Q, μ_Q_m_1, T_A_subseq_isfinite, T_A_subseq_isconstant, @@ -150,7 +150,7 @@ def stumped(dask_client, T_A, m, T_B=None, ignore_trivial=True, normalize=True, ( T_B, M_T, - Σ_T_inverse, + Σ_T, M_T_m_1, T_B_subseq_isfinite, T_B_subseq_isconstant, @@ -202,8 +202,8 @@ def stumped(dask_client, T_A, m, T_B=None, ignore_trivial=True, normalize=True, T_B_future = dask_client.scatter(T_B, broadcast=True, hash=False) M_T_future = dask_client.scatter(M_T, broadcast=True, hash=False) μ_Q_future = dask_client.scatter(μ_Q, broadcast=True, hash=False) - Σ_T_inverse_future = dask_client.scatter(Σ_T_inverse, broadcast=True, hash=False) - σ_Q_inverse_future = dask_client.scatter(σ_Q_inverse, broadcast=True, hash=False) + Σ_T_future = dask_client.scatter(Σ_T, broadcast=True, hash=False) + σ_Q_future = dask_client.scatter(σ_Q, broadcast=True, hash=False) M_T_m_1_future = dask_client.scatter(M_T_m_1, broadcast=True, hash=False) μ_Q_m_1_future = dask_client.scatter(μ_Q_m_1, broadcast=True, hash=False) T_A_subseq_isfinite_future = dask_client.scatter( @@ -238,8 +238,8 @@ def stumped(dask_client, T_A, m, T_B=None, ignore_trivial=True, normalize=True, m, M_T_future, μ_Q_future, - Σ_T_inverse_future, - σ_Q_inverse_future, + Σ_T_future, + σ_Q_future, M_T_m_1_future, μ_Q_m_1_future, T_A_subseq_isfinite_future, diff --git a/tests/naive.py b/tests/naive.py index cc4b0cc80..0e9ed97e3 100644 --- a/tests/naive.py +++ b/tests/naive.py @@ -6,7 +6,7 @@ from stumpy import core, config -def z_norm(a, axis=0, threshold=1e-7): +def z_norm(a, axis=0, threshold=config.STUMPY_STDDEV_THRESHOLD): std = np.std(a, axis, keepdims=True) std[np.less(std, threshold, where=~np.isnan(std))] = 1.0 diff --git a/tests/test_core.py b/tests/test_core.py index 675cd9e27..99b20292f 100644 --- a/tests/test_core.py +++ b/tests/test_core.py @@ -792,13 +792,12 @@ def test_preprocess_diagonal(): ref_T = np.array([0, 0, 2, 3, 4, 5, 6, 7, 0, 9], dtype=float) ref_M, ref_Σ = naive.compute_mean_std(ref_T, m) - ref_Σ_inverse = 1.0 / ref_Σ ref_M_m_1, _ = naive.compute_mean_std(ref_T, m - 1) ( comp_T, comp_M, - comp_Σ_inverse, + comp_Σ, comp_M_m_1, comp_T_subseq_isfinite, comp_T_subseq_isconstant, @@ -806,14 +805,14 @@ def test_preprocess_diagonal(): npt.assert_almost_equal(ref_T, comp_T) npt.assert_almost_equal(ref_M, comp_M) - npt.assert_almost_equal(ref_Σ_inverse, comp_Σ_inverse) + npt.assert_almost_equal(ref_Σ, comp_Σ) npt.assert_almost_equal(ref_M_m_1, comp_M_m_1) T = pd.Series(T) ( comp_T, comp_M, - comp_Σ_inverse, + comp_Σ, comp_M_m_1, comp_T_subseq_isfinite, comp_T_subseq_isconstant, @@ -821,7 +820,7 @@ def test_preprocess_diagonal(): npt.assert_almost_equal(ref_T, comp_T) npt.assert_almost_equal(ref_M, comp_M) - npt.assert_almost_equal(ref_Σ_inverse, comp_Σ_inverse) + npt.assert_almost_equal(ref_Σ, comp_Σ) npt.assert_almost_equal(ref_M_m_1, comp_M_m_1) diff --git a/tests/test_precision.py b/tests/test_precision.py new file mode 100644 index 000000000..3415f265a --- /dev/null +++ b/tests/test_precision.py @@ -0,0 +1,155 @@ +import numpy as np +import numpy.testing as npt +import pandas as pd +from stumpy import stump, snippets, config +import naive + + +def test_stump_identical_subsequence_self_join_rare_cases_1(): + # This test function is designed to capture the errors that migtht be raised + # due the imprecision in the calculation of pearson values in the edge case + # where two subsequences are identical. + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical + T_A[11 : 11 + identical.shape[0]] = identical + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_identical_subsequence_self_join_rare_cases_2(): + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical * 0.001 + T_A[11 : 11 + identical.shape[0]] = identical * 1000 + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_identical_subsequence_self_join_rare_cases_3(): + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical * 0.00001 + T_A[11 : 11 + identical.shape[0]] = identical * 100000 + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_volatile(): + # return True # bypassing test for now + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [0, 1, 2] + for seed in seed_values: + np.random.seed(seed) + T = np.random.rand(64) + scale = np.random.choice(np.array([0.001, 0, 1000]), len(T), replace=True) + T[:] = T * scale + + ref_mp = naive.stump(T, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_snippet_fixed_seeds(): + seed = 15 + np.random.seed(seed) + T = np.random.uniform(-1000, 1000, [64]).astype(np.float64) + m = 8 + k = 3 + s = 3 + + ( + ref_snippets, + ref_indices, + ref_profiles, + ref_fractions, + ref_areas, + ref_regimes, + ) = naive.mpdist_snippets(T, m, k, s=s) + ( + cmp_snippets, + cmp_indices, + cmp_profiles, + cmp_fractions, + cmp_areas, + cmp_regimes, + ) = snippets(T, m, k, s=s) + + npt.assert_almost_equal( + ref_snippets, cmp_snippets, decimal=config.STUMPY_TEST_PRECISION + ) + npt.assert_almost_equal( + ref_indices, cmp_indices, decimal=config.STUMPY_TEST_PRECISION + ) + npt.assert_almost_equal( + ref_profiles, cmp_profiles, decimal=config.STUMPY_TEST_PRECISION + ) + npt.assert_almost_equal( + ref_fractions, cmp_fractions, decimal=config.STUMPY_TEST_PRECISION + ) + npt.assert_almost_equal(ref_areas, cmp_areas, decimal=config.STUMPY_TEST_PRECISION) + npt.assert_almost_equal(ref_regimes, cmp_regimes) diff --git a/tests/test_stump.py b/tests/test_stump.py index 6feaf1598..3ea05bd3e 100644 --- a/tests/test_stump.py +++ b/tests/test_stump.py @@ -240,3 +240,150 @@ def test_stump_nan_zero_mean_self_join(): naive.replace_inf(ref_mp) naive.replace_inf(comp_mp) npt.assert_almost_equal(ref_mp, comp_mp) + + +def test_stump_identical_subsequence_self_join_rare_cases_1(): + # This test function is designed to capture the errors that migtht be raised + # due the imprecision in the calculation of pearson values in the edge case + # where two subsequences are identical. + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical + T_A[11 : 11 + identical.shape[0]] = identical + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_identical_subsequence_self_join_rare_cases_2(): + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical * 0.001 + T_A[11 : 11 + identical.shape[0]] = identical * 1000 + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_identical_subsequence_self_join_rare_cases_3(): + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [27343, 84451] + for seed in seed_values: + np.random.seed(seed) + + identical = np.random.rand(8) + T_A = np.random.rand(20) + T_A[1 : 1 + identical.shape[0]] = identical * 0.00001 + T_A[11 : 11 + identical.shape[0]] = identical * 100000 + + ref_mp = naive.stump(T_A, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T_A, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + comp_mp = stump(pd.Series(T_A), m, ignore_trivial=True) + naive.replace_inf(comp_mp) + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_stump_volatile(): + # return True # bypassing test for now + m = 3 + zone = int(np.ceil(m / 4)) + + seed_values = [0, 1, 2] + for seed in seed_values: + np.random.seed(seed) + T = np.random.rand(64) + scale = np.random.choice(np.array([0.001, 0, 1000]), len(T), replace=True) + T[:] = T * scale + + ref_mp = naive.stump(T, m, exclusion_zone=zone, row_wise=True) + comp_mp = stump(T, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices + + +def test_investigate(): + # T is generated as follows: + # T = np.zoros(10, dtype=np.float64) + # T[:3] = Q1; T[-3:] = Q2, where Q1=T_rand[12:15] and Q2=T_rand[9:12], and + # T_rand: + # seed = 1; np.random.seed(seed) + # T_rand = np.random.rand(64) + # scale = np.random.choice(np.array([0.001, 1.0, 1000]), len(T_rand), replace=True) + # T_rand = T_rand * scale + T = np.array( + [ + 0.538816734003357, + 0.4191945144032948, + 0.0006852195003967596, + np.inf, + 0.00020445224973151743, + 0.0008781174363909454, + 27.387593197926165, + np.inf, + 0.0002804439920644052, + 0.0007892793284514885, + 103.22600657764202, + ] + ) + + m = 3 + zone = int(np.ceil(m / 4)) + + ref_mp = naive.stump(T, m, exclusion_zone=zone, row_wise=False) + comp_mp = stump(T, m, ignore_trivial=True) + naive.replace_inf(ref_mp) + naive.replace_inf(comp_mp) + + npt.assert_almost_equal( + ref_mp[:, 0], comp_mp[:, 0], decimal=config.STUMPY_TEST_PRECISION + ) # ignore indices