nustattools.stats package

Module contents

Copyright (c) 2024 Lukas Koch. All rights reserved.

Potentially useful statistical tools that are not available in scipy.stats.

class nustattools.stats.Bee(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]

Bases: rv_continuous

A random variable representing the maximum of df chi distributions.

Each chi disitribution has df = 1.

Note

You probably do not need to use this class directly. Instead work with the instance bee.

Parameters:

df (int) – The number of chi-distirbuted variables to take the maximum of.

class nustattools.stats.Bee2(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]

Bases: rv_continuous

A random variable representing the maximum of df chi2 distributions.

Each chii2 disitribution has df = 1.

Note

You probably do not need to use this class directly. Instead work with the instance bee2.

Parameters:

df (int) – The number of chi-distirbuted variables to take the maximum of.

Notes

This distribution is discussed in [1] in the context of robust test statistics.

References

class nustattools.stats.Cee(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]

Bases: rv_continuous

A random variable representing the maximum of multiple chi distributions.

Each chi disitribution can have a different df. If all df are equal to 1, this is identical to the Bee distribution with df = len(k).

Note

You probably do not need to use this class directly. Instead work with the instance cee.

Parameters:

k (DF or Iterable of DF) – Special class to pass variable length list of degrees of freedom of the chi-distirbuted variables to take the maximum of.

Examples

>>> cee(k=DF(1,2,3)).pdf(1)
0.2501359390297275
class nustattools.stats.Cee2(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]

Bases: rv_continuous

A random variable representing the maximum of multiple chi2 distributions.

Each chi2 disitribution can have a different df. If all df are equal to 1, this is identical to a Bee2 disritbution with df = len(k).

Note

You probably do not need to use this class directly. Instead work with the instance cee2.

Parameters:

k (DF or Iterable of DF) – Special class to pass variable length list of degrees of freedom of the chi-distirbuted variables to take the maximum of.

Examples

>>> cee2(k=DF(1,2,3)).pdf(1)
0.12506796951321578

Notes

TODO: Add reference to paper.

class nustattools.stats.DF(*args)[source]

Bases: object

Helper class to get around limitations of rv_continuous.

This class represents the shape parameter for the Cee and Cee2 distributions.

Parameters:

*args (tuple of int) – The df for each chi(2) distribution

class nustattools.stats.FMaxStatistic(*, k, funcs=None, inv_funcs=None)[source]

Bases: TestStatistic

Test statistic that takes the maximum of functions of chi2 distributed data.

Parameters:
  • k (Iterable of int) – Number of degrees of freedom for the assumed chi2 distributions of the data points.

  • funcs (Iterable of Callable or None, optional) – Functions to apply to each data point. Maximum will be taken from the function outputs. If not specified, the identity function``f(x) = x`` will be used for all data points.

  • invfuncs (Iterable of Callable or None, optional) – Inverse of the functions that are applied to each data point. These are used to calculate the CDF. If not specified, the inverse will be calculated numerically using the funcs.

Notes

TODO: Cite paper.

Parameters:

inv_funcs (Optional[Iterable[Optional[Callable[..., ndarray[Any, dtype[Any]]]]]])

class nustattools.stats.OptimalFMaxStatistic(*, k)[source]

Bases: FMaxStatistic

FMaxStatistic that minimizes the maximum M-distance for any given p-value.

Parameters:

k (Iterable of int) – Number of degrees of freedom for the assumed chi2 distributions of the data points.

Notes

TODO: Cite paper.

class nustattools.stats.QMaxStatistic(*, k)[source]

Bases: FMaxStatistic

FMaxStatistic where the functions are the assumed CDFs of the data.

This is equivalent to taking the minimum of the individual p-values as the test statistic.

Parameters:

k (Iterable of int) – Number of degrees of freedom for the assumed chi2 distributions of the data points.

Notes

TODO: Cite paper.

class nustattools.stats.RVTestStatistic(momtype=1, a=None, b=None, xtol=1e-14, badvalue=None, name=None, longname=None, shapes=None, seed=None)[source]

Bases: rv_continuous

A random variable distributed as the expectation of a TestStatistic.

Note

You probably do not need to use this class directly. Instead work with the instance rvteststatistic.

Parameters:

statistic (TestStatistic or Iterable of TestStatistic)

Examples

>>> ts = OptimalFMaxStatistic(k=[1,2,3])
>>> rvteststatistic(statistic=ts).pdf(1)
0.24601379637056994
class nustattools.stats.TestStatistic[source]

Bases: ABC

General class for test statistcs.

A TestStatistic must implement a way to calculate the test statistic from some data, as well as a way to calculate the CDF of the expected distribution of the test statistic if an assumed model is correct.

calculate(data)[source]

Calculate the test statistic from some given data.

You can also call the object directly, e.g. statistic(data) instead of statistic.calculate(data).

Parameters:

data (numpy.ndarray) – The data to calculate the test statistic for. The test statstic is calculated over the last dimension of the array, so the return shape will be input_shape[:-1].

Returns:

statistic_value

Return type:

numpy.ndarray

cdf(statistic)[source]

Calculate the CDF of the expected distribution of the test statistic.

Assumes that certain model assumptions are true.

Parameters:

statistic (numpy.ndarray) – The test statistic to calculate the CDF for.

Returns:

cdf

Return type:

numpy.ndarray

nustattools.stats.bee = <nustattools.stats._dist.Bee object>

Use this instance of Bee.

nustattools.stats.bee2 = <nustattools.stats._dist.Bee2 object>

Use this instance of Bee2.

nustattools.stats.cee = <nustattools.stats._dist.Cee object>

Use this instance of Cee.

nustattools.stats.cee2 = <nustattools.stats._dist.Cee2 object>

Use this instance of Cee2.

nustattools.stats.derate_covariance(cov, *, jacobian=None, sigma=3.0, accuracy=0.01, return_dict=None)[source]

Derate the covariance of some data to account for unknown correlations.

See TODO: Ref to paper.

Parameters:
  • cov (numpy.ndarray or list of numpy.ndarray) – The covariance matrix of the data or a list of covariances that add up to the total. Unknown covariance blocks must be np.nan. Off diagonal blocks may only be ``0’’ or ``np.nan’’. Diagonal blocks must not be ``np.nan’’.

  • jacobian (numpy.ndarray, default=None) – Jacobian matrix of the model prediction wrt the best-fit parameters.

  • sigma (float, default=3.) – The desired confidence level up to which the derated covariance should be conservative, expressed in standard-normal standard deviations. E.g. sigma=3. corresponds to CL=0.997.

  • accuracy (float, default=0.01) – The derating factor is calculated using numerical sampling. This parameter determines how many samples to throw. Lower values mean more samples.

  • return_dict (dict, optional) – If specified, the nightmare covariance and thrown data samples are added to this dictionary for detailed studies outside the function.

Returns:

a – The derating factor for the total covariance.

Return type:

float

nustattools.stats.rvteststatistic = <nustattools.stats._dist.RVTestStatistic object>

Use this instance of RVTestStatistic.