pelicun.uq module¶

This module defines constants, classes and methods for uncertainty quantification in pelicun.

Contents

`RandomVariable`(ID, dimension_tags[, …])	Characterizes a Random Variable (RV) that represents a source of uncertainty in the calculation.
`RandomVariableSubset`(RV, tags)	Provides convenient access to a subset of components of a RandomVariable.
`tmvn_rvs`(mu, COV[, lower, upper, size])	Sample a truncated MVN distribution.
`mvn_orthotope_density`(mu, COV[, lower, upper])	Estimate the probability density within a hyperrectangle for an MVN distr.
`tmvn_MLE`(samples[, tr_lower, tr_upper, …])	Fit a truncated multivariate normal distribution to samples using MLE.

pelicun.uq.tmvn_rvs(mu, COV, lower=None, upper=None, size=1)[source]¶

Sample a truncated MVN distribution.

Truncation of the multivariate normal distribution is currently considered through rejection sampling. The applicability of this method is limited by the amount of probability density enclosed by the hyperrectangle defined by the truncation limits. The lower that density is, the more samples will need to be rejected which makes the method inefficient when the tails of the MVN shall be sampled in high-dimensional space. Such cases can be handled by a Gibbs sampler, which is a planned future feature of this function.

Parameters

mu: float scalar or ndarray: Mean(s) of the non-truncated distribution.
COV: float ndarray: Covariance matrix of the non-truncated distribution.
lower: float vector, optional, default: None: Lower bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from below in a subset of the dimensions, assign an infinite value (i.e. -numpy.inf) to those dimensions.
upper: float vector, optional, default: None: Upper bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from above in a subset of the dimensions, assign an infinite value (i.e. numpy.inf) to those dimensions.
size: int: Number of samples requested.

Returns

samples: float ndarray: Samples generated from the truncated distribution.

pelicun.uq.mvn_orthotope_density(mu, COV, lower=None, upper=None)[source]¶

Estimate the probability density within a hyperrectangle for an MVN distr.

Use the method of Alan Genz (1992) to estimate the probability density of a multivariate normal distribution within an n-orthotope (i.e., hyperrectangle) defined by its lower and upper bounds. Limits can be relaxed in any direction by assigning infinite bounds (i.e. numpy.inf).

Parameters

mu: float scalar or ndarray: Mean(s) of the non-truncated distribution.
COV: float ndarray: Covariance matrix of the non-truncated distribution
lower: float vector, optional, default: None: Lower bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from below in a subset of the dimensions, use either None or assign an infinite value (i.e. -numpy.inf) to those dimensions.
upper: float vector, optional, default: None: Upper bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from above in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.
Returns
——-
alpha: float: Estimate of the probability density within the hyperrectangle
eps_alpha: float: Estimate of the error in alpha.

pelicun.uq.tmvn_MLE(samples, tr_lower=None, tr_upper=None, censored_count=0, det_lower=None, det_upper=None, alpha_lim=None)[source]¶

Fit a truncated multivariate normal distribution to samples using MLE.

The number of dimensions of the distribution function are inferred from the shape of the sample data. Censoring is automatically considered if the number of censored samples and the corresponding detection limits are provided. Infinite or unspecified truncation limits lead to fitting a non-truncated normal distribution in that dimension.

Parameters

samples: ndarray: Raw data that serves as the basis of estimation. The number of samples equals the number of columns and each row introduces a new feature. In other words: a list of sample lists is expected where each sample list is a collection of samples of one variable.
tr_lower: float vector, optional, default: None: Lower bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from below in a subset of the dimensions, use either None or assign an infinite value (i.e. -numpy.inf) to those dimensions.
tr_upper: float vector, optional, default: None: Upper bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from above in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.
censored_count: int, optional, default: None: The number of censored samples that are beyond the detection limits. All samples outside the detection limits are aggregated into one set. This works the same way in one and in multiple dimensions. Prescription of specific censored sample counts for sub-regions of the input space outside the detection limits is not supported.
det_lower: float ndarray, optional, default: None: Lower detection limit(s) for censored data. In multivariate cases the limits need to be defined as a vector; a scalar value is sufficient in a univariate case. If the data is not censored from below in a particular dimension, assign None to that position of the ndarray.
det_upper: float ndarray, optional, default: None: Upper detection limit(s) for censored data. In multivariate cases the limits need to be defined as a vector; a scalar value is sufficient in a univariate case. If the data is not censored from above in a particular dimension, assign None to that position of the ndarray.
alpha_lim: float, optional, default:None: Introduces a lower limit to the probability density within the n-orthotope defined by the truncation limits. Assigning a reasonable minimum (such as 1e-4) can be useful when the mean of the distribution is several standard deviations from the truncation limits and the sample size is small. Such cases without a limit often converge to distant means with inflated variances. Besides being incorrect estimates, those solutions only offer negligible reduction in the negative log likelihood, while making subsequent sampling of the truncated normal distribution very challenging.

Returns

mu: float scalar or ndarray: Mean of the fitted probability distribution. A vector of means is returned in a multivariate case.
COV: float scalar or 2D ndarray: Covariance matrix of the fitted probability distribution. A 2D square ndarray is returned in a multi-dimensional case, while a single variance (not standard deviation!) value is returned in a univariate case.

class pelicun.uq.RandomVariable(ID, dimension_tags, raw_data=None, detection_limits=None, censored_count=None, distribution_kind=None, theta=None, COV=None, corr_ref='pre', p_set=None, truncation_limits=None)[source]¶

Bases: object

Characterizes a Random Variable (RV) that represents a source of uncertainty in the calculation.

The uncertainty can be described either through raw data or through a pre-defined distribution function. When using raw data, provide potentially correlated raw samples in a 2 dimensional array. If the data is left or right censored in any number of its dimensions, provide the list of detection limits and the number of censored samples. No other information is needed to define the object from raw data. Then, either resample the raw data, or fit a prescribed distribution to the samples and sample from that distribution later.

Alternatively, one can choose to prescribe a distribution type and its parameters and sample from that distribution later.

Parameters

ID: int
dimension_tags: str array: A series of strings that identify the stochastic model parameters that correspond to each dimension of the random variable. When the RV is one dimensional, the dim_tag is a single string. In multi-dimensional cases, the order of strings shall match the order of elements provided as other inputs.
raw_data: float scalar or ndarray, optional, default: None: Samples of an uncertain variable. The samples can describe a multi-dimensional random variable if they are arranged in a 2D ndarray.
detection_limits: float ndarray, optional, default: None: Defines the limits for censored data. The limits need to be defined in a 2D ndarray that is structured as two vectors with N elements. The vectors collect left and right limits for the N dimensions. If the data is not censored in a particular direction, assign None to that position of the ndarray. Replacing one of the vectors with None will assign no censoring to all dimensions in that direction. The default value corresponds to no censoring in either dimension.
censored_count: int, optional, default: None: The number of censored samples that are beyond the detection limits. All samples outside the detection limits are aggregated into one set. This works the same way in one and in multiple dimensions. Prescription of censored sample counts for sub-regions of the input space outside the detection limits is not yet supported. If such an approach is desired, the censored raw data shall be used to fit a distribution in a pre-processing step and the fitted distribution can be specified for this random variable.
distribution_kind: {‘normal’, ‘lognormal’, ‘multinomial’}, optional, default: None: Defines the type of probability distribution when raw data is not provided, but the distribution is directly specified. When part of the data is normal in log space, while the other part is normal in linear space, define a list of distribution tags such as [‘normal’, ‘normal’, ‘lognormal’]. Make sure that the covariance matrix is based on log transformed data for the lognormally distributed variables! Mixing normal distributions with multinomials is not supported.
theta: float scalar or ndarray, optional, default: None: Median of the probability distribution. A vector of medians is expected in a multi-dimensional case.
COV: float scalar or 2D ndarray, optional, default: None: Covariance matrix of the random variable. In a multi-dimensional case this parameter has to be a 2D square ndarray, and the number of its rows has to be equal to the number of elements in the supplied theta vector. In a one-dimensional case, a single value is expected that equals the variance (not the standard deviation!) of the distribution. The COV for lognormal variables is assumed to be specified in logarithmic space.
corr_ref: {‘pre’, ‘post’}, optional, default: ‘pre’: Determines whether the correlations prescribed by the covariance matrix refer to the distribution functions before or after truncation. The default ‘pre’ setting assumes that pre-truncation correlations are prescribed and creates a multivariate normal distribution using the COV matrix. That distribution is truncated according to the prescribed truncation limits. The other option assumes that post-truncation correlations are prescribed. The post-truncation distribution is not multivariate normal in general. Currently we use a Gaussian copula to describe the dependence between the truncated variables. Similarly to other characteristics, the corr_ref can be defined as a single string, or a vector of strings. The former assigns the same option to all dimensions, while the latter allows for more flexible assignment by setting the corr_ref for each dimension individually.
p_set: float vector, optional, default: None: Probabilities of a finite set of events described by a multinomial distribution. The RV will have binomial distribution if only one element is provided in this vector. The number of events equals the number of vector elements if their probabilities sum up to 1.0. If the sum is less than 1.0, then an additional event is assumed with the remaining probability of occurrence assigned to it. The sum of event probabilities shall never be more than 1.0.
truncation_limits: float ndarray, optional, default: None: Defines the limits for truncated distributions. The limits need to be defined in a 2D ndarray that is structured as two vectors with N elements. The vectors collect left and right limits for the N dimensions. If the distribution is not truncated in a particular direction, assign None to that position of the ndarray. Replacing one of the vectors with None will assign no truncation to all dimensions in that direction. The default value corresponds to no truncation in either dimension.

Attributes

COV: Return the covariance matrix of the probability distribution.
censored_count: Return the number of samples beyond the detection limits.
corr: Return the correlation matrix of the probability distribution.
det_lower: Return the lower detection limit(s) corresponding to the raw data in either linear or log space according to the distribution.
det_upper: Return the upper detection limit(s) corresponding to the raw data in either linear or log space according to the distribution.
detection_limits: Return the detection limits corresponding to the raw data in linear space.
dimension_tags: Return the tags corresponding to the dimensions of the variable.
distribution_kind: Return the assigned probability distribution family.
mu: Return the mean value(s) of the probability distribution.
raw: Return the pre-assigned raw data.
samples: Return the pre-generated samples from the distribution.
sig: Return the standard deviation vector of the probability distribution.
theta: Return the median value(s) of the probability distribution.
tr_limits_post: Return the post truncation limits of the probability distribution in linear space.
tr_limits_pre: Return the pre truncation limits of the probability distribution in linear space.
tr_lower_post: Return the lower post truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.
tr_lower_pre: Return the lower pre truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.
tr_upper_post: Return the upper post truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.
tr_upper_pre: Return the upper pre truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.
var: Return the variance vector of the probability distribution.

Methods

`fit_distribution`(self, distribution_kind[, …])	Estimate the parameters of a probability distribution from raw data.
`orthotope_density`(self[, lower, upper])	Estimate the probability density within an orthotope for a TMVN distr.
`sample_distribution`(self, sample_size[, …])	Sample the probability distribution assigned to the random variable.

property distribution_kind¶: Return the assigned probability distribution family.

property theta¶: Return the median value(s) of the probability distribution.

property mu¶: Return the mean value(s) of the probability distribution. Note that the mean value is in log space for lognormal distributions.

property COV¶: Return the covariance matrix of the probability distribution. Note that the covariances are in log space for lognormal distributions.

property corr¶: Return the correlation matrix of the probability distribution. Note that correlation coefficient correspond to the joint distribution in log space for lognormal distributions.

property var¶: Return the variance vector of the probability distribution. Note that the variances are in log space for lognormal distributions.

property sig¶: Return the standard deviation vector of the probability distribution. Note that the standard deviations are in log space for lognormal distributions.

property dimension_tags¶: Return the tags corresponding to the dimensions of the variable.

property detection_limits¶: Return the detection limits corresponding to the raw data in linear space.

property det_lower¶: Return the lower detection limit(s) corresponding to the raw data in either linear or log space according to the distribution.

property det_upper¶: Return the upper detection limit(s) corresponding to the raw data in either linear or log space according to the distribution.

property tr_limits_pre¶: Return the pre truncation limits of the probability distribution in linear space.

property tr_limits_post¶: Return the post truncation limits of the probability distribution in linear space.

property tr_lower_pre¶: Return the lower pre truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.

property tr_upper_pre¶: Return the upper pre truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.

property tr_lower_post¶: Return the lower post truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.

property tr_upper_post¶: Return the upper post truncation limit(s) corresponding to the distribution in either linear or log space according to the distribution.

property censored_count¶: Return the number of samples beyond the detection limits.

property samples¶: Return the pre-generated samples from the distribution.

property raw¶: Return the pre-assigned raw data.

fit_distribution(self, distribution_kind, truncation_limits=None)[source]¶

Estimate the parameters of a probability distribution from raw data.

Parameter estimates are calculated using maximum likelihood estimation. If the data spans multiple dimensions, the estimates will also describe a multi-dimensional distribution automatically. Data censoring is also automatically taken into consideration following the detection limits specified previously for the random variable. Truncated target distributions can be specified through the truncation limits. The specified truncation limits are applied after the correlations are set. In other words, the corr_ref proprety of the RV is set to ‘pre’ when fitting a distribution.

Besides returning the parameter estimates, their values are also stored as theta and COV attributes of the RandomVariable object for future use.

Parameters

distribution_kind: {‘normal’, ‘lognormal’} or a list of those: Specifies the type of the probability distribution that is fit to the raw data. When part of the data is normal in log space, while the other part is normal in linear space, define a list of distribution tags such as [‘normal’, ‘normal’, ‘lognormal’].
truncation_limits: float ndarray, optional, default: None: Defines the limits for truncated distributions. The limits need to be defined in a 2D ndarray that is structured as two vectors with N elements. The vectors collect left and right limits for the N dimensions. If the distribution is not truncated in a particular direction, assign None to that position of the ndarray. Replacing one of the vectors with None will assign no truncation to all dimensions in that direction. The default value corresponds to no truncation in either dimension.

Returns

theta: float scalar or ndarray: Median of the probability distribution. A vector of medians is returned in a multi-dimensional case.
COV: float scalar or 2D ndarray: Covariance matrix of the probability distribution. A 2D square ndarray is returned in a multi-dimensional case.

sample_distribution(self, sample_size, preserve_order=False)[source]¶

Sample the probability distribution assigned to the random variable.

Normal distributions (including truncated and/or multivariate normal and lognormal) are sampled using the tmvn_rvs() method in this module. If post-truncation correlations are set for a dimension, the corresponding truncations are enforced after sampling by first applying probability integral transformation to transform samples from the non-truncated normal to standard uniform distribution, and then applying inverse probability integral transformation to transform the samples from standard uniform to the desired truncated normal distribution. Multinomial distributions are sampled using the multinomial method in scipy. The samples are returned and also stored in the sample attribute of the RV.

If the random variable is defined by raw data only, we sample from the raw data.

Parameters

sample_size: int: Number of samples requested.
preserve_order: bool, default: False: Influences sampling from raw data. If True, the samples are copies of the first n rows of the raw data where n is the sample_size. This only works for sample_size <= raw data size. If False, the samples are drawn from the raw data pool with replacement.

Returns

samples: DataFrame: Samples generated from the distribution. Columns correspond to the dimension tags that identify the variables.

orthotope_density(self, lower=None, upper=None)[source]¶

Estimate the probability density within an orthotope for a TMVN distr.

Use the mvn_orthotope_density function in this module for the calculation. Pre-defined truncation limits for the RV are automatically taken into consideration. Limits for lognormal distributions shall be provided in linear space - the conversion is performed by the algorithm automatically. Pre- and post-truncation correlation is also considered automatically.

Parameters

lower: float vector, optional, default: None: Lower bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from below in any dimension, use either ‘None’ or assign an infinite value (i.e. -numpy.inf) to that dimension.
upper: float vector, optional, default: None: Upper bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from above in any dimension, use either ‘None’ or assign an infinite value (i.e. numpy.inf) to that dimension.

Returns

alpha: float: Estimate of the probability density within the orthotope.
eps_alpha: float: Estimate of the error in alpha.

class pelicun.uq.RandomVariableSubset(RV, tags)[source]¶

Bases: object

Provides convenient access to a subset of components of a RandomVariable.

This object is useful when working with multivariate RVs, but it is used in all cases to provide a general approach.

Parameters

RV: RandomVariable: The potentially multivariate random variable that is accessed through this object.
tags: str or list of str: A string or list of strings that identify the subset of component we are interested in. These strings shall be among the dimension_tags of the RV.

Attributes

samples: Return the pre-generated samples of the selected component from the RV distribution.
tags: Return the tags corresponding to the components in the RV subset.

Methods

`orthotope_density`(self[, lower, upper])	Return the density within the orthotope in the marginal pdf of the RVS.
`sample_distribution`(self, sample_size[, …])	Sample the probability distribution assigned to the connected RV.

property tags¶: Return the tags corresponding to the components in the RV subset.

property samples¶: Return the pre-generated samples of the selected component from the RV distribution.

sample_distribution(self, sample_size, preserve_order=False)[source]¶

Sample the probability distribution assigned to the connected RV.

Note that this function will sample the full, potentially multivariate, distribution.

Parameters

sample_size: int: Number of samples requested.
preserve_order: bool, default: False: Influences sampling from raw data. If True, the samples are copies of the first n rows of the raw data where n is the sample_size. This only works for sample_size <= raw data size. If False, the samples are drawn from the raw data pool with replacement.

Returns

samples: DataFrame: Samples of the selected component generated from the distribution.

orthotope_density(self, lower=None, upper=None)[source]¶

Return the density within the orthotope in the marginal pdf of the RVS.

The function considers the influence of every dependent variable in the RV on the marginal pdf of the RVS. Note that such influence only occurs when the RV is a truncated distribution and at least two variables are dependent. Pre- and post-truncation correlation is considered automatically.

Parameters

lower: float vector, optional, default: None: Lower bound(s) of the orthotope. A scalar value can be used for a univariate RVS; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from below in any dimension, use either ‘None’ or assign an infinite value (i.e. -numpy.inf) to that dimension.
upper: float vector, optional, default: None: Upper bound(s) of the orthotope. A scalar value can be used for a univariate RVS; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from above in any dimension, use either ‘None’ or assign an infinite value (i.e. numpy.inf) to that dimension.

Returns

alpha: float: Estimate of the probability density within the orthotope.
eps_alpha: float: Estimate of the error in alpha.