General Discussion

Pseudo-observations

Through the statistical process leading to the estimation of copulas, one usually observes the data and information on the marginals scale and not on the copula scale. This discrepancy between the observed information and the modeled distribution must be taken into account. A key concept is that of pseudo-observations.

Definition (Pseudo-observations): If $\bm x \in \mathbb R^{N\times d}$ is a $N$-sample of a $d$-variate real-valued random vector $\bm X$, then the pseudo-observations are the normalized ranks of the marginals of $\bm x$, defined as :

\[\bm u \in \mathbb [0,1]^{N\times d}:\; u_{i,j} = \frac{\mathrm{Rank}(x_{i,j},\,\bm x_{\cdot,j})}{N+1} = \frac{1}{N+1}\sum_{k=1}^N \mathbb 1_{x_{k,j} \le x_{i,j}},\]

where $\mathrm{Rank}(y,\bm x) = \sum\limits_{x_i \in \bm x} \mathbb 1_{x_i \le y}$.

In Copulas.jl, we provide a function pseudos that implement this transformation directly.

Copulas.pseudosFunction
pseudos(sample)

Compute the pseudo-observations of a multivariate sample. Note that the sample has to be given in wide format (d,n), where d is the dimension and n the number of observations.

Warning: the order used is ordinal ranking like https://en.wikipedia.org/wiki/Ranking#Ordinalranking.28.221234.22_ranking.29, see StatsBase.ordinalrank for the ordering we use. If you want more flexibility, checkout NormalizeQuantiles.sampleranks.

source

Deheuvel's empirical copula

From these pseudo-observations, an empirical copula is defined as follows:

Definition (Deheuvel's empirical copula [34]): The empirical distribution function of the normalized ranks,

\[\hat{C}_N(\bm u) = \frac{1}{N} \sum_{i=1}^N \mathbb 1_{\bm u_i \le \bm u},\]

is called the empirical copula function.

Theorem (Exhaustivity and consistency [34]): $\hat{C}_N$ is an exhaustive estimator of $C$, and moreover for any normalizing constants $\{\phi_N, N\in \mathbb N\}$ such that $\lim\limits_{N \to \infty} \phi_N \sqrt{N^{-1}\ln \ln N} = 0$,

\[\lim\limits_{N\to\infty} \phi_N \sup_{\bm u \in [0,1]^d} \lvert\hat{C}_N(\bm u) - C(\bm u) \rvert = 0 \text{ a.s.}\]

\[\hat{C}_N\]

then converges (weakly) to $C$, the true copula of the random vector $\bm X$, when the number of observations $N$ goes to infinity.

The empirical copula is not a true copula

Despite its name, $\hat{C}_N$ is not a copula since it does not have uniform marginals. Be careful.

In the package, this copula is implemented as the EmpiricalCopula:

Copulas.EmpiricalCopulaType
EmpiricalCopula{d,MT}

Fields:

  • u::MT - the matrix of observations.

Constructor

EmpiricalCopula(u;pseudos=true)

The EmpiricalCopula in dimension $d$ is parameterized by a pseudo-data matrix which should have shape (d,N). Its expression is given as :

\[C(\mathbf x) = \frac{1}{N}\sum_{i=1}^n \mathbf 1_{\mathbf u_i \le \mathbf x}\]

This function is very practical, be be aware that this is not a true copula (since $\mathbf u$ are only pseudo-observations). The constructor allows you to pass dirctly pseudo-observations (the default) or will compute them for you. You can then compute the cdf of the copula, and sample it through the standard interface.

References:

  • [3] Nelsen, Roger B. An introduction to copulas. Springer, 2006.
source

Beta copula

The empirical copula function is not a copula. An easy way to fix this problem is to smooth out the marginals with beta distribution functions:

Definition (Beta Copula [35]): Denoting $F_{n,r}(x) = \sum_{s=r}^n \binom{n}{s} x^s(1-x)^{n-s}$ the distribution function of a $\mathrm{Beta}(r,n+1-r)$ random variable, the function

\[\hat{C}_N^\beta : \bm x \mapsto \frac{1}{N} \sum_{i=1}^N \prod\limits_{j=1}^d F_{n,(N+1)u_{i,j}}(x_j)\]

is a genuine copula, called the Beta copula.

Property (Proximity of $\hat{C}_N$ and $\hat{C}_N^\beta$ [35]):

\[\sup\limits_{\bm u \in [0,1]^d} \lvert \hat{C}_N(\bm u) - \hat{C}_N^\beta(\bm u) \rvert \le d\left(\sqrt{\frac{\ln n}{n}} + \sqrt{\frac{1}{n}} + \frac{1}{n}\right)\]

Not implemented yet!

Do not hesitate to come talk on our GitHub !

Bernstein Copula

Bernstein copula are simply another smoothing of the empirical copula using Bernstein polynomials.

Not implemented yet!

Do not hesitate to come talk on our GitHub !

Checkerboard Copulas

There are other nonparametric estimators of the copula function that are true copulas. Of interest to our work is the Checkerboard construction (see [36, 37]), detailed below.

First, for any $\bm m \in \mathbb N^d$, let $\left\{B_{\bm i,\bm m}, \bm i < \bm m\right\}$ be a partition of the unit hypercube defined by

\[B_{\bm i, \bm m} = \left]\frac{\bm i}{\bm m}, \frac{\bm i+1}{\bm m}\right].\]

Furthermore, for any copula $C$ (or more generally distribution function $F$), we denote $\mu_{C}$ (resp $\mu_F$) the associated measure. For example, for the independence copula $Pi$, $\mu_{\Pi}(A) = \lambda(A \cup [\bm 0, \bm 1])$ where $\lambda$ is the Lebesgue measure.

Definition (Empirical Checkerboard copulas [36]): Let $\bm m \in \mathbb N^d$. The $\bm m$-Checkerboard copula $\hat{C}_{N,\bm m}$, defined by

\[\hat{C}_{N,\bm m}(\bm x) = \bm m^{\bm 1} \sum_{\bm i < \bm m} \mu_{\hat{C}_N}(B_{\bm i, \bm m}) \mu_{\Pi}(B_{\bm i, \bm m} \cap [0,\bm x]),\]

is a genuine copula as soon as $m_1,...,m_d$ all divide $N$.

Property (Consistency of $\hat{C}_{N,\bm m}$ [36]): If all $m_1,..,m_d$ divide $N$,

\[\sup\limits_{\bm u \in [0,1]^d} \lvert \hat{C}_{N,\bm m}(\bm u) - C(\bm u) \rvert \le \frac{d}{2m} + \mathcal O_{\mathbb P}\left(n^{-\frac{1}{2}}\right).\]

This copula is called Checkerboard, as it fills the unit hypercube with hyperrectangles of same shapes $B_{\bm i, \bm m}$, conditionally on which the distribution is uniform, and the mixing weights are the empirical frequencies of the hyperrectangles.

It can be noted that there is no need for the hyperrectangles to be filled with a uniform distribution ($\mu_{\Pi}$), as soon as they are filled with copula measures and weighted according to the empirical measure in them (or to any other copula). The direct extension is then the more general patchwork copulas, whose construction is detailed below.

Denoting $B_{\bm i, \bm m}(\bm x) = B_{\bm i, \bm m} \cap [0,\bm x]$, we have :

\[\begin{align} m^d\mu_{\Pi}(B_{\bm i, \bm m} \cap [0,\bm x]) &= \frac{\mu_{\Pi}(B_{\bm i, \bm m} \cap [0,\bm x])}{\mu_{\Pi}(B_{\bm i, \bm m})}\\ &= \frac{\mu_{\Pi}(B_{\bm i, \bm m}(\bm x))}{\mu_{\Pi}(B_{\bm i, \bm m})}\\ &= \mu_{\Pi}(\bm m B_{\bm i, \bm m}(\bm x)) \end{align}\]

where we intend $\bm m ]\bm a, \bm b] = ] \bm m \bm a, \bm m \bm b]$ (products between vectors are componentwise).

This allows for an easy generalization in the framework of patchwork copulas:

Definition (Patchwork copulas [3840]:) Let $\bm m \in \mathbb N^d$ all divide $N$, and let $\mathcal C = \{C_{\bm i}, \bm i < \bm m\}$ be a given collection of copulas. The distribution function:

\[\hat{C}_{N,\bm m, \mathcal C}(\bm x) = \sum_{\bm i < \bm m} \mu_{\hat{C}_N}(B_{\bm i, \bm m}) \mu_{C_{\bm i}}(\bm m B_{\bm i, \bm m}(\bm x))\]

is a copula.

In fact, replacing $\hat{C}_N$ by any copula in the patchwork construct still yields a genuine copula, with no more conditions that all components of $\bm m$ divide $N$. The Checkerboard grids are practical in the sense that computations associated to a Checkerboard copula can be really fast: if the grid is large, the number of boxes is small, and otherwise if the grid is very refined, many boxes are probably empty. On the other hand, the grid is fixed a priori, see [41] for a construction with an adaptive grid.

Convergence results for this kind of copulas can be found in [40], with a slightly different parametrization.

Not implemented yet!

Do not hesitate to come talk on our GitHub !

[3]
R. B. Nelsen. An Introduction to Copulas. 2nd ed Edition, Springer Series in Statistics (Springer, New York, 2006).
[34]
P. Deheuvels. La Fonction de Dépendance Empirique et Ses Propriétés. Académie Royale de Belgique. Bulletin de la Classe des Sciences 65, 274–292 (1979).
[35]
J. Segers, M. Sibuya and H. Tsukahara. The Empirical Beta Copula. Journal of Multivariate Analysis 155, 35–51 (2017).
[36]
A. Cuberos, E. Masiello and V. Maume-Deschamps. Copulas Checker-Type Approximations: Application to Quantiles Estimation of Sums of Dependent Random Variables. Communications in Statistics - Theory and Methods, 1–19 (2019).
[37]
P. Mikusiński and M. D. Taylor. Some Approximations of N-Copulas. Metrika 72, 385–414 (2010).
[38]
F. Durante, E. Foscolo, J. A. Rodríguez-Lallena and M. Úbeda-Flores. A Method for Constructing Higher-Dimensional Copulas. Statistics 46, 387–404 (2012).
[39]
F. Durante, J. Fernández Sánchez and C. Sempi. Multivariate Patchwork Copulas: A Unified Approach with Applications to Partial Comonotonicity. Insurance: Mathematics and Economics 53, 897–905 (2013).
[40]
F. Durante, J. Fernández-Sánchez, J. J. Quesada-Molina and M. Úbeda-Flores. Convergence Results for Patchwork Copulas. European Journal of Operational Research 247, 525–531 (2015).
[41]
O. Laverny. Empirical and Non-Parametric Copula Models with the Cort R Package. Journal of Open Source Software 5, 2653 (2020).