Influence of the method of estimation

The possibility to fit directly the copula and the marginals through the SklarDist interface is very practical if you want a quick and dirty fit to be produce for a given dataset. It works as follows:

using Copulas, Distributions
X₁ = Normal()
X₂ = LogNormal()
X₃ = Gamma()
C = GaussianCopula([
    1.0 0.4 0.1
    0.4 1.0 0.8
    0.1 0.8 1.0
])
D = SklarDist(C,(X₁,X₂,X₃))
x = rand(D,100)

3×100 Matrix{Float64}:
 -0.552908  0.965012  -0.605342  1.41302   …  0.0439251  -0.613191  0.171789
  0.272467  3.29432    2.60837   1.51639      0.911216    0.151278  3.30615
  0.377835  1.76531    3.49677   0.686516     1.96904     0.139437  2.70501

And we can fit the same model directly as follows:

quick_fit = fit(SklarDist{GaussianCopula, Tuple{Normal,LogNormal,Gamma}}, x)

SklarDist{GaussianCopula{3, Matrix{Float64}}, Tuple{Distributions.Normal{Float64}, Distributions.LogNormal{Float64}, Distributions.Gamma{Float64}}}(
C: GaussianCopula{3, Matrix{Float64}}(
Σ: [0.9999999999999997 0.3540155478114256 0.06115369401632672; 0.3540155478114256 0.9999999999999998 0.8111928759812674; 0.06115369401632672 0.8111928759812674 0.9999999999999998]
)

m: (Distributions.Normal{Float64}(μ=-0.053257617060137104, σ=1.0343943081663323), Distributions.LogNormal{Float64}(μ=0.0014500787448678465, σ=1.0138149438525002), Distributions.Gamma{Float64}(α=1.0042102493219283, θ=1.0090108450674184))
)

However, we should be clear about what this is doing. There are several way of estimating compound parametric models:

Joint estimation (JMLE): This method simply compute the joint loglikelyhood $\log f_D$ of the random vector D and maximize, jointly, w.r.t. the parameters of the dependence structure and the parameters of the marginals. This is the easiest to understand, but not the easiest numerically as the produces loglikelihood can be highly non-linear and computations can be tedious.
Inference functions for margins (IFM): This method splits up the process into two parts: the marginal distributions $F_{i}, i \in 1,..d$ are estimated first, separately from each other, through maximum likelihood on marginal data. Denote $\hat{F}_{i}$ these estimators. Then, we fit the copula on pseudo-observations, but these pseudo-observations could be computed in two different ways:
- IFM1: We use empirical ranks to compute the pseudo-observations.
- IFM2: We leverage the estimated distribution functions $\hat{F}_{i}$ for the marginals to compute the pseudo-observations as $u_{i,j} = $\hat{F}_{i}(x_{i,j})$.

The fit(SklarDist{...},...) method in the Copulas.jl package is implemented as follows:

function Distributions.fit(::Type{SklarDist{CT,TplMargins}},x) where {CT,TplMargins}
    # The first thing to do is to fit the marginals : 
    @assert length(TplMargins.parameters) == size(x,1)
    m = Tuple(Distributions.fit(TplMargins.parameters[i],x[i,:]) for i in 1:size(x,1))
    u = pseudos(x)
    C = Distributions.fit(CT,u)
    return SklarDist(C,m)
end

and so clearly performs IFM1 estimation. IFM2 is not much harder to implement, it could be done for our model as follows:

# Marginal fits are the same than IFM1, so we just used those to compute IFM2 ranks:
u = similar(x)
for i in 1:length(C)
    u[i,:] .= cdf.(Ref(quick_fit.m[i]), x[i,:])
end

# estimate a Gaussian copula:
ifm2_cop = fit(GaussianCopula,u)

GaussianCopula{3, Matrix{Float64}}(
Σ: [1.0000000000000002 0.35989801308773267 0.057874968238708; 0.35989801308773267 1.0000000000000002 0.8136277685917885; 0.057874968238708 0.8136277685917885 0.9999999999999997]
)

Let us compare the two obtained correlation matrices:

ifm2_cop.Σ .- quick_fit.C.Σ

3×3 Matrix{Float64}:
  5.55112e-16  0.00588247   -0.00327873
  0.00588247   4.44089e-16   0.00243489
 -0.00327873   0.00243489   -1.11022e-16

We see that the estimated parameter is not exactly the same, which is normal. However, even in this contrived example, the difference between the two is not striking. Whether one method is better than the other is unclear, but the JMLE method is clearly superior by definition. However, due to its complexity, most software do not perform such an estimation.