The ceser package contains an implementation in R of the Cluster Estimated Standard Error (CESE) method proposed by Jackson (2019). The method estimates the covariance matrix of the estimated coefficients of linear models in grouped data sets with correlation among observations within groups. CESE is an alternative solution for the classical Cluster Robust Standard Error (CRSE) (Greene (2012),Eicker (1967),White (1980),Liang and Zeger (1986),MacKinnon and Webb (2017)), which underestimates the standard errors in most of the situations encountered in practice (Esarey and Menger (2018)).
Jackson (2019) proposes an approach labeled CESE to estimate the standard errors in grouped data with within-group correlation in the residuals. The approach is based on the estimated expectation of the product of the residuals. Assuming that the residuals have the same variance-covariance matrix within the groups, if we denote by \(\sigma_{ig} = \sigma_{g}^{2}\) and \(\rho_{ig} = \rho_{g}\) the variance and covariance, respectively, of the residuals within the group \(g\), then the expectation of the product of the residuals is given by (see Jackson (2019) for details):
\[\begin{align} \displaystyle\label{eq-exp-ee} \begin{split} \Sigma_g = \mathbb{E}[e_g e_g^T] = \sigma_{g}^{2} (I_g - P_g) + \rho_{g} \bigg[ \iota_{g} \iota_{g}^{T} - (I_g - P_g) - (P_g \iota_{g} \iota_{g}^{T} + \iota_{g} \iota_{g}^{T} P_g) \\ + X_g(X^{T}X)^{-1} \left( \sum_{g=1}^{G} X_{g}^{T}\iota_{g} \iota_{g}^{T} X_g \right)(X^{T}X)^{-1}X_g \bigg] \end{split} \end{align}\]
where \(\iota_{g}\) is a unitary column vector, \(I_g\) is a \(g \times g\) identity matrix, and \(P_g = X_g (X^{T}X)^{-1}X_{g}^{T}\). The equation above can be rewritten concisely as:
\[\begin{align} \displaystyle\label{eq-exp-ee-reduced} \Sigma_{g}^{} = \sigma_{g}^{2} Q_{1g} + \rho_{g} Q_{2g}. \end{align}\]
This equation explicitly shows that the expectation of the cross-product of the residuals is a function the data through \(Q_{1g}\), \(Q_{2g}\), the unknown variance \(\sigma_{g}^{2}\), and correlation \(\rho_{g}\) of the residuals \(e_g\) in each group \(g\). The CESE solution is to explore the linear structure of that equation and estimate \(\sigma_{g}^{2}\) and \(\rho_{g}\) as if the estimated values of \(e_g e_{g}^{T}\) were random deviances from their expectations. Denote \(\xi\) that deviance. Then
\[\begin{align} \displaystyle\label{eq-cese-reg} \begin{split} e_g e_{g}^{T} &= \mathbb{E}[e_g e_{g}^{T}] + \xi \\ &= \sigma_{g}^{2} Q_{1g} + \rho_{g} Q_{2g} + \xi \\ &= \Sigma_{g}^{} + \xi. \end{split} \end{align}\]
The estimates of \(\sigma_{g}^{2}\) and \(\rho_{g}\) are obtained using the OLS estimator. That is, if we denote \(\Omega_g = (\sigma_{g}^{2} , \rho_{g} )^{T}\), \(q_{1g}\) (or \(q_{2g}\)) the vectorized diagonal and lower triangle of \(Q_{1g}\) (or \(Q_{2g}\)) stacked into a \(n_g(n_g + 1)/2\) column vector, \(q_g = [q_{1g}, q_{2g}]\), and \(s_{eg}\) the corresponding elements of \(e_g e_{g}^{T}\) stacked into a column vector as well, then the OLS CESE estimator \(\hat{\Omega}_{g} = (\hat{\sigma}_{g}^{2} , \hat{\rho}_{g} )^{T}\) of the variance and correlation of the residuals in group \(g\) is given by
\[ \hat{\Omega}_g = \text{arg min}_{\Omega_g } (s_{eg} - q_{g}\Omega_{g} )^{T}(s_{eg} - q_{g}\Omega_{g} ). \]
If we assume that \(q_{g}^{T}q_{g}\) is invertible, the first order condition gives:
\[\begin{align} \displaystyle\label{eq-cese-concise} \hat{\Omega}_{g} = (q_g^{T}q_g)^{-1}q_g^{T}s_{eg}. \end{align}\]
We can rewrite the equation above as:
\[\begin{align} \displaystyle\label{eq-cese-matrix } \begin{bmatrix} \hat{\sigma}_{g}^{2} \\ \hat{\rho}_{g} \end{bmatrix} = \begin{bmatrix} q_{1g}^{T}q_{1g} & q_{1g}^{T}q_{2g}\\ q_{2g}^{T}q_{1g} & q_{2g}^{T}q_{2g} \end{bmatrix}^{-1} \begin{bmatrix} q_{1g}^{T}s_{eg} \\ q_{2g}^{T}s_{eg} \end{bmatrix}. \end{align}\]
The estimators of \(\sigma_{g}^{2}\) and \(\rho_{g}\) do not require any assumption on \(\xi\), unless we want to construct confidence intervals for the estimates of those parameters.
Jackson (2019) shows that CESE produces larger standard errors for the coefficients and much more conservative confidence intervals than the CRSE, which is known to be biased downward. CESE is also less sensitive to the number of clusters and to the heterogeneity of the clusters, which can be a problem for both CRSE and bootstrap methods.
The package CESER provides a function
vcovCESE() that takes the output of the function
lm() (or any other that produces compatible outputs) and
computes the CESE. The basic structure of the function is:
The parameter mod receives the output of the
lm() function. The parameter cluster can
receive a right-hand side R formula with the summation of the variables
in the data that will be used to cluster the standard errors. For
instance, if one wants to cluster the standard error by country, one can
use:
To cluster by country and gender, simply use:
The parameter cluster can also receive, instead of a
formula, a string vector with the name of the variables that contain the
groups to cluster the standard errors. If cluster = NULL,
each observation is considered its own group to cluster the standard
errors.
The parameter type receives the procedure to use for
heterokedasticity correction. Heterokedasticity occurs when the diagonal
elements of \(\Sigma\) are not constant
across observations. The correction can also be used to deal with
underestimation of the true variance of the residuals due to leverage
produced by outliers. We include five types of correction. In
particular, type can be either “HC0”, “HC1”, “HC2”, “HC3”,
and “HC4” (Hayes and Cai (2007)). Denote
\(e_c\) the corrected residuals. Each
option produce the following corretion:
where \(k\) is the number of covariates, \(h_{ii}\) is the \(i^{th}\) diagonal element of the matrix \(P=X(X^{T}X)^{-1}X^{T}\), and \(\delta_{i} = \min(4, h_{ii}\frac{n}{k} )\).
The estimation also corrects for cases in which \(\rho_{g} > \sigma^2{g}\). Following Jackson (2019), we use \(\hat{\sigma}_{g}^{2} = (\hat{\rho}_g + 0.02 )\) in those cases.
To ilustrate how to use the ceser package, we use
the data set dcese, which is provided with the package. The
data set contains information of 310 (i=1,…, 310) observations across 51
countries (g=1,…,51).
The outcome variable is the number of effective legislative parties (enep). The explanatory variables are: the number of presidential candidates (enpc); a measure of presidential power (fapres); the proximity of presidential and legislative elections (proximity); the effective number of ethnic groups (eneg); the log of average district magnitudes (logmag); an interaction term between the number of presidential candidates and the presidential power (enpcfapres = enpc \(\times\) fapres), and another interaction term between the log of the district magnitude and the number of ethnic groups (logmag_eneg = logmag \(\times\) eneg). In the example below, we regress enpc on fapres, enpc, their interaction, and other controls.