Skip to contents

Introduction

Factor analysis is one of the most widely used methods in the social and behavioural sciences. When a single latent factor fits the covariance among a set of observed indicators, researchers often treat the latent variable as the causally relevant quantity in subsequent analyses. This practice implicitly assumes a structural interpretation of the latent factor model: that the latent variable itself, rather than the individual indicators, is what is causally efficacious.

But a measurement model that fits well does not guarantee that this structural interpretation is correct. The indicators could each have their own direct causal effects on an outcome, even though their covariance structure is perfectly consistent with a single latent factor. If the structural interpretation is wrong, causal conclusions treating the latent variable as the cause may be misleading.

VanderWeele and Vansteelandt (2022) showed that the structural interpretation imposes testable empirical constraints. The structest package implements two statistical tests—\(T_0\) and \(T_1\)—that can reject these constraints, providing researchers with a principled way to evaluate whether the structural interpretation is tenable.

The Model

Consider a reflective latent factor model with \(d\) indicators (Equation 1 in the paper). After standardisation so that \(\eta\) has mean zero and unit variance, each centred indicator is:

\[ X_i = \lambda_i\,\eta + \varepsilon_i, \qquad i = 1, \ldots, d \tag{1} \]

where \(\eta\) is the latent variable, \(\lambda_i\) are the reliabilities, and the \(\varepsilon_i\) are mean-zero random errors independent of \(\eta\) and of each other. (Following the paper, we use the term “reliabilities” for \(\lambda_i\); in mainstream psychometrics these are more commonly known as factor loadings.)

Under the structural interpretation, the latent variable \(\eta\) is the causally relevant quantity: the indicators \((X_1, \ldots, X_d)\) do not have causal effects on anything subsequent, and they are only affected by antecedents through \(\eta\). This is a strong assumption—the basic measurement model (1) fitting well does not imply it. If we have any external variable \(Z\) and the structural latent factor model holds, then \(Z\) must be independent of \((X_1, \ldots, X_d)\) conditional on \(\eta\). This yields the following testable constraint:

Theorem 1 (VanderWeele & Vansteelandt, 2022). Suppose that \(Z\) is independent of \((X_1, \ldots, X_d)\) conditional on \(\eta\) and that the basic latent factor model in Equation (1) holds. Then for any \(i\) and \(j\), and any values \(z\) and \(z^*\), we must have

\[ \lambda_i \bigl\{E(X_j \mid Z=z) - E(X_j \mid Z=z^*)\bigr\} = \lambda_j \bigl\{E(X_i \mid Z=z) - E(X_i \mid Z=z^*)\bigr\}. \]

Equivalently, dividing by the reliabilities:

\[ \frac{E(X_j \mid Z = z) - E(X_j \mid Z = z^*)}{\lambda_j} = \frac{E(X_i \mid Z = z) - E(X_i \mid Z = z^*)}{\lambda_i}, \qquad \forall\; i, j. \]

The mean differences scaled by reliabilities must be constant across indicators.

Note that Theorem 1 is completely general: \(Z\) can be continuous or discrete, and the identity holds for any external variable. The Appendix of the paper further generalises the result to allow conditioning on covariates \(C\) and multivariate latent variables \(\eta\) (provided the number of indicators exceeds the dimension of \(\eta\)). However, the \(T_0\) and \(T_1\) tests implemented here require \(Z\) to be discrete with a finite number of levels, because the moment conditions are parameterised using category indicators \(I(z = w)\). In practice, a continuous \(Z\) must therefore be discretised (e.g., into quantile groups) before applying the tests.

The \(T_0\) and \(T_1\) tests formalise the constraint in Theorem 1 using distance metric statistics from the generalised method of moments (GMM) framework (Newey & McFadden, 1994). Here the moment conditions are restrictions on the conditional means \(E(X_i \mid Z = z_j)\): under the null, the residuals (observed minus predicted by the model) should average to zero. GMM stacks these residuals into a vector and measures how far their sample average deviates from zero using a variance-weighted quadratic form; the resulting statistic is asymptotically \(\chi^2\)-distributed under the null.

The \(T_0\) Test (Reliability-Dependent)

The \(T_0\) test (Section 3.2 of the paper) requires pre-estimated reliabilities \(\hat\lambda_i\) (see Estimating Reliabilities below). Consider a discrete \(Z\) with categories \(\{1, \ldots, p\}\). Define \(\gamma_i = E(X_i \mid Z = 1)\) and \(\beta_w = E(X_1 \mid Z = w) - E(X_1 \mid Z = 1)\). Under the null hypothesis, the conditional expectations follow the restricted model (Equation 2 in the paper):

\[ E(X_i \mid Z = z) = \gamma_i + \frac{\lambda_i}{\lambda_1} \sum_{w=2}^{p} \beta_w\, I(z = w) \tag{2} \]

for \(i = 1, \ldots, d\), where the reliabilities \(\lambda_i\) are treated as known (plugged in from the estimation step). This model has \(d + p - 1\) free parameters (\(\gamma_1, \ldots, \gamma_d\) and \(\beta_2, \ldots, \beta_p\), with \(\beta_1 = 0\) as the reference level).

Moment conditions and test statistic

Under the null, the model in Equation (2) implies that for each indicator \(i\) and Z-level \(j\), \(E[X_i \mid Z = z_j] - \gamma_i - (\lambda_i/\lambda_1)\sum\beta_w I(z_j = w) = 0\). GMM constructs moment conditions from these restrictions by defining, for each observation \(k\), the residual \(X_{ik}\) minus its predicted value under the model. Let \(U_k\) be the \((d \times p)\)-dimensional column vector with elements

\[ I(Z_k = z_j)\!\left(X_{ik} - \gamma_i - \frac{\lambda_i}{\lambda_1}\sum_{w=2}^{p}\beta_w\,I(z_j = w)\right) \]

for \(z_j = 1, \ldots, p\) and \(i = 1, \ldots, d\). If the model is correctly specified, \(E[U_k] = 0\); the test measures how far the sample average \(\bar U = N^{-1}\sum_k U_k\) deviates from zero. The distance metric statistic is

\[ T_0 = N\,\bar U^\top\, \Sigma^{-1}\, \bar U, \]

where \(\Sigma\) is the empirical covariance matrix of \(U_k\) after adjusting for the uncertainty in the estimated reliabilities (see below). This quadratic form measures the squared distance of \(\bar U\) from zero, weighted by \(\Sigma^{-1}\) to standardise each moment condition by its variance and account for correlations between them. The minimum of \(T_0\) over the free parameters converges to a \(\chi^2\) distribution with \((d - 1)(p - 1)\) degrees of freedom under the null (theorem 9.2 of Newey & McFadden, 1994).

Variance adjustment for estimated reliabilities

In practice the reliabilities \(\lambda_i\) are unknown and replaced by estimates \(\hat\lambda_i\). To account for this additional uncertainty (p. 2041 of the paper), the covariance matrix \(\Sigma\) in \(T_0\) is computed from the adjusted moment contributions

\[ U_k^* = U_k - \left(\frac{1}{N}\sum_{k=1}^{N} \frac{\partial U_k}{\partial \lambda}\right) \left(\frac{1}{N}\sum_{k=1}^{N} \frac{\partial V_k}{\partial \lambda}\right)^{\!-1} V_k, \]

where \(\lambda = (\lambda_1, \ldots, \lambda_d)\) and \(V_k\) is the \(d\)-dimensional vector from the reliability estimating equations with elements

\[ V_{ik} = \sum_{\substack{j=1 \\ j \neq i}}^{d} \lambda_j \left\{\!\left(X_i - \bar X_i\right)\! \left(X_j - \bar X_j\right) - \lambda_i\lambda_j\right\}. \]

The empirical covariance of \(U_k^*\) (rather than of \(U_k\)) is then used as \(\Sigma\), yielding a test statistic that remains \(\chi^2_{(d-1)(p-1)}\) under the null.

When to use: \(d \geq 3\) indicators and \(p \geq 2\) levels of \(Z\).

Example: data consistent with the structural model

set.seed(12345)
n <- 5000
d <- 5

# Simulate under the structural model
z <- rbinom(n, 1, 0.5)                        # binary auxiliary variable
eta <- 1 + 0.5 * z + rnorm(n)                 # latent factor affected by Z
lambda <- c(1.0, 0.8, 0.6, 0.9, 0.7)         # reliabilities
X <- matrix(NA, n, d)
for (i in 1:d) {
  X[, i] <- 2 + lambda[i] * eta + rnorm(n, sd = 0.5)
}

# T0 test (should NOT reject)
result_t0 <- test_t0(X, z); (result_t0)
#> 
#>   T0: Reliability-dependent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X and z 
#> statistic = 6.075, df = 4, p-value = 0.1936
#> n = 5000, d = 5 indicators, p = 2 Z-levels

Because the data were generated under the structural model, the test statistic is small and the \(p\)-value is large—we fail to reject \(H_0\).

Example: data violating the structural model

Now consider data where \(Z\) has direct effects on each indicator that are not proportional to the reliabilities, and does not act through \(\eta\):

set.seed(12345)
eta <- rnorm(n)                                # Z does not affect eta
X_bad <- cbind(
  2 + 1.0 * eta + 0.5 * z + rnorm(n, sd = 0.3),
  3 + 0.8 * eta + 2.0 * z + rnorm(n, sd = 0.3),
  1 + 0.6 * eta - 0.3 * z + rnorm(n, sd = 0.3),
  2 + 0.9 * eta + 0.4 * z + rnorm(n, sd = 0.3),
  1 + 0.7 * eta + 0.8 * z + rnorm(n, sd = 0.3)
)

# T0 test (should reject)
result_t0_bad <- test_t0(X_bad, z); (result_t0_bad)
#> 
#>   T0: Reliability-dependent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X_bad and z 
#> statistic = 2226, df = 4, p-value = < 2.2e-16
#> n = 5000, d = 5 indicators, p = 2 Z-levels

The test statistic is large and the \(p\)-value is very small, correctly rejecting the structural interpretation.

We can inspect the full output with summary():

summary(result_t0_bad)
#> 
#>   T0: Reliability-dependent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X_bad and z 
#> 
#> Test statistic:  2226 
#> Degrees of freedom:  4 
#> P-value:  < 2.2e-16 
#> 
#> Sample size:  5000 
#> Indicators (d):  5 
#> Z-levels (p):  2 
#> Convergence code:  1 
#> 
#> Parameter estimates:
#>   gamma (intercepts):
#>     X1     X2     X3     X4     X5 
#> 1.7006 2.7836 0.8274 1.7447 0.7968 
#>   beta (Z effects):
#>   z=0   z=1 
#> 0.000 1.728 
#>   lambda (reliabilities):
#> [1] 1.0099 1.0213 0.4555 0.9002 0.8222

The \(T_1\) Test (Reliability-Independent)

The \(T_1\) test (Section 3.3 of the paper) avoids estimating reliabilities entirely, relying instead on an equivalent reformulation of Theorem 1:

Theorem 2 (VanderWeele & Vansteelandt, 2022). Under the basic latent factor model in Equation (1), for any discrete \(Z\) the following are equivalent:

  1. For any \(i\) and \(j\), and any values \(z\) and \(z^*\), \(\lambda_i\{E(X_j \mid Z = z) - E(X_j \mid Z = z^*)\} = \lambda_j\{E(X_i \mid Z = z) - E(X_i \mid Z = z^*)\}\).
  2. For any \(i\) and any values \(z\), \(z^*\), \(E(X_i \mid Z = z) - E(X_i \mid Z = z^*) = \alpha_i\,\beta_z\) for some parameters \(\alpha_i\), \(\beta_z\).

Condition 2 says that the \(d \times (p-1)\) matrix of mean differences \(\Delta_{ij} = E(X_i \mid Z = z_j) - E(X_i \mid Z = z_1)\) has rank \(\leq 1\), since it factors as \(\mathbf{\alpha}\,\mathbf{\beta}^\top\). Rank \(\leq 1\) means every column of \(\Delta\) is a scalar multiple of every other column: the pattern of mean differences across indicators is the same no matter which Z-levels you compare, only the overall magnitude changes. This can be tested without knowing the \(\lambda_i\).

Under the null, the conditional expectations satisfy (Equation 3 in the paper):

\[ E(X_i \mid Z = z) = \gamma_i + \sum_{w=2}^{p} \alpha_i\,\beta_w\,I(z = w) \tag{3} \]

for \(i = 1, \ldots, d\), with \(\alpha_1 = 1\) for identification (the product \(\alpha_i \beta_z\) is invariant to rescaling \(\alpha_i/\tau\), \(\beta_z \tau\)) and \(\beta_1 = 0\) for the reference level. This model has \(2d + p - 2\) free parameters (\(d\) intercepts \(\gamma_i\), \(d - 1\) free \(\alpha_i\)’s, and \(p - 1\) free \(\beta_w\)’s).

Moment conditions and test statistic

As with \(T_0\), the moment conditions are constructed as residuals: observed minus predicted under the null model. Let \(U_k\) be the \((d \times p)\)-dimensional column vector with elements

\[ I(Z_k = z_j)\!\left(X_{ik} - \gamma_i - \alpha_i\,\beta_w\right) \]

for \(z_j = 1, \ldots, p\) and \(i = 1, \ldots, d\), where \(\beta_w = \beta_{z_j}\). Under Equation (3), \(E[U_k] = 0\); the test measures how far \(\bar U = N^{-1}\sum_k U_k\) deviates from this. The distance metric statistic is

\[ T_1 = N\!\left(\frac{1}{N}\sum_{k=1}^{N} U_k^{\!\top}\right) \Sigma^{-1} \left(\frac{1}{N}\sum_{k=1}^{N} U_k\right), \]

where \(\Sigma\) is the empirical covariance matrix of \(U_k\). Unlike \(T_0\), no variance adjustment is needed since \(T_1\) does not use estimated reliabilities. The minimum of \(T_1\) over the free parameters converges to a \(\chi^2\) distribution with

\[ d \times p - (2d + p - 2) = (d - 1)(p - 2) \]

degrees of freedom under the null.

Because \(T_1\) does not depend on reliability estimates, it is not sensitive to misspecification of the error covariance structure (e.g., correlated or heteroscedastic \(\varepsilon_i\)).

When to use: \(d \geq 2\) indicators and \(p \geq 3\) levels of \(Z\). Preferred over \(T_0\) because it does not rely on error-structure assumptions needed for reliability estimation.

Example: data consistent with the structural model

set.seed(12345)
n <- 5000
d <- 4

# Simulate under the structural model with a 4-level Z
z <- sample(0:3, n, replace = TRUE)
eta <- 1 + 0.3 * (z == 1) + 0.7 * (z == 2) + 1.2 * (z == 3) + rnorm(n)
lambda <- c(1.0, 0.8, 0.6, 0.9)
X <- matrix(NA, n, d)
for (i in 1:d) {
  X[, i] <- 2 + lambda[i] * eta + rnorm(n, sd = 0.5)
}

# T1 test (should NOT reject)
result_t1 <- test_t1(X, z); (result_t1)
#> 
#>   T1: Reliability-independent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X and z 
#> statistic = 3.017, df = 6, p-value = 0.8068
#> n = 5000, d = 4 indicators, p = 4 Z-levels

Again, the test does not reject—consistent with data generated under the structural model.

Example: data violating the structural model

set.seed(12345)
eta <- rnorm(n)
X_bad <- cbind(
  2 + 1.0 * eta + 0.5 * (z == 1) + 0.3 * (z == 2) + 0.4 * (z == 3)
    + rnorm(n, sd = 0.3),
  3 + 0.8 * eta + 2.0 * (z == 1) + 0.1 * (z == 2) + 3.0 * (z == 3)
    + rnorm(n, sd = 0.3),
  1 + 0.6 * eta - 0.3 * (z == 1) + 0.8 * (z == 2) - 0.5 * (z == 3)
    + rnorm(n, sd = 0.3),
  2 + 0.9 * eta + 0.4 * (z == 1) + 0.6 * (z == 2) + 0.3 * (z == 3)
    + rnorm(n, sd = 0.3)
)

# T1 test (should reject)
result_t1_bad <- test_t1(X_bad, z); (result_t1_bad)
#> 
#>   T1: Reliability-independent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X_bad and z 
#> statistic = 1400, df = 6, p-value = < 2.2e-16
#> n = 5000, d = 4 indicators, p = 4 Z-levels

The structural interpretation is rejected, as expected.

summary(result_t1_bad)
#> 
#>   T1: Reliability-independent test of structural interpretation (VanderWeele & Vansteelandt, 2022) 
#> 
#> data:   X_bad and z 
#> 
#> Test statistic:  1400 
#> Degrees of freedom:  6 
#> P-value:  < 2.2e-16 
#> 
#> Sample size:  5000 
#> Indicators (d):  4 
#> Z-levels (p):  4 
#> Convergence code:  1 
#> 
#> Parameter estimates:
#>   gamma (intercepts):
#>     X1     X2     X3     X4 
#> 2.0872 3.0878 1.0656 2.1163 
#>   alpha:
#>      X1      X2      X3      X4 
#>  1.0000  7.7155 -1.5389  0.6065 
#>   beta (Z effects):
#>     z=0     z=1     z=2     z=3 
#>  0.0000  0.2378 -0.0676  0.3796

Estimating Reliabilities

The estimate_reliability() function estimates the reliabilities \(\lambda_i\) following Section 3.1 of the paper. Under model (1) with \(\text{Var}(\eta) = 1\) and independent errors, the off-diagonal covariances factor as

\[ \text{Cov}(X_i, X_j) = \lambda_i\,\lambda_j, \qquad i \neq j. \]

Consistent estimators \(\hat\lambda_i\) are obtained by solving \(d\) unbiased estimating equations (one per indicator):

\[ \sum_{\substack{j=1 \\ j \neq i}}^{d} \frac{\lambda_j}{N} \sum_{k=1}^{N}\left(X_{ik} - \bar X_i\right) \left(X_{jk} - \bar X_j\right) - \lambda_i\lambda_j = 0, \qquad i = 1, \ldots, d. \]

Taking logs of \(\text{Cov}(X_i, X_j) = \lambda_i\lambda_j\) gives \(\log E(C_{ij}) = \log\lambda_i + \log\lambda_j\), which is a linear model in the log-reliabilities. Writing \(\Lambda_s = \log\lambda_s\):

\[ \log E\!\left(C_{ij}\right) = \sum_{s=1}^{d} \Lambda_s\, I(s \in \{i, j\}) \]

where \(C_{ij}\) is the sample covariance between \(X_i\) and \(X_j\), and \(I(\cdot)\) is the indicator function. The solutions are readily obtained as the exponentiated coefficients from a quasi-Poisson GLM with log link, using the \(d(d-1)/2\) pairwise covariances as the response and a \(d\)-column binary design matrix indicating which indicators are involved in each pair.

set.seed(12345)
n <- 5000
d <- 5
eta <- rnorm(n)
lambda_true <- c(1.0, 0.8, 0.6, 0.9, 0.7)
X <- matrix(NA, n, d)
for (i in 1:d) {
  X[, i] <- lambda_true[i] * eta + rnorm(n, sd = 0.5)
}

rel <- estimate_reliability(X)

# Compare estimated vs true
data.frame(
  true = lambda_true,
  estimated = round(rel$lambda, 3)
)
#>   true estimated
#> 1  1.0     0.974
#> 2  0.8     0.808
#> 3  0.6     0.594
#> 4  0.9     0.898
#> 5  0.7     0.695

This function is called internally by test_t0(), but you can use it directly if you need the reliability estimates for other purposes.

Choosing Between \(T_0\) and \(T_1\)

\(T_0\) \(T_1\)
Relies on reliability estimates Yes No
Sensitive to error distribution misspecification Yes No
Minimum indicators (\(d\)) 3 2
Minimum \(Z\)-levels (\(p\)) 2 3
Degrees of freedom \((d-1)(p-1)\) \((d-1)(p-2)\)

Guidance:

  • \(T_1\) is generally preferred because it does not depend on the error-structure assumptions needed to estimate reliabilities (e.g., independent, homoscedastic errors).
  • \(T_0\) is necessary when only 2 levels of \(Z\) are available, since \(T_1\) requires \(p \geq 3\).
  • When both tests are applicable (\(d \geq 3\), \(p \geq 3\)), running both provides a useful sensitivity check. Agreement strengthens the conclusion; disagreement may signal misspecification of the error distribution (which affects \(T_0\) but not \(T_1\)).

References

Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. L. McFadden (Eds.), Handbook of Econometrics (Vol. 4, pp. 2111–2245). Elsevier. https://doi.org/10.1016/S1573-4412(05)80005-4

VanderWeele, T. J., & Vansteelandt, S. (2022). A statistical test to reject the structural interpretation of a latent factor model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(5), 2032–2054. https://doi.org/10.1111/rssb.12555