FCO

Overview

Motivation

This package allows users to derive flexible cutoffs for the evaluation of absolute model fit in Covariance-Based Structural Equation Modeling (CBSEM).

For CBSEM, a multitude of fit indices have been proposed, roughly classified as Goodness-of-Fit (GoF), such as CFI (Comparative Fit Index), or Badness-of-Fit (BoF), including SRMR (Standardized Root Mean Square Residual). Despite its initial appeal, all fit indices are distorted by characteristics of the model (e.g., size of the model) and sample characteristics (e.g., sample size) to some degree. Hence, their ability to determine whether a model has a truly “good” fit or not is often masked by an unwanted sensitivity to model and sample characteristics.

Consequentially, widely acknowledged recommendations and guidelines (e.g., CFI >= .95 indicates “good” fit) struggle with these distortions because their “fixed” cutoffs remain constant, irrespective of the model and sample characteristics investigated. For example, CFI tends to be sensitive to sample size. A rather small sample (e.g., N = 200) will almost always produce smaller CFIs than a rather large sample (e.g., N = 1,000). Hence, fixed cutoffs for CFI, such as .95 will likely reject more correct models for small sample sizes and likely accept more incorrect models for larger sample sizes.

Flexible cutoffs aim to overcome this deficit by providing customized cutoff values for a given model with specific model and sample characteristics. For the above example, a flexible cutoff will decrease (to e.g., .94 for N = 200) or increase (to e.g., .98 for N = 1,000) depending on the model and its characteristics.

Flexible cutoffs are derived from simulated distributions of correctly specified Confirmatory Factor Analysis (CFA) models for a wide range of latent variables (or factors), indicators per latent variable, sample sizes, factor loadings, and normal as well as non-normal data. Flexible cutoffs can be understood as the empirical quantile of a given index for a predefined uncertainty. If an uncertainty of 5 percent (or .05) is accepted a-priori, the 5 percent quantile of the simulated distribution for correctly specified CFA models, with the given model and sample characteristics, will determine the flexible cutoff. Depending on the nature of the underlying fit index, the appropriate lower (GoF) or upper (BoF) width of the respective confidence interval, as defined by the quantile, is used to derive the flexible cutoff.

Thereby, flexible cutoffs are also flexible in the recommendations they provide. If users are more uncertain about their model, they can adjust the uncertainty to 10 percent (or .10). When being more certain about the underlying model, the uncertainty can be adjusted to .1, or even .001. Note that this uncertainty is inverse to the understanding of a p-value. A researcher admits how uncertain s/he is about a given model and thus .10 indicates very conservative cutoffs, while .001 determines very lenient cutoffs.

Details on the procedure and methods can be found in the paper “Flexible Cutoff Values for Fit Indices in the Evaluation of Structural Equation Models” (Niemand and Mai, 2018).

Package and tool

From 2018 to 2021, we provided a website-based tool on flexiblecutoffs.org that allowed users to input certain parameters and obtain the respective flexible cutoffs from pre-run simulations. Given the limitations of this approach (parameters to be simulated are finite) and user feedback, we decided to deprecate the tool on the website and instead offer this package FCO. We believe that this provides multiple advantages for users, particularly:

As a side note of transforming the previous tool into a package, flexible cutoffs obtained from tool and package are not identical, but certainly very close to each other. This disadvantage stems from the differential use of seeds in the pre-run simulations and the present package-based simulations as well as the implementation of the new features. If you detect substantial differences between flexible cutoffs, please contact us.

Getting started

Installation

As usual, the package is available on CRAN and can be installed easily:

#install.packages("FCO")
#Add later when published on CRAN

Then, the package can be loaded, for example with (required packages, if installed, will be loaded automatically):

library(FCO)

Basic usage

In an initial example, a normal user has two objects, a dataset x and a (theoretical) model mod to be estimated. For illustrative purposes, we use the data from Babakus and Boller (1992), which is available as a correlation matrix of 22 items (Q1-Q22) for the SERVQUAL scale consisting of five factors (here F1 to F5). We simulated 502 observations from the original correlation matrix via MASS::mvrnorm and provide it as a data.frame. This dataset is included in the package:

data(bb1992)
head(bb1992, 3)
#>           Q1         Q2         Q3        Q4         Q5         Q6         Q7
#> 1 -1.0206565  0.2276041 -1.2568654 0.8136901 -1.6019692  0.7552052 -0.4227379
#> 2  1.5938691 -0.0276403 -1.2359269 0.4527496  0.4198624  1.2915004  0.3614339
#> 3  0.6432085  0.5146161  0.9700383 0.5768854  0.1788848 -0.2423614  0.7111541
#>           Q8          Q9        Q10        Q11        Q12       Q13         Q14
#> 1 -1.2791570  0.13263442 -0.4502009 -0.3691879 -0.5386581 -1.238408  0.02816952
#> 2  0.4577506 -0.05877489  0.6064040  0.9672691  1.1028063  1.784519 -0.47735634
#> 3 -0.5452735 -0.54137900 -0.8755678  0.3516510 -1.9334731 -0.498252 -0.33502753
#>           Q15        Q16        Q17        Q18          Q19        Q20
#> 1 -0.36047328  0.3736429 -0.3124173  0.3765030 -0.956071795 1.50488335
#> 2  0.17534870  0.2421734  1.7328168  0.3081753  0.004130691 0.62142902
#> 3 -0.07986953 -1.4054425 -0.7272241 -0.7679232  0.009754038 0.07108398
#>          Q21        Q22
#> 1 -0.2591684 -0.4093789
#> 2  0.9135299  1.1154500
#> 3 -0.6949648  1.7242517

The second object required is a model to be fitted to the data. A simple lavaan syntax for an empirical five-factor solution looks like this:

mod <- "
F1 =~ Q5 + Q7 + Q8
F2 =~ Q2 + Q4
F3 =~ Q10 + Q11 + Q12 + Q13 + Q18 + Q19 + Q20 + Q21 + Q22
F4 =~ Q1 + Q17
F5 =~ Q6 + Q14 + Q15 + Q16
"

Both objects, the dataset bb1992, and the model mod can then be used to estimate the CFA via lavaan. For simplicity reasons, we focus on two fit indices, CFI and SRMR in this example (Niemand & Mai, 2018, p. 1170).

res <- cfa(mod, data = bb1992)
fitmeasures(res, fit.measures = c("CFI", "SRMR"))
#>   cfi  srmr 
#> 0.960 0.038

Note that both values would suffice fixed cutoffs such as CFI ≥ .95 and SRMR ≤ .09, hence concluding that the model fits the data well. However, given the empirical setting of the example study, it would be beneficial to investigate the uncertainty of this finding. This is where flexible cutoffs come into play.

To derive the flexible cutoffs, one only needs the two objects discussed before. All other arguments are set internally via standard options (see below). In this case, we however only use 10 replications, to speed up the derivation. The function gen_fit generates a list fits.single of fit indicators for the underlying data and model, as well as lists for the other arguments. The function flex_co then takes this list and estimates the intended flexible cutoffs for the selected fit indices. Since the default setting for the preset uncertainty is .05, we can also omit this argument. For now, this code is sufficient:

fits.single <- gen_fit(mod1 = mod, x = bb1992, rep = 10)
#Please use for flexible cutoffs as desribed below:
#fits.single <- gen_fit(mod1 = mod, x = bb1992, rep = 100) 
flex_co(fits = fits.single, index = c("CFI", "SRMR"))
#> Warning in flex_co(fits = fits.single, index = c("CFI", "SRMR")): The number of
#> replications is lower than the recommended minimum of 500. Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 0.97826871 0.03659316 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0

In this case, the second function returns a warning that the number of replications is below the recommended minimum. For non-illustrative purposes, we recommend higher values (e.g., 500 or 1,000). The function flex_co also returns objects cutoff, index, alpha, gof, replications, number of non-converging models and share of non-converging models. Cutoff provides the flexible cutoff values for the selected indices. Index returns the name of the indices selected and alpha returns the preset uncertainty. Gof returns the type of fit index (GoF or BoF, logically, GoF = FALSE indicates BoF). This can be either stated manually in the function with the argument gof or is guessed by the helper function index_guess, which works for all established fit indices.

Note that the cutoff values for CFI and SRMR are higher (CFI = .971 for 100 replications) and lower (SRMR = .036 for 100 replications) than the respective empirical values (CFI = .960, SRMR = .038). That is, given the uncertainty of .05, as well as the present model and data, the cutoffs are close to the empirical values, but the model is marginally rejected. We can however simulate what happens if we assume less (.001) or more (.10) uncertainty.

flex_co(fits = fits.single,
        index = c("CFI", "SRMR"),
        alpha.lev = .001)
#> Warning in flex_co(fits = fits.single, index = c("CFI", "SRMR"), alpha.lev =
#> 0.001): The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 0.97826871 0.03659316 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.001
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0
flex_co(fits = fits.single,
        index = c("CFI", "SRMR"),
        alpha.lev = .10)
#> Warning in flex_co(fits = fits.single, index = c("CFI", "SRMR"), alpha.lev =
#> 0.1): The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 0.97980796 0.03616053 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.1
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0

We do not need to re-simulate the flexible cutoffs, as the function flex_co takes the simulated values from the generated fit indices and we only need to change the uncertainty via the alpha.lev argument. As expected, if one assumes less uncertainty—or more certainty, for example because the model has been tested before—the flexible cutoffs tend to be closer to the empirical values for more replications. Likewise, when assuming more uncertainty (e.g., a novel model), the cutoffs more clearly reject the model.

In principle, this is the core usage originally intended for flexible cutoffs. We however included a substantial number of other features that may be interesting for users. The following chapter gives an overview of the possible features.

Details

Population model

The basic idea of flexible cutoffs is that these cutoffs come from a sui-generis (having it’s own shape) distribution of fit indices for a correct, unbiased model. Essentially, this means that the population model is correctly specified with no errors. For example, all observed variables load on the latent variables they are supposed to load on, all correlations are estimated freely, and the data is not excessively skewed. This assumption makes flexible cutoffs potentially more objective since no subjective modification to the model (misspecification) or data (skewness, kurtosis) is needed. However, this also implies that the population model is specified not simply on the basis of the own empirical model. In an anecdotal manner, one can compare this to the introduction of the meter. Assuming that one’s own measure is exactly 1 meter is likely error prone. One needs a validated meter bar. The meter bar in this case is the population model.

The function pop_mod specifies this population model. We offer three types of population models, the NM (Niemand & Mai, 2018) option, the HB (Hu & Bentler, 1999) option, and the EM (empirical) option. Function gen_fit internally calls function pop_mod, but the argument type is also available in the latter function for flexibility. We can compare these models by:

pop_mod(mod, x = bb1992, type = "NM")$pop.mod
#> [1] "F1=~0.7*Q5+0.7*Q7+0.7*Q8 \nF2=~0.7*Q2+0.7*Q4 \nF3=~0.7*Q10+0.7*Q11+0.7*Q12+0.7*Q13+0.7*Q18+0.7*Q19+0.7*Q20+0.7*Q21+0.7*Q22 \nF4=~0.7*Q1+0.7*Q17 \nF5=~0.7*Q6+0.7*Q14+0.7*Q15+0.7*Q16 \nF1~~1*F1 \nF2~~0.3*F1 \nF3~~0.3*F1 \nF4~~0.3*F1 \nF5~~0.3*F1 \nF2~~1*F2 \nF3~~0.3*F2 \nF4~~0.3*F2 \nF5~~0.3*F2 \nF3~~1*F3 \nF4~~0.3*F3 \nF5~~0.3*F3 \nF4~~1*F4 \nF5~~0.3*F4 \nF5~~1*F5 \n \n"
pop_mod(mod, x = bb1992, type = "HB")$pop.mod
#> [1] "F1=~0.7*Q5+0.75*Q7+0.8*Q8 \nF2=~0.75*Q2+0.75*Q4 \nF3=~0.7*Q10+0.7*Q11+0.7*Q12+0.7*Q13+0.75*Q18+0.8*Q19+0.8*Q20+0.8*Q21+0.8*Q22 \nF4=~0.75*Q1+0.75*Q17 \nF5=~0.7*Q6+0.75*Q14+0.75*Q15+0.8*Q16 \nF1~~1*F1 \nF2~~0.5*F1 \nF3~~0.4*F1 \nF4~~0.3*F1 \nF5~~0.5*F1 \nF2~~1*F2 \nF3~~0.4*F2 \nF4~~0.5*F2 \nF5~~0.4*F2 \nF3~~1*F3 \nF4~~0.3*F3 \nF5~~0.5*F3 \nF4~~1*F4 \nF5~~0.4*F4 \nF5~~1*F5 \n \n"
pop_mod(mod, x = bb1992, type = "EM")$pop.mod
#> [1] "F1=~0.807*Q5+0.637*Q7+0.876*Q8 \nF2=~0.747*Q2+0.842*Q4 \nF3=~0.519*Q10+0.568*Q11+0.597*Q12+0.697*Q13+0.603*Q18+0.557*Q19+0.534*Q20+0.578*Q21+0.552*Q22 \nF4=~0.659*Q1+0.772*Q17 \nF5=~0.727*Q6+0.719*Q14+0.826*Q15+0.752*Q16 \nF1~~1*F1 \nF2~~0.36*F1 \nF3~~0.71*F1 \nF4~~0.641*F1 \nF5~~0.807*F1 \nF2~~1*F2 \nF3~~0.349*F2 \nF4~~0.486*F2 \nF5~~0.492*F2 \nF3~~1*F3 \nF4~~0.625*F3 \nF5~~0.8*F3 \nF4~~1*F4 \nF5~~0.831*F4 \nF5~~1*F5 \n \n"

When the type is “NM”, all loadings (default: .7) and correlations (default: .3) are assumed to be equal (\n denotes a line break for the lavaan syntax). When the type is “HB”, the loadings vary by .05 around .75, depending on the number of indicators and the correlations are either .5, .4, or .3, also depending on the number of latent variables. Finally, when the type is “EM”, the function runs a CFA and determines the empirical loadings and correlations. Since one cannot assume the own empirical model to be correct, we advise users to not use the “EM” type for model validation. This type might be useful for other features (see further applications). Hence, the default is set to “NM”. Since the by far most selected standardized factor loading (afl) in our tool was .7, we set the default value to .7. Other options between 0 and 1 are possible. The average correlation between latent variables (aco) is set to a default of .3, but this can be changed likewise. To increase flexibility, the argument standardized (default: TRUE) can also be called, allowing for the specification of standardized (all loadings < 1, all covariances are correlations) and unstandardized (loadings > 1, covariances, not correlations) parameters. The function returns a warning when the empirical model suspects standardized or unstandardized loadings and this is in conflict with the standardized argument. See below:

pop_mod(mod, x = bb1992, type = "NM", afl = .9)$pop.mod
#> [1] "F1=~0.9*Q5+0.9*Q7+0.9*Q8 \nF2=~0.9*Q2+0.9*Q4 \nF3=~0.9*Q10+0.9*Q11+0.9*Q12+0.9*Q13+0.9*Q18+0.9*Q19+0.9*Q20+0.9*Q21+0.9*Q22 \nF4=~0.9*Q1+0.9*Q17 \nF5=~0.9*Q6+0.9*Q14+0.9*Q15+0.9*Q16 \nF1~~1*F1 \nF2~~0.3*F1 \nF3~~0.3*F1 \nF4~~0.3*F1 \nF5~~0.3*F1 \nF2~~1*F2 \nF3~~0.3*F2 \nF4~~0.3*F2 \nF5~~0.3*F2 \nF3~~1*F3 \nF4~~0.3*F3 \nF5~~0.3*F3 \nF4~~1*F4 \nF5~~0.3*F4 \nF5~~1*F5 \n \n"
pop_mod(mod, x = bb1992, type = "NM", aco = .5)$pop.mod
#> [1] "F1=~0.7*Q5+0.7*Q7+0.7*Q8 \nF2=~0.7*Q2+0.7*Q4 \nF3=~0.7*Q10+0.7*Q11+0.7*Q12+0.7*Q13+0.7*Q18+0.7*Q19+0.7*Q20+0.7*Q21+0.7*Q22 \nF4=~0.7*Q1+0.7*Q17 \nF5=~0.7*Q6+0.7*Q14+0.7*Q15+0.7*Q16 \nF1~~1*F1 \nF2~~0.5*F1 \nF3~~0.5*F1 \nF4~~0.5*F1 \nF5~~0.5*F1 \nF2~~1*F2 \nF3~~0.5*F2 \nF4~~0.5*F2 \nF5~~0.5*F2 \nF3~~1*F3 \nF4~~0.5*F3 \nF5~~0.5*F3 \nF4~~1*F4 \nF5~~0.5*F4 \nF5~~1*F5 \n \n"
pop_mod(mod, x = bb1992, type = "EM", standardized = FALSE)$pop.mod
#> Warning in pop_mod(mod, x = bb1992, type = "EM", standardized = FALSE): All
#> loadings are < 1. Consider revision of standardized.
#> [1] "F1=~0.807*Q5+0.637*Q7+0.876*Q8 \nF2=~0.747*Q2+0.842*Q4 \nF3=~0.519*Q10+0.568*Q11+0.597*Q12+0.697*Q13+0.603*Q18+0.557*Q19+0.534*Q20+0.578*Q21+0.552*Q22 \nF4=~0.659*Q1+0.772*Q17 \nF5=~0.727*Q6+0.719*Q14+0.826*Q15+0.752*Q16 \nF1~~1*F1 \nF2~~0.36*F1 \nF3~~0.71*F1 \nF4~~0.641*F1 \nF5~~0.807*F1 \nF2~~1*F2 \nF3~~0.349*F2 \nF4~~0.486*F2 \nF5~~0.492*F2 \nF3~~1*F3 \nF4~~0.625*F3 \nF5~~0.8*F3 \nF4~~1*F4 \nF5~~0.831*F4 \nF5~~1*F5 \n"

Assumptions from data

In order to follow this principle of objectivity, the data x is essentially not that important for deriving flexible cutoffs, unless specified differently. That is, x determines the sample size (N) for the simulations and the multivariate non-normality of the data, if assume.mvn = FALSE (default: TRUE). Both are only relevant for the gen_fit function.

Internally, the function gen_fit calls the simulateData function from lavaan, which itself does not require data, but takes a population model from function pop_mod. Sample size is specified via sample.nobs by nrow(x). Arguments skewness and kurtosis are derived from semTools::mardiaKurtosis(x) and semTools::mardiaSkew(x) in package semTools, if assume.mvn = FALSE.

Since there seems to be no unified agreement on what “no”, “low”, “moderate”, or “high” kurtosis / skewness constitutes (Niemand & Mai, 2018), we omitted the once verbally differentiated levels available in the tool. As mentioned before, x is also needed for the empirical population model type “EM” in pop_mod.

Index guessing

For flexible cutoffs, the type of a fit index (GoF or BoF) plays an essential role, as the lower (GoF) or upper (BoF) width of a confidence interval is required. Since empirically guessing the type may be misleading (e.g., a very bad model may produce a distribution of SRMR that is not different from a distribution for a CFI in a better fitting model), we implemented a helper function index_guess that simply guesses the fit index by name (upper or lowercase considered):

index_guess("cfi")
#> [1] "GoF"
index_guess("CFI")
#> [1] "GoF"
index_guess("srmr")
#> [1] "BoF"
index_guess("SRMR")
#> [1] "BoF"
index_guess("mickey_mouse")
#> [1] "not a fit index"

This function is applied in the flex_co function. If one specifies established fit indices, such as CFI or SRMR, the gof argument is not required. However, this feature does not override the gof argument. For example:

flex_co(
  fits = fits.single,
  index = c("CFI", "SRMR"),
  gof = c(TRUE, FALSE)
)
#> Warning in flex_co(fits = fits.single, index = c("CFI", "SRMR"), gof = c(TRUE, :
#> The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 0.97826871 0.03659316 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0
flex_co(
  fits = fits.single,
  index = c("CFI", "SRMR"),
  gof = c(FALSE, TRUE)
)
#> Warning in flex_co(fits = fits.single, index = c("CFI", "SRMR"), gof =
#> c(FALSE, : The number of replications is lower than the recommended minimum of
#> 500. Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 1.00000000 0.03096027 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>   CFI  SRMR 
#> FALSE  TRUE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0

The first version (the gof argument can also be omitted as the guess is correct) gives the correct flexible cutoffs, the second version does not, as CFI is not a BoF and SRMR is not a GoF. Hence, users should be careful with specifying this argument. For novel fit indices or alternative uses, it may however be beneficial to maintain this argument.

Multi-core support

Since simulations may need some time, we implemented multi-core support. Depending on the system the package runs on, function gen_fit uses mclapply (on Linux or Mac) and parLapply (on Windows) from package parallel. The default is set to TRUE and two cores.

system.time(gen_fit(mod1 = mod, x = bb1992, rep = 10))
#>        User      System verstrichen 
#>       0.027       0.006       0.618

In the fortunate situation that a user has more cores, more cores can be set by the argument cores. However, note that the function returns an error if the number of available cores of the system is lower than the number of cores provided by the cores argument (e.g., cores = 4). Of course, multi-core support can be switched off by setting the argument multi.core = FALSE.

system.time(gen_fit(
  mod1 = mod,
  x = bb1992,
  rep = 10,
  multi.core = FALSE
))
#>        User      System verstrichen 
#>       1.028       0.028       1.062

Recommendations

One quintessential issue we notice in many papers, reviews, and PhD courses is that non-experts do not know which fit index or fit indicator to choose. To summarize, this is the major question one of our papers investigates (Mai et al., 2021) and the main message is that one should follow a tailor-fit approach. Depending on three settings, a) sample size, b) research purpose (novel or established model), and c) focus (confirming a factorial structure, i.e., CFA or investigating a theoretical, structural model), different fit indicators are recommended.

The differentiation between novel or established model might need some explanation. Fit indicators often yield different Type I and II errors. When an established model that has been empirically investigated many times before (e.g., Theory of Planned Behavior-models) is tested, it makes sense to put more weight on Type I error. However, when the model has never been tested before, it makes sense to put more weight on the Type II error. Since fixed cutoffs show worse hit rates for higher Type II error weights (1:3, 1:4), they may be poorly performing for novel models.

We built the function recommend to incorporate this tailor-fit approach in a user-friendly way. It requires the simulated fit indices and the arguments for purpose and focus. Sample size is determined automatically. Results are rounded to 3 digits, but can be changed to 1 to 5 digits if needed, for example by digits = 5.

Since the most obvious application is to conduct a CFA on a novel model, the standard arguments of purpose and focus are set to this application. Hence, when we use no further arguments, we get the following recommendation:

recommend(fits.single)
#> Warning in recommend(fits.single): The number of replications is lower than the
#> recommended minimum of 500. Consider with care.
#> $recommended
#>      type fit.values
#> SRMR  BoF      0.038
#> 
#> $cutoffs
#>               SRMR
#> cutoff 0.001 0.037
#> cutoff 0.01  0.037
#> cutoff 0.05  0.037
#> cutoff 0.1   0.036
#> 
#> $decisions
#>                  SRMR
#> cutoff 0.001 rejected
#> cutoff 0.01  rejected
#> cutoff 0.05  rejected
#> cutoff 0.1   rejected
#> 
#> $replications
#> [1] 10
#> 
#> $comment
#> [1] "Recommendations based on flexible cutoffs and Mai et al. (2021)"

SRMR is recommended due to the purpose, focus, and sample size (n = 502) in line with the recommendations by Mai et al. (2021). Hence, the lone fit indicator we need to report is SRMR. The function also returns the type of the fit indicator, which is guessed from index_guess and the actual value of the SRMR in the model.

Additionally, the function provides a sensitivity analysis for different values of uncertainty, .001 (.1 percent), .01 (1 percent), .05 (5 percent) and .10 (10 percent) and makes a decision given the cutoff. For completeness, replications, the number of non-converging models, and their share are also provided.

The result found here demonstrates the consequences of uncertainty for the decision. When one is very conservative and assumes a high Type I error (.10), the cutoff (.036 for 100 replications) is lower than the actual value (.038) and hence the present model should be rejected. When one is very lenient and feels safe with the model (.001), the cutoff (.040 for 100 replications) is higher than the actual value and hence the model can be confirmed.

Let us assume for a moment that Babakus and Boller had not been as exploratory as they were and simply looked for confirmation of an established measurement model. So, we change the purpose argument to established.

recommend(fits.single, purpose = "established")
#> $recommended
#>     type fit.values
#> CFI  GoF       0.96
#> 
#> $decisions
#> [1] "confirmed"
#> 
#> $comment
#> [1] "Recommendations based on fixed cutoffs and Mai et al. (2021)"

Now the recommendation changes to CFI with a fixed cutoff. Consequently, no uncertainty is provided (as they are fixed) and the recommendation is to confirm the model because the actual value (.960) is above the fixed cutoff of .95. This demonstrates two things. First, the recommend function also recommends fixed cutoffs when it is acceptable to do so (see Mai et al. 2021). Second, it shows what happens when assuming established models (and hence a low importance of Type II error): Type I errors become more important. Compare this with the lenient uncertainty before (.001, i.e., .040 for 100 replications), from the first recommendation, where SRMR also confirms the model. In other words, we feigned to be very certain by assuming the model to be an established model (ignoring the doubts) and hence got a very determined answer.

Finally, for exploratory investigations, one can also override the recommendations programmed into the function by using the override argument. This however requires users to provide one or more indices with the argument index (otherwise, an error is returned).

recommend(fits.single,
          override = TRUE,
          index = c("CFI", "SRMR"))
#> Warning in recommend(fits.single, override = TRUE, index = c("CFI", "SRMR")):
#> The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $recommended
#>      type fit.values
#> CFI   GoF      0.960
#> SRMR  BoF      0.038
#> 
#> $cutoffs
#>                CFI  SRMR
#> cutoff 0.001 0.978 0.037
#> cutoff 0.01  0.978 0.037
#> cutoff 0.05  0.978 0.037
#> cutoff 0.1   0.980 0.036
#> 
#> $decisions
#>                   CFI     SRMR
#> cutoff 0.001 rejected rejected
#> cutoff 0.01  rejected rejected
#> cutoff 0.05  rejected rejected
#> cutoff 0.1   rejected rejected
#> 
#> $replications
#> [1] 10
#> 
#> $comment
#> [1] "Override mode"

In the example, we provide CFI and SRMR and get the appropriate recommendations for them. Unsurprisingly, the recommendations do not change much (SRMR is confirmed for levels of .001 and .01 uncertainty for 100 replications, otherwise the model is rejected). Please note that purpose and focus are without function in this case, as the recommendations by Mai et al. (2021) are overridden.

Further applications

Relative fit comparison

A popular feature in CBSEM is to compare nested models, for example testing some type of invariance between groups, comparing alternative theoretical models or discriminant validity testing (see next chapter for the latter). So far, we allow users to incorporate a second model by specifying the mod2 argument in function gen_fit. In this case, the resulting fits from function are not vectors in a list of the length of replications, but a matrix with two rows. Function flex_co then produces a slightly different output displaying the flexible cutoffs for both models as well as the difference between the two models.

Let us assume that the first two factors F1 and F2 might be orthogonal (independent). Hence, we constrain the factor correlation between both to zero (F1 ~~ 0 * F2).

subset(parameterestimates(res, standardized = T), lhs == "F1" &
         rhs == "F2")
#>    lhs op rhs   est    se     z pvalue ci.lower ci.upper std.lv std.all std.nox
#> 46  F1 ~~  F2 0.217 0.038 5.736      0    0.143    0.291   0.36    0.36    0.36
mod.con <- "
F1 =~ Q5 + Q7 + Q8
F2 =~ Q2 + Q4
F3 =~ Q10 + Q11 + Q12 + Q13 + Q18 + Q19 + Q20 + Q21 + Q22
F4 =~ Q1 + Q17
F5 =~ Q6 + Q14 + Q15 + Q16
F1 ~~ 0 * F2
"
fits.con <- gen_fit(
  mod1 = mod,
  mod2 = mod.con,
  x = bb1992,
  rep = 10
)
flex_co(fits = fits.con,
        index = c("CFI", "SRMR"),
        alpha.lev = .05)
#> Warning in flex_co(fits = fits.con, index = c("CFI", "SRMR"), alpha.lev =
#> 0.05): The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $cutoff
#>               CFI       SRMR
#> Model 1 0.9808747 0.03280321
#> Model 2 0.9640482 0.05264762
#> 
#> $difference
#>         CFI        SRMR 
#>  0.01682647 -0.01984441 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0
fitmeasures(res, fit.measures = c("cfi", "srmr")) - fitmeasures(cfa(model = mod.con, data = bb1992), fit.measures = c("cfi", "srmr"))
#>    cfi   srmr 
#>  0.011 -0.037

Since the correlation between both factors was only .36, it is likely that the models show a comparable fit. Indeed, the flexible cutoffs are slightly worsened in the constrained model 2 (CFI = .956, SRMR = .056) compared to model 1 (CFI = .969, SRMR = .035), yielding a small difference (delta CFI = .012, delta SRMR = -.022). Consider that this difference is larger than the present difference of CFI between the models (.011) and smaller than present difference of SRMR between the models (-.037). That is, CFI was - as expected - less sensitive to the constraint than SRMR. Put simply, flexible cutoffs suggest that the change in fit by CFI is acceptable (flexible cutoff: .012 > present difference: .011), but the change in fit by SRMR is not (flexible cutoff: -.020 > present difference: -.037). We should reject the alternative model of orthogonal factors F1 and F2. Fixed cutoffs (CFI ≥ .95, SRMR ≤ .09) would still have accepted the alternative model. Please note that guidelines for relative fit comparisons in a contingent way - accounting for model size and sample size - are rare to find (e.g., Meade et al. 2008 for measurement invariance). Future investigations of the performance of flexible cutoffs for relative fit comparisons might be interesting.

As a technical side note, the flexible cutoffs here are slightly different from the ones described in chapter “Basic usage” since the type argument is automatically switched to “EM” (type = "EM") when two models are provided. When single cutoffs would be generated with this setting, the same flexible cutoffs would be found.

fits.proof <- gen_fit(
  mod1 = mod,
  x = bb1992,
  rep = 10,
  type = "EM"
)
flex_co(fits = fits.proof,
        index = c("CFI", "SRMR"),
        alpha.lev = .05)
#> Warning in flex_co(fits = fits.proof, index = c("CFI", "SRMR"), alpha.lev =
#> 0.05): The number of replications is lower than the recommended minimum of 500.
#> Consider with care.
#> $cutoff
#>        CFI       SRMR 
#> 0.98087471 0.03280321 
#> 
#> $index
#> [1] "CFI"  "SRMR"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>   CFI  SRMR 
#>  TRUE FALSE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0

Discriminant validity testing

A special application of flexible cutoffs could be its ability to test discriminant validity. As Rönkkö & Cho (2020) highlighted, CBSEM could be useful to compare CFAs where a second alternative model merges, hereafter “merging”, two factors subject to an issue in discriminant validity (given a substantially high correlation between the factors). Alternatively, it is possible to compare an unconstrained model with a constrained model, where the correlation between the factors is set to a cutoff value (e.g., .9). We refer to this second principle as “constraining”. Instead of testing the chi-square difference between the original and the alternative model, flexible cutoffs for fit indicators might work equally well or even better, given the issues with the chi-square statistic (e.g., Niemand & Mai 2018).

Consider that in the example data, factors F4 and F5 are highly correlated (.831), possibly indicating an issue in discriminant validity. To investigate this, we can use the discriminant validity-related arguments of the gen_fit function, termed dv, dv.factors, merge.mod, and dv.cutoff. Argument dv simply tells the function to expect the application of discriminant validity testing. Argument dv.factors provides the factors that should be investigated, in this case F4 and F5. If this argument is missing, it is assumed that the first and second factor of the model should be investigated (F1 & F2). Hence, one should provide the names of the factors if that is not the case. Argument merge.mod is needed if merging should be applied. This argument calls an internal function that takes the original model, changes all indicators of the second factor (F5) specified under dv.factors to be indicators of the first factor (F4), and omits the second factor from estimation. Please note that it does not matter which factor is first or second, as both are merged anyway. Finally, dv.cutoff is required when one wants to apply constraining. The value for this cutoff can be between 0 and 1, but a warning is given if it is lower than .8, as this is generally accepted as the lower border of a discriminant validity issue (Rönkkö & Cho 2020, p. 30). Some guessing work is implemented here to make the arguments more convenient. For example, if one forgot the argument dv but sets merge.mod = TRUE, merging is still assumed. To make sure the user knows which approach is used, a message indicating the discriminant validity testing approach is returned.

subset(parameterestimates(res, standardized = TRUE), lhs == "F4" &
         rhs == "F5")
#>    lhs op rhs   est    se     z pvalue ci.lower ci.upper std.lv std.all std.nox
#> 55  F4 ~~  F5 0.398 0.042 9.417      0    0.316    0.481  0.831   0.831   0.831
fits.dv.con <- gen_fit(
  mod1 = mod,
  x = bb1992,
  rep = 10,
  dv = TRUE,
  dv.factors = c("F4", "F5"),
  dv.cutoff = .9
)
#> Constraining is selected as the discriminant validity testing option given the provided arguments.
fits.dv.merge <- gen_fit(
  mod1 = mod,
  x = bb1992,
  rep = 10,
  dv = TRUE,
  dv.factors = c("F4", "F5"),
  merge.mod = TRUE
)
#> Merging is selected as the discriminant validity testing option given the provided arguments.

flex_co(fits = fits.dv.con,
        index = "CFI",
        alpha.lev = .05)
#> Warning in flex_co(fits = fits.dv.con, index = "CFI", alpha.lev = 0.05): The
#> number of replications is lower than the recommended minimum of 500. Consider
#> with care.
#> $cutoff
#>               CFI
#> Model 1 0.9808747
#> Model 2 0.9805963
#> 
#> $difference
#> [1] 0.0002783622
#> 
#> $index
#> [1] "CFI"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>  CFI 
#> TRUE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0
flex_co(fits = fits.dv.merge,
        index = "CFI",
        alpha.lev = .05)
#> Warning in flex_co(fits = fits.dv.merge, index = "CFI", alpha.lev = 0.05): The
#> number of replications is lower than the recommended minimum of 500. Consider
#> with care.
#> $cutoff
#>               CFI
#> Model 1 0.9808747
#> Model 2 0.9428451
#> 
#> $difference
#> [1] 0.03802959
#> 
#> $index
#> [1] "CFI"
#> 
#> $alpha
#> [1] 0.05
#> 
#> $gof
#>  CFI 
#> TRUE 
#> 
#> $replications
#> [1] 10
#> 
#> $`number of non-converging models`
#> [1] 0
#> 
#> $`share of non-converging models`
#> [1] 0

mod.dv.con <- "
F1 =~ Q5 + Q7 + Q8
F2 =~ Q2 + Q4
F3 =~ Q10 + Q11 + Q12 + Q13 + Q18 + Q19 + Q20 + Q21 + Q22
F4 =~ Q1 + Q17
F5 =~ Q6 + Q14 + Q15 + Q16
F4 ~~ .9 * F5
"
fitmeasures(
  cfa(
    model = mod.dv.con,
    data = bb1992,
    auto.fix.first = FALSE,
    std.lv = TRUE
  ),
  fit.measures = "cfi"
)
#>   cfi 
#> 0.959

mod.dv.merge <- "
F1 =~ Q5 + Q7 + Q8
F2 =~ Q2 + Q4
F3 =~ Q10 + Q11 + Q12 + Q13 + Q18 + Q19 + Q20 + Q21 + Q22
F4 =~ Q1 + Q17 + Q6 + Q14 + Q15 + Q16
"
fitmeasures(
  cfa(
    model = mod.dv.merge,
    data = bb1992
  ),
  fit.measures = "cfi"
)
#>   cfi 
#> 0.952

In this example, we asked for discriminant validity testing (dv = TRUE) and ran constraining first, constraining the correlation between the provided factors (dv.factors = c("F4", "F5")) to the cutoff of .9 (dv.cutoff = .9). Second, we ran merging, this time telling the function to merge both selected factors (merge.mod = TRUE).

Similar to the relative fit comparison case, the function returns tables for each replication, which then can be handled by the flex_co function. There are multiple options for how to assess discriminant validity subject to further investigations. As a simple test, we apply the following logic here (essentially a variant of a chi-squared difference test only with CFI): If two factors are clearly discriminant, their correlation is low, yielding a worse fit for the constrained or merged model compared to the original (unconstrained, not merged) model. This implies that the a GoF index, such as CFI, is smaller than the cutoff of the correct model (e.g., .75 < .95). In contrast, a BoF index, such as SRMR, would indicate discriminant validity if it is larger than the respective cutoff of the correct model (e.g., .15 > .05). In this example, the CFI for constraining is .959, while cutoffs for the correct model (model 1) are .969. In a nutshell, the CFI of the constrained model is out of the range of simulated CFIs for correct models (in this case, outside of 95% of all CFIs simulated). For merging, the same picture is found. CFI is .952, which is well below the cutoff of .969. Multiple other options are plausible, but we leave the point here.

Since the code to obtain these values is rather extensive and hence to make user’s work easier, we provide a recommendation function recommend_dv that incorporates the previously described considerations. As with the recommend function for single fit indicators, it requires a result from gen_fit with appropriate arguments and, optionally, names of the fit indices (CFI is used if not provided). Results are rounded to 3 digits, but can be changed to 1 to 5 digits if needed, for example by digits = 5.

recommend_dv(fits.dv.con)
#> Warning in recommend_dv(fits.dv.con): The number of replications is lower than
#> the recommended minimum of 500. Consider with care.
#> $cutoffs
#>     CFI       model alpha
#> 1 0.981    original 0.001
#> 2 0.981 constrained 0.001
#> 3 0.981    original 0.010
#> 4 0.981 constrained 0.010
#> 5 0.981    original 0.050
#> 6 0.981 constrained 0.050
#> 7 0.982    original 0.100
#> 8 0.982 constrained 0.100
#> 
#> $fit.values
#>     CFI       model
#> 1 0.960    original
#> 2 0.959 constrained
#> 
#> $differences
#>                CFI
#> cutoff 0.001 0.000
#> cutoff 0.01  0.000
#> cutoff 0.05  0.000
#> cutoff 0.1   0.000
#> fit          0.001
#> 
#> $decisions
#>                    CFI
#> cutoff 0.001 confirmed
#> cutoff 0.01  confirmed
#> cutoff 0.05  confirmed
#> cutoff 0.1   confirmed
#> 
#> $replications
#> [1] 10
#> 
#> $comment
#> [1] "Approach for discriminant validity testing: constraining. Discriminant validity is confirmed if the fit value from the constrained/merged model is smaller (GoF) / larger (BoF) than the respective cutoff of original model."
recommend_dv(fits.dv.merge)
#> Warning in recommend_dv(fits.dv.merge): The number of replications is lower than
#> the recommended minimum of 500. Consider with care.
#> $cutoffs
#>     CFI    model alpha
#> 1 0.981 original 0.001
#> 2 0.943   merged 0.001
#> 3 0.981 original 0.010
#> 4 0.943   merged 0.010
#> 5 0.981 original 0.050
#> 6 0.943   merged 0.050
#> 7 0.982 original 0.100
#> 8 0.944   merged 0.100
#> 
#> $fit.values
#>     CFI    model
#> 1 0.960 original
#> 2 0.952   merged
#> 
#> $differences
#>                CFI
#> cutoff 0.001 0.038
#> cutoff 0.01  0.038
#> cutoff 0.05  0.038
#> cutoff 0.1   0.038
#> fit          0.008
#> 
#> $decisions
#>                    CFI
#> cutoff 0.001 confirmed
#> cutoff 0.01  confirmed
#> cutoff 0.05  confirmed
#> cutoff 0.1   confirmed
#> 
#> $replications
#> [1] 10
#> 
#> $comment
#> [1] "Approach for discriminant validity testing: merging. Discriminant validity is confirmed if the fit value from the constrained/merged model is smaller (GoF) / larger (BoF) than the respective cutoff of original model."

The function returns the cutoffs for different levels of alpha and the two models, where the first model is the original model and the second model is termed constrained or merged based on the approach selected in the simulation. Further, the actual fit values are automatically provided. Hence, the constrained or merged model does not needed to be defined explicitly. Differences and decisions based on the different alpha values are provided as well. Finally, the number of replications as well as a comment highlighting the approach and an interpretation of the decision basis (the simple test) are given. Please note that the decisions are identical to the ones made by Rönkkö & Cho’s tool in semTools::discriminantValidity.

References

Babakus, E., & Boller, G. W. (1992). An empirical assessment of the SERVQUAL scale. Journal of Business Research, 24(3), 253–268. https://doi.org/10.1016/0148-2963(92)90022-4

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

Mai, R., Niemand, T., & Kraus, S. (2021). A Tailor-Fit Model Evaluation Strategy for Better Decisions about Structural Equation Models. Technological Forecasting & Social Change, 173(December) 121142. https://doi.org/10.1016/j.techfore.2021.121142

Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93(3), 568-592. https://doi.org/10.1037/0021-9010.93.3.568

Niemand, T., & Mai, R. (2018). Flexible cutoff values for fit indices in the evaluation of structural equation models. Journal of the Academy of Marketing Science, 46(6), 1148—1172. https://doi.org/10.1007/s11747-018-0602-9

Rönkkö, M., & Cho, E. (2020). An updated guideline for assessing discriminant validity. Organizational Research Methods. https://doi.org/10.1177/1094428120968614

Comment: V18012022