In this example, we will show how to use lslx
to conduct semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939
in the package lavaan
. Hence, lavaan
must be installed.
In the following specification, x1
- x9
is assumed to be measurements of 3 latent factors: visual
, textual
, and speed
.
model_fa <- "visual :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed :=> x7 + x8 + x9
visual :~> x4 + x5 + x6 + x7 + x8 + x9
textual :~> x1 + x2 + x3 + x7 + x8 + x9
speed :~> x1 + x2 + x3 + x4 + x5 + x6
visual <=> fix(1)* visual
textual <=> fix(1)* textual
speed <=> fix(1)* speed"
The operator :=>
means that the LHS latent factors is defined by the RHS observed variables. In particular, the loadings are freely estimated. The operator :~>
also means that the LHS latent factors is defined by the RHS observed variables, but these loadings are set as penalized coefficients. In this model, visual
is mainly measured by x1
- x3
, textual
is mainly measured by x4
- x6
, and speed
is mainly measured by x7
- x9
. However, the inclusion of the penalized loadings indicates that each measurement may not be only influenced by one latent factor. The operator <=>
means that the LHS and RHS variables/factors are covaried. If the LHS and RHS variable/factor are the same, <=>
specifies the variance of that variable/factor. For scale setting, visual <=> fix(1) * visual
makes the variance of visual
to be zero. Details of model syntax can be found in the section of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Every time we conduct analysis with lslx
, an lslx
object must be initialized. The following code initializes an lslx
object named lslx_fa
.
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
Here, lslx
is the object generator for lslx
object and $new()
is the build-in method of lslx
to generate a new lslx
object. The initialization of lslx
requires users to specify a model for model specification (argument model
) and a data to be fitted (argument sample_data
). The data set must contain all the observed variables specified in the given model. In is also possible to initialize an lslx
object via sample moments (see vignette("structural-equation-modeling")
).
To see the model specification, we may use the $extract_specification()
method.
relation left right group reference matrix block type start
x1<-1/g x1<-1 x1 1 g FALSE alpha y<-1 free NA
x2<-1/g x2<-1 x2 1 g FALSE alpha y<-1 free NA
x3<-1/g x3<-1 x3 1 g FALSE alpha y<-1 free NA
x4<-1/g x4<-1 x4 1 g FALSE alpha y<-1 free NA
x5<-1/g x5<-1 x5 1 g FALSE alpha y<-1 free NA
x6<-1/g x6<-1 x6 1 g FALSE alpha y<-1 free NA
x7<-1/g x7<-1 x7 1 g FALSE alpha y<-1 free NA
x8<-1/g x8<-1 x8 1 g FALSE alpha y<-1 free NA
x9<-1/g x9<-1 x9 1 g FALSE alpha y<-1 free NA
x1<-visual/g x1<-visual x1 visual g FALSE beta y<-f free NA
x2<-visual/g x2<-visual x2 visual g FALSE beta y<-f free NA
x3<-visual/g x3<-visual x3 visual g FALSE beta y<-f free NA
x4<-visual/g x4<-visual x4 visual g FALSE beta y<-f pen NA
x5<-visual/g x5<-visual x5 visual g FALSE beta y<-f pen NA
x6<-visual/g x6<-visual x6 visual g FALSE beta y<-f pen NA
x7<-visual/g x7<-visual x7 visual g FALSE beta y<-f pen NA
x8<-visual/g x8<-visual x8 visual g FALSE beta y<-f pen NA
x9<-visual/g x9<-visual x9 visual g FALSE beta y<-f pen NA
x1<-textual/g x1<-textual x1 textual g FALSE beta y<-f pen NA
x2<-textual/g x2<-textual x2 textual g FALSE beta y<-f pen NA
x3<-textual/g x3<-textual x3 textual g FALSE beta y<-f pen NA
x4<-textual/g x4<-textual x4 textual g FALSE beta y<-f free NA
x5<-textual/g x5<-textual x5 textual g FALSE beta y<-f free NA
x6<-textual/g x6<-textual x6 textual g FALSE beta y<-f free NA
x7<-textual/g x7<-textual x7 textual g FALSE beta y<-f pen NA
x8<-textual/g x8<-textual x8 textual g FALSE beta y<-f pen NA
x9<-textual/g x9<-textual x9 textual g FALSE beta y<-f pen NA
x1<-speed/g x1<-speed x1 speed g FALSE beta y<-f pen NA
x2<-speed/g x2<-speed x2 speed g FALSE beta y<-f pen NA
x3<-speed/g x3<-speed x3 speed g FALSE beta y<-f pen NA
x4<-speed/g x4<-speed x4 speed g FALSE beta y<-f pen NA
x5<-speed/g x5<-speed x5 speed g FALSE beta y<-f pen NA
x6<-speed/g x6<-speed x6 speed g FALSE beta y<-f pen NA
x7<-speed/g x7<-speed x7 speed g FALSE beta y<-f free NA
x8<-speed/g x8<-speed x8 speed g FALSE beta y<-f free NA
x9<-speed/g x9<-speed x9 speed g FALSE beta y<-f free NA
visual<->visual/g visual<->visual visual visual g FALSE phi f<->f fixed 1
textual<->visual/g textual<->visual textual visual g FALSE phi f<->f free NA
speed<->visual/g speed<->visual speed visual g FALSE phi f<->f free NA
textual<->textual/g textual<->textual textual textual g FALSE phi f<->f fixed 1
speed<->textual/g speed<->textual speed textual g FALSE phi f<->f free NA
speed<->speed/g speed<->speed speed speed g FALSE phi f<->f fixed 1
x1<->x1/g x1<->x1 x1 x1 g FALSE phi y<->y free NA
x2<->x2/g x2<->x2 x2 x2 g FALSE phi y<->y free NA
x3<->x3/g x3<->x3 x3 x3 g FALSE phi y<->y free NA
x4<->x4/g x4<->x4 x4 x4 g FALSE phi y<->y free NA
x5<->x5/g x5<->x5 x5 x5 g FALSE phi y<->y free NA
x6<->x6/g x6<->x6 x6 x6 g FALSE phi y<->y free NA
x7<->x7/g x7<->x7 x7 x7 g FALSE phi y<->y free NA
x8<->x8/g x8<->x8 x8 x8 g FALSE phi y<->y free NA
x9<->x9/g x9<->x9 x9 x9 g FALSE phi y<->y free NA
label penalty set weight
x1<-1/g <NA> none 0 0
x2<-1/g <NA> none 0 0
x3<-1/g <NA> none 0 0
x4<-1/g <NA> none 0 0
x5<-1/g <NA> none 0 0
x6<-1/g <NA> none 0 0
x7<-1/g <NA> none 0 0
x8<-1/g <NA> none 0 0
x9<-1/g <NA> none 0 0
x1<-visual/g <NA> none 0 0
x2<-visual/g <NA> none 0 0
x3<-visual/g <NA> none 0 0
x4<-visual/g <NA> default 1 1
x5<-visual/g <NA> default 1 1
x6<-visual/g <NA> default 1 1
x7<-visual/g <NA> default 1 1
x8<-visual/g <NA> default 1 1
x9<-visual/g <NA> default 1 1
x1<-textual/g <NA> default 1 1
x2<-textual/g <NA> default 1 1
x3<-textual/g <NA> default 1 1
x4<-textual/g <NA> none 0 0
x5<-textual/g <NA> none 0 0
x6<-textual/g <NA> none 0 0
x7<-textual/g <NA> default 1 1
x8<-textual/g <NA> default 1 1
x9<-textual/g <NA> default 1 1
x1<-speed/g <NA> default 1 1
x2<-speed/g <NA> default 1 1
x3<-speed/g <NA> default 1 1
x4<-speed/g <NA> default 1 1
x5<-speed/g <NA> default 1 1
x6<-speed/g <NA> default 1 1
x7<-speed/g <NA> none 0 0
x8<-speed/g <NA> none 0 0
x9<-speed/g <NA> none 0 0
visual<->visual/g <NA> none 0 0
textual<->visual/g <NA> none 0 0
speed<->visual/g <NA> none 0 0
textual<->textual/g <NA> none 0 0
speed<->textual/g <NA> none 0 0
speed<->speed/g <NA> none 0 0
x1<->x1/g <NA> none 0 0
x2<->x2/g <NA> none 0 0
x3<->x3/g <NA> none 0 0
x4<->x4/g <NA> none 0 0
x5<->x5/g <NA> none 0 0
x6<->x6/g <NA> none 0 0
x7<->x7/g <NA> none 0 0
x8<->x8/g <NA> none 0 0
x9<->x9/g <NA> none 0 0
The row names show the coefficient names. The most two relevant columns are type
which shows the type of the coefficient and start
which gives the starting value. In lslx
, many extract
related methods are defined. extract
related methods can be used to extract quantities stored in lslx
object. For details, please see the section of Extract-Related Methods via ?lslx
.
After an lslx
object is initialized, method $fit()
can be used to fit the specified model to the given data.
lslx_fa$fit(penalty_method = "mcp",
lambda_grid = seq(.01, .60, .01),
delta_grid = c(1.5, 3.0, Inf))
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
The fitting process requires users to specify the penalty method (argument penalty_method
) and the considered penalty levels (argument lambda_grid
and delta_grid
). In this example, the mcp
penalty is implemented on the lambda grid seq(.01, .60, .01)
and delta grid c(1.5, 3, Inf)
. Note that in this example lambda = 0
is not considered because it may result in unidentified model. All the fitting result will be stored in the fitting
field of lslx_fa
.
Unlike traditional SEM analysis, lslx
fits the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Available selectors can be found in the section of Penalty Level Selection via ?lslx
. The following code summarize the fitting result under the penalty level selected by Bayesian information criterion (BIC).
General Information
number of observations 301
number of complete observations 301
number of missing patterns none
number of groups 1
number of responses 9
number of factors 3
number of free coefficients 30
number of penalized coefficients 18
Numerical Conditions
selected lambda 0.140
selected delta 1.500
selected step none
objective value 0.158
objective gradient absolute maximum 0.001
objective Hessian convexity 0.593
number of iterations 8.000
loss value 0.103
number of non-zero coefficients 34.000
degrees of freedom 20.000
robust degrees of freedom 20.638
scaling factor 1.032
Fit Indices
root mean square error of approximation (rmsea) 0.043
comparative fit index (cfi) 0.988
non-normed fit index (nnfi) 0.978
standardized root mean of residual (srmr) 0.030
Likelihood Ratio Test
statistic df p-value
unadjusted 30.901 20.000 0.057
mean-adjusted 29.946 20.000 0.071
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.043 0.000 0.075
mean-adjusted 0.041 0.000 0.075
Coefficient Test (Std.Error = "sandwich")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-visual free 0.709 0.095 7.477 0.000 0.523 0.895
x2<-visual free 0.555 0.082 6.798 0.000 0.395 0.715
x3<-visual free 0.744 0.070 10.575 0.000 0.606 0.882
x4<-visual pen 0.000 - - - - -
x5<-visual pen -0.103 0.073 -1.413 0.158 -0.245 0.040
x6<-visual pen 0.000 - - - - -
x7<-visual pen -0.254 0.118 -2.147 0.032 -0.486 -0.022
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.323 0.075 4.290 0.000 0.175 0.471
x1<-textual pen 0.255 0.081 3.136 0.002 0.096 0.415
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.987 0.061 16.133 0.000 0.867 1.106
x5<-textual free 1.143 0.062 18.583 0.000 1.023 1.264
x6<-textual free 0.913 0.058 15.726 0.000 0.799 1.027
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.824 0.105 7.816 0.000 0.617 1.031
x8<-speed free 0.731 0.069 10.605 0.000 0.596 0.866
x9<-speed free 0.499 0.063 7.961 0.000 0.376 0.622
Covariance
type estimate std.error z-value P(>|z|) lower upper
textual<->visual free 0.325 0.086 3.767 0.000 0.156 0.494
speed<->visual free 0.374 0.100 3.725 0.000 0.177 0.570
speed<->textual free 0.278 0.078 3.566 0.000 0.125 0.431
Variance
type estimate std.error z-value P(>|z|) lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.671 0.113 5.951 0.000 0.450 0.892
x2<->x2 free 1.073 0.106 10.164 0.000 0.866 1.280
x3<->x3 free 0.719 0.091 7.944 0.000 0.542 0.896
x4<->x4 free 0.377 0.050 7.507 0.000 0.279 0.476
x5<->x5 free 0.415 0.060 6.869 0.000 0.297 0.534
x6<->x6 free 0.362 0.046 7.903 0.000 0.273 0.452
x7<->x7 free 0.596 0.109 5.462 0.000 0.382 0.810
x8<->x8 free 0.488 0.084 5.772 0.000 0.322 0.653
x9<->x9 free 0.540 0.064 8.430 0.000 0.415 0.666
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.341 0.074 58.452 0.000 4.195 4.486
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.374 0.058 92.546 0.000 5.260 5.488
In this example, we can see that most penalized coefficients are estimated as zero under the selected penalty level except for x9<-visual
, which shows the benefit of using the semi-confirmatory approach. The summarize
method also shows the result of significance tests for the coefficients. In lslx
, the default standard errors are calculated based on sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal. However, it may not be valid after selecting an optimal penalty level.
lslx
provides four methods for visualizing the fitting results. The method $plot_numerical_condition()
shows the numerical condition under all the penalty levels. The following code plots the values of n_iter_out
(number of iterations in outer loop), objective_gradient_abs_max
(maximum of absolute value of gradient of objective function), and objective_hessian_convexity
(minimum of univariate approximate hessian). The plot can be used to evaluate the quality of numerical optimization.
The method $plot_information_criterion()
shows the values of information criteria under all the penalty levels.
The method $plot_fit_index()
shows the values of fit indices under all the penalty levels.
The method $plot_coefficient()
shows the solution path of coefficients in the given block. The following code plots the solution paths of all coefficients in the block y<-f
, which contains all the regression coefficients from latent factors to observed variables (i.e., factor loadings).
In lslx
, many quantities related to SEM can be extracted by extract-related method. For example, the loading matrix can be obtained by
$g
visual textual speed
x1 0.709 0.255 0.000
x2 0.555 0.000 0.000
x3 0.744 0.000 0.000
x4 0.000 0.987 0.000
x5 -0.103 1.143 0.000
x6 0.000 0.913 0.000
x7 -0.254 0.000 0.824
x8 0.000 0.000 0.731
x9 0.323 0.000 0.499
The model-implied covariance matrix and residual matrix can be obtained by
$g
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 1.3560 0.4393 0.5892 0.479 0.474 0.444 0.0754 0.245 0.423
x2 0.4393 1.3806 0.4129 0.178 0.149 0.165 0.0298 0.152 0.283
x3 0.5892 0.4129 1.2728 0.239 0.200 0.221 0.0399 0.203 0.379
x4 0.4793 0.1779 0.2386 1.350 1.095 0.901 0.1443 0.200 0.240
x5 0.4741 0.1493 0.2002 1.095 1.657 1.013 0.1618 0.204 0.226
x6 0.4436 0.1647 0.2208 0.901 1.013 1.196 0.1336 0.185 0.223
x7 0.0754 0.0298 0.0399 0.144 0.162 0.134 1.1832 0.533 0.381
x8 0.2455 0.1515 0.2033 0.200 0.204 0.185 0.5328 1.022 0.453
x9 0.4235 0.2828 0.3792 0.240 0.226 0.223 0.3812 0.453 1.014
$g
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 -0.00234 0.03193 0.00930 -0.025585 0.03353 -0.011256 -0.009370 -0.018374 -0.03488
x2 0.03193 -0.00115 -0.03820 -0.031019 -0.06184 -0.082892 0.126516 0.041886 0.03875
x3 0.00930 -0.03820 -0.00206 0.030436 0.08788 -0.023276 -0.048421 -0.009070 0.00538
x4 -0.02558 -0.03102 0.03044 -0.000296 -0.00288 0.005221 -0.075402 0.074744 -0.00295
x5 0.03353 -0.06184 0.08788 -0.002877 -0.00322 -0.001204 0.018757 0.023529 -0.06892
x6 -0.01126 -0.08289 -0.02328 0.005221 -0.00120 -0.000274 -0.010490 0.019983 -0.01349
x7 -0.00937 0.12652 -0.04842 -0.075402 0.01876 -0.010490 0.000101 -0.002465 0.00786
x8 -0.01837 0.04189 -0.00907 0.074744 0.02353 0.019983 -0.002465 -0.000357 -0.00433
x9 -0.03488 0.03875 0.00538 -0.002951 -0.06892 -0.013493 0.007861 -0.004325 -0.00093
plsem()
and S3
MethodsAfter version 0.6.3, the previous analysis can be equivalently conducted by plsem()
with lavaan
style model syntax.
model_fa <- "visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
pen() * visual =~ x4 + x5 + x6 + x7 + x8 + x9
pen() * textual =~ x1 + x2 + x3 + x7 + x8 + x9
pen() * speed =~ x1 + x2 + x3 + x4 + x5 + x6
visual ~~ 1 * visual
textual ~~ 1 * textual
speed ~~ 1 * speed"
lslx_fa <- plsem(model = model_fa,
data = lavaan::HolzingerSwineford1939,
penalty_method = "mcp",
lambda_grid = seq(.01, .60, .01),
delta_grid = c(1.5, 3.0, Inf))
An 'lslx' R6 class is initialized via 'data' argument.
Response Variables: x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factors: visual textual speed
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
General Information
number of observations 301
number of complete observations 301
number of missing patterns none
number of groups 1
number of responses 9
number of factors 3
number of free coefficients 30
number of penalized coefficients 18
Numerical Conditions
selected lambda 0.140
selected delta 1.500
selected step none
objective value 0.158
objective gradient absolute maximum 0.001
objective Hessian convexity 0.593
number of iterations 8.000
loss value 0.103
number of non-zero coefficients 34.000
degrees of freedom 20.000
robust degrees of freedom 20.641
scaling factor 1.032
Fit Indices
root mean square error of approximation (rmsea) 0.043
comparative fit index (cfi) 0.988
non-normed fit index (nnfi) 0.978
standardized root mean of residual (srmr) 0.030
Likelihood Ratio Test
statistic df p-value
unadjusted 30.902 20.000 0.056
mean-adjusted 29.942 20.000 0.071
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.043 0.000 0.075
mean-adjusted 0.041 0.000 0.075
Coefficient Test (Std.Error = "sandwich")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-visual free 0.709 0.095 7.477 0.000 0.523 0.895
x2<-visual free 0.555 0.082 6.797 0.000 0.395 0.715
x3<-visual free 0.744 0.070 10.576 0.000 0.606 0.882
x4<-visual pen 0.000 - - - - -
x5<-visual pen -0.102 0.073 -1.412 0.158 -0.245 0.040
x6<-visual pen 0.000 - - - - -
x7<-visual pen -0.254 0.118 -2.147 0.032 -0.486 -0.022
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.323 0.075 4.288 0.000 0.175 0.471
x1<-textual pen 0.255 0.081 3.136 0.002 0.096 0.415
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.987 0.061 16.133 0.000 0.867 1.106
x5<-textual free 1.143 0.062 18.584 0.000 1.023 1.264
x6<-textual free 0.913 0.058 15.726 0.000 0.799 1.027
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.823 0.105 7.812 0.000 0.617 1.030
x8<-speed free 0.731 0.069 10.604 0.000 0.596 0.866
x9<-speed free 0.499 0.063 7.960 0.000 0.376 0.622
Covariance
type estimate std.error z-value P(>|z|) lower upper
textual<->visual free 0.325 0.086 3.766 0.000 0.156 0.494
speed<->visual free 0.374 0.100 3.730 0.000 0.177 0.570
speed<->textual free 0.278 0.078 3.564 0.000 0.125 0.430
Variance
type estimate std.error z-value P(>|z|) lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.671 0.113 5.950 0.000 0.450 0.892
x2<->x2 free 1.073 0.106 10.165 0.000 0.866 1.280
x3<->x3 free 0.719 0.091 7.943 0.000 0.542 0.896
x4<->x4 free 0.377 0.050 7.507 0.000 0.279 0.476
x5<->x5 free 0.415 0.060 6.869 0.000 0.297 0.534
x6<->x6 free 0.362 0.046 7.904 0.000 0.273 0.452
x7<->x7 free 0.596 0.109 5.467 0.000 0.383 0.810
x8<->x8 free 0.487 0.084 5.767 0.000 0.322 0.653
x9<->x9 free 0.540 0.064 8.431 0.000 0.414 0.666
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.341 0.074 58.452 0.000 4.195 4.486
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.374 0.058 92.546 0.000 5.260 5.488