SEM Forests

Andreas Brandmaier

2022-05-13

Load data

Load affect dataset from the psychTools package. These are data from two studies conducted in the Personality, Motivation and Cognition Laboratory at Northwestern University to study affect dimensionality and the relationship to various personality dimensions.

library(psychTools)
#> Warning: Paket 'psychTools' wurde unter R Version 4.1.2 erstellt
data(affect)

knitr::kable(head(affect))
Study Film ext neur imp soc lie traitanx state1 EA1 TA1 PA1 NA1 EA2 TA2 PA2 NA2 state2 MEQ BDI
maps 3 18 9 7 10 3 24 22 24 14 26 2 6 5 7 4 NA NA 0.0476190
maps 3 16 12 5 8 1 41 40 9 13 10 4 4 14 5 5 NA NA 0.3333333
maps 3 6 5 3 1 2 37 44 1 14 4 2 2 15 3 1 NA NA 0.1904762
maps 3 12 15 4 6 3 54 40 5 15 1 0 4 15 0 2 NA NA 0.3846154
maps 3 14 2 5 6 3 39 67 12 20 7 13 14 15 16 13 NA NA 0.3809524
maps 1 6 15 2 4 5 51 38 9 14 5 1 7 12 2 2 NA NA 0.2380952

affect$Film <- as.factor(affect$Film)
affect$lie <- as.ordered(affect$lie)
affect$imp <- as.ordered(affect$imp)

Create simple model of state anxiety

The following code implements a simple SEM with only a single manifest variables and two parameters, the mean of state anxiety after having watched a movie (state2), \(\mu\), and the variance of state anxiety, \(\sigma^2\).

library(OpenMx)
manifests<-c("state2")
latents<-c()
model <- mxModel("Univariate Normal Model", 
type="RAM",
manifestVars = manifests,
latentVars = latents,
mxPath(from="one",to=manifests, free=c(TRUE), 
       value=c(50.0) , arrows=1, label=c("mu") ),
mxPath(from=manifests,to=manifests, free=c(TRUE), 
       value=c(100.0) , arrows=2, label=c("sigma2") ),
mxData(affect, type = "raw")
);

result <- mxRun(model)
#> Running Univariate Normal Model with 2 parameters

These are the estimates of the model when run on the entire sample:

summary(result)
#> Summary of Univariate Normal Model 
#>  
#> free parameters:
#>     name matrix    row    col  Estimate  Std.Error A
#> 1 sigma2      S state2 state2 115.05414 12.4793862  
#> 2     mu      M      1 state2  42.45118  0.8226717  
#> 
#> Model Statistics: 
#>                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
#>        Model:              2                    168              1289.158
#>    Saturated:              2                    168                    NA
#> Independence:              2                    168                    NA
#> Number of observations/statistics: 330/170
#> 
#> Information Criteria: 
#>       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
#> AIC:       953.1576               1293.158                 1293.194
#> BIC:       314.9100               1300.756                 1294.412
#> CFI: NA 
#> TLI: 1   (also known as NNFI) 
#> RMSEA:  0  [95% CI (NA, NA)]
#> Prob(RMSEA <= 0.05): NA
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2022-05-13 11:13:15 
#> Wall clock time: 0.05101109 secs 
#> optimizer:  SLSQP 
#> OpenMx version number: 2.20.3 
#> Need help?  See help(mxSummary)

Forest

Create a forest control object that stores all tuning parameters of the forest. Note that we use only 5 trees for demo purposes. Please increase the number in real applications.

control <- semforest.control(num.trees = 5)
print(control)
#> SEM-Forest control:
#> -----------------
#> Number of Trees:  5 
#> Sampling:  subsample 
#> Comparisons per Node: 2 
#> 
#>  SEM-Tree control:
#>  <U+2594><U+2594><U+2594><U+2594><U+2594><U+2594><U+2594><U+2594><U+2594><U+2594> 
#> <U+25CF> Splitting Method: fair
#> <U+25CF> Alpha Level: 1
#> <U+25CF> Bonferroni Correction:FALSE
#> <U+25CF> Minimum Number of Cases: 20
#> <U+25CF> Maximum Tree Depth: NA
#> <U+25CF> Number of CV Folds: 5
#> <U+25CF> Exclude Heywood Cases: FALSE
#> <U+25CF> Test Invariance Alpha Level: NA
#> <U+25CF> Use all Cases: FALSE
#> <U+25CF> Verbosity: FALSE
#> <U+25CF> Progress Bar: TRUE
#> <U+25CF> Seed: NA

Now, run the forest using the control object:

forest <- semforest( model=model,
                     data = affect, 
                     control = control,
                     covariates = c("Study","Film", "state1",
                                    "PA2","NA2","TA2"))
#> > Model was not run. Estimating parameters now before running the forest.
#> Running Univariate Normal Model with 2 parameters
#> 
#> Beginning initial fit attempt
#> Running Univariate Normal Model with 2 parameters
#> 
#>  Lowest minimum so far:  1289.15758570645
#> 
#> Solution found
#> 
#>  Solution found!  Final fit=1289.1576 (started at 1387.7841)  (1 attempt(s): 1 valid, 0 errors)
#>  Start values from best fit:
#> 115.054144655482,42.4511764378938
#> <U+2714> Tree construction finished [took 32s].
#> <U+2714> Tree construction finished [took 30s].
#> <U+2714> Tree construction finished [took 32s].
#> <U+2714> Tree construction finished [took 28s].
#> <U+2714> Tree construction finished [took 21s].
#> <U+2714> Forest completed [took ~2.5min]

Variable importance

Next, we compute permutation-based variable importance.

vim <- varimp(forest)
print(vim, sort.values=TRUE)
#> Variable Importance
#>     Study      Film       PA2       NA2    state1       TA2 
#>  1.827936 11.281157 11.416595 16.697414 62.229144 72.004270
plot(vim)

From this, we can learn that variables such as NA2 representing negative affect (after the movie), TA2 representing tense arousal (after the movie), and state1 representing the state anxiety before having watched the movie, are the best predictors of difference in the distribution of state anxiety (in either mean, variance or both) after having watched the movie.