2022-05-24 @Atsushi Kawaguchi
In this vignette, the output is omitted. Please refer to the following book for the output.
Kawaguchi A. (2021). Multivariate Analysis for Neuroimaging Data. CRC Press.
The data is generated by the strsimdata
function. The
function generates data by applying a zero-weighted load to randomly
generated factors.
n = 100; seed = 2
dataset1 = strsimdata(n = n, ncomp=2,
Xps=c(4,4), Ztype="binary", seed=seed)
The number of subjects is 100, the number of factors is 2, the generated explanatory variable matrix X has 2 blocks and the number of variables is 4 and 4, respectively. Thus, the number of blocks is set by the length of the vector that specifies the number of variables. Also, set whether to generate the supervisor vector Z. Multi-block data is a list of data matrices.
X2 = dataset1$X;
Z = dataset1$Z
str(dataset1[c("X","Z")])
The weights for X are generated by normalizing normal random numbers so that their length is 1 and they are stored as follows.
dataset1$WX
The first element of the list has a super weight and the second element has a block weight. The block weight corresponds to a component in the row and the column corresponds to the number of variables.
The numbers of zero weights are as follows.
dataset1$nZeroX
dataset1$ZcoefX
Perform supervised multi-block PCA by setting not only X2 but also Z
and supervised parameter muX
. First, select the number of
components and the regularized parameter.
(opt212 = optparasearch(X2, Z=Z, muX=0.5,
search.method = "ncomp1st", criterion="BIC", whichselect="X"))
Perform supervised multi-block PCA using msma function using the selected number of components and regularized parameters.
(fit212 = msma(X2, Z=Z, muX=0.5, comp=opt212$optncomp,
lambdaX=opt212$optlambdaX))
The results of the first and the second components are as follows.
par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit212, axes = 1, plottype="bar",
block="super", XY="X")
plot(fit212, axes = 2, plottype="bar",
block="super", XY="X")
plot(fit212, axes = 1, plottype="bar",
block="block", XY="X")
plot(fit212, axes = 2, plottype="bar",
block="block", XY="X")
The relationship between the super score and the binary outcome Z is examined.
par(mfrow=c(1,2))
for(i in 1:2){
t1=t.test(fit212$ssX[,i]~Z)
boxplot(fit212$ssX[,i]~Z,
main=paste("Comp", i),
sub=paste("t-test p =", round(t1$p.value,4)))
}
The data is generated by the strsimdata
function. The
function generates data by applying a zero-weighted load to randomly
generated factors.
dataset2 = strsimdata(n = n, ncomp=2, Xps=c(4,4),
Yps=c(3,5), Ztype="binary", cz=c(10,10), seed=seed)
The number of subjects is 100, the number of factors is 2, the generated explanatory variable matrix X has 2 blocks and the number of variables is 4 and 5, respectively. Thus, the number of blocks is set by the length of the vector that specifies the number of variables. The same is true for the objective variable matrix Y. Also, set whether to generate the supervisor vector Z. Here, it is generated.
Multi-block data is a list of data matrices.
X2 = dataset2$X; Y2 = dataset2$Y
Z = dataset2$Z
str(dataset2[c("X","Y","Z")])
The weights for X are generated by normalizing normal random numbers so that their length is 1 and they are stored as follows.
dataset2$WX
The first element of the list has a super weight and the second element has a block weight. The block weight corresponds to a component in the row and the column corresponds to the number of variables.
The numbers of zero weights are as follows.
dataset2$nZeroX
The weights of Y as well as X are set as follows.
dataset2$WY
dataset2$nZeroY
dataset2$ZcoefX
dataset2$ZcoefY
Here, further set Z and execute Supervised sparse PLS. The supervised
parameters muX
and muY
are both set to 0.3
here.
(opt222 = optparasearch(X2, Y2, Z, muX=0.3, muY=0.3,
search.method = "ncomp1st", criterion="BIC",
criterion4ncomp="BIC", whichselect=c("X","Y")))
(fit222 = msma(X2, Y2, Z,
muX=0.3, muY=0.3, comp=opt222$optncomp,
lambdaX=opt222$optlambdaX, lambdaY=opt222$optlambdaY))
The results of the first component are as follows.
par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit222, axes = 1, plottype="bar",
block="super", XY="X")
plot(fit222, axes = 2, plottype="bar",
block="super", XY="X")
plot(fit222, axes = 1, plottype="bar",
block="block", XY="X")
plot(fit222, axes = 2, plottype="bar",
block="block", XY="X")
The results of the first component are as follows.
par(mfrow=c(2,2), oma = c(0, 0, 2, 0))
plot(fit222, axes = 1, plottype="bar",
block="super", XY="Y")
plot(fit222, axes = 2, plottype="bar",
block="super", XY="Y")
plot(fit222, axes = 1, plottype="bar",
block="block", XY="Y")
plot(fit222, axes = 2, plottype="bar",
block="block", XY="Y")
par(mfrow=c(1,2))
for(i in 1:2) plot(fit222, axes = i, XY="XY")
The relationship between the super score and the binary outcome Z is examined to compare the presence of supervision. The result with supervision is as follows.
par(mfrow=c(2,2))
for(xy in c("X","Y")){for(i in 1:2){
ss = fit222[[paste0("ss", xy)]][,i]
t1=t.test(ss~Z)
boxplot(ss~Z, main=paste(xy, "Comp", i),
sub=paste("t-test p =", round(t1$p.value,4)))
}}