This is continuation of the eQTL analysis vignette. In that vignette we have used PCA to compute data driven covariances. Here we demonstrate the use of additional data driven covariance, via flash decomposition.
Same as the eQTL analysis vignette we simulate a toy data-set,
library(ashr)
library(mashr)
set.seed(1)
simdata = simple_sims(10000,5,1) # simulates data on 40k tests
# identify a subset of strong tests
m.1by1 = mash_1by1(mash_set_data(simdata$Bhat,simdata$Shat))
strong.subset = get_significant_results(m.1by1,0.05)
# identify a random subset of 5000 tests
random.subset = sample(1:nrow(simdata$Bhat),5000)
and create random
and strong
sets,
data.temp = mash_set_data(simdata$Bhat[random.subset,],simdata$Shat[random.subset,])
Vhat = estimate_null_correlation_simple(data.temp)
rm(data.temp)
data.random = mash_set_data(simdata$Bhat[random.subset,],simdata$Shat[random.subset,],V=Vhat)
data.strong = mash_set_data(simdata$Bhat[strong.subset,],simdata$Shat[strong.subset,], V=Vhat)
We first perform the empirical Bayes matrix factorization via flashr
using its default settings,
Alternatively, as suggested by Jason Willwerscheid for multi-tissue QTL studies, constrain factors to non-negative values. We can use tag
parameter to customize the names in the output list:
var_type="constant"
is an additional parameter passed to flash
function, as suggested in the post above.
Now we fit mash to the random tests using both data-driven and canonical covariances.
Now we can compute posterior summaries etc for any subset of tests using the above mash fit. Here we do this for the strong
tests.