Introduction:
Epistasis Test in Meta-Analysis (ETMA) is a statistical method using summary data from genetic association studies to detect gene-gene interaction. This package etma has a main function for detecting epistasis using ETMA, and contains three complete example data sets.
Background:
Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The ‘missing heritability’ has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called ‘Epistasis Test in Meta-Analysis’ (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis.
Installation:
User may open the main R window and enter the following text to install etma package (assuming an internet connection and appropriate access rights on the computer):
install.packages("etma")
After installation, the user will need to enter the following text to load the etma package:
library(etma)
Datasets:
Use the data command to load these data and the print command to view them as follows. To analyze the data, use help(read.table) to view the details. User can use the help command to view the detailed definition of variables.
GSTs family and cancer
data(data.GST)
head(data.GST)
## Study Ethnicity Country Cancer case.GSTM1.0
## 1 Yri 2012 Caucasian Norway Hodgkin lymphoma 111
## 2 Van Hemelrijck 2012 Caucasian Switzerland Prostate cancer 98
## 3 Rudolph 2012 Caucasian German Colorectal cancer 822
## 4 Ramalhinho 2012 Caucasian Portugal Breast cancer 35
## 5 Ovsiannikov 2012 Caucasian Germany Bladder cancer 94
## 6 Oliveira 2012 Mix Brazil Ovarian cancer 84
## ctrl.GSTM1.0 case.GSTM1.1 ctrl.GSTM1.1 case.GSTT1.0 ctrl.GSTT1.0
## 1 567 110 477 189 965
## 2 172 105 188 168 296
## 3 844 932 923 1433 1459
## 4 76 66 45 54 97
## 5 113 102 122 163 188
## 6 90 48 42 93 98
## case.GSTT1.1 ctrl.GSTT1.1
## 1 31 50
## 2 35 64
## 3 313 308
## 4 47 24
## 5 33 47
## 6 39 34
PAH metabolism pathway and oral cancer
data(data.PAH)
head(data.PAH)
## Athour Year Country case.CYP1A1.0 case.CYP1A1.1 ctrl.CYP1A1.0
## 1 Sato 2000 Japan 68 74 90
## 2 Tanimoto 1999 Japan 32 68 62
## 3 Gronau 2003 Germany 55 18 94
## 4 Gatt?s 2006 Brazil 25 13 63
## 5 Cha 2007 USA 20 52 49
## 6 Matthias 1998 UK 110 14 165
## ctrl.CYP1A1.1 case.GSTM1.0 case.GSTM1.1 ctrl.GSTM1.0 ctrl.GSTM1.1
## 1 52 50 92 78 64
## 2 38 57 43 58 42
## 3 35 32 41 63 66
## 4 39 14 24 63 39
## 5 114 35 37 86 123
## 6 28 51 71 83 95
RAS and chronic kidney disease
data(data.RAS)
head(data.RAS)
## Author Year Race Tyep case.ACE.0 case.ACE.1
## 1 Su 2014 Asian combined 792 502
## 2 Shaikh 2014 Caucasian diabetic nephropathy 99 121
## 3 Pawlik 2014 Caucasian glomerulonephritis 126 154
## 4 Chen 2014 Asian combined 314 152
## 5 Zsom 2011 Caucasian combined 266 352
## 6 Huang 2010 Asian glomerulonephritis 49 45
## ctrl.ACE.0 ctrl.ACE.1 case.AGT.0 case.AGT.1 ctrl.AGT.0 ctrl.AGT.1
## 1 859 429 193 1101 230 1058
## 2 107 123 49 171 83 147
## 3 180 194 141 139 179 195
## 4 617 281 73 393 150 748
## 5 198 202 328 290 200 200
## 6 168 72 14 80 40 200
Simple example:
The main function of etma package is ‘ETMA’, and ETMA use an n by 8 matrix including the numbers of variants of SNP1 and SNP2 in case and control in each study (n is the number of studies) to analyse gene-gene interaction. Thus, the inputs of ETMA function include: (1) the number of wild type of SNP1 in case group, (2) the number of mutation type of SNP1 in case group, (3) the number of wild type of SNP1 in control group, (4) the number of mutation type of SNP1 in control group, (5) the number of wild type of SNP2 in case group, (6) the number of mutation type of SNP2 in case group, (7) the number of wild type of SNP2 in control group, and (8) the number of mutation type of SNP1 in control group.
Because ETMA is based on MCMC and a 2-steps iteration process, the main options of ETMA function include: (1) the maximum number of iterations (default is 20), (2) the length of chain to obtain the study-level parameters in step 1 (default is 20,000), (3) the length of chain to obtain the global-level parameters in step 2 (default is 200,000), and (4) the start seed of this algorithm (default is a random seed). Moreover, user also can choose whether want to export MCMC plots in each iterations.
The main outputs include: (1) the beta values (logarithmic ORs) of each SNP and interaction term, (2) the variance covariance matrix of beta value, and (3) the p matrix in iterations process. According these outputs, we can calculate ORs, their confidence intervals, and p values.
Use the ETMA command to analyze gene–gene interaction using ETMA and save the results to ggint.toy (Note: the computing time in this example is about 3-5 secs).
ggint.toy=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
data=data.RAS,iterations.step1=100,iterations.step2=300,
start.seed=1,show.detailed.plot=FALSE,show.final.plot=FALSE)
After the analysis, use the print and summary commands to view the result of gene–gene interaction analysis.
print(ggint.toy)
## Epistasis Test in Meta-Analysis (ETMA)
## A MCMC algorithm for detecting gene-gene interaction in meta-analysis.
##
## This analysis include 34 studies. (df = 31)
##
## b se OR 95%ci.l 95%ci.u t value p value
## SNP1(mutation) -0.00458 0.04044 0.995 0.917 1.081 -0.1131 0.9106
## SNP2(mutation) 0.08809 0.04787 1.092 0.991 1.204 1.8402 0.0753
## Interaction 0.13528 0.06773 1.145 0.997 1.314 1.9974 0.0546
summary(ggint.toy)
## Epistasis Test in Meta-Analysis (ETMA)
## A MCMC algorithm for detecting gene-gene interaction in meta-analysis.
##
## This analysis include 34 studies. (df = 31)
##
## b se OR 95%ci.l 95%ci.u t value p value
## SNP1(mutation) -0.00458 0.04044 0.995 0.917 1.081 -0.1131 0.9106
## SNP2(mutation) 0.08809 0.04787 1.092 0.991 1.204 1.8402 0.0753
## Interaction 0.13528 0.06773 1.145 0.997 1.314 1.9974 0.0546
##
## OR 95%ci.l 95%ci.u t value p value
## SNP1(wild type) & SNP2(mutation) 1.092 0.991 1.204 1.8402 0.0753
## SNP1(mutation) & SNP2(wild type) 0.995 0.917 1.081 -0.1131 0.9106
## SNP1(mutation) & SNP2(mutation) 1.245 1.180 1.313 8.3543 <0.0001
Complete example:
Following examples are complete examples. They need 20,000/200,000 learning time in step 1/step 2, respectively (default). Please note they need more than 15 mins, and one of example need about 3 hrs. The complete learning time is necessary in real data analysis. Please use default setting as following to analysis your data.
GSTs family and cancer (note: the computing time for this example is about 3 h):
ggint1=ETMA(case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
case.GSTT1.0,case.GSTT1.1,ctrl.GSTT1.0,ctrl.GSTT1.1,
data=data.GST,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint1)
summary(ggint1)
PAH metabolism pathway and oral cancer (note: the computing time for this example is about 15 min):
ggint2=ETMA(case.CYP1A1.0,case.CYP1A1.1,ctrl.CYP1A1.0,ctrl.CYP1A1.1,
case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
data=data.PAH,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint2)
summary(ggint2)
RAS and chronic kidney disease (note: the computing time for this example is about 15 min):
ggint3=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
data=data.RAS,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint3)
summary(ggint3)