Demonstration of package heritEWAS

DNA methylation can be thought of as a type of “mark” on DNA that can affect gene expression. Most methylation marks are erased soon after conception but methylation is known to be effectively inherited from parents to their offspring in rare cases. The heritEWAS package provides functions to identify these heritable DNA methylation marks using the method of (Joo et al., 2018). This approach looks for Mendelian patterns of inheritance within families and is based on the relationship structure of each family and on methylation data for a subset of the family members (but no genotype data). This method can handle large, multi-generational families and it is optimised for methylation data at thousands or millions of methylation sites, such as the data generated by epigenome-wide association studies (EWAS) of related individuals.

install.packages("heritEWAS")

library(heritEWAS)

To use this package, you will need two sets of data (described in more detail in the help page for the function genotype_combinations):

The pedigree data should look something like the following (where extra variables like aff and age can be included but they will be ignored):

head(ped)
#>   family  indiv mother father sex aff age typed
#> 1    aey aey001   <NA>   <NA>   M   0  28     0
#> 2    aey aey002 aey005 aey006   F   0  41     1
#> 3    aey aey003 aey002 aey001   F   0  70     1
#> 4    aey aey004 aey002 aey001   F   0  12     0
#> 5    aey aey005   <NA>   <NA>   F   0  18     1
#> 6    aey aey006 aey008 aey009   M   0  13     1
unique(ped$family)
#>  [1] "aey" "awz" "beu" "ety" "ibz" "kph" "ljd" "mps" "msn" "msp" "nmc" "nqc"
#> [13] "opq" "qor" "rpb" "rvn" "thv" "wkt" "yiu" "yvf"

# Colnames are the individual IDs of the pedigree data
M_values[1:5, 1:5]
#>                aey002     aey003     aey005     aey006     aey007
#> cg18584561 -3.8530708 -4.8507816 -3.9702189 -4.5818866 -4.2766219
#> cg01074083 -2.3250118 -3.3565441 -0.7297221  1.3038464 -0.1940888
#> cg27639199 -4.0406730 -4.0770266 -3.6542961 -3.8844059 -3.6013861
#> cg26773954 -0.7413546 -1.1655354  0.9955952 -1.0797262 -0.8736215
#> cg04417708 -0.2233104 -0.2838067  1.7158682 -0.3687787 -0.4975771

The main goal of the package heritEWAS is to calculate a statistic \(\Delta l\) for each methylation site. This statistic measures the strength of evidence that the site’s M-values follow a Mendelian pattern of inheritance within the families, with larger values of \(\Delta l\) corresponding to more heritable methylation sites. This statistic is the difference in maximised log-likelihoods of two statistical models, and can be interpreted as a difference in the Bayesian information criteria of the two models; see (Joo et al. 2018).

The most time-consuming part of the calculation of \(\Delta l\) is the same for all methylation sites, so the heritEWAS package calculates this part once, stores the output, then re-uses this calculation for each methylation site.
This part of the calculation is performed by the function genotype_combinations():

typed_genos <- genotype_combinations(ped)
#> Family 1 (aey): Number of genotype combinations to check: 104 
#> Family 2 (awz): Number of genotype combinations to check: 16 
#> Family 3 (beu): Number of genotype combinations to check: 96 
#> Family 4 (ety): Number of genotype combinations to check: 66 
#> Family 5 (ibz): Number of genotype combinations to check: 192 
#> Family 6 (kph): Number of genotype combinations to check: 64 
#> Family 7 (ljd): Number of genotype combinations to check: 16 
#> Family 8 (mps): Number of genotype combinations to check: 68 
#> Family 9 (msn): Number of genotype combinations to check: 66 
#> Family 10 (msp): Number of genotype combinations to check: 132 
#> Family 11 (nmc): Number of genotype combinations to check: 7 
#> Family 12 (nqc): Number of genotype combinations to check: 64 
#> Family 13 (opq): Number of genotype combinations to check: 256 
#> Family 14 (qor): Number of genotype combinations to check: 33 
#> Family 15 (rpb): Number of genotype combinations to check: 68 
#> Family 16 (rvn): Number of genotype combinations to check: 16 
#> Family 17 (thv): Number of genotype combinations to check: 16 
#> Family 18 (wkt): Number of genotype combinations to check: 104 
#> Family 19 (yiu): Number of genotype combinations to check: 68 
#> Family 20 (yvf): Number of genotype combinations to check: 192

The results are stored in a named list of data frames, with one data frame per family. Each data frame gives the probability of each possible combination of genotypes for those family members with methylation data (i.e. those with typed = 1). The possible genotypes for each person are 0 and 1, corresponding to non-carriers and carriers (respectively) of a hypothetical genetic variant that controls methylation at a given methylation site under one of the two statistical models used to define \(\Delta l\). Impossible genotype combinations (those with a probability of 0) are excluded from the output of genotype_combinations().

str(typed_genos)
#> List of 20
#>  $ aey:'data.frame': 36 obs. of  9 variables:
#>   ..$ p     : num [1:36] 0.93458 0.00935 0.00467 0.00467 0.00234 ...
#>   ..$ aey002: num [1:36] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ aey003: num [1:36] 0 1 0 0 0 0 1 1 0 0 ...
#>   ..$ aey005: num [1:36] 0 0 1 1 1 1 1 1 0 0 ...
#>   ..$ aey006: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ aey007: num [1:36] 0 0 0 1 0 1 0 1 0 0 ...
#>   ..$ aey008: num [1:36] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ aey009: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ aey012: num [1:36] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ awz:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#>   ..$ awz002: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ awz007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#>   ..$ awz011: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#>   ..$ awz013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#>  $ beu:'data.frame': 35 obs. of  8 variables:
#>   ..$ p     : num [1:35] 0.93925 0.00935 0.00467 0.00234 0.00234 ...
#>   ..$ beu001: num [1:35] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ beu002: num [1:35] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ beu003: num [1:35] 0 0 0 0 0 0 0 0 0 1 ...
#>   ..$ beu004: num [1:35] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ beu006: num [1:35] 0 0 0 1 1 1 1 0 0 0 ...
#>   ..$ beu007: num [1:35] 0 0 1 0 0 1 1 0 0 0 ...
#>   ..$ beu010: num [1:35] 0 1 0 0 1 0 1 0 0 0 ...
#>  $ ety:'data.frame': 51 obs. of  8 variables:
#>   ..$ p     : num [1:51] 0.940421 0.004673 0.001168 0.000876 0.000292 ...
#>   ..$ ety003: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety004: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety005: num [1:51] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ ety006: num [1:51] 0 0 0 1 1 1 1 0 0 1 ...
#>   ..$ ety007: num [1:51] 0 0 1 0 0 1 1 0 1 0 ...
#>   ..$ ety011: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety013: num [1:51] 0 1 0 0 1 0 1 0 0 0 ...
#>  $ ibz:'data.frame': 99 obs. of  9 variables:
#>   ..$ p     : num [1:99] 0.943925 0.009346 0.009346 0.000876 0.000292 ...
#>   ..$ ibz003: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz004: num [1:99] 0 1 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz006: num [1:99] 0 0 0 1 1 1 1 1 1 1 ...
#>   ..$ ibz007: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz008: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz009: num [1:99] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ ibz011: num [1:99] 0 0 0 0 0 1 1 0 0 1 ...
#>   ..$ ibz016: num [1:99] 0 0 1 0 1 0 1 0 1 0 ...
#>  $ kph:'data.frame': 64 obs. of  7 variables:
#>   ..$ p     : num [1:64] 0.94911 0.00506 0.00506 0.00506 0.00149 ...
#>   ..$ kph001: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ kph003: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ kph004: int [1:64] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ kph005: int [1:64] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ kph009: int [1:64] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ kph011: int [1:64] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ ljd:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00238 ...
#>   ..$ ljd004: int [1:13] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ ljd005: int [1:13] 0 0 0 0 1 1 1 1 0 1 ...
#>   ..$ ljd009: int [1:13] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ ljd010: int [1:13] 0 1 0 1 0 1 0 1 0 0 ...
#>  $ mps:'data.frame': 68 obs. of  8 variables:
#>   ..$ p     : num [1:68] 0.936186 0.009492 0.000438 0.000146 0.001606 ...
#>   ..$ mps001: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps003: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps005: num [1:68] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ mps006: num [1:68] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ mps008: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps011: num [1:68] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ mps016: num [1:68] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ msn:'data.frame': 50 obs. of  8 variables:
#>   ..$ p     : num [1:50] 0.936332 0.00993 0.000876 0.000876 0.000292 ...
#>   ..$ msn001: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn003: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn005: num [1:50] 0 0 0 0 0 0 1 1 1 1 ...
#>   ..$ msn007: num [1:50] 0 0 1 1 1 1 0 0 1 1 ...
#>   ..$ msn010: num [1:50] 0 1 0 0 1 1 0 1 0 0 ...
#>   ..$ msn011: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn013: num [1:50] 0 0 0 1 0 1 0 0 0 1 ...
#>  $ msp:'data.frame': 132 obs. of  9 variables:
#>   ..$ p     : num [1:132] 0.931732 0.000365 0.000365 0.000365 0.001825 ...
#>   ..$ msp001: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp004: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp008: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp009: num [1:132] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ msp010: num [1:132] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ msp011: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp013: int [1:132] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ msp014: int [1:132] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ nmc:'data.frame': 6 obs. of  5 variables:
#>   ..$ p     : num [1:6] 0.94286 0.00952 0.00952 0.00952 0.00952 ...
#>   ..$ nmc001: num [1:6] 0 0 1 0 1 0
#>   ..$ nmc002: num [1:6] 0 1 1 0 0 0
#>   ..$ nmc003: num [1:6] 0 0 0 1 1 0
#>   ..$ nmc010: num [1:6] 0 0 0 0 0 1
#>  $ nqc:'data.frame': 49 obs. of  7 variables:
#>   ..$ p     : num [1:49] 0.94217 0.00292 0.00292 0.00292 0.00292 ...
#>   ..$ nqc004: int [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ nqc005: int [1:49] 0 0 0 0 0 0 0 0 0 1 ...
#>   ..$ nqc008: int [1:49] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ nqc009: int [1:49] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ nqc012: int [1:49] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ nqc013: int [1:49] 0 1 0 1 0 1 0 1 0 0 ...
#>  $ opq:'data.frame': 151 obs. of  9 variables:
#>   ..$ p     : num [1:151] 0.93925 0.00117 0.00117 0.00117 0.00117 ...
#>   ..$ opq002: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq004: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq005: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq008: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq011: int [1:151] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ opq013: int [1:151] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ opq014: int [1:151] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ opq015: int [1:151] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ qor:'data.frame': 26 obs. of  7 variables:
#>   ..$ p     : num [1:26] 0.92775 0.00573 0.00573 0.00573 0.00573 ...
#>   ..$ qor002: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ qor005: num [1:26] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ qor010: num [1:26] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ qor011: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ qor014: num [1:26] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ qor017: num [1:26] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ rpb:'data.frame': 44 obs. of  8 variables:
#>   ..$ p     : num [1:44] 0.945238 0.000595 0.000595 0.000595 0.000595 ...
#>   ..$ rpb001: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb003: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb004: num [1:44] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ rpb006: num [1:44] 0 1 1 1 1 0 1 1 1 1 ...
#>   ..$ rpb007: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb009: int [1:44] 0 0 0 1 1 0 0 0 1 1 ...
#>   ..$ rpb014: int [1:44] 0 0 1 0 1 0 0 1 0 1 ...
#>  $ rvn:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#>   ..$ rvn005: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ rvn007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#>   ..$ rvn009: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#>   ..$ rvn013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#>  $ thv:'data.frame': 16 obs. of  5 variables:
#>   ..$ p     : num [1:16] 0.95595 0.00357 0.00357 0.00357 0.00357 ...
#>   ..$ thv005: int [1:16] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ thv006: int [1:16] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ thv011: int [1:16] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ thv012: int [1:16] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ wkt:'data.frame': 45 obs. of  9 variables:
#>   ..$ p     : num [1:45] 0.942857 0.004762 0.004762 0.000595 0.000595 ...
#>   ..$ wkt004: num [1:45] 0 1 1 1 1 1 1 1 1 1 ...
#>   ..$ wkt007: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ wkt008: num [1:45] 0 0 0 1 1 1 1 1 1 1 ...
#>   ..$ wkt011: num [1:45] 0 0 1 0 0 0 0 0 0 0 ...
#>   ..$ wkt012: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ wkt013: num [1:45] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ wkt014: num [1:45] 0 0 0 0 0 1 1 0 0 1 ...
#>   ..$ wkt015: num [1:45] 0 0 0 0 1 0 1 0 1 0 ...
#>  $ yiu:'data.frame': 47 obs. of  8 variables:
#>   ..$ p     : num [1:47] 0.93575 0.00117 0.00117 0.00117 0.00584 ...
#>   ..$ yiu001: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu003: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu005: num [1:47] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ yiu008: num [1:47] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ yiu011: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu014: int [1:47] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ yiu018: int [1:47] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ yvf:'data.frame': 63 obs. of  9 variables:
#>   ..$ p     : num [1:63] 0.93224 0.00234 0.00234 0.00234 0.00234 ...
#>   ..$ yvf001: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf004: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf006: num [1:63] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ yvf007: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf008: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf011: int [1:63] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ yvf013: int [1:63] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ yvf014: int [1:63] 0 1 0 1 0 1 0 1 0 1 ...

Given the genotype probabilities, we can use the package’s main function ML_estimates() to compute \(\Delta l\) for each site. The output is a data frame with rows corresponding to methylation sites and columns giving details about certain fitted models (see the help page of the function ML_estimates for more details). In particular, the column delta.l gives the statistic \(\Delta l\), which measures how heritable each methylation site is.

Demonstration of package heritEWAS

References