DNA methylation can be thought of as a type of “mark” on DNA that can affect gene expression. Most methylation marks are erased soon after conception but methylation is known to be effectively inherited from parents to their offspring in rare cases. The heritEWAS
package provides functions to identify these heritable DNA methylation marks using the method of (Joo et al., 2018). This approach looks for Mendelian patterns of inheritance within families and is based on the relationship structure of each family and on methylation data for a subset of the family members (but no genotype data). This method can handle large, multi-generational families and it is optimised for methylation data at thousands or millions of methylation sites, such as the data generated by epigenome-wide association studies (EWAS) of related individuals.
You can install the released version of heritEWAS from CRAN with:
To use this package, you will need two sets of data (described in more detail in the help page for the function genotype_combinations
):
A data frame containing the pedigree data, i.e. the relationship structure of the families. Each row of the data frame corresponds to a person, and the columns correspond to each person’s individual identifier (indiv
) and the identifiers of his or her mother (mother
) and father (father
), as well as a family identifier (family
) and a binary flag (typed
) which is 1
for people who have methylation data available. No family should contain a pedigree loop, such as one caused by inbreeding.
A matrix of the M-values, with rows corresponding to methylation sites (i.e. CpG probes) and columns corresponding to people. The column names should match the individual identifiers of people in the pedigree data with typed = 1
.
The pedigree data should look something like the following (where extra variables like aff
and age
can be included but they will be ignored):
head(ped)
#> family indiv mother father sex aff age typed
#> 1 aey aey001 <NA> <NA> M 0 28 0
#> 2 aey aey002 aey005 aey006 F 0 41 1
#> 3 aey aey003 aey002 aey001 F 0 70 1
#> 4 aey aey004 aey002 aey001 F 0 12 0
#> 5 aey aey005 <NA> <NA> F 0 18 1
#> 6 aey aey006 aey008 aey009 M 0 13 1
unique(ped$family)
#> [1] "aey" "awz" "beu" "ety" "ibz" "kph" "ljd" "mps" "msn" "msp" "nmc" "nqc"
#> [13] "opq" "qor" "rpb" "rvn" "thv" "wkt" "yiu" "yvf"
And the M-values matrix should look something like:
# Colnames are the individual IDs of the pedigree data
M_values[1:5, 1:5]
#> aey002 aey003 aey005 aey006 aey007
#> cg18584561 -3.8530708 -4.8507816 -3.9702189 -4.5818866 -4.2766219
#> cg01074083 -2.3250118 -3.3565441 -0.7297221 1.3038464 -0.1940888
#> cg27639199 -4.0406730 -4.0770266 -3.6542961 -3.8844059 -3.6013861
#> cg26773954 -0.7413546 -1.1655354 0.9955952 -1.0797262 -0.8736215
#> cg04417708 -0.2233104 -0.2838067 1.7158682 -0.3687787 -0.4975771
The main goal of the package heritEWAS
is to calculate a statistic \(\Delta l\) for each methylation site. This statistic measures the strength of evidence that the site’s M-values follow a Mendelian pattern of inheritance within the families, with larger values of \(\Delta l\) corresponding to more heritable methylation sites. This statistic is the difference in maximised log-likelihoods of two statistical models, and can be interpreted as a difference in the Bayesian information criteria of the two models; see (Joo et al. 2018).
The most time-consuming part of the calculation of \(\Delta l\) is the same for all methylation sites, so the heritEWAS
package calculates this part once, stores the output, then re-uses this calculation for each methylation site.
This part of the calculation is performed by the function genotype_combinations()
:
typed_genos <- genotype_combinations(ped)
#> Family 1 (aey): Number of genotype combinations to check: 104
#> Family 2 (awz): Number of genotype combinations to check: 16
#> Family 3 (beu): Number of genotype combinations to check: 96
#> Family 4 (ety): Number of genotype combinations to check: 66
#> Family 5 (ibz): Number of genotype combinations to check: 192
#> Family 6 (kph): Number of genotype combinations to check: 64
#> Family 7 (ljd): Number of genotype combinations to check: 16
#> Family 8 (mps): Number of genotype combinations to check: 68
#> Family 9 (msn): Number of genotype combinations to check: 66
#> Family 10 (msp): Number of genotype combinations to check: 132
#> Family 11 (nmc): Number of genotype combinations to check: 7
#> Family 12 (nqc): Number of genotype combinations to check: 64
#> Family 13 (opq): Number of genotype combinations to check: 256
#> Family 14 (qor): Number of genotype combinations to check: 33
#> Family 15 (rpb): Number of genotype combinations to check: 68
#> Family 16 (rvn): Number of genotype combinations to check: 16
#> Family 17 (thv): Number of genotype combinations to check: 16
#> Family 18 (wkt): Number of genotype combinations to check: 104
#> Family 19 (yiu): Number of genotype combinations to check: 68
#> Family 20 (yvf): Number of genotype combinations to check: 192
The results are stored in a named list of data frames, with one data frame per family. Each data frame gives the probability of each possible combination of genotypes for those family members with methylation data (i.e. those with typed = 1
). The possible genotypes for each person are 0
and 1
, corresponding to non-carriers and carriers (respectively) of a hypothetical genetic variant that controls methylation at a given methylation site under one of the two statistical models used to define \(\Delta l\). Impossible genotype combinations (those with a probability of 0
) are excluded from the output of genotype_combinations()
.
str(typed_genos)
#> List of 20
#> $ aey:'data.frame': 36 obs. of 9 variables:
#> ..$ p : num [1:36] 0.93458 0.00935 0.00467 0.00467 0.00234 ...
#> ..$ aey002: num [1:36] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ aey003: num [1:36] 0 1 0 0 0 0 1 1 0 0 ...
#> ..$ aey005: num [1:36] 0 0 1 1 1 1 1 1 0 0 ...
#> ..$ aey006: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ aey007: num [1:36] 0 0 0 1 0 1 0 1 0 0 ...
#> ..$ aey008: num [1:36] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ aey009: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ aey012: num [1:36] 0 0 0 0 0 0 0 0 0 1 ...
#> $ awz:'data.frame': 13 obs. of 5 variables:
#> ..$ p : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#> ..$ awz002: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#> ..$ awz007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#> ..$ awz011: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#> ..$ awz013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#> $ beu:'data.frame': 35 obs. of 8 variables:
#> ..$ p : num [1:35] 0.93925 0.00935 0.00467 0.00234 0.00234 ...
#> ..$ beu001: num [1:35] 0 0 0 0 0 0 0 1 1 1 ...
#> ..$ beu002: num [1:35] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ beu003: num [1:35] 0 0 0 0 0 0 0 0 0 1 ...
#> ..$ beu004: num [1:35] 0 0 0 0 0 0 0 0 1 0 ...
#> ..$ beu006: num [1:35] 0 0 0 1 1 1 1 0 0 0 ...
#> ..$ beu007: num [1:35] 0 0 1 0 0 1 1 0 0 0 ...
#> ..$ beu010: num [1:35] 0 1 0 0 1 0 1 0 0 0 ...
#> $ ety:'data.frame': 51 obs. of 8 variables:
#> ..$ p : num [1:51] 0.940421 0.004673 0.001168 0.000876 0.000292 ...
#> ..$ ety003: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ety004: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ety005: num [1:51] 0 0 0 0 0 0 0 1 1 1 ...
#> ..$ ety006: num [1:51] 0 0 0 1 1 1 1 0 0 1 ...
#> ..$ ety007: num [1:51] 0 0 1 0 0 1 1 0 1 0 ...
#> ..$ ety011: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ety013: num [1:51] 0 1 0 0 1 0 1 0 0 0 ...
#> $ ibz:'data.frame': 99 obs. of 9 variables:
#> ..$ p : num [1:99] 0.943925 0.009346 0.009346 0.000876 0.000292 ...
#> ..$ ibz003: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ibz004: num [1:99] 0 1 0 0 0 0 0 0 0 0 ...
#> ..$ ibz006: num [1:99] 0 0 0 1 1 1 1 1 1 1 ...
#> ..$ ibz007: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ibz008: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ ibz009: num [1:99] 0 0 0 0 0 0 0 1 1 1 ...
#> ..$ ibz011: num [1:99] 0 0 0 0 0 1 1 0 0 1 ...
#> ..$ ibz016: num [1:99] 0 0 1 0 1 0 1 0 1 0 ...
#> $ kph:'data.frame': 64 obs. of 7 variables:
#> ..$ p : num [1:64] 0.94911 0.00506 0.00506 0.00506 0.00149 ...
#> ..$ kph001: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ kph003: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ kph004: int [1:64] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ kph005: int [1:64] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ kph009: int [1:64] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ kph011: int [1:64] 0 1 0 1 0 1 0 1 0 1 ...
#> $ ljd:'data.frame': 13 obs. of 5 variables:
#> ..$ p : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00238 ...
#> ..$ ljd004: int [1:13] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ ljd005: int [1:13] 0 0 0 0 1 1 1 1 0 1 ...
#> ..$ ljd009: int [1:13] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ ljd010: int [1:13] 0 1 0 1 0 1 0 1 0 0 ...
#> $ mps:'data.frame': 68 obs. of 8 variables:
#> ..$ p : num [1:68] 0.936186 0.009492 0.000438 0.000146 0.001606 ...
#> ..$ mps001: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ mps003: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ mps005: num [1:68] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ mps006: num [1:68] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ mps008: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ mps011: num [1:68] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ mps016: num [1:68] 0 1 0 1 0 1 0 1 0 1 ...
#> $ msn:'data.frame': 50 obs. of 8 variables:
#> ..$ p : num [1:50] 0.936332 0.00993 0.000876 0.000876 0.000292 ...
#> ..$ msn001: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msn003: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msn005: num [1:50] 0 0 0 0 0 0 1 1 1 1 ...
#> ..$ msn007: num [1:50] 0 0 1 1 1 1 0 0 1 1 ...
#> ..$ msn010: num [1:50] 0 1 0 0 1 1 0 1 0 0 ...
#> ..$ msn011: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msn013: num [1:50] 0 0 0 1 0 1 0 0 0 1 ...
#> $ msp:'data.frame': 132 obs. of 9 variables:
#> ..$ p : num [1:132] 0.931732 0.000365 0.000365 0.000365 0.001825 ...
#> ..$ msp001: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msp004: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msp008: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msp009: num [1:132] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ msp010: num [1:132] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ msp011: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ msp013: int [1:132] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ msp014: int [1:132] 0 1 0 1 0 1 0 1 0 1 ...
#> $ nmc:'data.frame': 6 obs. of 5 variables:
#> ..$ p : num [1:6] 0.94286 0.00952 0.00952 0.00952 0.00952 ...
#> ..$ nmc001: num [1:6] 0 0 1 0 1 0
#> ..$ nmc002: num [1:6] 0 1 1 0 0 0
#> ..$ nmc003: num [1:6] 0 0 0 1 1 0
#> ..$ nmc010: num [1:6] 0 0 0 0 0 1
#> $ nqc:'data.frame': 49 obs. of 7 variables:
#> ..$ p : num [1:49] 0.94217 0.00292 0.00292 0.00292 0.00292 ...
#> ..$ nqc004: int [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ nqc005: int [1:49] 0 0 0 0 0 0 0 0 0 1 ...
#> ..$ nqc008: int [1:49] 0 0 0 0 0 0 0 0 1 0 ...
#> ..$ nqc009: int [1:49] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ nqc012: int [1:49] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ nqc013: int [1:49] 0 1 0 1 0 1 0 1 0 0 ...
#> $ opq:'data.frame': 151 obs. of 9 variables:
#> ..$ p : num [1:151] 0.93925 0.00117 0.00117 0.00117 0.00117 ...
#> ..$ opq002: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ opq004: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ opq005: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ opq008: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ opq011: int [1:151] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ opq013: int [1:151] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ opq014: int [1:151] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ opq015: int [1:151] 0 1 0 1 0 1 0 1 0 1 ...
#> $ qor:'data.frame': 26 obs. of 7 variables:
#> ..$ p : num [1:26] 0.92775 0.00573 0.00573 0.00573 0.00573 ...
#> ..$ qor002: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ qor005: num [1:26] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ qor010: num [1:26] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ qor011: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ qor014: num [1:26] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ qor017: num [1:26] 0 1 0 1 0 1 0 1 0 1 ...
#> $ rpb:'data.frame': 44 obs. of 8 variables:
#> ..$ p : num [1:44] 0.945238 0.000595 0.000595 0.000595 0.000595 ...
#> ..$ rpb001: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ rpb003: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ rpb004: num [1:44] 0 0 0 0 0 1 1 1 1 1 ...
#> ..$ rpb006: num [1:44] 0 1 1 1 1 0 1 1 1 1 ...
#> ..$ rpb007: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ rpb009: int [1:44] 0 0 0 1 1 0 0 0 1 1 ...
#> ..$ rpb014: int [1:44] 0 0 1 0 1 0 0 1 0 1 ...
#> $ rvn:'data.frame': 13 obs. of 5 variables:
#> ..$ p : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#> ..$ rvn005: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#> ..$ rvn007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#> ..$ rvn009: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#> ..$ rvn013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#> $ thv:'data.frame': 16 obs. of 5 variables:
#> ..$ p : num [1:16] 0.95595 0.00357 0.00357 0.00357 0.00357 ...
#> ..$ thv005: int [1:16] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ thv006: int [1:16] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ thv011: int [1:16] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ thv012: int [1:16] 0 1 0 1 0 1 0 1 0 1 ...
#> $ wkt:'data.frame': 45 obs. of 9 variables:
#> ..$ p : num [1:45] 0.942857 0.004762 0.004762 0.000595 0.000595 ...
#> ..$ wkt004: num [1:45] 0 1 1 1 1 1 1 1 1 1 ...
#> ..$ wkt007: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ wkt008: num [1:45] 0 0 0 1 1 1 1 1 1 1 ...
#> ..$ wkt011: num [1:45] 0 0 1 0 0 0 0 0 0 0 ...
#> ..$ wkt012: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ wkt013: num [1:45] 0 0 0 0 0 0 0 1 1 1 ...
#> ..$ wkt014: num [1:45] 0 0 0 0 0 1 1 0 0 1 ...
#> ..$ wkt015: num [1:45] 0 0 0 0 1 0 1 0 1 0 ...
#> $ yiu:'data.frame': 47 obs. of 8 variables:
#> ..$ p : num [1:47] 0.93575 0.00117 0.00117 0.00117 0.00584 ...
#> ..$ yiu001: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yiu003: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yiu005: num [1:47] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ yiu008: num [1:47] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ yiu011: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yiu014: int [1:47] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ yiu018: int [1:47] 0 1 0 1 0 1 0 1 0 1 ...
#> $ yvf:'data.frame': 63 obs. of 9 variables:
#> ..$ p : num [1:63] 0.93224 0.00234 0.00234 0.00234 0.00234 ...
#> ..$ yvf001: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yvf004: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yvf006: num [1:63] 0 0 0 0 0 0 0 0 1 1 ...
#> ..$ yvf007: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yvf008: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ yvf011: int [1:63] 0 0 0 0 1 1 1 1 0 0 ...
#> ..$ yvf013: int [1:63] 0 0 1 1 0 0 1 1 0 0 ...
#> ..$ yvf014: int [1:63] 0 1 0 1 0 1 0 1 0 1 ...
Given the genotype probabilities, we can use the package’s main function ML_estimates()
to compute \(\Delta l\) for each site. The output is a data frame with rows corresponding to methylation sites and columns giving details about certain fitted models (see the help page of the function ML_estimates
for more details). In particular, the column delta.l
gives the statistic \(\Delta l\), which measures how heritable each methylation site is.
MLEs <- ML_estimates(typed_genos, M_values, ncores = 2)
#> Loading data
#> Computing log-likelihood for the standard mixture model
#> Computing log-likelihood for the Mendelian model
head(MLEs)
#> methylation.site delta.l ll.null ll.mix ll.mendel mu.null
#> 3 cg27639199 118.09775 -271.71287 -264.2143 -146.1166 -2.6183850
#> 1 cg18584561 90.00927 -237.56529 -231.8907 -141.8814 -3.2970715
#> 5 cg04417708 54.40022 -249.14781 -241.2967 -186.8965 0.5123842
#> 486 cg20592836 49.01353 -287.01184 -275.2788 -226.2653 -2.2974237
#> 380 cg06536614 47.62643 48.11356 103.9804 151.6069 -2.8489881
#> 6 cg05865327 26.85557 -262.41616 -257.5835 -230.7279 0.2555829
#> sd.null mu0.mendel sd0.mendel mu1.mendel sd1.mendel mu0.mix sd0.mix
#> 3 2.0214565 -3.7764376 0.21607159 0.4513136 1.3477016 -2.5365336 2.06441478
#> 1 1.5481184 -4.2069460 0.25838259 -0.8795158 0.7369130 -3.2561007 1.56614986
#> 5 1.6947389 -0.4606702 0.68962203 3.0981023 0.1964459 0.3379929 1.60045489
#> 486 2.2780993 -1.1515825 2.30146074 -4.0355995 0.1204863 -2.1302684 2.31615821
#> 380 0.1661564 -2.9096112 0.02112367 -2.6747390 0.2544657 -2.9061301 0.05273047
#> 6 1.8798420 -0.5085749 1.44145564 2.7821332 0.1835955 0.1361195 1.83725157
#> mu1.mix sd1.mix
#> 3 -3.811011 0.09404426
#> 1 -4.292643 0.01360576
#> 5 3.190368 0.06669716
#> 486 -4.047780 0.04947619
#> 380 -2.453558 0.14302755
#> 6 2.828646 0.12484766
Joo JE, Dowty JG, Milne RL, Wong EM, Dugue PA, English D, Hopper JL, Goldgar DE, Giles GG, Southey MC, kConFab. Heritable DNA methylation marks associated with susceptibility to breast cancer. Nat Commun. 2018 Feb 28;9(1):867.