We developed a novel software package (SMDIC) that enables automated identification of Somatic Mutation-Driven Immune Cell. The operation modes include inference of the relative abundance matrix of tumor-infiltrating immune cells, detection of differential abundance immune cells concerning the gene mutation status, conversion of the abundance matrix of significantly dysregulated cells into two binary matrices (one for up-regulated and one for down-regulated cells), identification of somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and visualization of immune cell abundance of samples in different mutation status for each gene. Here we present the flow diagram of SMDIC.
SMDIC has three main functions (flow diagram):
(A) inferring the relative abundance matrix of tumor-infiltrating immune cells
(B) detecting differential abundance immune cells concerning a particular gene mutation status and converting the abundance matrix of significantly dysregulated immune cells into two binary matrices (one for up-regulated and one for down-regulated cells)
(C) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples.
This vignette illustrates how to easily use the SMDIC package. With the use of functions in this package, users could identify the immune cells driven by somatic mutations in the tumor microenvironment.
This package provides the exp2cell
function to use gene expression profile to quantify the cell abundance matrix.
This package provides the maf2matrix
function to use mutation annotation file (MAF) format data to build a binary mutations matrix.
This package provides the mutcorcell
function to identify the immune cells driven by somatic mutations in the tumor microenvironment.
This package provides the plotwaterfall
function to plot the waterfall for mutation genes that drive immune cells.
This package provides the plotCoocMutex
function to plot the co-occurrence and mutual exclusivity plots for mutation genes that drive immune cells.
This package provides the heatmapcell
function to draw clustered heatmaps for the cells driven by a somatic mutation.
This package provides the survcell
function to draw Kaplan–Meier survival curves in the above-median and below-median groups for cell risk score.
We can use the function GetExampleData
to return example data and environment variables.
The function exp2cell can use gene expression profiles to quantify cell abundance matrix. ‘exp2cell’ provides three methods for estimating the relative infiltration abundance of different cell types in the tumor microenvironment (TME), which including xCell, ssGSEA estimated method proposed by Şenbabaoğlu et al. and CIBERSORT.Through these methods, the relative abundance matrix of immune cells is inferred with cells as the row and samples as the column.
We selected the breast cancer gene expression profile data in the GDC TCGA database for exp2cell and obtained the results of the cell abundance matrix. The commands are as follows.
library(SMDIC)
#get breast cancer gene expression profile.
exp.example<-GetExampleData("exp.example")
# perform the exp2cell method. The method must be one of "xCell","ssGSEA" and "CIBERSORT".
cellmatrix.example<-exp2cell(exp.example,method="ssGSEA")
#> Warning in .local(expr, gset.idx.list, ...): 659 genes with constant
#> expression values throuhgout the samples.
#get the result of the exp2cell function
#view the first six rows and six columns of the cell abundance matrix.
head(cellmatrix.example)
#> TCGA.A7.A13G.01 TCGA.A7.A26E.01 TCGA.A1.A0SQ.01
#> aDC -0.1721896 -0.2365829 -0.09769543
#> B cells -0.1785598 -0.2060056 -0.21060641
#> CD8 T cells 0.2603276 0.2939651 0.32180621
#> Cytotoxic cells -0.1710623 -0.1744474 -0.13125948
#> DC -0.2110966 -0.2248506 -0.16642146
#> Eosinophils 0.1456574 0.1297275 0.16402532
#> TCGA.GI.A2C9.11 TCGA.E9.A1NE.01 TCGA.A2.A0SX.01
#> aDC -0.11102476 0.30829474 0.255985640
#> B cells -0.15950503 -0.06773615 -0.007057109
#> CD8 T cells 0.34343045 0.42561320 0.337348145
#> Cytotoxic cells 0.04400304 0.29537554 0.125879324
#> DC 0.16974204 -0.01435208 0.109405174
#> Eosinophils 0.12760198 0.13450722 0.068610081
#> TCGA.EW.A1PD.01 TCGA.BH.A0HN.01 TCGA.E2.A1IU.01
#> aDC -0.13083749 -0.06885779 -0.008594315
#> B cells -0.25506334 -0.25029785 -0.124540116
#> CD8 T cells 0.30512749 0.28520165 0.317656631
#> Cytotoxic cells -0.16228083 -0.15758420 -0.048766592
#> DC -0.08489845 -0.26360357 -0.158910587
#> Eosinophils 0.11624230 0.09294411 0.125777697
#> TCGA.BH.A0BO.01
#> aDC -0.08444126
#> B cells -0.19944249
#> CD8 T cells 0.32995068
#> Cytotoxic cells -0.09360501
#> DC 0.04338637
#> Eosinophils 0.16156528
The somatic mutation data used in the package is the mutation annotation file (MAF) format. We extract the non-silent somatic mutations (nonsense mutation, missense mutation, frame-shift indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions, and built a binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0. The genes with a given mutation frequency greater than a threshold value (one percent as the default value) are retained for the following analysis.
We selected the breast cancer somatic mutation data in the GDC TCGA database for maf2matrix and obtained the results of a binary mutation matrix. The commands are as follows.
# get the path of the mutation annotation file.
maf <- system.file("extdata","example.maf.gz",package = "SMDIC")
# perform the maf2matrix method.
mutmatrix.example<-maf2matrix(maffile = maf,percent = 0.01)
#get the result of the exp2cell function
#view the first six rows and six columns of the binary mutations matrix
head(mutmatrix.example)[1:6,1:6]
#> TCGA.E9.A22G.01 TCGA.OL.A6VO.01 TCGA.AR.A1AP.01 TCGA.BH.A0AW.01
#> TP53 1 1 1 1
#> TP53BP2 0 0 0 0
#> TP53BP1 0 0 0 0
#> PCDH10 0 0 0 0
#> CDH1 0 0 0 0
#> CDH10 0 0 0 0
#> TCGA.E2.A1LG.01 TCGA.A2.A04U.01
#> TP53 1 1
#> TP53BP2 0 0
#> TP53BP1 0 0
#> PCDH10 1 0
#> CDH1 0 0
#> CDH10 0 0
For a particular mutation gene, we compare the binary mutation vector with each binary cell abundance vector in the up-regulated or down-regulated immune cell-matrix respectively, and a 2×2 contingency table is constructed.Fisher’s exact test is applied to recover the cells that had drastic mutation-correlated up-regulated or down-regulated response. This process is repeated for each immune cell in the binary abundance matrices. To correct for multiple comparisons, we adjust the exact test p-values by using the false discovery rate (FDR) method proposed by Benjamini and Hochberg. The immune cells with the default FDR<0.05 are deemed as statistically significant mutation-correlated, and which may be driven by the somatic mutation. We repeat the above process for each mutated gene.
The “mutcorcell” function is implemented to identify somatic mutation-specific immune cell response by inputting the abundance matrix of immune cells and binary mutations matrix. This function firstly detects differential abundance of immune cells concerning a particular gene mutation status with the Significance Analysis of Microarrays (SAM) method. Function mutcorcell
detects the differential immune cells concerning a particular gene mutation status and construction of two binary matrices based on the abundance matrix of significant differential immune cell (one for up-regulated and one for down-regulated), identification of somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices.
We selected the abundance matrix of immune cells and binary mutations matrix for mutcorcell and obtained the results of A list of four matrices.
* A binary numerical matrix which shows the immune cells driven by somatic mutant gene;
* Two numerical matrices which show the pvalue and fdr of the immune cells driven by somatic mutant gene;
* A character matrix which shows the cell responses of the immune cells driven by a somatic mutant gene.
The commands are as follows.
# get breast cancer cell abundance matrix, which can be the result of exp2cell function.
cellmatrix<-GetExampleData("cellmatrix")
# get breast cancer binary mutations matrix, which can be the result of maf2matrix function.
mutmatrix<-GetExampleData("mutmatrix")
# perform the function `mutcorcell`.
mutcell<-mutcorcell(cellmatrix= cellmatrix,mutmatrix = mutmatrix,fisher.adjust = TRUE)
#get the result of the `mutcorcell` function
mutcell<-GetExampleData("mutcell")
# the binary numerical matrix which shows the immune cells driven by somatic mutant gene.
mutcell$mut_cell[1:6,1:6]
#the numerical matrix which shows the pvalue of the immune cells driven by a somatic mutant gene
#mutcell$mut_cell_p
#the numerical matrix which show the fdr of the immune cells driven by somatic mutant gene
#mutcell$mut_cell_fdr
#the character matrix which shows the cell responses of the immune cells driven by a somatic mutant gene."up" means up-regulation, "down" means down-regulation, and "0" means no significant adjustment relationship
#mutcell$mut_cell_cellresponses
tip: The binary matrix can not only come from the maf2matrix function but also any binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0.
Function mutcellsummary produces result summaries of the results of mutcorcell function. The summary has four columns. The first column is gene names, the second column is the cells driven by the gene, the third column is the number of cells driven by the gene, the fourth column is mutation rates of the gene.
We selected the result of mutcorcell function, a binary mutations matrix and a cell abundance matrix for mutcellsummary and obtained the summaries of mutcorcell function.
The commands are as follows.
# perform the function mutcellsummary
summary<-mutcellsummary(mutcell =mutcell,mutmatrix = mutmatrix,cellmatrix = cellmatrix)
# get the result of the mutcellsummary function
head(summary)
#> gene
#> 1 TP53
#> 2 CDH1
#> 3 NBPF14
#> 4 OBSCN
#> 5 NF1
#> 6 KCNA4
#> cells
#> 1 Astrocytes,CD8+ naive T-cells,CD8+ Tem,DC,Keratinocytes,Macrophages,Macrophages M1,Melanocytes,NK cells,pDC,Pericytes,Plasma cells,pro B-cells,Sebocytes,Tgd cells,Th1 cells,Th2 cells,Tregs
#> 2 Adipocytes,CD4+ Tcm,Chondrocytes,CMP,Endothelial cells,Fibroblasts,HSC,ly Endothelial cells,Megakaryocytes,Mesangial cells,mv Endothelial cells,NKT
#> 3 CD8+ Tem,DC,iDC,Keratinocytes,pro B-cells,Sebocytes
#> 4 Epithelial cells,Keratinocytes,pro B-cells,Sebocytes,Th1 cells
#> 5 aDC,B-cells,Memory B-cells,pDC,Plasma cells
#> 6 CD8+ T-cells,CD8+ Tcm,Class-switched memory B-cells,NK cells,pDC
#> count mut rate
#> 1 18 0.34394251
#> 2 12 0.12936345
#> 3 6 0.01334702
#> 4 5 0.02977413
#> 5 5 0.03696099
#> 6 5 0.01129363
Function gene2cellsummary produces result summaries of the immune cells driven by a somatic mutation. The summaries are a matrix that shows the short name, full name, pvalue, fdr, cell responses(up or down) of the cells driven by a somatic mutation.
We selected the result of mutcorcell function, the method we use in exp2cell and a gene we are interested in for gene2cellsummary ,and obtained the result summaries of the immune cells driven by a somatic mutation.
The commands are as follows.
# perform the function gene2cellsummary
gene2cellsummary(gene="TP53",method="xCell",mutcell = mutcell)
#> gene cell name full name
#> Astrocytes TP53 Astrocytes Astrocytes
#> CD8+ naive T-cells TP53 CD8+ naive T-cells CD8+ naive T-cells
#> CD8+ Tem TP53 CD8+ Tem CD8+ effector memory T-cells
#> DC TP53 DC Dendritic cells
#> Keratinocytes TP53 Keratinocytes Keratinocytes
#> Macrophages TP53 Macrophages Macrophages
#> Macrophages M1 TP53 Macrophages M1 Macrophages M1
#> Melanocytes TP53 Melanocytes Melanocytes
#> NK cells TP53 NK cells NK cells
#> pDC TP53 pDC Plasmacytoid dendritic cells
#> Pericytes TP53 Pericytes Pericytes
#> Plasma cells TP53 Plasma cells Plasma cells
#> pro B-cells TP53 pro B-cells pro B-cells
#> Sebocytes TP53 Sebocytes Sebocytes
#> Tgd cells TP53 Tgd cells Gamma delta T-cells
#> Th1 cells TP53 Th1 cells Type 1 T-helper cells
#> Th2 cells TP53 Th2 cells Type 2 T-helper cells
#> Tregs TP53 Tregs Regulatory T-cells
#> pvalue fdr cell responses
#> Astrocytes 1.010916e-07 1.162554e-06 up
#> CD8+ naive T-cells 3.992545e-03 1.311836e-02 up
#> CD8+ Tem 1.748251e-02 4.467753e-02 up
#> DC 1.604871e-02 4.385580e-02 up
#> Keratinocytes 2.331616e-07 1.981328e-06 up
#> Macrophages 1.820606e-04 1.196398e-03 up
#> Macrophages M1 6.327255e-09 9.701790e-08 up
#> Melanocytes 5.155187e-03 1.580924e-02 up
#> NK cells 5.359277e-04 2.465267e-03 up
#> pDC 2.076020e-03 8.121323e-03 up
#> Pericytes 3.445410e-04 1.981111e-03 up
#> Plasma cells 1.620758e-02 4.385580e-02 up
#> pro B-cells 2.637040e-03 9.331064e-03 up
#> Sebocytes 1.510706e-09 3.474624e-08 up
#> Tgd cells 2.118606e-03 8.121323e-03 up
#> Th1 cells 2.584341e-07 1.981328e-06 up
#> Th2 cells 8.859960e-13 4.075581e-11 up
#> Tregs 4.775752e-04 2.440940e-03 up
We provide a set of visual analysis functions including heatmapcell
,plotwaterfall
,plotCoocMutex
,and survcell
.
Function heatmapcell
is a function to draw clustered heatmaps for the cells driven by a somatic mutation.
The commands are as follows.
# load dependent package.
require(pheatmap)
#> Loading required package: pheatmap
# plot significant up-regulation or down-regulation cells heat map specific for breast cancer
heatmapcell(gene = "TP53",mutcell = mutcell,cellmatrix = cellmatrix,mutmatrix = mutmatrix)
We use the TCGA breast cancer sample data set to display the waterfall and co-occurrence and mutual exclusivity plots, and users can also enter the cancer data set they are interested in.
Function plotwaterfall
and plotCoocMutex
provide visualization of the mutation genes correlated with immune cells using a waterfall plot and mutually exclusive or co-occurring plot.
The commands are as follows.
#maf<-"dir"
#tips: dir is the name of the mutation annotation file (MAF) format data. It must be an absolute path or the name relative to the current working directory.
#plot the waterfall for mutation genes which drive immune cells
#plotwaterfall(maffile = maf,mutcell.summary = summary,cellnumcuoff =4)
#plot the co-occurrence and mutual exclusivity plots for mutation genes that drive immune cells.
#plotCoocMutex(maffile = maf,mutcell.summary = summary,cellnumcuoff =4)
#view the result of the plotwaterfall function
knitr::include_graphics("../inst/plotwaterfall.jpeg")
#view the result of the plotCoocMutex function
knitr::include_graphics("../inst/plotCoocMutex.jpeg")
Function survcell
draws Kaplan–Meier survival curves in the above-median and below-median groups for cell risk score. The cell risk score is calculated by the weighted mean of cells driven by a gene mutation, where the weight of cells is estimated by the “Univariate” or “Multivariate” cox regression.
# get the result of `mutcorcell` function.
mutcell<-GetExampleData("mutcell")
# get the result of `exp2cell` function.
cellmatrix<-GetExampleData("cellmatrix")
#get the survival data, the first column is the sample name, the second column is the survival time, and the third is the survival event.
surv<-GetExampleData("surv")
#draw Kaplan–Meier survival curves
survcell(gene ="TP53",mutcell=mutcell,cellmatrix=cellmatrix,surv=surv,palette = c("#E7B800", "#2E9FDF"))