This vignette illustrates how to easily use the ICDS package. This package can identifying cancer risk subpathways which can explain the disturbed biological functions in more detail and accurately.
This package provides the getExpp
,getMethp
,getCnvp
function to calculate p-values or corrected p-values for each gene.
This package provides the coverp2zscore
,combinep_two
,combinep_three
function to Convert p-values or corrected p-values to z-scores.
This package provides the FindSubPath
function to search for interested subpathways in each entire pathway.
This package provides the opt_subpath
function to Optimize interested subpathways.
This package provides the Permutation
function to to calculate statistical significance for these interested subpathways .
We can use function getExampleData to return example data and environment variables, such as the data of exp_data.
library(ICDS)
# obtain the expression profile data
exp_data<-GetExampleData("exp_data")
#view first six rows and six colmns of data
exp_data[1:6, 1:6]
#obtain the labels of the samples of the expression profile, the label vector is a vector of 0/1s,
# 0 represents the case sample and 1 represents the control sample
label1<-GetExampleData("label1")
#view first ten label
label1[1:10]
## [1] 0 0 0 0 0 0 0 0 0 0
we used the Student’s t-test to calculate the p-value for expression level and methylation level of each gene in the tumor and normal samples.label,0 represents normal samples;1 represents cancer samples.
#calculate p-values or corrected p-values for each gene
exp.p<-getExpp(exp_data,label = label1,p.adjust = FALSE)
label2<-GetExampleData("label2")
meth_data<-GetExampleData("meth_data")
meth.p<-getMethp(meth_data,label = label2,p.adjust = FALSE)
exp.p[1:10,]
For copy number data, we can calculate p-value through the following way:
#obtain Copy number variation data
cnv_data<-GetExampleData("cnv_data")
#obtion amplified genes
amp_gene<-GetExampleData("amp_gene")
#obtion deleted genes
del_gene<-GetExampleData(("del_gene"))
#calculate p-values or corrected p-values for each gene
cnv.p<-getCnvp(exp_data,cnv_data,amp_gene,del_gene,p.adjust=FALSE,method="fdr")
cnv.p[1:10,]
With the above data, we constructed an integrative gene z-score (Z) based three datasets.
Firstly,the gene score calculated through combined p-values of Fisher’s Inverse chi-square tests. This method computes a combined statistic S from the p-values of the difference coefficient obtained from three individual datasets .Usually,the statistic S follows a chi-square distribution with 2k degrees,we can calculate the null hypothesis p-value of the statistic S.
Secondly,we convert the p-value to z-score according to Gaussian distribution , which is taken as the gene z-score(Z) .
#obtain the p-values of expression profile data,methylation profile data and Copy number variation data
exp.p<-GetExampleData("exp.p")
meth.p<-GetExampleData("meth.p")
cnv.p<-GetExampleData("cnv.p")
#calculate z-scores for p-values of each kind of data
zexp<-coverp2zscore(exp.p)
zmeth<-coverp2zscore(meth.p)
zcnv<-coverp2zscore(cnv.p)
#combine two kinds of p-values,then,calculate z-score for them
zz<-combinep_two(exp.p,meth.p)
#combine three kinds of p-values,then,calculate z-score for them
zzz<-combinep_three(exp.p,meth.p,cnv.p)
zzz[1:6,]
For a priori KEGG path, we perform a greedy algorithm to search for interested subpathways.We provide two statistical test methods of the subpathway, which one is whole gene-based perturbation, and the other is the local gene perturbation in a particular pathway.
#obtain z-score of each gene
zzz<-GetExampleData("zzz")
zzz[1:10,]
require(graphite)
zz<-GetExampleData("zzz")
#subpathdata<-FindSubPath(zz) #only show
Optimize interested subpathways.If the number of genes shared by the two pathways accounted for more than the Overlap ratio of each pathway genes,than combine two pathways.
subpathdata<-GetExampleData("subpathdata")
keysubpathways<-opt_subpath(subpathdata,zz,overlap=0.6)
head(keysubpathways)
## SubpathwayID pathway
## [1,] "path00010_1" "Glycolysis / Gluconeogenesis"
## [2,] "path00010_2" "Glycolysis / Gluconeogenesis"
## [3,] "path00010_3" "Glycolysis / Gluconeogenesis"
## [4,] "path00010_4" "Glycolysis / Gluconeogenesis"
## [5,] "path00010_5" "Glycolysis / Gluconeogenesis"
## [6,] "path00010_6" "Glycolysis / Gluconeogenesis"
## Subgene
## [1,] "G6PC/HK3/PGM1/HKDC1/GPI/FBP1/ALDOA/G6PC2/GALM/G6PC3"
## [2,] "ADH5/ADH1B/ADH1A/ALDH3B2/ALDH2/AKR1A1/ALDH7A1/ALDH3B1/ALDH1B1/ALDH1A3"
## [3,] "PKLR/ENO1/LDHA/PGAM1/MINPP1/PCK1/LDHAL6A/LDHAL6B"
## [4,] "ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2"
## [5,] "PFKP/ALDOA/GAPDH/FBP1/GPI/ALDOC/GAPDHS"
## [6,] "PGAM4/ENO1/PGAM1/MINPP1/PCK1/PGK2"
## Size SubpathwayZScore
## [1,] "10" "20.8492758044976"
## [2,] "10" "20.9317587858665"
## [3,] "8" "19.0899605586423"
## [4,] "7" "19.7078248333379"
## [5,] "7" "18.3020024323356"
## [6,] "6" "16.7772761794305"
the perturbation test was used to calculate statistical significance for these interested subpathways.
keysubpathways<-Permutation(keysubpathways,zz,nperm1=100,method1=TRUE,nperm2=100,method2=FALSE)
head(keysubpathways)
## SubpathwayID pathway
## [1,] "path00010_1" "Glycolysis / Gluconeogenesis"
## [2,] "path00010_2" "Glycolysis / Gluconeogenesis"
## [3,] "path00010_3" "Glycolysis / Gluconeogenesis"
## [4,] "path00010_4" "Glycolysis / Gluconeogenesis"
## [5,] "path00010_5" "Glycolysis / Gluconeogenesis"
## [6,] "path00010_6" "Glycolysis / Gluconeogenesis"
## Subgene
## [1,] "G6PC/HK3/PGM1/HKDC1/GPI/FBP1/ALDOA/G6PC2/GALM/G6PC3"
## [2,] "ADH5/ADH1B/ADH1A/ALDH3B2/ALDH2/AKR1A1/ALDH7A1/ALDH3B1/ALDH1B1/ALDH1A3"
## [3,] "PKLR/ENO1/LDHA/PGAM1/MINPP1/PCK1/LDHAL6A/LDHAL6B"
## [4,] "ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2"
## [5,] "PFKP/ALDOA/GAPDH/FBP1/GPI/ALDOC/GAPDHS"
## [6,] "PGAM4/ENO1/PGAM1/MINPP1/PCK1/PGK2"
## Size SubpathwayZScore p1 fdr1
## [1,] "10" "20.8492758044976" "0.05" "0.365714285714286"
## [2,] "10" "20.9317587858665" "0.08" "0.465454545454545"
## [3,] "8" "19.0899605586423" "0.05" "0.365714285714286"
## [4,] "7" "19.7078248333379" "0.01" "0.150588235294118"
## [5,] "7" "18.3020024323356" "0.08" "0.465454545454545"
## [6,] "6" "16.7772761794305" "0.07" "0.448"
require(graphite)
require(org.Hs.eg.db)
subpID<-unlist(strsplit("ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2","/"))
pathway.name="Glycolysis / Gluconeogenesis"
zzz<- GetExampleData("zzz")
PlotSubpathway(subpID=subpID,pathway.name=pathway.name,zz=zzz)