Genekitr is a gene analysis toolkit based on R.
Search: gene-related information (exp. location, gene name, GC content, gene biotype …) and PubMed records
Convert: ID conversion among
Symbol & Alias
, NCBI Entrez
,
Ensembl
,Uniprot
and
Microarray probe
Analysis: user could select interested gene set from hundreds of gene sets for both model and non-model species, including GO (BP, CC and MF), KEGG (pathway, module, enzyme, network, drug and disease), WikiPathway, MsigDb, EnrichrDb, Reactome, MeSH, DisGeNET, Disease Ontology (DO), Network of Cancer Gene (NCG) (version 6 and v7) and COVID-19. Gene enrichment analysis (GSA) contains both over representation analysis (ORA) and gene set enrichment analysis (GSEA) methods. ORA could support multi-group comparison.
Plot: 14 GO plots, 7 KEGG plots, 5 GSEA plots, 2 Venn plots and 1 Volcano plot with flexible modification on text, color, border, axis and legend. All plot function input is dataframe format and supports GeneOntology web result. Feel free to make your own plots.
Export: easily export multiple data sets as various sheets in one excel file
For more details, please refer to this site.
New features are available for
version > 1.0.0
install.packages("genekitr")
::install_github("GangLiLab/genekitr") remotes
::install_git("https://gitee.com/genekitr/pacakge_genekitr") remotes
https://www.genekitr.fun/
Genes are the basic omics research unit, just like cells in our body.
However, the issue of the gene is a little tedious.
Here, I want to tell you a story about Mr. Doodle, a computational biology student. Now let us welcome our host Mr.Doodle to introduce his daily work with PI…
PI gave Doodle 30 genes and let him check their locations (better with sequences) and exact names. Doodle searched on NCBI one by one and copied & paste it into excel. Doodle sent the file to PI one hour later, and PI smiled, “Well done! Now I have another 50!”
PI gave Doodle a DEG (differential expression analysis) matrix and a target gene list file. PI let him find if the target gene is up-regulated after treatment. After a while, Doodle found no PDL1 gene in the matrix but indeed exists in the gene list. “Do we have PDL1 gene?” he asked PI, and PI smiled, “Of course! You need to check gene CD274 instead of PDL1, which is an alias!”
Doodle was confused: how to distinguish between a real gene name and an alias?
Doodle got the up-regulated gene symbols of the last DEG matrix to analyze KEGG. KEGG only supports Entrez id, so he needs to convert the symbol to Entrez. He found some symbols do not match Entrez id, but NCBI has. Doodle remembered he used org.db v3.12, but the current is v3.15. After he updated the annotation package, he finally got all matched IDs.
Doodle wonders if there is any method to help him get updated results instead of self-check every time?
PI did enrichment analysis alone on the GeneOntology website and let Doodle do visualization according to that result. “Could you please help plot the pathway bubble plot? Meanwhile, I want to show the x-axis as FoldEnrichment?” PI smiled. Doodle wanted to use the clusterProfiler R package for the plot, but he found it only accepts its object. So he bites the bullet and self-coding using ggplot2.
Doodle wonders why does not have a tool that supports standard enrichment data frames?
Doodle finished the bubble plot at last and sent it to PI. After 15 minutes, PI sent him a message with a smile: “seems plot text size is too small, and could you give me a white background with border size 4 pt?” Doodle adjusted the ggplot theme function and modified 10 minutes. After a while, PI sent a message again, “I saw the second version; maybe the border is too thick. Could you replot?”
Doodle wonder if there is a function that could help him process the plot theme instead of changing the current code again and again?
Once Doodle got the GO enrichment analysis result, PI let him think about how to show them nicely. Doodle found that every tool has its specific plot. For example, WEGO could compare BP, CC, and MF terms; GOplot has a chord plot to show the relationship of gene and GO terms; clusterProfiler support enriched map and network, which could explore the relationship among enrich terms. One big problem is that their input data is not compatible, so it is inconvenient to plot WEGO plots using clusterProfiler objects.
Doodle wonder if there is any method that could involve beautiful plots from different tools with one universal data format?
Doodle has finished differential expression analysis and GO/KEGG enrichment analysis; PI let him send all result files to him. Doodle firstly saved all results into three excel files and named “DEG_data.xlsx,” “GO_enrich.xlsx,” and “KEGG_enrich.xlsx” then, he packed three files into one zipped folder and named them the date, finally, he sent to PI. After a while, PI sent him a message: “Could you put all three results into one excel file?”
Doodle wonders if there is a way to save all data into one file without much manual operation?
If you have ever had one or more similar problems like Mr. Doodle, try
genekitr
!
Wait to update…
If you are interested in genekitr
, welcome contribute
your ideas as follows:
genekitr.Rproj
to open RStudioR/
folderdevtools::check()
to make sure no errors, warnings
or notes