A common problem in R is labelling scatter plots with large numbers of points and/or labels. We provide a utility for easy labelling of scatter plots and quick plotting of volcano plots and MA plots for gene expression analyses. Using an interactive shiny and plotly interface, users can hover over points to see where specific points are located and click on points to easily label them. Labels can be toggled on/off simply by clicking. An input box and batch input window provides an easy way to label points by name. Labels can be dragged around the plot to place them optimally. Notably we provide an easy way to export directly to PDF for publication.
Install from CRAN
install.packages("easylabel")
library(easylabel)
Install from Github
devtools::install_github("myles-lewis/easylabel")
library(easylabel)
If you wish to use the optional useQ
argument with easyVolcano()
and easyMAplot()
, you will need to install additional package qvalue
from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("qvalue")
If you wish to use the optional fullGeneNames
argument, you will need to install packages AnnotationDbi
and org.Hs.eg.db
from Bioconductor:
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
Use easylabel()
to open a shiny app and plot and label scatter plots. A table of the main data is supplied in the Table tab for easy viewing of data.
data(mtcars)
easylabel(mtcars, x = 'mpg', y = 'wt',
colScheme = 'royalblue')
The output_shiny
option enables users to switch between invoking the shiny app or directly outputting a plotly object. For example, to extract a plotly figure with draggable annotations:
data(mtcars)
p1 <- easylabel(mtcars, x = 'mpg', y = 'wt', col = 'cyl',
startLabels = rownames(mtcars)[mtcars$gear == 5],
output_shiny = FALSE) %>%
layout(yaxis = list(zeroline = FALSE))
p2 <- easylabel(mtcars, x = 'mpg', y = 'drat', col = 'vs',
colScheme = c("dodgerblue", "orange"),
startLabels = rownames(mtcars)[mtcars$gear == 5],
output_shiny = FALSE) %>%
layout(xaxis = list(zeroline = FALSE))
plotly::subplot(p1, p2, nrows = 2, shareY = TRUE, titleX = TRUE, margin = 0.05)
Note that while the labels and lines can still be dragged and moved in the exported plotly object, clicking on points to add them, exporting to PDF and borders around label text, which are all shiny/ PDF output features, are not available from a pure plotly object.
Plotly objects can also be exported as a return value by pressing the ‘Export plotly & exit’ button in the shiny app. In this case the object retains the labelled points and their edited positions.
Similar to ggplot2 and plotly, colour can either be set as a single colour, using colScheme = 'blue'
or can be set to change with the level of a factor variable within data
by setting col
. Transparency can be altered by setting alpha
. Currently colour gradients are not supported.
easylabel(mtcars, x = 'mpg', y = 'wt',
col = 'cyl')
Marker shapes can either be set as a single shape using shapeScheme = 21
, based on the base graphics pch
symbol scheme (see points()
), or shapes can be set to change with the level of a factor variable within data
by setting shape
. Shapes and colours can be combined.
# gapminder data set
if(!require(gapminder)) {install.packages("gapminder")}
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', shape = 'continent',
size = 10,
labs = 'country',
zeroline = FALSE)
The size of all markers can be adjusted by setting size
to a single number e.g. size = 6
(default 8). Size can be assigned to a column of continuous data within data
to produce a bubble chart. The size of points is automatically scaled to fit within a range of point sizes, determined by scaleRange
.
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', labs = 'country',
size = 'pop',
alpha = 0.6,
zeroline = FALSE)
By default, easylabel()
takes the rownames of the main dataframe as the label names. Any column can be selected for label names by setting labs
to the name of a column within data
, as for example in the Gapminder dataset plots above.
Axis titles can be set using xlab
and ylab
. These accept R expressions to allow maths symbols. This is primarily designed to work when exporting plots to PDF via base graphics, since plotly does not natively understand expressions.
easylabel(xymatrix, x = 'x', y = 'y', col = 'col',
colScheme = c('darkgrey', 'green3', 'gold3', 'blue'),
xlab = expression("log"[2] ~ " fold change post-Rituximab"),
ylab = expression("log"[2] ~ " fold change post-Tocilizumab"),
showgrid = TRUE)
xlim
and ylim
allow control over the range of each axis. Outlying points are shown as diamonds. Outliers can be toggled off using showOutliers = FALSE
. Outlier symbol, colour and line width can be set using outlier_shape
, outlier_col
and outlier_lwd
respectively.
Axis tick marks can be controlled by setting xaxp
or yaxp
to a vector of form c(x1, x2, n)
where x1
and x2
are the limits of the tick marks and n
is the number of intervals between tick marks.
For full customisation of axis tick coordinates and labels use xticks
or yticks
, which are specified as a list of two named vectors at = ...
and labels = ...
.
# example axis ticks
if(!require(gapminder)) {install.packages("gapminder")}
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', shape = 'continent',
size = 10,
labs = 'country',
zeroline = FALSE,
xaxp = c(0, 50000, 10),
yaxp = c(40, 85, 9),
showgrid = TRUE)
Gridlines can be shown by setting showgrid = TRUE
. If showgrid
is set to "x"
or "y"
then gridlines are shown only for that axis. Density of gridlines can be controlled by setting xaxp
or yaxp
(see Axis tick marks).
Black lines through the origin can be controlled by setting zeroline
to TRUE
or FALSE
.
Dashed grey horizontal lines can be added at specific levels by setting hline
with a vector of values. Similarly dashed grey vertical lines can be added by setting vline
with a vector of values.
Setting showLegend
to TRUE
or FALSE
can be used to show or hide the legend.
Legends can be positioned by setting legendxy
. This gives a vector of coordinates for the position of the legend in plotly paper reference with c(0, 0)
being the bottom left corner and c(1, 1)
being the top right corner of the plot window.
Plotly has unusual behaviour in that the x coordinate always aligns the left side of the legend. However, the y coordinate aligns the top, middle or bottom of the legend dependent on whether it is in the top, middle or bottom 1/3 of the plot window. So c(1, 0)
positions the legend in the bottom right corner outside the right margin of the plot, while c(1, 0.5)
centre aligns the legend around the centre of y axis.
Further control of plotting can be achieved by passing other arguments to plot() via R’s ...
argument. For example, a box around the plot can be added using bty = 'o'
(see par()
).
Base graphics’ panel.last
argument is a flexible way to add plotting commands. This provides a way to add trend lines, loess lines or any additional features such as extra legends to the plot after the points are plotted, but before labels are drawn. Note that panel.last
only works when saving to PDF, and is not available in plotly.
# example adding a trend line using panel.last
fit <- lm(xymatrix$y ~ xymatrix$x)
easylabel(xymatrix, x = 'x', y = 'y', col = 'col',
colScheme = c('darkgrey', 'green3', 'gold3', 'blue'),
xlab = expression("log"[2] ~ " fold change post-Rituximab"),
ylab = expression("log"[2] ~ " fold change post-Tocilizumab"),
showgrid = TRUE, fullGeneNames = TRUE,
panel.last = {
abline(fit, col='red')
})
This argument is used to reduce the number of scatter points displayed by plotly, to prevent plotly becoming sluggish or unusable, which happens above about 200,000 points. To plot large datasets such as genomic data, we recommend adding a logical column to your dataset which filters out points of interest for inspecting and labelling via plotly. plotly_filter
specifies the name of this column. Saving to PDF retains the full dataset and all points will be plotted. This argument is used by the easyManhattan()
function (see Manhattan plot).
Use the easyVolcano()
function to quickly plot a volcano plot from DESeq2 or EdgeR objects. The useQ
argument will switch to using q values for FDR.
# Example DESeq2 object
head(volc1)
## baseMean log2FoldChange lfcSE stat pvalue
## 5_8S_rRNA 0.01625099 0.02726923 2.9811510 0.009147217 0.99270168
## 5S_rRNA 3.66321881 0.05989097 0.4115732 0.145517159 0.88430257
## 7SK 12.96667458 -0.46156215 1.3668722 -0.337677614 0.73560615
## A1BG 290.75694559 0.23486554 0.3514587 0.668259368 0.50396804
## A1BG-AS1 170.77906789 0.11800652 0.2001594 0.589562618 0.55548392
## A1CF 4.56855647 -1.23495475 0.4855212 -2.543565265 0.01097276
## padj
## 5_8S_rRNA NA
## 5S_rRNA 0.9823854
## 7SK 0.9528927
## A1BG 0.8956205
## A1BG-AS1 0.9117136
## A1CF 0.4531638
# Example limma object
head(volc2)
## logFC AveExpr t P.Value adj.P.Val B
## A1BG 0.7697662 6.389039 4.692064 0.0011239293 0.008610736 -1.16895727
## A1BG-AS1 0.5298897 3.610850 4.688657 0.0011293516 0.008638889 -0.85241114
## AAAS -0.4162720 5.349529 -5.578986 0.0003398373 0.004148831 0.17585449
## AACS -0.2049703 3.690271 -1.465073 0.1768461859 0.299064278 -5.88763988
## AAGAB 0.2970478 5.538436 5.446618 0.0004033869 0.004553688 -0.02478892
## AAK1 0.1969886 1.861988 1.286456 0.2302992497 0.362402197 -5.74356735
# Typical DESeq2 workflow
volc1 <- results(dds)
easyVolcano(volc1, useQ = TRUE)
If you want to specify your own columns within the dataset set x
to the column containing log fold change, y
to the column containing raw p values and padj
to the column containing adjusted p values. labs
can be used to specify the column containing label names, otherwise this defaults to rownames(data)
.
# Manually specify columns
easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue', padj = 'padj')
# To use nominal unadjusted p value for significant genes
easyVolcano(volc1, x = 'log2FoldChange', y = 'pvalue')
Note that if you specify y
for a column of p values but leave padj
blank, the function assumes you are using nominal unadjusted p values for the cutoff for significance.
Use the easyMAplot()
function to quickly plot an MA plot from DESeq2 or EdgeR objects.
easyMAplot(volc2, useQ = TRUE)
The fullGeneNames
argument will use Bioconductor package AnnotationDbi
and the org.Hs.eg.db
human gene database to expand gene symbols in the Table tab. Both will need to be installed from Bioconductor.
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
easyVolcano(volc1, useQ = TRUE, fullGeneNames = TRUE)
Full gene names are also shown when hovering over points in the scatter plot, which can make it easier to label points of interest. For mouse genes, install org.Mm.eg.db
and set AnnotationDb = org.Mm.eg.db
(note the lack of quotes).
BiocManager::install("org.Mm.eg.db")
library(org.Mm.eg.db)
easyVolcano(volc1,
fullGeneNames = TRUE,
AnnotationDb = org.Mm.eg.db)
For volcano plots a simple colour scheme with downregulated genes in blue and upregulated genes in red can be rendered by setting fccut = 0
.
easyVolcano(volc2,
useQ = TRUE,
fccut = 0,
main = "Volcano title")
A title can be added using main = "Title"
. The font size of the title can be modified using cex.main = 2
(default 1.2) if saving to PDF.
You can add left and right sided titles using Ltitle
and Rtitle
to explain the direction of effect for up/downregulation. The use of expression
in the example below shows how to add left/right arrow symbols to the titles. The symbols only appear in the saved PDF - they are not compatible with plotly.
LRtitle_side = 1
puts these titles on the bottom while LRtitle_side = 3
puts them on the top.
cex.lab
controls font size for these titles as well as axis titles. cex.axis
controls font size for axis numbering.
easyVolcano(volc1,
useQ = TRUE, fullGeneNames = TRUE,
Ltitle = expression(symbol("\254") ~ "Non-responder"),
Rtitle = expression("Responder" ~ symbol("\256")),
LRtitle_side = 1,
cex.lab = 0.9, cex.axis = 0.8,
fccut = c(1, 2), fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = c('darkgrey', 'blue', 'orange', 'red'))
The FDR cutoff is set using fdrcutoff
(volcano plots allow a single value, while MA plots allow multiple values). In order to force significant genes to be shown based on unadjusted P values, this can be achieved by a workaround setting both y
and padj
manually to the unadjusted p value column. Note this does mean the legend incorrectly states ‘FDR <’ etc.
easyVolcano(volc1, y = 'pvalue', padj = 'pvalue', fdrcutoff = 0.01)
xlim
and ylim
allow control over the range of each axis. Outlying points are shown as diamonds (see example above). Outliers can be toggled off using showOutliers = FALSE
. Outlier symbol, colour and line width can be set using outlier_shape
, outlier_col
and outlier_lwd
respectively.
The colour scheme system has been expanded to allow multiple fold change cut-offs. In the example above the colours are symmetrical.
In the next 2 plots, the colours range from blue for downregulated genes, through to red for upregulated genes. Vertical lines can be added using vline
.
colScheme <- c('darkgrey', 'blue', 'lightblue', 'orange', 'red')
easyVolcano(volc1, fccut = 1, fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = colScheme,
vline = c(-1, 1))
The next example has 6 colours and also shows how to remove the white outlines around points using outline_col = NA
and use transparency instead via alpha
.
library(RColorBrewer)
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(9:7, 3:1)])
easyVolcano(volc1, fccut = c(1, 2), fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = colScheme,
alpha = 0.75, outline_col = NA)
Similarly 6 colours can be applied to MA plots using 3 levels of cut-off for FDR (note the colour scheme is in a slightly different order).
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)])
easyMAplot(volc2, fdrcutoff = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
alpha = 0.75, outline_col = NA,
colScheme = colScheme)
Label lines can be altered using the argument labelDir
or by selecting the Label direction pulldown menu in the shiny app. A couple of examples are shown below:
easyVolcano(volc1, labelDir = "horiz")
easyMAplot(volc1, labelDir = "vert")
Label line colour can be set using line_col
. Label text colour can be set using text_col
. Each of these can be set to match the colour of each point by setting each to "match"
. Default label line length can be altered by changing lineLength
(default 75 in pixels). cex.text
can be changed to alter the font size of labels (default 0.72).
Rectangles can be drawn around the label text, using rectangles = TRUE
, and will appear when saving to PDF (they are not supported in plotly). Rectangle fill colour can be set using rect_col
and rectangle border colour can be set using border_col
. Use border_col = NA
to turn off rectangle borders. The amount of padding in pixels around label text can be specified by setting padding
. The amount of roundedness in pixels of rectangle corners is specified by border_radius
.
# Simple outlines
easyVolcano(volc2, useQ = TRUE, fccut = 0,
rectangles = TRUE)
# Red outlined labels, rounded ends
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
rectangles = TRUE,
padding = 5,
border_radius = 10,
line_col = 'red',
border_col = 'red',
text_col = 'red')
# Transparent grey rectangles, rounded ends
easyMAplot(volc2, fdrcut = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
alpha = 0.75, outline_col = NA,
fullGeneNames = TRUE,
colScheme = c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)]),
rectangles = TRUE,
border_col = NA,
padding = 5,
rect_col = adjustcolor('grey', alpha.f = 0.6),
border_radius = 20)
# White text on black background, no rounding
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
fccut = 0,
rectangles = TRUE,
padding = 4,
border_radius = 0,
rect_col = 'black',
text_col = 'white',
border_col = NA)
While label text and label lines can be adjusted by setting text_col
and line_col
to individual colours, label text and label lines can also be set to mirror the colour of each point by setting text_col = "match"
and line_col = "match"
respectively. Label box fill colour or rectangle outline colour can also be set to match each point’s colour by setting rect_col = "match"
or border_col = "match"
(label boxes are not supported in plotly).
# Label text and label lines match point colours
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', labs = 'country',
size = 'pop',
alpha = 0.6,
line_col = "match", text_col = "match",
zeroline = FALSE, showgrid = "y")
# Rectangle fill colour and label line match point colours, rounded rectangles
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', labs = 'country',
size = 'pop',
alpha = 0.6,
line_col = "match", text_col = "white",
rectangles = TRUE, border_col = NA,
rect_col = "match", border_radius = 20, padding = 5,
zeroline = FALSE, showgrid = "y")
Manhattan plots can be labelled using the function easyManhattan()
. Plotly struggles with more than about 100,000 points, so we use a filtering system to reduce the number of points shown in the plotly interactive plot, while the full plot with all points is produced when saving to PDF. An example is shown below:
# Manhattan plot using SLE GWAS data from Bentham et al 2015
# FTP download full summary statistics from
# https://www.ebi.ac.uk/gwas/studies/GCST003156
library(data.table)
SLE_gwas <- fread('../bentham_2015_26502338_sle_efo0002690_1_gwas.sumstats.tsv')
# Simple Manhattan plot
easyManhattan(SLE_gwas)
# 4 colours for chromosomes
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'))
By default Manhattan plots display only the top 100,000 points by p value in the plotly window and plot the top 1 million SNPs when saving to PDF. These defaults can be altered by setting npoints
and nplotly
. Setting npoints
to NA
will plot all points when saving to PDF.
# Examples
# 12 colours for chromosomes, no separate colour for significant points
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(12, 'Paired'),
sigCol = NA)
# Label peaks automatically, add horizontal gridlines
easyManhattan(SLE_gwas, npeaks = 20, showgrid = "y")
# Vertical version
easyManhattan(SLE_gwas, transpose = TRUE, height = 1000, width = 600)
# Chr1 only
easyManhattan(SLE_gwas[SLE_gwas$chrom == 1, ])
# Add symbols for the significant SNPs
easyManhattan(SLE_gwas, chromCols = RColorBrewer::brewer.pal(4, 'Paired'),
size = 8,
shape = 'col',
shapeScheme = c(rep(20, 4), 18))
To look in a specific genomic region
# Create a locus plot over one chromosomal region
library(plotly)
p1 = easyManhattan(SLE_gwas[SLE_gwas$chrom == 6 &
SLE_gwas$pos >= 28e6 &
SLE_gwas$pos <= 34e6, ],
output_shiny = FALSE, labs = "rsid",
startLabels=c("rs115466242", "rs2853999"),
npeaks = 3)
# To annotate genes in that region
source("https://raw.githubusercontent.com/KatrionaGoldmann/BioOutputs/master/R/bio_gene_locations.R")
library(ggbio)
library(gginnards)
library(ggrepel)
if (! "EnsDb.Hsapiens.v75" %in% rownames(installed.packages()))
BiocManager::install("EnsDb.Hsapiens.v75")
p2 = bio_gene_locations(6, c(28e6, 34e6),
subset_genes = c('HLA-F', 'HLA-G', 'HLA-A', 'HLA-E',
'HLA-C', 'HLA-B', 'HLA-DRA',
'HLA-DRB5', 'HLA-DRB1', 'HLA-DQA1',
'HLA-DQB1', 'HLA-DQA2', 'HLA-DQB2',
'HLA-DOB', 'HLA-DMB', 'HLA-DMA',
'HLA-DOA', 'HLA-DPA1', 'HLA-DPB1'))
plotly::subplot(p1, p2$plotly_location %>% layout(yaxis=list(range=c(0.25, 2))),
shareY = T, titleX = T, margin=0.05,
nrows=2, heights=c(0.7, 0.3))
scattergl
function, which uses webGL, seems to have a limit of around 400,000 scatter points per plot. Above this level, the shiny/plotly app becomes sluggish or freezes and becomes unusable. We recommend limiting plots to around 100,000 points using the plotly_filter
argument. See plotly_filter.