README.md
In KatrionaGoldmann/BioOutputs: Visualisation of Biological Data

BioOutputs

This package contains common R scripts I use in my day to day data analysis of biological data. The scripts are primarily for plotting and visualisation, with some data organisation thrown in as well.

Bi-Results

Correlation Plot

Frequency Table

Module Plot

Treatment Group Comparisons

Volcano Plot

Switching Gene Ids Fold Change Heatmap

Significance Boxplots

Sankey Plots

Capitalize Phrases Module Scores

dplyr ggplot2 ggrepel ComplexHeatmap RColorBrewer Rmisc ggpubr gtools grid pBrackets biomaRt

First install devtools to allow installation from gitub and any other required packages.

install.packages("devtools")
library("devtools")

library(devtools)
library(knitr)

Now install the BioOutputs package.

install_github("KatrionaGoldmann/BioOutputs")
library("BioOutputs")

Create a correlation plot. Taken from kassambara/ggpubr just changed the default arguments.

So we can use the classic example with the mtcars data frames:

kable(head(mtcars))

mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

bio_corr(mtcars, "qsec", "wt")

The bio_frequency() function generates a frequency table from factor or character vector columns in a data frame. This has the following arguments:

Argument data A data frame containing columns to be counted columns Column names or indices to be counted in data freq.percent Whether the table should include frequency counts, percentages or both (options = c("freq", "percent", "both")). Default="both" include.na Include NA values (options are TRUE/FALSE, default=TRUE) remove.vars Character vector of variables not to be included in the counts (e.g. remove.vars = c("") remove blanks from the count)

Then if we want to see the breakdown of, say, the gear column in mtcars we can apply:

kable(bio_frequency(mtcars, "gear"))

3 4 5 Total gear 15 (47%) 12 (38%) 5 (16%) n = 32

And if wanted we can remove one variable from the table. This is useful if we have unknowns or the likes.

kable(bio_frequency(mtcars, "gear", remove.vars=c("5")))

3 4 Total gear 15 (56%) 12 (44%) n = 27

This function generates a volcano plot from a top table using ggplot. The function contains many parameters, use ?bio_volcano to interogate.

Argument toptable A data frame containing p value and fold change columns for parameters compared across multiple groups. The p value column should be named "pvalue". fc.col The column name which stores the fold change. Should be in the log2 format (default="log2FC") padj.col The column which contains adjusted p-values. If NULL adjusted pvalues will be calculated padj.method correction method. Options include: c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default="fdr padj.cutoff The cutoff for adjusted pvalues. This adds a horizontal line of significance (default=NULL) fc.cutoff The log2(fold change) significance cut-off (default=1) marker.colour Character vector of four colours to map to the volcano plot. In the order non-significanct, fold-change significant, pvalue significant, significant in fold-change and pvalues (default=c("grey60", "olivedrab", "salmon", "darkturquoise")) label.p.cutoff The cutoff for adjusted pvalues for labelling (default=NULL). Not recommended if many significant rows. label.row.indices Indices of rows to be labelled (default=NULL) label.colour Colour of labels (default="black") legend.labs A character vector for theThe legend label names (default=c("Not Significant", "FC>fc.cutoff", "Padjfc.cutoff& PadjLets look at the ALS patients carrying the C9ORF72 data set

toptable <- read.table("https://gist.githubusercontent.com/stephenturner/806e31fce55a8b7175af/raw/1a507c4c3f9f1baaa3a69187223ff3d3050628d4/results.txt", sep="", header=T)
rownames(toptable) = toptable$Gene

bio_volcano(toptable, fc.col="log2FoldChange", label.row.indices=1:10, main="ALS patients carrying the C9ORF72", add.lines=TRUE)

Lets look at the leukemia data set

BiocManager::install("leukemiasEset", version = "3.8")

library(leukemiasEset)
library(limma)

data(leukemiasEset)
ourData <- leukemiasEset[, leukemiasEset$LeukemiaType %in% c("ALL", "NoL")]
ourData$LeukemiaType <- factor(ourData$LeukemiaType)

design <- model.matrix(~ ourData$LeukemiaType)
fit <- lmFit(ourData, design)
fit <- eBayes(fit)
toptable <- topTable(fit)
toptable$pvalue = toptable$P.Value

bio_volcano(toptable, fc.col="logFC", label.row.indices=1:10, main="leukemia", add.lines=FALSE)

This function colours data according to whether it is below or above a defined plane. The plane is plotted as a line and data can be output as either a line or markers/points.

Argument x x column name in data y y column name in data df.data Data frame containing x and y columns x.plane column name for the x axis in df.plane y.plane column name for the y axis in df.plane df.plane Date frame modelling the plane stepwise logical whether to plot the cutoff plane as stepwise or smoothed colours colour vector for higher, lower and plane values (default=c("green", "red", "grey) respectively) inc.equal logical whethere points on the line should be counted as above (dafault=TRUE) labels label for the markers (default=c("above", "below")) type type of plot for data (options include point (dafault), line, stepwise)

Lets look at some beaver temperature data.

data(beavers)
df.plane = beaver1
df.data = beaver2
df.plane$temp <- df.plane$temp + 0.5
bio_bires(x="time", y="temp", df.data, x.plane="time", y.plane="temp", df.plane)

This function splits expression data into customisable modules and averages over catagories in a given variable.

Argument exp data frame containing the expression data mod.list A list of modules. Each element contains the list of genes for a modules. The gene names must match the rownames in the exp dataframe. meta Dataframe where each column contains an annotation/tract for samples in the heatmap. The order of samples in meta must match that of exp. cluster.rows The method to use for clustering of rows cols Chacter vector, or named vector to fix the order, defining the colours of each mean.var group. main Title of heatmap show.names Show row names. Logical. mean.subjects Logical to determine whether to add a row for the mean value for all subjects in a group split.var Character defining the meta column to average (mean) over.

In this example we will look at two Li modules and a custom one I made up.

exp = rld.syn
meta <- rld.metadata.syn

bio_mods(exp=exp, mod.list=mod_list, meta=meta, mean.var = "Pathotype", cluster.rows = FALSE)

This function plots a line graph comparing two treatment groups over time.

Argument df Data frame containing the data for both groups group.col Column name which corresponding to the group of values in df y.col Column name corresponding to the y-axis values in df x.col Column name corresponding to the x-axis values in df id.col Column name which corresponds to subject ID main Title of pot cols Character vector for colours of lines p.col Colour of p-value text

Lets look at kidney disease survival in this example from the survival package.

library(survival)
data(kidney)

df <- kidney
df <- df[df$time < 150, ]

bio_treatmentGroups(df, group.col="status", y.col="frail", x.col="time")

This function switches gene (or snp) ids using biomart

Argument IDs list of the Ids you want to convert IDFrom What format these IDs are in (default Ensembl) IDTo What format you want the IDs converted to (default gene names) mart The biomart to use. Typically, for humans you will want ensembl (default). Alternatives can be found at listEnsembl() dataset you want to use. To see the different datasets available within a biomaRt you can e.g. do: mart = useEnsembl('ENSEMBL_MART_ENSEMBL'), followed by listDatasets(mart). attributes list of variables you want output

For example genes TNF and A1BG:

IDs = c("TNF", "A1BG")
#kable(bio_geneid(IDs, IDFrom='hgnc_symbol', IDTo = 'ensembl_transcript_id'))

This function exports a complex heatmap looking at the expression of different modules which can be customised.

Argument exp Expression Data var Vector classing samples by variables prefix Prefix to Heatmap titles stars whether pvales should be written as numeric or start (default=FALSE) overlay pvalues on fold change Heatmap or besidde (default = TRUE) logp Whether or not to log the pvalues ... Other parameters to pass to Complex Heatmap

exp = rld.syn
meta <- rld.metadata.syn

bio_fc_heatmap(exp=exp, var=meta$Pathotype)

Creates boxplots showing the significance between groups.

Argument data Data frame containing columns x and y x, y x and y variable names for drawing. p.cutoff plot p-value if above p.cutoff threshold. To include all comparisons set as NULL. stars Logical. Whether significance shown as numeric or stars method a character string indicating which method to be used for comparing means.c("t.test", "wilcox.test") star.vals a list of arguments to pass to the function symnum for symbolic number coding of p-values. For example, the dafault is symnum.args <- list(cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 1), symbols = c('****', '', '', '', 'ns')). In other words, we use the following convention for symbols indicating statistical significance: ns: p > 0.05; : p <= 0.05; : p <= 0.01; : p <= 0.001; ****: p <= 0.0001

Lets look at the iris data:

bio_boxplots(iris, x="Species", y= "Sepal.Width", p.cutoff = 0.0001)

bio_boxplots(iris, x="Species", y= "Sepal.Width", NULL, stars=TRUE)

Creates a sankey plot with heatmaps showing how individuals progress over time.

Argument samp.orders A list of named vectors for sample order at each timepoint. Vector names must correspond to matchable ids.

library(lme4)
library(ComplexHeatmap)
data(sleepstudy)
data = sleepstudy
data = data[data$Days %in% c(0, 1, 2), ]
data = data[data$Days !=1 | data$Subject != "308", ]
time.col="Days"
exp.cols = "Reaction"
sub.col = "Subject"

row.order=list()
for(i in unique(data[, time.col])){
  hm = Heatmap(data[data[, time.col] == i, exp.cols])
  row.order[[paste('timepoint', as.character(i))]] = setNames(unlist(row_order(hm)), data[data[, time.col] == i, sub.col])
}

bio_sankey(samp.order=row.order)

Converts strings to appropriate format for titles according to Chicago style.

Argument titles A character vector of phrases to be converted to titles exeption.words A character vector of words with case to be forced (for example abbreviations and roman numerals) replace.chars A named list of characters to replace in title. This works in order of appearance. E.g. c("\."=" ") replaces fullstops with spaces.

shakespeare_plays =c("all's well that ends well", "as you like it","comedy of errors","love's labour's lost", "measure for measure", "merchant of venice","merry wives of windsor","midsummer night's dream","much ado about nothing","taming of the shrew", "tempest", "twelfth night","two gentlemen of verona", "winter's tale", "henry iv, part i","henry iv, part ii", "henry v", "henry vi, part i","henry vi, part ii", "henry vi, part iii", "henry viii","king john", "pericles","richard ii", "richard iii", "antony and cleopatra","coriolanus","cymbeline", "hamlet","julius caesar", "king lear", "macbeth (the scottish play)", "othello", "romeo and juliet","timon of athens", "titus andronicus", "troilus and cressida")
exception.words= c("I", "II", "III", "IV", "V", "VI") # Pass in exceptions

bio_capitalize(as.character(shakespeare_plays[grepl("henry", shakespeare_plays)]), exception.words)

Calculates the module expression from a list of genes.

Argument exp data frame containing the expression data mod.list A list of modules. Each element contains the list of genes for a modules. The gene names must match the rownames in the exp dataframe.

bio_mods(exp=exp, mod.list=mod_list)

KatrionaGoldmann/BioOutputs documentation built on May 21, 2022, 1:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

KatrionaGoldmann/BioOutputs
Visualisation of Biological Data

README.md
In KatrionaGoldmann/BioOutputs: Visualisation of Biological Data

BioOutputs

Gallery

Required packages

Install Package

bio_corr

bio_frequency

bio_volcano

bio_bires

bio_mods

bio_treatmentGroups

bio_geneid

bio_fc_heatmap

bio_boxplots

bio_sankey

bio_capitalize

bio_modscores

R Package Documentation

Browse R Packages

We want your feedback!

KatrionaGoldmann/BioOutputs Visualisation of Biological Data

README.md In KatrionaGoldmann/BioOutputs: Visualisation of Biological Data

BioOutputs

Gallery

Required packages

Install Package

bio_corr

bio_frequency

bio_volcano

bio_bires

bio_mods

bio_treatmentGroups

bio_geneid

bio_fc_heatmap

bio_boxplots

bio_sankey

bio_capitalize

bio_modscores

R Package Documentation

Browse R Packages

We want your feedback!

KatrionaGoldmann/BioOutputs
Visualisation of Biological Data

README.md
In KatrionaGoldmann/BioOutputs: Visualisation of Biological Data