knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(LRTools)

Created by Lionel Rohner

Current package version: 0.1.0.0000 (beta)

Contact: lionel.rohner@gmail.com

Introduction

This is a small collection of the most useful functions I wrote during my master project. The package consists of functions that are necessary for the bioinformatics pipeline that I used to analyze Illumina HM450 and EPIC data. The package also includes other useful functions that can be helpful when analyzing DNA methylation data from these platforms.

Installation & Dependecies

To install this package use : devtools::install_github("LionelRohner/LRTools")
Following packages are required to use all functions of this package: matrixStats, pwr, ChAMP, ChAMPdata, akmedoids, ggplot2 If there are errors during the installation of ChAMP and ChAMPdata, try to install these packages first using :

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ChAMP")
BiocManager::install("ChAMPdata")

Functions

Beta_To_M

Conversion of beta-values to M-values.

Description:

This function performs a logit transform ($M=log2(beta)-log2(beta-1)$) on beta-values as single values or in vector, matrix, or data.frame form. M-values are not defined for beta-values equal to 0 or 1. A warning message will be displayed in case of presence of 0 and 1 in the matrix.

Arguments

Return: ####

Examples:####

vecBeta <- runif(100,0,1)
vecM <- Beta_To_M(vecBeta)
head(vecM)

champ.load_extended

Modified champ.load function from the \emph{ChAMP} package.

Description:

This is a modification of the function "champ.load" from the \emph{ChAMP} package, which allows to use the Noob background correction. The original function from \emph{ChAMP} loads data from IDAT files to calculate intensity and produce quality control images. Furthermore, a new argument has been added that allows to load a specific sample sheet using a regular expression. This avoids the errors of the normal loading function when multiple csv files are found in the working directory. Only the new arguments are presented here. The other arguments can be found in the \emph{ChAMP} manual here: https://rdrr.io/bioc/ChAMP/man/champ.load.html. The modification are minor, see line 94 and 104 to 111, everything else is from the \emph{ChAMP} package created by Yuan Tian \link{cre,aut}, Tiffany Morris \link{ctb}, Lee Stirling \link{ctb}, Andrew Feber \link{ctb}, Andrew Teschendorff \link{ctb}, Ankur Chakravarthy \link{ctb}. \emph{ChAMP} is licensed under GPL-3.

Arguments

Return: ####

Examples:####

myLoad_450K <- champ.load_extended(method = "minfi",
                                 arraytype = "450K",
                                 sampleSheet.csv ="^someSampleSheet.csv$",
                                 preproc = "Noob",
                                 dyeMethod = "single")

checkBMIQ

Plot the beta distribution of Infinium I and II probes before and after BMIQ-correction.

Description:

This function can be used to evaluate the beta-value distribution of Infinium I and II probes before and after probe bias correction with BMIQ. Although implemented to check BMIQ performance, this function can be used to evaluate Infinium I and II probe distribution at any step of the pipeline and is also not limited to BMIQ-corrected beta-value matrices. By default, the function represents the mean density of Infinium I and II probes across all samples. It may be advisable to check for updates to \emph{ChAMP} and \emph{ChAMPdata} as new EPIC manifests are released from time to time.

Arguments

Return: ####

Examples:####

checkBMIQ(betaBefore = beta,
                    betaAfter = beta_BMIQcorrected,
                    array = "450K")

cohensD_Paired

Calculate Cohen's D effect size for paired data.

Description:

This function calculates Cohen's D for paired data. The formula was taken from the \emph{rstatix} package.

Arguments

Return: ####

Examples:####

before <- rnorm(1000, 10, 2)
after <- rnorm(1000,20, 4)
D <- before - after
meanD <- mean(D)
sigmaD <- sd(D)
cohensD_Paired(meanD = meanD,
               sigmaD = sigmaD)

cohensD

Calculate standard Cohen's D effect size.

Description:

This function calculates Cohen's D, an effect size measure. If sample sizes are not equal use the unbiased modifier.

Arguments

Return: ####

Examples:####

GroupA <- rnorm(1000,10, 2)
GroupB <- rnorm(1000,20, 4)

meanGroupA <- mean(GroupA)
meanGroupB <- mean(GroupB)

sigmaGroupA <- sd(GroupA)
sigmaGroupB <- sd(GroupB)

cohensD(meanA = meanGroupA,
        meanB = meanGroupB,
        sdA = sigmaGroupA,
        sdB = sigmaGroupB,
        nA = 1000, nB = 1000,
        unbiased = TRUE,
        verbose = TRUE)

exVarPlot

Visualize explained variance of PC-1 and PC-2 of a matrix contaning only the most variable rows.

Description:

To be honest, I am not entirely sure how informative this function is. I used it to see how much the explained variance of the PCs varies in the lower tail in order to choose a suitable number of the most variable CpG-probes. This number can be quite arbitrary, and if this number is in a region of the distribution that fluctuates a lot, it may be wiser to reconsider the number of most variable rows. Explained variance is calculated as the ratio of eigenvalues of PC-1 or PC-2 divided by the sum of all eigenvalues as proposed here: https://stats.stackexchange.com/questions/254592/calculating-pca-variance-explained/254598.

Arguments

Return: ####

Examples:####

A <- data.frame(rnorm(1000,0.2,0.01),runif(1000,0.2,0.75),rnorm(1000,0.75,0.05),
                          rnorm(1000,0.2,0.01),runif(1000,0.2,0.6),rnorm(1000,0.6,0.05))

rownames(A) <- paste("cg",sample(10000000:99999999,1000,replace = FALSE), sep ="")
colnames(A) <- rep(paste("Sample_",1:6,sep=""))

A <- as.matrix(A)

X <- exVarPlot(mat=A,
               lowerLim = 10,
               upperLim = 1000)
head(X)

NamesDeltaBetaCut

Extraction of CpG-probes IDs or indices based on delta beta-values.

Description:

The purpose of this function is to find the row names or indices of the CpG probe that have a certain absolute delta beta-value difference between two groups of choice. The function is restricted to comparison of two groups.

Arguments

Return: ####

Examples:####

# create some random matrix
A <- data.frame(runif(1000,0.2,1),runif(1000,0.3,1),runif(1000,0.5,0.99),
                runif(1000,0.1,0.7),runif(1000,0.2,0.6),runif(1000,0,0.6))

rownames(A) <- paste("cg",sample(10000000:99999999,1000,replace = FALSE), sep ="")
colnames(A) <- rep(paste("Sample_",1:6,sep=""))

pd <- cbind.data.frame(colnames(A),c(rep("Group_1",3),rep("Group_2",3)))
colnames(pd) <- c("Sample_Names","Groups")

inclusion <- NamesDeltaBetaCut(pd = pd,
                               feature = pd$Groups,
                               beta = A,
                               group1_Name = "Group_1",
                               group2_Name = "Group_2",
                               deltaBetaThreshold = 0.2,
                               returnIndeces = FALSE,
                               sampleNames = "Sample_Names")

# Plot results
A_filtered <- A[rownames(A) %in% inclusion,]
Group1 <- paste("^",pd[which(pd$Groups == "Group_1"), "Sample_Names"],"$",
                collapse = "|", sep = "")
Group2 <- paste("^",pd[which(pd$Groups == "Group_2"), "Sample_Names"],"$",
                collapse = "|", sep = "")

meanG1 <- rowMeans(A[,grep(Group1, colnames(A))])
meanG2 <- rowMeans(A[,grep(Group2, colnames(A))])

hist(meanG1-meanG2, xlab = "delta beta-values",
     main = "All (grey) vs. filtered delta beta-values (red)")

meanG1_filtered <- rowMeans(A_filtered[,grep(Group1, colnames(A_filtered))])
meanG2_filtered <- rowMeans(A_filtered[,grep(Group2, colnames(A_filtered))])

hist(meanG1_filtered-meanG2_filtered, col = rgb(0.8,0.1,0.1,0.3),add = TRUE)

powerCalc

Power and sample size estimation.

Description:

The purpose of this function is to evaluate the power and estimated sample size across all CpG probes present in a beta-value or M-value matrix. The calculation of power and sample size estimation are based on the t-test functions of the \emph{pwr} package and can only be performed between features that have two levels, e.g., primary PanNET versus metastasis. PowerCalc works for normal two-group designs or paired designs. The core functions used in powerCalc are from the \emph{pwr} package created by Stephane Champely \link{aut}, Claus Ekstrom \link{ctb}, Peter Dalgaard \link{ctb}, Jeffrey Gill \link{ctb}, Stephan Weibelzahl \link{ctb}, Aditya Anandkumar \link{ctb}, Clay Ford \link{ctb}, Robert Volcic \link{ctb}, and Helios De Rosario \link{cre}. \emph{pwr} is licensed under GPL (>= 3).

Arguments

Return: ####

Examples:####

A <- data.frame(runif(1000,0.5,1),runif(1000,0.4,0.8),runif(1000,0.6,0.99),
                runif(1000,0.1,0.5),runif(1000,0.2,0.6),runif(1000,0,0.4))

colnames(A) <- rep(paste("Sample_",1:6,sep=""))
pd <- data.frame(c(rep("Group_1",3),rep("Group_2",3)))
colnames(pd) <- "Groups"

A <- as.matrix(A)

PC <- powerCalc(betaMatrix = A,
                type = "unpaired",
                pdGroups = pd$Groups,
                nameGroups = c("Group_1","Group_2"),
                M = TRUE,
                cutOff = 0.1)
head(PC)

powerCalcPlot

Plotting PowerCalc Results.

Description:

Plotting function for powerCalc sample estimation. Generates a plot of the estimated sample size as a function of the CpG probes that have a power of 0.8.

Arguments

Return: ####

Examples:####

# Standard Example

A <- data.frame(runif(1000,0.5,1),runif(1000,0.4,0.8),runif(1000,0.6,0.99),
                runif(1000,0.1,0.5),runif(1000,0.2,0.6),runif(1000,0,0.4))

colnames(A) <- rep(paste("Sample_",1:6,sep=""))
pdA <- data.frame(c(rep("Group_1",3),rep("Group_2",3)))
colnames(pdA) <- "Groups"

A <- as.matrix(A)

powerCalcA <-powerCalc(betaMatrix = A,
                       type = "unpaired",
                       pdGroups = pdA$Groups,
                       nameGroups = c("Group_1","Group_2"),
                       cutOff = 0.1)

B <- data.frame(runif(1000,0.8,1),runif(1000,0.8,0.9),runif(1000,0.7,0.9),
                runif(1000,0.05,0.3),runif(1000,0.2,0.3),runif(1000,0,0.3))

colnames(B) <- rep(paste("Sample_",1:6,sep=""))
pdB <- data.frame(c(rep("Group_1",3),rep("Group_2",3)))
colnames(pdB) <- "Groups"

B <- as.matrix(B)

powerCalcB <-powerCalc(betaMatrix = B,
                       type = "unpaired",
                       pdGroups = pdB$Groups,
                       nameGroups = c("Group_1","Group_2"),
                       cutOff = 0.1)

powerCalcPlot(powerCalcA, powerCalcB)

# Example with returnCleanPlotObj and explicitNames

g <- powerCalcPlot(powerCalcA, powerCalcB,
                   returnCleanPlotObj = TRUE,
                   explicitNames = c("Some","Thing"))
library(ggplot2)
g + geom_line(aes(color = Group)) + labs(title = "my Title") # + etc as you like

topN_variableX

Find top most variable row entries in a microarray data matrix.

Description:

This function can be used to extract the rows that have the greatest variance for further PCA analysis.

Arguments

Return: ####

Examples:####

A <- data.frame(rnorm(1000,0.2,0.01),runif(1000,0.2,0.75),rnorm(1000,0.75,0.05),
                          rnorm(1000,0.2,0.01),runif(1000,0.2,0.6),rnorm(1000,0.6,0.05))

rownames(A) <- paste("cg",sample(10000000:99999999,1000,replace = FALSE), sep ="")
colnames(A) <- rep(paste("Sample_",1:6,sep=""))

A <- as.matrix(A)

top1K <- topN_variableX(mat = A,
                        topN = 1000,
                        plot = TRUE)

head(top1K)


LionelRohner/LRTools documentation built on Dec. 17, 2021, 1:10 a.m.