remove_covariate: Removing Covariate Effect form Expression Data
In vladpetyuk/vp.misc: Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

Description Usage Arguments Functions Note See Also Examples

The main purpose of this function is to remove batch effect from the data. Batch can be associated with different days of sample processing (as factor) or with run order (continuous). Can also be used to remove any unwanted effects from the data.

remove_covariate(x, cov_name)

correct_batch_effect(
  m,
  batch_name,
  least_count_threshold = 2,
  BPPARAM = bpparam(),
  ...
)

remove_batch_effect(x, batch_name, ref_level = NULL, subset_by = c(NULL, NULL))

correct_batch_effect_empiricalBayesLM(
  x,
  removed_cov_name,
  retained_cov_name = NULL,
  least_proportion_threshold = 0.5,
  ...
)

`x`	MSnSet or ExpressionSet object
`cov_name`	covariate name. Must be in pData(x). At this point it can be only one name.
`batch_name`	same thing as covariate name. Using "batch" instead of "covariate" to keep it consistent with 'ComBat'. Must be in pData(x). At this point it can be only one name.
`least_count_threshold`	minimum number of feature observations required per batch. The default values is 2, the minimum 'ComBat' can handle safely.
`BPPARAM`	BiocParallelParam for parallel operation. Default is bpparam(). Use bpparam("SerialParam") if you want to restrict it to only one thread.
`...`	other arguments for `ComBat`
`ref_level`	In case a certain factor level should be reference and kept at zero bias. Default is NULL, i.e. none.
`subset_by`	vector of two strings from varLabels(x). First is the variable name for subsetting the data. Second is the variable value to retain.
`removed_cov_name`	covariate name to be removed. Must be in pData(x).
`retained_cov_name`	covariate name to be included in the model but not removed. Must be in pData(x).
`least_proportion_threshold`	minimum proportion of feature observations required for the whole dataset. The default value is 50%.

correct_batch_effect: wrapper around sva::ComBat
remove_batch_effect: A flexible batch correction function
correct_batch_effect_empiricalBayesLM: An empirical Bayesian approach to batch correction with discrete or continuous covariates

The algorithm essentially uses an LM. The reason for re-inventing the wheel is presense of missing values in proteomics datasets more then usual.

ComBat empiricalBayesLM

# example 1
set.seed(1)
means <- rep(c(1,2,3), each=3)
nrows <- 5
e <- matrix(rep(means, nrows), ncol=length(means), byrow=T)
e <- e + 
    matrix(rnorm(Reduce(`*`, dim(e)), sd=0.3), ncol=length(means), byrow=T)

# add missing values in increasing frequency
extreme <- 10 # controls how quickly increases propotion of NAs
# 1 - means it will reach 100% by the end
# 2 - means only 50% will be missing by the last row
# N - is 1/N-th
freqs <- (1:nrow(e)-1)/(extreme*(nrow(e)-1))
mis <- t(sapply(freqs, rbinom, n=ncol(e), size = 1))

mis[mis == 1] <- NA
e[5,8:9] <- NA
e[4,4:6] <- NA
e <- e + mis
image(e) 
library("ggplot2"); library("reshape2")
ggplot(melt(e), aes(x=Var1, y=Var2, fill=value)) + geom_raster()

# generating factors
facs <- gl(length(unique(means)),length(means)/length(unique(means)))
# alternative is correction for continuous variable
cova <- seq_along(means)

library("Biobase")
m <- ExpressionSet(e)
pData(m)$pesky <- facs
pData(m)$runorder <- cova
m2 <- remove_covariate(m, "pesky")
m3 <- remove_covariate(m, "runorder")

image(exprs(m))
image(exprs(m2))
image(exprs(m3))

# Example 2 (real-world)
data(cptac_oca)
# let's test for iTRAQ_Batch effect
res <- eset_lm(oca.set, "y ~ iTRAQ_Batch", "y ~ 1")
# not too strong, but there
hist(res$p.value, 50)
image_msnset(oca.set, facetBy="iTRAQ_Batch")
oca.fixed <- remove_covariate(oca.set, "iTRAQ_Batch")
res <- eset_lm(oca.fixed, "y ~ iTRAQ_Batch", "y ~ 1")
hist(res$p.value, 50)
image_msnset(oca.fixed, facetBy="iTRAQ_Batch")

# Example for correct_batch_effect
data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- correct_batch_effect(oca.set, batch_name = "Batch")
plot_pca_v3(oca.set.2, phenotype = 'Batch')

data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- remove_batch_effect(oca.set,
                                 batch_name = "Batch", ref_level="X14",
                                 subset_by=c("tumor_stage","IIIC"))
plot_pca_v3(oca.set.2, phenotype = 'Batch')

# Example for correct_batch_effect_empiricalBayesLM
data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- correct_batch_effect_empiricalBayesLM(oca.set,
                                 removed_cov_name = "Batch",
                                 retained_cov_name = "tumor_stage")
plot_pca_v3(oca.set.2, phenotype = 'Batch')

vladpetyuk/vp.misc documentation built on June 25, 2021, 6:35 a.m.

vladpetyuk/vp.misc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vladpetyuk/vp.misc
Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

remove_covariate: Removing Covariate Effect form Expression Data
In vladpetyuk/vp.misc: Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

Description

Usage

Arguments

Functions

Note

See Also

Examples

Related to remove_covariate in vladpetyuk/vp.misc...

R Package Documentation

Browse R Packages

We want your feedback!

vladpetyuk/vp.misc Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

remove_covariate: Removing Covariate Effect form Expression Data In vladpetyuk/vp.misc: Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

Description

Usage

Arguments

Functions

Note

See Also

Examples

Related to remove_covariate in vladpetyuk/vp.misc...

R Package Documentation

Browse R Packages

We want your feedback!

vladpetyuk/vp.misc
Miscellaneous functions used by Vlad Petyuk for proteomics data analysis

remove_covariate: Removing Covariate Effect form Expression Data
In vladpetyuk/vp.misc: Miscellaneous functions used by Vlad Petyuk for proteomics data analysis