remove_covariate: Removing Covariate Effect form Expression Data

Description Usage Arguments Functions Note See Also Examples

View source: R/remove_covariate.R

Description

The main purpose of this function is to remove batch effect from the data. Batch can be associated with different days of sample processing (as factor) or with run order (continuous). Can also be used to remove any unwanted effects from the data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
remove_covariate(x, cov_name)

correct_batch_effect(
  m,
  batch_name,
  least_count_threshold = 2,
  BPPARAM = bpparam(),
  ...
)

remove_batch_effect(x, batch_name, ref_level = NULL, subset_by = c(NULL, NULL))

correct_batch_effect_empiricalBayesLM(
  x,
  removed_cov_name,
  retained_cov_name = NULL,
  least_proportion_threshold = 0.5,
  ...
)

Arguments

x

MSnSet or ExpressionSet object

cov_name

covariate name. Must be in pData(x). At this point it can be only one name.

batch_name

same thing as covariate name. Using "batch" instead of "covariate" to keep it consistent with 'ComBat'. Must be in pData(x). At this point it can be only one name.

least_count_threshold

minimum number of feature observations required per batch. The default values is 2, the minimum 'ComBat' can handle safely.

BPPARAM

BiocParallelParam for parallel operation. Default is bpparam(). Use bpparam("SerialParam") if you want to restrict it to only one thread.

...

other arguments for ComBat

ref_level

In case a certain factor level should be reference and kept at zero bias. Default is NULL, i.e. none.

subset_by

vector of two strings from varLabels(x). First is the variable name for subsetting the data. Second is the variable value to retain.

removed_cov_name

covariate name to be removed. Must be in pData(x).

retained_cov_name

covariate name to be included in the model but not removed. Must be in pData(x).

least_proportion_threshold

minimum proportion of feature observations required for the whole dataset. The default value is 50%.

Functions

Note

The algorithm essentially uses an LM. The reason for re-inventing the wheel is presense of missing values in proteomics datasets more then usual.

See Also

ComBat empiricalBayesLM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# example 1
set.seed(1)
means <- rep(c(1,2,3), each=3)
nrows <- 5
e <- matrix(rep(means, nrows), ncol=length(means), byrow=T)
e <- e + 
    matrix(rnorm(Reduce(`*`, dim(e)), sd=0.3), ncol=length(means), byrow=T)

# add missing values in increasing frequency
extreme <- 10 # controls how quickly increases propotion of NAs
# 1 - means it will reach 100% by the end
# 2 - means only 50% will be missing by the last row
# N - is 1/N-th
freqs <- (1:nrow(e)-1)/(extreme*(nrow(e)-1))
mis <- t(sapply(freqs, rbinom, n=ncol(e), size = 1))

mis[mis == 1] <- NA
e[5,8:9] <- NA
e[4,4:6] <- NA
e <- e + mis
image(e) 
library("ggplot2"); library("reshape2")
ggplot(melt(e), aes(x=Var1, y=Var2, fill=value)) + geom_raster()

# generating factors
facs <- gl(length(unique(means)),length(means)/length(unique(means)))
# alternative is correction for continuous variable
cova <- seq_along(means)

library("Biobase")
m <- ExpressionSet(e)
pData(m)$pesky <- facs
pData(m)$runorder <- cova
m2 <- remove_covariate(m, "pesky")
m3 <- remove_covariate(m, "runorder")

image(exprs(m))
image(exprs(m2))
image(exprs(m3))

# Example 2 (real-world)
data(cptac_oca)
# let's test for iTRAQ_Batch effect
res <- eset_lm(oca.set, "y ~ iTRAQ_Batch", "y ~ 1")
# not too strong, but there
hist(res$p.value, 50)
image_msnset(oca.set, facetBy="iTRAQ_Batch")
oca.fixed <- remove_covariate(oca.set, "iTRAQ_Batch")
res <- eset_lm(oca.fixed, "y ~ iTRAQ_Batch", "y ~ 1")
hist(res$p.value, 50)
image_msnset(oca.fixed, facetBy="iTRAQ_Batch")

# Example for correct_batch_effect
data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- correct_batch_effect(oca.set, batch_name = "Batch")
plot_pca_v3(oca.set.2, phenotype = 'Batch')

data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- remove_batch_effect(oca.set,
                                 batch_name = "Batch", ref_level="X14",
                                 subset_by=c("tumor_stage","IIIC"))
plot_pca_v3(oca.set.2, phenotype = 'Batch')

# Example for correct_batch_effect_empiricalBayesLM
data("cptac_oca") # oca.set object
plot_pca_v3(oca.set, phenotype = 'Batch')
oca.set.2 <- correct_batch_effect_empiricalBayesLM(oca.set,
                                 removed_cov_name = "Batch",
                                 retained_cov_name = "tumor_stage")
plot_pca_v3(oca.set.2, phenotype = 'Batch')

vladpetyuk/vp.misc documentation built on June 25, 2021, 6:35 a.m.