cleaningY: Regress out covariates

Description Usage Arguments Value Author(s) Examples

View source: R/cleaningY.R

Description

Regress out covariates such as surrogate variables or principal components.

Usage

1
cleaningY(y, mod, P)

Arguments

y

A matrix such as the outcome matrix from sva or a gene expression matrix.

mod

A full rank model matrix.

P

The number of SVs or PCs to protect based on the column order. For example, 'P=2' would keep the intercept term and a case vs diagnosis term in a model that is ~ Dx + more covariates.

Value

An object of the same type as 'y' with the SVs/PCs regressed out.

Author(s)

Rafael Irizarry, Leonardo Collado-Torres (examples)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
## Define a model generating function for 30 'samples'
set.seed(20190827)
model_fun <- function(x) {
    ## Baseline + a group effect (2 groups) + a second covariate effect
    rnorm(30) +
    c(rnorm(15, mean = 3), rnorm(15, mean = 1)) +
    c(rnorm(5, sd = 0.5), rnorm(5, sd = 0.2, mean = 0.5),
        rnorm(5, sd = 0.2, mean = 0.9))
}

## Generate the data for 20 'genes'
y <- t(sapply(1:20, model_fun))

## Define the phenotype data for these 30 'samples'
pheno <- data.frame(
    group = rep(c('A', 'B'), each = 15),
    batch = rep(1:3, each = 5)
)

## Define a full model
mod <- with(pheno, model.matrix(~ group + batch))

## Check the raw data for gene 1
boxplot(y[1, ] ~ pheno$group, ylab = 'Gene 1 Raw Expr')

## Now regress out the batch covariate from the gene expression matrix
y_clean_p2 <- cleaningY(y, mod, P = 2)

## Check the cleaned data for gene 1 (with P = 2)
boxplot(y_clean_p2[1, ] ~ pheno$group, ylab = 'Gene 1 Clean Expr (P = 2)')

## Or regress out the group and batch effects
y_clean_p3 <- cleaningY(y, mod, P = 1)

## Check the cleaned data for gene 1 (with P = 3)
boxplot(y_clean_p3[1, ] ~ pheno$group, ylab = 'Gene 1 Clean Expr (P = 3)')




## The function also supports NAs observations as detailed below

## Make one observation 0, clean the data
y[1, 1] <- 0
y_clean_p2_0 <- cleaningY(y, mod, P = 2)
## then NA and clean again
y[1, 1] <- NA
y_clean_p2_NA <- cleaningY(y, mod, P = 2)

## Compare the results
corner(y_clean_p2_0)
corner(y_clean_p2_NA)

## They are identical except for that NA in [1, 1]
table(y_clean_p2_0 -  y_clean_p2_NA, useNA = 'ifany')

## Compared to the original y, there are differences since we lost
## one observation which affects all of the first row of the cleaned Y
y_clean_p2[1, ] - y_clean_p2_NA[1, ]
all(y_clean_p2[-1, ] - y_clean_p2_NA[-1, ] == 0)

blackthornrx/rbnctools documentation built on Feb. 6, 2021, 12:27 a.m.