calc.confounding.level: Level of confounding calculation

Description Usage Arguments Value References Examples

Description

Calculate the level of confounding between handling effects and sample group of interest for a dataset. First, principal component is applied on the non-biological subset of the data. The first five principal components are then used to build a simple linear regression model to predict the sample group. the highest adjusted R-squared is returned as the level of confounding.

Usage

1
calc.confounding.level(data, group.id, nbe.genes)

Arguments

data

microarry dataset. It must have rows as probes and columns as samples.

group.id

a vector of sample-group labels for each sample of the dataset.

nbe.genes

a vector of non-biological genes used to filter the dataset. Non-biological genes are indicated as TRUE, otherwise as FALSE. The vector must have an equal length to the number of probes in the dataset.

Value

a list of two elements:

locc

the level of confounding

k_pc

the most correlated principal component of the non-biological genes in the dataset with the sample group

References

Leek J., Scharpf R., Bravo H., et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733-9, 2010.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Not run: 
biological.effect <- estimate.biological.effect(uhdata = uhdata.pl)
handling.effect <- estimate.handling.effect(uhdata = uhdata.pl,
                             nuhdata = nuhdata.pl)

ctrl.genes <- unique(rownames(uhdata.pl))[grep("NC", unique(rownames(uhdata.pl)))]

biological.effect.nc <- biological.effect[!rownames(biological.effect)
  %in% ctrl.genes, ]
handling.effect.nc <- handling.effect[!rownames(handling.effect) %in% ctrl.genes, ]

group.id <- substr(colnames(biological.effect.nc), 7, 7)

biological.effect.train.ind <- colnames(biological.effect.nc)[c(sample(which(
  group.id == "E"), size = 64),
sample(which(group.id == "V"), size = 64))]
handling.effect.train.ind <- colnames(handling.effect.nc)[c(1:64, 129:192)]

# randomly created a vector of Boolean for nbe.genes
nbe.genes <- sample(c(TRUE, FALSE), size = nrow(biological.effect.nc), replace = TRUE)

calc.confounding.level(data = biological.effect.nc[, biological.effect.train.ind],
                       group.id = substr(biological.effect.train.ind, 7, 7),
                       nbe.genes = nbe.genes)

## End(Not run)

LXQin/precision documentation built on May 11, 2019, 6:24 p.m.